The Computer Weekly Developer Network decided to cover the emerging area of DataOps… and we were overloaded.
After an initial couple of stories to define the term here and another exploring DataOps further here, we embarked upon a full feature for Computer Weekly itself.
After all that, the industry still wanted to comment more, so this post includes some of the additional commentary that we think is valuable and worth voicing.
What is DataOps?
As a reminder, DataOps is a close relative of data science where we use machine learning to build more cohesive software applications — DataOps concentrates on the creation & curation of a central data hub, repository and management zone designed to collect, collate and then onwardly distribute application data.
The concept here hinges around the proposition that an almost metadata type level of application data analytics can be more widely propagated and democratised across an entire organisation’s IT stack. Then, subsequently, more sophisticated layers of analytics can be brought to bear such as built-for-purpose analytics engines designed to help track application performance in an altogether more granular fashion.
“The fact is, DevOps enables developers and operation teams to be efficient in managing the software lifecycle by using automation – this is valuable for DataOps. This is done using models (expressed as configuration or code), that are maintained in source code systems,” said Jitendra Thethi, AVP of Technology at Altran.
Thethi says that patterns of such implementations can be seen as Pipeline as Code, Infrastructure as Code, Deployment Playbooks or Automated Test Suites — and that this is why and how data scientists and data managers can use DevOps practices.
“They need to do this by moving to model-driven approaches for data governance, data ingestion and data analysis so that it can be managed by version control systems and enforced by an automated database system. Placing them into containers provides an environment where it is easy to test and deploy these models. For example, it can be launched over container cluster infrastructure when in production,” said Thethi.
Will Cappelli, CTO EMEA and global VP of product strategy at Moogsoft argues that the crux of this topic it is less a question of DevOps processes generating varying data sets at sufficient velocity to speed the model learning process, than it is a question of DevOps teams and data scientists learning how to work together more effectively.
“DevOps professionals are all too often impatient. They don’t want to wait for the results of a rigorous analysis whether that analysis is carried out by humans or by algorithms. Of course, data scientists can be overly fastidious – particularly those coming from maths as opposed to a computer science. The truth is, though, that DevOps needs the results of data science delivered rapidly but effectively so both communities need to overcome some of their bad habits. Perhaps it is time for an agile take on data science itself,” said Cappelli.
Nigel Kersten is VP of ecosystem engineering at Puppet. Kersten says that while DataOps is more than just DevOps applied to data, they do share the same methodology around agile processes, automated pipelines, automated testing and lifecycle optimisation.
“What I’m most heartened by however is seeing the DataOps movement focus on the people in addition to processes and tools, as this is more critical than ever in a world of automated data collection and analysis at a massive scale. If we don’t focus on people, as well as ensuring that we include a diverse range of perspectives and examine our own biases before we go all out encoding them in the form of algorithms and then weaponising them using automation, we’re going to end up amplifying and ossifying some of the very worst aspects of human society,” said Kersten.
Coming next, we hear from Morpheus and Talend… and then finally from NetApp and PagerDuty.