The Computer Weekly Developer Network decided to cover the emerging area of DataOps… and we were overloaded.
After an initial couple of stories to define the term here and another exploring DataOps further here, we embarked upon a full feature for Computer Weekly itself.
After all that, the industry still wanted to comment more, so this post includes some of the additional commentary that we think is valuable and worth voicing.
What is DataOps?
As a reminder, DataOps is a close relative of data science where we use machine learning to build more cohesive software applications — DataOps concentrates on the creation & curation of a central data hub, repository and management zone designed to collect, collate and then onwardly distribute application data.
The concept here hinges around the proposition that an almost metadata type level of application data analytics can be more widely propagated and democratised across an entire organisation’s IT stack. Then, subsequently, more sophisticated layers of analytics can be brought to bear such as built-for-purpose analytics engines designed to help track application performance in an altogether more granular fashion.
Brad Parks is VP of business development at Morpheus Data, a unified Ops orchestration tool company.
Parks says that the requirements of a developer requesting a new web application environment are not that different from that of a data scientist requesting a new database.
This is so he thinks because the flow of work is similar and in both cases i.e. the elimination of workflow bottlenecks is a key consideration
“In a DataOps context, enabling the rapid creation and destruction of environments for the collection, modeling and curation of data requires automation and must acknowledge that just like developers, data scientists are not infrastructure admins. The right automation and orchestration platform can enable DataOps self-service, whereby data scientists can request a data set, stand up the environment to utilise that data set… and then tear-down that environment without ever having to talk to IT Ops,” said Parks.
At the same time, he suggests that the Ops side of DataOps should that assure proper data governance and data protection policies are in place to manage security and risk.
This is the core use case that EUMETSAT had for its meteorological data when it selected Morpheus as their next-gen cloud automation platform.
“The race to the cloud among enterprise companies has been putting pressure on DevOps teams for some time now… and DataOps is a variant of this, but much more. Multi-cloud raises that pressure to a whole new level, the emergence of DataOps is demanding more from developers managing data infrastructure.
Gourdel suggests that as a new approach, DataOps is driven by the advent of Machine Learning (ML) and Artificial Intelligence (AI) specifically. He says that the growing complexity of data and the rise of needs for data governance and ownership are huge drivers in the emergence of DataOps.
“[In the context of DataOps], it is important to have the right people (engineers, IT staff, scientists etc.) to get the most value from technologies like ML and AI, but also to have these people responsible for the data,” said Gourdel.
Coming next as our final comment here, we hear from NetApp and PagerDuty.