The Computer Weekly Developer Network decided to cover the emerging area of DataOps… and we were overloaded.
After all that, the industry still wanted to comment more, so this post includes some of the additional commentary that we think is valuable and worth voicing.
What is DataOps?
As a reminder, DataOps is a close relative of data science where we use machine learning to build more cohesive software applications — DataOps concentrates on the creation & curation of a central data hub, repository and management zone designed to collect, collate and then onwardly distribute application data.
The concept here hinges around the proposition that an almost metadata type level of application data analytics can be more widely propagated and democratised across an entire organisation’s IT stack. Then, subsequently, more sophisticated layers of analytics can be brought to bear such as built-for-purpose analytics engines designed to help track application performance in an altogether more granular fashion.
Grant Caley is chief technologist for UK and Ireland at hybrid cloud data services and data management company NetApp UK.
Caley suggests that we’re seeing a continued of blurring the lines between job functions, which has been driven by the significant development of technology… and the rise of value in data.
“DataOps addresses the unique challenges of enterprise data workflows – with one being Software Defined Data Centre strategies. As organisations and employees at all levels have become more digitally proficient, the birth of DataOps derived from the growth in disruptive technologies. This has meant the need for closer collaboration from job roles including software developers, architects and security and data governance professionals in order to evolve the people and process paradigm,” said Caley.
He further states that accelerating DataOps will need to use cloud, as well as on-premises IT.
“To deliver and protect data across this essentially hybrid landscape, organisations will need to develop a data fabric to ensure enterprise hybrid cloud data management,” he added.
George Miranda is DevOps advocate at PagerDuty, a provider of digital operations management technologies.
Miranda says that similar to DevOps, the goal of DataOps is to accelerate time to value where a “throw it over the wall” approach existed previously. For DataOps, that means setting up a data pipeline where you continuously feed data into one side and churn that into useful results (models, views, etc) on the other.
“This is essentially the same concept used by developers continuously delivering new features to production. The keys in both of these models are reproducibility and automation. Properly validating every new development before it goes into the hands of users requires a lot of stringent analysis and governance,” said Miranda.
He continues, “The myth of DevOps is that teams simply don’t have to meet the same governance requirements as teams operating in more traditional models. But we’ve seen from years of data that this simply isn’t true. What development teams have learned to do is codify those stringent requirements into automated tests that are automatically applied every time a new development is submitted.”
Miranda concludes by saying that similarly, when it comes to managing data, continuous testing must be applied to any new data intended for use by your users.
He thinks that making that process easier for developers can mean using containers to provide a simple to use method for applying consistent tests from local development on their own workstations through making that data available in production.