Sergey Nivens - Fotolia

Feature

How DataOps helps organisations make better decisions

DataOps is helping organisations to accelerate the time it takes to derive value from the data they collect

Adrian Bridgwater

Published: 15 Apr 2019

The race to the cloud among enterprises has been putting pressure on DevOps teams for some time now. DataOps is a variant of this, which is being used as a way to deliver new data models and test data more quickly, to support the pace with which organisations are building out data-driven initiatives.

Whereas machine learning is used to build cohesive software applications, DataOps is being used in a similar way to DevOps, to accelerate the speed with which data models are built, tested and deployed. In doing so, organisations can accelerate the time it takes to derive value from the customer data they collect.

Thibaut Gourdel, technical product manager at Talend, says: “DataOps is a new approach, driven by the advent of machine learning and artificial intelligence. The growing complexity of data and the rise of needs for data governance and ownership are huge drivers in the emergence of DataOps. Data must be governed, stored in specific datacentres, and organisations should know who has access to data, which data and who owns it.”

More sophisticated analytics

DataOps effectively concentrates on the creation and curation of a central data hub, repository and management zone designed to collect, collate and then onwardly distribute application data and data models. The concept hinges around the proposition that an almost metadata-type level of application data analytics can be propagated and democratised more widely across an entire organisation’s IT stack. This then allows more sophisticated layers of analytics to be brought to bear.

As Tamr database guru Andy Palmer puts it: “DataOps acknowledges the interconnected nature of data engineering, data integration, data quality and data security/privacy. It helps an organisation rapidly deliver data that accelerates analytics and enables previously impossible analytics.”

DataOps is not a product. Rather, it is a methodology and an approach. As such, it has its theorists, its naysayers and its fully paid up card-carrying believers. Some argue that DataOps provides the means to deliver data and data models for continuous testing with version control.

George Miranda, DevOps advocate at PagerDuty, a provider of digital operations management, says: “The goal of DataOps is to accelerate time to value where a ‘throw it over the wall’ approach existed previously. For DataOps, that means setting up a data pipeline where you continuously feed data into one side and churn that into useful results.”

Making it easier for people to work with data is a key requirement in DataOps. Nigel Kersten, vice president of ecosystem engineering at Puppet, says: “The DataOps movement focuses on the people in addition to processes and tools, as this is more critical than ever in a world of automated data collection and analysis at a massive scale.”

DataOps practitioners (DataOps engineers or DOEs) generally focus on building data governance frameworks. A good data governance framework – one that is fed and watered regularly with accurate de-duplicated data that stems from the entire IT stack – is able to help data models to evolve more rapidly. Engineers can then run reproducible tests using consistent test environments that ingest customer data in a way that complies with data and privacy regulations.

The end result is a continuous and virtuous develop-test-deploy cycle for data models, says Justin Reock, chief architect at Rogue Wave, a Perforce Company. “At the core of all modern business, code is needed to transport, analyse and arrange domain data,” he says. “This need has given rise to entirely new software disciplines, such as enterprise federation, API-to-API [application programming interface] communication, big data and big data analytics, stream processing, machine learning and data science.

“As the complexity and scale of these applications expand, as is often the case in sophisticated environments, the need for convergence arises. We must be able to reconcile data security, integrity, accessibility and organisation into a single mode of thought – and that mode of thought is DataOps.”

It is important to remember that data has a lifecycle. The data model resulting from a diligent DataOps process will have an appreciation for the entire data lifecycle.

Some data is new, raw, unstructured and potentially quite peripheral; other data may be live, current and possibly mission-critical, while there will always be data that is effectively redundant or needs to be retired. Other types of data may simply be inaccessible due to policy access control or system incompatibility.

Containerised form

Tim Mackey, senior technical evangelist at Synopsys, says: “Data scientists may create an experimental model which is deployed in containerised form. As they refine their model, deployment of the updated model can be quickly performed – potentially while leaving the previous model available for real-time comparison. As their model proves itself, they can quickly scale underlying resources seamlessly, confident that each node in the model is identical to its peers, both in function and performance.”

A number of so-called data science platforms are starting to emerge that support DataOps. Domino Data Lab is the one MoneySuperMarket.com has deployed, and Atwal says it offers a way to provide self-service for its data scientists to work.

Rogue Wave’s Reock believes DataOps, when combined with modern data analytics practices and emerging machine learning technologies, can help organisations to prepare for the coming surge in data-driven business models.

Improve decision-making

The growth in the use of data to improve decision-making, such as applying advanced analytics to internet of things (IoT) sensor streams, is likely to dwarf, by orders of magnitude, the already astronomical amount of data now being generated.

This is likely to lead to greater emphasis on the management of data models and test data , which means DataOps will have an increasingly important role.

Will Cappelli, CTO and global vice-president of product strategy at Moogsoft, says DevOps teams and data scientists should learn how to work together more effectively. “DevOps professionals are all too often impatient,” he says. “They don’t want to wait for the results of a rigorous analysis, whether it is carried out by humans or by algorithms. Data scientists can be overly fastidious – particularly those coming from maths, rather than computer science.

“The truth is, though, that DevOps needs the results of data science delivered rapidly but effectively, so both communities need to overcome some of their bad habits. Perhaps it is time for an agile take on data science itself.”

Next Steps

How DataOps can improve healthcare outcomes

How DataOps helps organisations make better decisions

DataOps is helping organisations to accelerate the time it takes to derive value from the data they collect

More sophisticated analytics

Read more about DataOps

Containerised form

Improve decision-making

Next Steps

Read more on Big data analytics

DataOps

How to build an effective DataOps team

The data team-IT operations divide delays data delivery

DataOps vs. MLOps: Streamline your data operations