In a world of big data and cloud, software application development professionals are becoming data developers and the terms ‘data scientist’ and ‘data engineer’ are becoming very real (and increasingly prevalent) as job titles.
Swimming the data lake
In this data centric world of development, developers have to navigate the so-called data lake in the data warehouse — that unknown pool of unstructured, uncategorised, un-schemed data that has yet to be ascribed a form and function.
Barton says that modern hybrid data warehouse environments are not data lakes; they are information refineries, no less.
“Automation can help us manage the time, cost and risk associated with building, maintaining and operating these complex hybrid data warehouse environments. This has never been more important, now that the sources – and consumers – of relevant data have both changed significantly. In conjunction with this, the underlying infrastructure is now far more complicated (heterogeneous, open-source, etc) and constantly in flux,” he said.
What is an information refinery?
Barton’s notion of an information refinery is a place which is responsible for processing real-time and near real-time, variable quality data inputs, and producing a wide range of outputs, in different forms, for different consumers (both human and programmatic ones) via a complex of constantly changing software machinery.
“Managing not only the flows of data within hybrid data warehousing environments, but also the machinery and processes that consume data, and produce data, within such environments, is a significant challenge, even for experienced data warehousing practitioners — the largest the data warehousing industry has faced, to date,” said WhereScape’s Barton.
Barton typically speaks on the tasks, challenges and requirements associated with hybrid data warehouse automation, ranging from discovery, design and development though to deployment, operations, governance and DevOps.
WhereScape is a data warehouse automation company that offers WhereScape RED, a productivity tool that provides a framework for development and ongoing support of an enterprise data warehouse.