This is a guest post for the Computer Weekly Developer Network written by Barry Devlin, founder and principal of 9Sight Consulting.
Devlin believes that data warehouse developers (Ed – is that actually a role and a ‘thing’ already?) have historically walked a narrow line between data quality and business agility.
At the same time, Devlin points out that they very clearly need to balance the needs and relationships of both IT itself and internal business clients [users, operators].
So what to do?
The argument put forward here is that technology has answered this dilemma with two separate approaches:
- the data vault optimised for data warehouse agility and
- data warehouse automation for faster and more reliable development.
Devlin writes as follows
Data vault modeling is designed for long-term historical storage of data from multiple operational systems, looking at data associated with auditing, tracing of data, loading speed and resilience. Data vault inventor, Dan Linstedt, first conceived this approach in the early 2000s.
Data vault modeling is now in its second generation.
While data vaults grow in popularity and expand with features like a new development methodology, developers who want to implement one continue to face numerous challenges, here are a few of the most challenging obstacles and tips on how to overcome them.
Rift between IT & business
The age-old struggle between IT and business explicitly challenges data vault projects.
An overly engineering focused mindset in IT may alienate business interests. As the data warehouse staff concentrates on implementing a new data vault model, they could reduce their face-to-face time with the business, leading to poorer, less detailed, or delayed delivery of specific business solutions.
Delays can widen the rift between business and IT and prompt the business to look elsewhere for quick-fix solutions. An automated approach to data vault design, development, deployment and operation can both accelerate time to data vault delivery, as well as provide new abilities to iteratively collaborate with business users early in the project – increasing engagement, trust and success in delivering value to the business the first time.
The first step in adhering to data vault principles is to understand the source systems, their structures, relationships and underlying data quality.
Although a time-consuming task, it is necessary to validate the model design and implementation approach. Automated discovery and data quality profiling will reduce the design time and population process.
Business and IT can collaborate in compressed time windows to iterate on model designs and validate with live data. This approach eliminates assumptions, enables the model to be validated before deployment and ensures the data warehouse can evolve at the pace needed by the business.
Set rules and follow them
The data vault model involves an extensive framework of rules and recommendations. A data vault’s data objects—from common hubs, links and satellites to the lesser known point in time and bridge helper tables—must adhere to specific standards and definition rules to ensure data vault agility and ease of maintenance. When developers “re-invent” these structures, problems arise that demand reworking, both in the initial build and in ongoing operation.
Whether sharing tasks between diverse teams or onboarding new team members, ensuring sustained best practices requires strict design standards, documentation, error handling and auditing.
By eliminating the idiosyncrasies of each developer’s coding style, the generated code is consistent across the team and adheres to the same naming standards, resulting in ease of maintenance and future upgrades as well as quick on-boarding of new developers.
Automated culture of maintenance
Perhaps the most under-appreciated challenge for a data warehouse team is the ongoing operation, maintenance and upgrade of the environment.
Prepare for it now.
Data vaults, like data warehouses, require ongoing operations overhead to schedule, execute and monitor the data feeds—including handling failed jobs and restarts, while ensuring everything is processed in the correct order. Data vaults are also challenged by the added complexity of scheduling and management numerous data and processing objects.
Manual approaches are inadequate to address this. A particular challenge is that in manual deployment the necessary logging and auditing capabilities are often sidelined when projects fall behind schedule.
Capabilities in automation software, such as integrated scheduling tools and automated logging and auditing capabilities, help IT teams to meet the complexity and continuous need for operational attention head on.
Non-stop, high-speed change
Businesses are inundated with change—constant, rapid and unpredictable change—sometimes even before the first data warehouse iteration is rolled out. A key driver of the data vault model and methodology is to ease the problems associated with such ongoing change.
At the level of practical implementation, response to change first requires the ability to carry out extensive and effective impact analysis. What tables and columns will be affected by changing this code? What are the unintended, down-stream consequences? How can we reduce risk and, simultaneously expedite necessary change?
Documentation is supposed to provide answers, but the reality is that manual approaches to development are seldom accompanied by complete, up-to-date documentation.
Beyond the productivity and standardisation gains associated with eliminating the vast majority of hand-coding required to deliver a data vault, documentation automation may be the most visible and impactful contribution to a project seen by IT teams. With code and documentation tied to metadata, change management can be automated and reduced to hassle-free review rather than decoding ancient programming. Such metadata-driven automation is key to keeping pace with the ever more rapidly changing business needs.
During the last decade and a half, businesses have been gradually adopting the data vault model as a new foundation for their data warehouses. Its design and approach has been instrumental in successfully addressing the growing need for agility in business analytics and decision-making support.
However, many companies have found that the structural complexity of the model can challenge the IT teams charged with implementation. Automation software built to tackle data vault development, such as WhereScape Data Vault Express can improve collaboration between business and IT, boost developer productivity, increase organisational consistency and standardization, better position teams for change and help organizations reap the benefits of Data Vault 2.0 much quicker.
About the author Devlin
Barry Devlin has worked in the IT industry for more than 30 years, many of those years as a distinguished engineer at IBM. He is now founder and principal of 9Sight Consulting, specialising in the design and the human, organisational and IT implications of deep business insight applications.