The Computer Weekly Developer Network (CWDN) continues its Infrastructure-as-Code (IaC) series of technical analysis discussions to uncover what this layer of the global IT fabric really means, how it integrates with the current push to orchestrate increasingly cloud-native systems more efficiently and what it means for software application development professionals now looking to take advantage of its core technology proposition.
This piece is written by Stephen Manley in his capacity as CTO of Druva – a company known for its cloud data protection and management software, which it calls ‘data management as a service’ for data protection, governance and intelligence.
The full title of this piece was intended to be Infrastructure-as-Code: Speed, quality & resiliency and they form a clear path for virtualised application stacks – and now Manley writes as follows…
As part of a DevOps workflow, Infrastructure-as-Code (IaC) streamlines application development and deployment, improves product quality and enhances application availability.
Still though, it’s important to remember that the increased agility of IaC has the potential to break data management. There are three IaC data management patterns which fit different workloads, but they all have limitations. Now is the time to lay the foundation for the future – Data as Code.
The value of IaC
IaC brings version control to infrastructure. Teams can now ‘check in’ and ‘check out’ an entire environment – compute infrastructure, network configuration, application instances and more. In the cloud, I can now deploy my entire application with one command. Even better, anybody else can deploy a copy of my application stack. With IaC, teams can develop and deploy applications faster with better quality and higher availability.
Data requirements in an IaC environment
Applications, even in an IaC environment, need data to deliver value.
With data, however, comes the risk of security breaches, data loss and compliance violations. Ransomware attacks target data, not infrastructure. You can rapidly restore an IaC environment after a disaster or malicious attack, but recovering the data is a much bigger challenge.
Testers can near-instantly bring up the application, but it takes hours to create a test data set. Finally, courts expect organisations to be able to reproduce their AI/ML results – with the original infrastructure and training data.
Organisations will need to provision, secure and protect IaC data. Except they will need to do it more quickly, with better reliability, at a global scale.
Data Management in IaC environments
Today, organisations use three approaches to manage data in their IaC environments.
For analytics-oriented use cases, developers store data in the application containers, like they would in a VM. Since code and data are bundled, the organisation can run the container anywhere knowing it will have access to the data. The limitation is that the containers become bloated, making them difficult to transfer and modify.
Still, if you are running weeks of analytics against the same data set, this can be an appealing option.
For lightweight modern applications, organisations leverage external object storage and managed databases. By splitting the compute and data environment, the organisation can leverage their existing data management and not burden the IaC infrastructure. Unfortunately, it leads to silos that complicate self-service provisioning, application-consistent data copy creation and data migration. Fortunately, a well-designed modern application can mitigate those issues, making external storage an excellent option.
Companies modernising their applications with Kubernetes increasingly create data volumes using the Container Storage Interface (CSI). CSI volumes tightly couple the containerised application with the data, but leverage external storage, so that the container itself stays small. As a result, application owners can reap the benefits of self-service provisioning and application consistent data as well as a robust storage infrastructure. Of course, CSI volumes tie the compute and data layers together, which can constrain the flexibility of IaC. Still, for modernised applications, Kubernetes CSI volumes are the best choice.
Since there is no ‘one size fits all’ option, most organisations use a mixture of the approaches based on their application needs. Without a standardised approach, organisations struggle to monitor, secure and protect their data. Legacy approaches simply do not meet the requirements of modern IaC applications, so it is time to modernise data management and protection.
The future: Data-as-Code (DaC)
Just as IaC transformed how organisations manage their compute infrastructure, Data as Code will revolutionise data infrastructure. Just as they can now check in and check out an environment, they will be able to do the same with datasets.
To enable agility with control, data management will be run entirely by policy. Each data set will have an associated protection, retention, security and geographical policy. The data then will be near-instantly available to whatever user is allowed to access it. To implement the policies, the data infrastructure will need efficiency techniques such as deduplication, tiering and on-demand retrieval to deliver the near-instant data access that the IaC users have come to expect.
While Data as Code is still a nascent idea, now is the time to lay the groundwork for highly efficient global versioned data access.
IaC has emerged from the hype cycle and companies are now developing and deploying applications with more speed, quality and resiliency. The agility of IaC is accelerating data sprawl, which makes protecting and securing data more challenging. Even with guidelines for how to store data based on the type of application, organisations must evolve their data infrastructure to keep pace with the compute infrastructure. As we lay the groundwork for Data as Code, we are on the path to finally creating fully virtualised application stacks.