Delphix: managing your data like code (Project Titan)

This is a guest post for Computer Weekly Open Source Insider written by Derek Smart in his role as senior staff engineer at Delphix.

As noted here on TechTarget, instead of working on live datasets, Delphix provides the ability to automate the process of creating a virtual instance of a database.

Users (or developers) can then access or perform analytics on those virtual databases to avoid performance hits that might result from working on live data. The Delphix database virtualisation approach is thought to also reduce the need for additional hardware to support multiple copies of the data.

Smart writes as follows…

It’s a core truth, software developers no longer work alone.

Despite the rise of the COVID-19 (Coronavirus) pandemic at the time of writing, we know that teams have evolved many different ways to be efficient and reap the benefits of having multiple perspectives to tackle a problem.

Regardless of whether developers work on distributed teams throughout the world or pair-program on the same computer, they all manage their code using version control. Life before version control was messy, difficult and led to individual developers working on code in isolation. Then, Git emerged as the de facto way developers manage and version their code, but applications need more than just code to work.

Titan for the data lifecycle

Developers need data to run their applications throughout the entire development lifecycle.

With Titan, they can manage the data for their application as they would with Git.

The current problems developers face with their application data are similar to the days before code was under version control. These challenges include using out-of-date local copies, the inability to share their data sets with other developers… and time-consuming processes to work around the issues with their application data.

Git has been fundamental in allowing application code to be better managed by a team rather than an individual. Each member has the ability to undo mistakes, maintain code history and experiment with new code.

Git also enables teams to create many different workflows to increase productivity. Workflows can be as basic as a single branch with multiple commits, allowing developers to easily share their work and navigate through a range of commits. Collaboration workflows can be as complex as teams need them, and there are some notable ones like feature branching or Gitflow.

Only then can teams determine how best to collaborate, share their work and ultimately release the product to the end user.

As with code, it is in data

Likewise, developers need the ability to control their data in that same way when fixing a bug or creating a new feature. The data inside a database should easily be administered to undo mistakes, maintain a version history, and allow for experimentation. With Titan, developers can manage data just like code.

Setting up: `git init` is the basic command that creates a new code repository in your current working directory. Git initially uses local storage to create the repository. Later on, it can push/pull to a remote repository. `titan install` is the equivalent command to setup the storage for a Titan repository.

Titan repositories rely on different storage than Git does. Since Titan uses containerised databases for the data, the storage can be Docker volumes or Kubernetes persistent volumes. After the storage context is created, `titan run` starts the containerised database server pulling the image from any container registry.

git init

titan install

titan run

titan migrate

titan cp

Creating & managing versions

The main workflows for proper version control revolve around repeated use of `git commit` and `git checkout`. Commits are unique states of your code that are identified by a unique hash.

Checkout is used to revert, rollback, and change your local application state for different development tasks. Titan allows you to do the same for your application data with `titan commit` and `titan checkout`— allowing developers to easily keep their application code and application data in the same state.

An easy way to manage and group commits is with tagging. Tags allow a developer to identify and manage commits in a more readable manner instead of relying on IDs that are uniquely generated and not easy to remember.

git commit

git checkout

git branch

git tag

titan commit

titan checkout

titan tag

Collaborating

With Git, individual developers are able to make code changes and then share them with other team members for collaboration, review, and even automated testing. `git remote` are the commands that allow sharing commits with remote repositories via `git push` and `git pull`. Titan also allows data states to be shared with `titan remote` commands. New team members can put a repository on their local machine with `git clone` and `titan clone`.

git remote

git push`

git pull

git clone

titan remote

titan push

titan pull

titan clone

Writing code, not mocks

Developers have been reaping the benefits from version control for application code, but the data layer of their application is largely still an afterthought — remaining rigid and resistant to speed, self-service control, and automation.

Having access to high quality, accurate, portable data sets that are properly versioned allows developers to spend their time writing code and not writing mocks.

True Agile software development is only possible when the data itself is managed like code.

Derek Smart, quite smart, for sure.