This Computer Weekly Developer Network series is devoted to examining the leading trends that go towards defining the shape of modern software application development.
As we have initially discussed here, with so many new platform-level changes now playing out across the technology landscape, how should we think about the cloud-native, open-compliant, mobile-first, Agile-enriched, AI-fuelled, bot-filled world of coding and how do these forces now come together to create the new world of modern programming?
This contribution comes from Ariel Assaraf in his role as CEO of Coralogix — the company is known for its technology that maps software flows to automatically detect production problems and delivers ‘pinpoint insights’ for log analytics.
Assaraf writes as follows…
In 2009, high availability, seamless deployments and machine images exploded onto the software engineering scene. In response, Amazon, Google and Microsoft released a vast array of options and engineering took a huge leap forward.
Overnight, you could create a database with automatic backups, cross-site failover, seamless upgrades and much more. We had arrived at our DevOps paradise, but a new destination already loomed on the horizon.
But what was so special about 2009?
Patrick Debois ran the first ‘DevOps Days’ meetup and changed how we build and run software for the next decade. Cloud tooling was forced to step up its game. It democratised ‘production-readiness’ and gave everyone (with a credit card) the power to build the next Netflix.
So we got everything we wanted, right?
Right, but now there’s a new target. Let’s look at an example. Elastic Kubernetes Service (EKS) is a managed Kubernetes offering from Amazon. Running a Kubernetes cluster requires some skill, so a managed solution is a great option for organisations that don’t want to build the next Netflix. But, if I wanted to maintain a 99.95% uptime, which button do I press?
The truth is, this capability is hidden inside the Kubernetes universe. If I wish to ensure that my application bounces back in 30 seconds, there is a laundry list of values to configure. I have to dive into the exact detail I sought to avoid. We’re closing the chasm between dev and ops, but a new one has opened. Our new challenge is between our contracts and our code. We asked for a self-service IT department and the cloud-delivered.
Now, we need more. We need automated compliance. So what can we do?
There are exciting changes coming in our cloud platforms, but first, we need a mindset shift. We can begin by making sure that we have a shared understanding of the importance of each component of our system. For example, if our service processes orders, what is the average value of those orders over an hour? This gives us some idea of the cost of failure.
Secondly, we can make engineers aware of the incentive behind this new release. If we believe that our new feature is going to bring in $10,000 a week, then delaying the change by a week costs $10,000. This is our cost of delay. Weighing off these two measurements enables engineers to make an economic decision, rather than a purely technical one. A pattern like this is what Donald Reinersten calls a decision rule.
Some companies go a step further with error budgets. Calculating an error budget is simple.
For example, if we have an SLA (or SLO) of 99.95% uptime in a given year, that leaves us with 4.38 hours of downtime to work with. Downtime can be calculated automatically, from a variety of sources, such as logs and metrics services. If we have accrued 4 hours of downtime, we should scrutinise our deployments. If we have 30 minutes of downtime, we may want to take risks. This gives us a data-driven, contractual approach to risk, rather than the ‘gut feeling’ or isolated engineering criteria that rules these situations.
Closing the gap
The next big step in cloud tooling is going to be monumental, but before we embark on this new adventure, we need to bridge this gap between our contracts and our code. Engineering, taking ownership of every part of the system, making economical and technical decisions. With this outlook, we can unlock another level of DevOps success.