Percona engineering lead: Into the ‘open’ database universe
This is a guest post written for Computer Weekly Open Source Insider written by Barrett Chambers in his capability as director for solutions engineering at open source database specialist company Percona.
Suggesting that there are perhaps three key trends we need to think about in the world of database deployment today, Chambers points to open source database adoption, data sovereignty and the negotiation of cloud workloads (hint: Kubernetes) as primary drivers.
Chambers writes in full as follows…
Trend 1 – open source
The first trend we need to talk about is open source. The majority of developers use open source technology, but there are some nuances here. I feel you should have an ‘open source first’ methodology.Using proprietary software can be the right thing for you, but it can also lead to a lot of unnecessary costs. For example, NASA was found to be overspending on its licensing costs for its Oracle licensing by an estimated $15 million… and that was just to avoid going into a full-scale audit. The result of this is that companies will spend more on licenses than they need to, which means that budgets are then not going into areas where you actually need that money in order to meet your business objectives.
Alongside this, you can see that a lot of companies are focused on reducing their costs right now, based on the current economic climate. This has always been a big factor in why people consider using open source – in a survey we ran, 80% of the people who responded said that moving to open source technology was driven by the potential to save on costs. However, it’s not the only reason – we’ve also got freedom from vendor lock in. Some 62 percent of respondents said vendor lock in was an important reason for migrating to open source. Companies – and tech professionals themselves – don’t want to be too dependent on any one provider, in case that comes back to bite them in the future.
Another reason is the community that exists around an open source project – some of these open source databases have a huge community of contributors, as well as open source extensions that you can use for backups and high availability. This makes it easier to get help with implementations or with specific tooling to fill a need. If you have that need, doubtless someone else will have had that need in the past as well.
To make the most of this, establish an open source software adoption strategy internally. This means confirming what your approach to software is and how this will run in practice. Rather than giving developers carte blanche to pick tools they like, there should be some thought and justification that goes into those decisions over time. Alongside developers evangelising a particular open source project, they will also have to make sure they have confidence in the long term future for that project, for example, having those high availability tools, robust backup tools and so on.
I would also say that data-developers need to prioritise Postgres. At Percona, we support MySQL, MongoDB and PostgreSQL. We started around MySQL and it is our largest customer base. However, what we’re seeing is a trend for people to want to choose PostgreSQL instead of MySQL or MongoDB. The community around Postgres is a large one and there are lots of great tools developed by the community that are well-supported.
Organisations also have to look at how to evaluate the return on investment they get from integration work. We have talked to a lot of customers at Percona who are looking to migrate away from their proprietary backend software, but it’s not always feasible for them to do so. Sometimes you have to leave those systems in place if they have significant amounts of embedded code. If that amount of code is large – say thousands of lines – it can take months or years to migrate off that system and then it may not be worth the cost to move. Instead, it may be better to bring in other applications over time that can run in parallel and then let the original one slowly die off.
In order to make this decision, you will want to evaluate and compare the ROI you would get from migrating your existing application and working on that code to what it would cost to run in parallel. This will be what helps you make sure that whatever path you take is a good step forward.
Trend 2 – data sovereignty & repatriation
The next major trend is around data sovereignty and the repatriation of cloud workloads. Over the past 10 years, we’ve seen many companies choose to run their applications, databases and service technologies in the cloud. These are great for getting started fast. However, these options don’t necessarily translate to every on-premises deployment or support those hybrid cloud strategies. For those in charge of IT, data sovereignty and industry compliance factor into their decisions around designing IT architectures.
Cloud providers can and do handle a lot of those issues for you. But there are more rules and regulations to follow around data location and these can lead to more compliance concerns. Where there are some regional considerations like the European Union with GDPR, there are also specific countries like Germany, for example, which has specific laws on privacy and data security.
In response to these rules, you have to be agile enough to deploy your application infrastructure and your database stack where it suits you, rather than relying on third-party providers to deliver. Taking a platform engineering approach supports this – Gartner predicts that 80 percent of software engineering audit organisations will establish platform teams by 2026, using self-service portals that can deliver what developers use to build their applications. This platform engineering approach essentially brings the user experience from cloud service providers internally into your own organisation. For developers, this makes it simple for them to just click deploy and they can automatically get their own instances and with the right database backup approach in place and integration with the other application backend services.
To manage these challenges, first establish a platform strategy. If you haven’t already done so, consider what tools you will use and what to prioritise first. Secondly, look at your regulatory compliance requirements and what data you have to store. If you only have customers in one country like the United States customers here in Florida for example, it might not make sense to start creating a cloud-native, deploy anywhere solution for your infrastructure when you can use a cloud service provider and start with Database as a Service (DBaaS). However, if you have operations in more than one country or have to handle more sensitive data, then you may have to adopt or change your strategy over time.
Third, move to a deployment-agnostic approach. Rather than depending on a particular provider to run your workloads, how can you adopt technology that you can choose where to run? Adopting a container-based deployment methodology allows you to do this, while using Kubernetes can make it easier to manage and orchestrate these containers at scale. Lastly, looking at your cloud workloads can also help you manage your costs.
On smaller workloads, cloud deployments can be very cost-effective. However, as you scale up, the costs can escalate. For example, you may have to move to a larger cloud instance to keep up with I/O or processing demand and this can be expensive as you move from instance to instance. Rather than simply moving up a tier every time, compare your costs here to other options that you could use instead, or look at whether your workload is actually still set up in the right way around queries and indexes. What might seem like a huge amount more work could be a change in your query needs that you have not yet optimised for.
Lastly on this, repatriation – moving that workload or data from one service to another – is a move you will have to consider, even if you are completely satisfied with the cloud service you use. You never know what might happen in the future, from a change in business circumstances to a disaster event that you could not predict. Getting data out of one service and into another is part of any disaster recovery plan, so you should investigate any data egress costs you might have to pay. If you are already committed to one service, this may be a cost you have to factor in for your future planning; if not, then you should be aware before you sign any contract.
Trend 3 – Kubernetes
Deploying applications in the cloud started off by following the same rules as traditional on-premises tiered application deployments. The Cloud Native Computing Foundation found that 96% of organisations are either adopting or have already adopted a Kubernetes strategy. However, as more developers used cloud-native technologies for their applications, other elements of the application infrastructure followed suit. This includes databases.
First off, it’s important to recognise that databases normally use persistent storage to store their data over time. In comparison, tools like Kubernetes sprang up to manage workloads that were not persistent and where stateless workloads could be created at any time to keep up with demand. To run stateful databases alongside stateless application containers, you have to bridge that gap. Kubernetes operators were created to step into the breach, ensuring that you can have persistent volume claims for your container images. This allowed developers to build the services they need from their databases on top of those container images.
So why isn’t everyone moving their databases to Kubernetes, making this the de facto way to deploy already? The biggest reason, according to the CNCF report, is lack of in-house skills and manpower. Some companies [Ed: and we can say that that list might include CBOE, Amazon and Percona] have made things easier to get started so you can build up your internal team to deploy these open source databases on Kubernetes-based backends.
Kubernetes operators can get you started with managing your containers, but that is not all they can provide. Look at the community and the tools that are available to provide those enterprise-level features you need – do they support or integrate with your Kubernetes operator? Or can you pick an operator that already includes and supports those features as standard? Either way, this can make it easier to adopt databases on Kubernetes and run them in production.