Auto-tech series - Civo: Why 'open' accessibility is critical for Machine Learning success

This is a guest post for the Computer Weekly Developer Network written by Josh Mesout, CIO at Civo – the company is known for its Kubernetes platform, its Platform-as-a-Service (PaaS) technologies and its wider approach to cloud-native deployment and management technologies.

Mesout writes in full as follows…

Interest in Artificial Intelligence (AI) is higher than ever.

The arrival of GPT-4 has brought interest in the benefits of AI to a fever pitch, with businesses investing significant capital to scope out how automation and AI can benefit the enterprise. These benefits vary, but some of the most prominent in the minds of executives include more efficient business processes and enabling the creation of new products and services. One of the most challenging areas of AI to implement is Machine Learning. In a world when many ML projects do not deliver and others fail to survive the journey from prototype to production. How can engineers and developers realise the potential of ML in the new era of automation?

Figuring out reconfiguring

The time taken for insight from ML projects remains a huge obstacle to success.

Google Research found that using current methods, eight hours of ML engineering requires a huge backend of preparation: 96 hours of infrastructure engineering, 32 hours of data engineering and 24 hours of feature engineering. In effect, this means just 5% out of 160 hours total work is actually spent on ML engineering.

Developers are left to pour vast amounts of time into managing and reconfiguring complex components across their infrastructure. This can leave ML a closed shop for smaller firms. According to Anaconda’s State of Data Science 2022 report, 65% of companies lack the investment in tooling to enable high-quality ML production.

The role of open source 

The demanding nature of running ML is bringing open source approaches to the fore to start cutting down on this complexity and reducing barriers to entry. For smaller firms, it delivers a cost-effective and resource-efficient way of running ML algorithms. Indeed, many businesses simply do not have the time to invest two months getting up to speed with platforms like AWS SageMaker before accessing ML insights.

Open source tooling is also far more in demand and of superior product quality than proprietary alternatives. Free from proprietary dependencies, open source tooling can be readily adjusted for specific cases and, therefore, cut down on the complexity in extracting insights from ML.

Crucially, open source offers businesses a way to tap into the best available ML expertise. Just look at one of the most popular ML toolkits, Kubeflow. It helps organisations deploy and run ML workflows on cloud-native Kubernetes infrastructure and regularly receives contributions from leading lights in the industry. The latest release, Kubeflow 1.7, received code contributions from over 250 people from across the world of tech.

This gives firms access to advanced domain expertise that may have otherwise been out of reach. 

Interoperable tooling, please

If we are to drive continuous adoption and accessibility of ML, we need to work together as a community to build a thriving open source cloud ecosystem.

This begins with interoperable tooling. Developers do not want needlessly complex tooling. They want tools that are familiar, don’t require huge onboarding processes to get algorithms up and running and leave them well-placed to start realising value from AI.

There are also a range of technical solutions to ML’s problems. One particularly promising area is GPU Edge boxes. These allow ML to run effectively across a range of different use cases, supporting businesses where security or regulatory requirements mandate workloads being kept in-house. GPU instances themselves are built on fast launch times, bandwidth pooling and transparent pricing. This gives firms a way to rapidly get up and running with ML, without any damaging surprise costs.

There is a huge opportunity with ML.

By empowering the developer community with the open source tooling they need, anything is possible.

Data Center
Data Management