Modern development - Paperspace: MLOps sharpens the new toolstack

This series is devoted to examining the leading trends that go towards defining the shape of modern software application development.

As we have initially discussed here, with so many new platform-level changes now playing out across the technology landscape, how should we think about the cloud-native, open-compliant, mobile-first, Agile-enriched, AI-fuelled, bot-filled world of coding and how do these forces now come together to create the new world of modern programming?

This contribution comes from Daniel Kobran, co-founder of Paperspace — the company is known for its GPU cloud tools built for developers that it says power next-generation workflows and the future of intelligent applications.

Paperspace is a term used in CAD to refer to the window into a 3D world… the company says that it’s a metaphor for the notion of a portal into the limitless power of the cloud.

Kobran writes as follows…

MLOps is the inevitable confluence of DevOps and Machine Learning. Why inevitable? For years the machine learning world has operated in a manner reminiscent of siloed software development patterns from the 1990s — there is practically zero automation, collaboration is a mess, pipelines are hacked-together with brittle scripts, visibility is practically nonexistent… and CI/CD is an altogether foreign concept.

MLOps represents a schema to overcome entropy and has brought a new wave of change to machine learning developers and enterprise teams.

So why is MLOps the topic on everyone’s mind? Why does MLOps matter? And what does the trend of MLOps suggest for the future of enterprise machine learning?

Software toolstack tools

Let’s face it, there are a lot of tools in the average software toolstack.

Software engineering teams rely on dozens of deployment tools for each release — from source code management systems to automated testing suites, performance monitoring tools and event-based alert systems. Everybody recognises these components of the deployment toolstack as essential blocks in the software development cycle — but it wasn’t always the case. Now, popular tools such as GitHub, CircleCI, Jenkins, Docker, NewRelic and an endless number of equivalents and contemporaries are in wide use around the world.

Pure software concepts like these can and should be applied homogeneously to machine learning. In machine learning, you still need source control for your code. You still need to monitor data pipelines, containerise applications, provision machines and test deployment endpoints. Machine learning may demand greater scale and more complex manipulation (especially with data), but the primitives are the same and so are the benefits of pipeline automation.

The 4-components of successful MLOps

MLOps has a model shape all of its own.

MLOps is a set of practices that provide determinism, scalability, agility and governance in the model development and deployment pipeline. This new (and essentially very modern) paradigm focuses on four key areas within the model training, tuning and deployment cycle: machine learning development must be reproducible, it must be collaborative, it must be scalable and it must be continuous.

So let’s look at each of these in turn.

Reproducibility

Reproducibility means the ability to reconstruct a previous machine learning model within a few percentage points of accuracy in order to improve it or to communicate its methodology to internal, external, or regulatory stakeholders. This requires traceability for inputs, which may include dataset, code commit, dependencies and packages, driver version, low-level libraries, container or runtime, parameters used to train the model, training hardware specifications and specific ML inputs such as initialisation of layer weights.

This is not a trivial task!

Collaboration

For a single developer working on a single model, this area of the development cycle is irrelevant. But for teams that are scaling both team members and productionalised models, the system will fail extraordinarily quickly without good collaborative processes in place.

Successful collaboration for ML teams requires a unified hub where all activity, lineage and model performance is tracked. This includes the full stack (from concept to R&D and through to production) and requires visibility of notebooks, training runs, hyperparameter searches, visualisations, metrics, datasets, code references and model artifacts.

That’s a lot of surface area that needs to be shared!

Scale

It is possible, but not advisable, to ask each machine learning engineer to master infrastructure deployment.

Because of the massive datasets and expensive compute requirements in machine learning, infrastructure management is a black hole of complexity that can consume a team’s productivity. Far better is to make compute infrastructure available on-demand and preconfigured for each member of the data science team.

If each person can self-provision preconfigured resources, the team can nearly eliminate scale-related provisioning bottlenecks.

CI/CD

The most important feature of great MLOps is a CI/CD pipeline for model development.

Pushing code to GitHub should trigger automatic compiling, testing and deployment — and this process should be the same for every member of the team in pursuit of perfect determinism. Through standardisation of the ML lifecycle it’s possible to increase model velocity output manyfold.

The future of MLOps?

There is a great amount of pressure coming to standardise machine learning lifecycles.

In a few short years the thought of manually training and deploying models will seem like a preindustrial curiosity. The lessons and legacy of the software development industry push toward DevOps years ago will make this transitional period in machine learning fly by — faster than anyone (or model) could predict.

For all of our sakes, please don’t be a preindustrial curiosity exhibit.

Image source: Paperspace