AI developer toolset series: Paperspace CEO on why is abstraction key to training machine brains
The Computer Weekly Developer Network is in the engine room, covered in grease and looking for Artificial Intelligence (AI), Machines Leaning (ML) and Deep Learning (DL) tools for software application developers to use.
This post is part of a series which also runs as a main feature in Computer Weekly.
With so much AI, ML & DL power in development and so many new neural network brains to build for our applications, how should programmers ‘kit out’ their AI toolbox?
How much grease and gearing should they get their hands dirty with… and, which robot torque wrench should we start with?
This is a guest post written in its entirety by Dillon Erb, CEO of Paperspace — a high-performance GPU-accelerated cloud computing and Machine Learning (ML) development platform for building, training and deploying machine learning models. .
Erb contends that Deep Learning (DL) development is easier than classical Machine Learning (ML) and writes as follows…
One of the key advantages with deep learning is there is very little ‘feature engineering’ required, which means a lot less hand-coding.
With a classical ML approach, laborious hand-crafted features were application specific, so features that worked in computer vision were completely useless in (for example) Natural Language Processing (NLP).
Today, with DL, which works on raw data, these different problems — NLP, image recognition etc.— become more similar in terms of the code itself.
Thus, doing simple tasks with known or structured data in a Proof of Concept (PoC) phase is surprisingly easy.
But, working with unstructured data and stitching together or gluing disparate systems requires a combination of AI expertise as well as systems, networking, data, security, which is an entirely different domain.
This makes the move from R&D to production a very difficult task for most teams.
Abstraction: key to training DL models
Much of the early work in AI has gone in to adding higher levels of abstraction to the underlying math libraries (CUDA, Intel’s MKL-DNN, etc). Frameworks such as PyTorch, TF, MXNet etc. are much higher order and let a developer train a powerful deep learning model in less than 100 lines of code.
More recently there has been a large amount of effort invested in so-called “AutoML” systems that can automate even the higher-level parts of developing ML pipelines.
Having a defined ecosystem is particularly hard today because in the early days of any technology cycle there is a lot of churn — meaning that the go-to technology stack changes relatively quickly and is not fully agreed upon.
Certain patterns have emerged for sure, i.e. Jupyter Notebooks for — exploration, batch job-running architectures for large-scale model training etc. But the particulars of the ecosystem are not fully defined. We are seeing a lot of growth in PyTorch for example and tools like ONNX which let ML models be more reusable.
But open source offerings will largely be sufficient.
As me move up to enterprise there will be an amount of scrutiny on the tools and particular requirements that will likely necessitate an investment in proprietary tools in some areas of the stack or at least a desire for commercial-grade support.
Benchmarking abstraction tools
Benchmarking tools would certainly be ideal. However, there are so many different parameters to judge a tool stack that each use case might have its own metrics. That is, a developer tool must be expressive and powerful, whereas reliability and determinism might be more highly valued in certain contexts (healthcare, cybersecurity, banking, etc).
AI developers end up spending a unreasonable portion of their time on training setup and execution time due to a severe lack of abstraction tools.
Because these are such highly paid engineers, this poses a massive burden on the industry. Any tools that can enable devs to become more efficient and productive has potential.
Quantifying/benchmarking this is not easy but we can intuitively say that tools like Stripe, GitHub etc. increase developer velocity by some meaningful percent.
A concrete example would be distributed training: setting up core infrastructure to enable distributed training is incredibly low-level and frankly not possible for most developers but the performance wins are huge.
An abstraction layer (platform) that gives this capability to every developer will be very powerful.