Encord is data-centric computer vision company.
The firm’s Encord Active is a free open source industry-agnostic (Ed: they mean its works in all verticals, presumably) toolkit that enables machine learning (ML) engineers and data scientists to understand and improve their training data quality.
It is also meant to help boost model performance.
The company says that for many use cases (self-driving cars and diagnostic medical models being prime examples), AI suffers from a ‘production gap’ between successful proof-of-concept models and models capable of running ‘in the wild’ i.e in the real world.
Proof-of-concept models perform well in research environments but struggle to make predictions accurately and consistently in real-world scenarios.
This gap is due to issues of model robustness and reliability which have hindered the widespread adoption of AI.
With Encord’s open source toolkit, ML engineers can bridge this gap using a new approach for investigating the quality of their data, labels and model performance. Data and label errors can severely impact a model’s performance, so continuously evaluating and improving training datasets is critical for ensuring high-quality predictions.
What is an AI label?
As TechTarget reminds us, “Data labeling is used when constructing ML algorithms for autonomous vehicles. Autonomous vehicles such as self-driving cars need to be able to tell the difference between objects in their course so that they can process the external world and drive safely. Data labeling is used to enable the car’s artificial intelligence (AI) to tell the difference between a person, the street, another car and the sky by labeling the key features of those objects or data points and looking for similarities between them.”
Encord’s new tool claims to give machine learning teams the power to find failure modes in their models, prioritise high-value data for labeling and drive smart data curation to improve model performance.
What is active learning?
Active learning, a process for training models in which the model asks for data that can help improve its performance, has gained traction as a theory among researchers, start-ups and enterprises.
Smaller AI companies, however, have not yet been able to implement usable active learning techniques. Encord Active allows companies of all sizes to move from theory to implementation by providing a new methodology based on ‘quality metrics’. Quality metrics are computed indexes added on top of your data, labels and models based on human-explainable concepts.
“As many ML engineers know, the performance of all models depends on the quality of their training data. Encord Active is first and foremost a framework built to help machine learning engineers understand and improve their data quality iteratively and effectively, ” said Eric Landau, co-founder and CEO at Encord. “We want to contribute to the progression of the computer vision space as much as possible, so making Encord Active open source was a no-brainer.”
The quality metrics approach focuses on the automatic calculation of characteristics of images, labels, model predictions and metadata. ML teams are then presented with a breakdown of their data, label distribution and model performance by each metric.
“According to Landau and team, “Encord Active is also the first tool to provide actionable end-to-end active learning workflows to create an environment where models can continuously learn and improve, similar to how humans do. Within the Encord ecosystem, users can not only find valuable data to label and find label errors to re-label but also complete the workflow cycle to fix these issues.”
Current active learning methods rely on ML engineers building their own tools and creating their own versions of quality metrics, making the process a time-consuming and expensive approach. Encord Active removes that work by automating computation of an assortment of pre-built quality metrics across the data, labels and model predictions.