Using the now well-seasoned example of how computer vision systems learn, Talaga reminded us that a human infant might typically only need to see four or five dogs to be able to recognise a dog in the future.
But, as we know, training a computer to recognise a dog in an image – and eradicate false positives (computers can mistake a dog for a fox, or a coyote, a wolf, a dingo or a jackal) – is likely to require large data sets of hundreds of thousands of images.
He thinks that no matter how much technology improves, we’re unlikely to ever be able to train a computer vision system in the same way as a human baby because of the unacceptable margin for error which would be present.
So what’s the secret to feeding computer vision systems properly? Talaga says that when training machine systems, data integrity is key.
“Especially when data sets are often pulled from the Internet, appropriate data governance processes must be put in place to ensure the integrity of data used for learning and limit access to the data to those who need it. Because they are so mission critical and can be time-intensive to assemble, huge data sets used for training machines can also be very valuable – meaning that data security must also be a key consideration, particularly when companies and nation states are engaged in an AI arms race and looking to gain any possible advantage,” said Talend’s Talaga.
Talend cites a working example where AI has had an advantage over humans.
Farmers have a constant battle to recognise and treat weeds which harm their crops and reduce yields. Weeds can be very similar in appearance, so much so that humans can struggle to distinguish different varieties and therefore identify the most effective and least disruptive treatment.
Bayer Digital Farming has been working with Talend to build a computer vision system called Weedscout which enables farmers to send in photos of weeds and receive an immediate answer on weed type.
“Talend processes and combines farmers’ photos, information from farmers about the weeds and their crops along with geolocation information from the mobile device, plus other bundled XMP metadata… and stores this data in the photo database. Talend also sends this data to an image-recognition module, which uses neural networks trained by a database of weed photos. When a farmer sends in a photo, the app identifies it and sends the result back to the farmer’s mobile device, usually in less than a minute. The app can recognise nearly 70 different varieties of weeds by matching them with its database of more than 100,000 photos,” said Talaga.
The technology used for weed-image recognition is based on self-learning algorithms.
To help ensure the app’s answers are comprehensive and error-free, the image database must be fed with further weed images. To date, there are 100,000 photos uploaded in the database on a private AWS cloud, with 70 different varieties of weed identified to help farmers increase yields and profits through computer vision.