Pixel-Shot - stock.adobe.com
Alibaba’s artificial intelligence (AI) algorithm has outperformed humans in answering image-related questions, taking pole position in the global VQA (visual question answering) challenge for the first time.
The annual VQA challenge pits AI algorithms from the likes of Facebook and Microsoft against each other in providing natural language answers to an image and a related question, such as picking out the number of children in a photo of a room of people.
This year, the challenge contained more than 250,000 images and 1.1 million questions. Alibaba’s AliceMind algorithm recorded an 81.26% accuracy rate in answering questions related to images, higher than the 80.83% accuracy rate recorded by humans.
The algorithm was designed by Alibaba’s Damo Academy research group using a slew of proprietary technologies, including diverse visual representations, multimodal pretrained language models, adaptive cross-modal semantic fusion and alignment technology.
This has enabled Alibaba to make significant progress in not only analysing the images and understanding the intent of the questions, but also in answering them with proper reasoning while expressing it in a human-like conversational style.
The VQA technology has already been widely applied by Alibaba. For example, it has been used in Alibaba’s intelligent chatbot called Alime, which is used by tens of thousands of merchants on Alibaba’s retail platforms.
“We are proud that we have achieved another significant milestone in machine intelligence, which underscores our continuous efforts in driving the research and development in related AI fields,” said Si Luo, head of natural language processing (NLP) at Alibaba Damo Academy.
“This is not implying humans will be replaced by robots one day. Rather, we are confident that smarter machines can be used to assist our daily work and life, and hence people can focus on the creative tasks that they are best at,” said Si.
Si added that VQA could also be used to search for products on e-commerce sites, analyse medical images to diagnose diseases, as well as in smart driving where basic analysis of photos captured by the in-car camera could be performed.
Alibaba has been making waves in NLP. In March 2020, it achieved a significant breakthrough by topping the Glue benchmark that assesses the performance of NLP models in sentence or sentence-pair language understanding tasks.
These models enable text to be represented semantically as continuous vectors so that meanings and relationships between sentences can be computed to perform sentiment analyses, offer product recommendations, and power chatbots and search engines.
In 2019, Alibaba’s model also exceeded human scores when tested by the Microsoft Machine Reading Comprehension dataset, one of the AI world’s most challenging tests for reading comprehension. The model scored 0.54 in the MS Marco question-answer task, outperforming the human score of 0.539, a benchmark provided by Microsoft.
In 2018, Alibaba also scored higher than the human benchmark in the Stanford Question Answering Dataset, also one of the most-popular machine reading-comprehension challenges worldwide.
Read more about AI and machine learning
- A new generation of farmers is tapping the internet of things and machine learning to operate self-sustaining urban farms with minimal supervision.
- University of Sydney and quantum control startup Q-CTRL have developed a new way to reduce quantum computing errors using custom machine learning algorithms.
- Google Cloud is working with Singapore’s national AI programme to build up the country’s talent pool in machine learning and AI.
- The annual Wimbledon tennis championship is an event IBM uses to showcase the insights possible using Watson-based artificial intelligence applications.