Prostock-studio -

Inside Alibaba’s NLP labs

Alibaba is among a growing crop of technology companies that are rising to the challenge of solving the toughest problems in natural language processing

At Alibaba’s natural language processing (NLP) research facilities spread across the US, China and Singapore, some of the world’s top researchers are solving the most challenging problems in the field of artificial intelligence (AI).

In March 2020, the researchers achieved a significant breakthrough by topping the Glue benchmark that assesses the performance of NLP models in sentence- or sentence-pair language understanding tasks.

These models enable text to be represented semantically as continuous vectors so that meanings and relationships between sentences can be computed to perform sentiment analyses, offer product recommendations, and to power chatbots and search engines.

“When you enter a query, you want to know what the question is about so you can provide more accurate answers,” said Si Luo, head of Alibaba’s NLP research. “That will help to improve customer service and clickthrough rates of search results.”

Alibaba aren’t the only ones honing their NLP capabilities, with other technology giants such as Baidu, Microsoft, Google and Ping An doing the same. Alibaba has since been overtaken by Ping An and Baidu in the Glue benchmark, though the performance scores of their NLP models remain close.

In an interview with Computer Weekly, Seattle-based Si said although the benchmark is an important test of Alibaba’s NLP capabilities, just one researcher has been tasked to work on it on a part-time basis. “Our major task is still to use our technical development to support Alibaba’s business,” he said.

Si was a tenured professor of computer science at Purdue University. While he has earned several academic accolades, including a career award from the US National Science Foundation, he wanted to do something with real-world impact.

“When you develop an algorithm, you want to show that it can benefit hundreds of thousands of customers. Publishing academic papers won’t help you achieve that,” he added.

Alibaba’s growing technology clout and footprint spanning e-commerce and logistics to financial services and cloud computing have attracted researchers like Si, who singled out Alibaba’s “big platforms for NLP and artificial intelligence” as a big draw. The company also taps fresh talent from top universities such as Tsinghua and Peking in China, and Princeton and Duke in the US. 

One of the beneficiaries of NLP is translation systems. To facilitate cross-border e-commerce, Si said Alibaba has built a translation platform to overcome the language barrier.

“As you can imagine, we want to do cross-border business, so if the information about products is originally in Chinese, we need to translate the information into Russian if the products are to be sold in Russia,” Si said.

In the past 18 months, Alibaba has also started building domain-specific NLP capabilities to handle the lexicon used in different fields. To process the data, it uses techniques such as word segmentation and named entity recognition, before applying other “vertical technologies” to analyse e-commerce revenues, contracts and electronic medical records, among other types of information.

More recently, in the fight against the Covid-19 pandemic, Si’s team developed models that were used in the text analysis of medical records and in epidemiological investigations conducted by China’s Centre for Disease Control and Prevention in several Chinese cities.

“We’ve optimised our machine translation techniques in the medical domain, to enable doctors around the world to communicate with each other through DingTalk,” he said. “For example, the service can translate between Italian and Chinese so doctors in the two countries can share their experience in fighting Covid-19.” 

But despite the promises of NLP in breaking down language barriers in commerce and expanding access to the world’s information, it is still challenging to codify the cultural nuances of human language which continues to evolve over time.

To improve the accuracy of machine translations, computers can be programmed to compare between machine and human translations. “This can be done automatically, so if the machine translation overlaps a lot with the human translation, then it’s good,” he said.

Human translators can also help determine if a translation is accurate and have their translations parsed by NLP algorithms for a representative sample. In cross-border e-commerce, this helps to speed up translations of large corpuses of product information, Si said.

“In this case, we can improve the translation quality for hundreds and millions of products by only utilising a small set of human translations,” he added.

Read more about NLP in APAC

Read more on Artificial intelligence, automation and robotics

Data Center
Data Management