aphotostory - stock.adobe.com

Alibaba debuts LLMs for Southeast Asia

Alibaba’s SeaLLMs are built to address the linguistic diversity and nuances in Southeast Asia, enabling businesses to deploy localised chatbots and translation applications

Alibaba has launched a slew of large language models (LLMs) that have been trained on Southeast Asian languages including Thai, Lao, Khmer, Burmese, Vietnamese, Indonesian, Malay and Tagalog.

Developed by Alibaba's Damo research arm, these so-called SeaLLMs were designed to cater to the linguistic diversity of Southeast Asia, enabling businesses to leverage chatbots that not only comprehend but also reflect the social norms, customs, stylistic preferences and legal considerations in the region.

The SeaLLMs have been open-sourced on Hugging Face, with released checkpoints and available for research and commercial use.

“In our ongoing effort to bridge the technological divide, we are thrilled to introduce SeaLLMs, a series of AI models that not only understand local languages, but also embrace the cultural richness of Southeast Asia,” said Lidong Bing, director of the language technology lab at Alibaba Damo Academy.

Luu Anh Tuan, assistant professor at Nanyang Technological University’s computer science and engineering school, said Alibaba’s efforts to create a multi-lingual LLM are impressive.

“This initiative has the potential to unlock new opportunities for millions who speak languages beyond English and Chinese. Alibaba’s efforts in championing inclusive technology have now reached a milestone with SeaLLMs’ launch,” he added.

Alibaba said a notable technical advantage of SeaLLMs is their efficiency, particularly with non-Latin languages. The models can interpret and process up to nine times longer text, or fewer tokens for the same length of text, than other models such as ChatGPT for languages such as Burmese, Khmer, Lao, and Thai. This helps to reduce operational and computational costs, with a lower environmental footprint.

“Furthermore, SeaLLM-13B, with 13 billion parameters, outshines comparable open-source models in a broad range of linguistic, knowledge-related, and safety tasks, setting a new standard for performance,” the company added.

When evaluated on the M3Exam benchmark that comprises exam papers of different academic levels, Alibaba said SeaLLMs displayed a “profound understanding of a spectrum of subjects, from science, chemistry, physics to economics, in Southeast Asian languages, outperforming its contemporaries”.

It claimed that SeaLLMs also excelled in the Flores benchmark, which assesses machine translation capabilities between English and languages such as Lao and Khmer that have limited data for training conversational AI systems.

Besides Alibaba, the Singapore government is also driving the development of LLMs trained in Southeast Asian languages. Earlier this month, it launched a two-year initiative to build multimodal, localised LLMs for Singapore and the region and drive deeper understanding of how LLMs work, among other goals.

Read more about AI in APAC

Read more on Artificial intelligence, automation and robotics

Data Center
Data Management