aphotostory - stock.adobe.com
Alibaba has launched a slew of large language models (LLMs) that have been trained on Southeast Asian languages including Thai, Lao, Khmer, Burmese, Vietnamese, Indonesian, Malay and Tagalog.
Developed by Alibaba's Damo research arm, these so-called SeaLLMs were designed to cater to the linguistic diversity of Southeast Asia, enabling businesses to leverage chatbots that not only comprehend but also reflect the social norms, customs, stylistic preferences and legal considerations in the region.
The SeaLLMs have been open-sourced on Hugging Face, with released checkpoints and available for research and commercial use.
“In our ongoing effort to bridge the technological divide, we are thrilled to introduce SeaLLMs, a series of AI models that not only understand local languages, but also embrace the cultural richness of Southeast Asia,” said Lidong Bing, director of the language technology lab at Alibaba Damo Academy.
Luu Anh Tuan, assistant professor at Nanyang Technological University’s computer science and engineering school, said Alibaba’s efforts to create a multi-lingual LLM are impressive.
“This initiative has the potential to unlock new opportunities for millions who speak languages beyond English and Chinese. Alibaba’s efforts in championing inclusive technology have now reached a milestone with SeaLLMs’ launch,” he added.
Alibaba said a notable technical advantage of SeaLLMs is their efficiency, particularly with non-Latin languages. The models can interpret and process up to nine times longer text, or fewer tokens for the same length of text, than other models such as ChatGPT for languages such as Burmese, Khmer, Lao, and Thai. This helps to reduce operational and computational costs, with a lower environmental footprint.
“Furthermore, SeaLLM-13B, with 13 billion parameters, outshines comparable open-source models in a broad range of linguistic, knowledge-related, and safety tasks, setting a new standard for performance,” the company added.
When evaluated on the M3Exam benchmark that comprises exam papers of different academic levels, Alibaba said SeaLLMs displayed a “profound understanding of a spectrum of subjects, from science, chemistry, physics to economics, in Southeast Asian languages, outperforming its contemporaries”.
It claimed that SeaLLMs also excelled in the Flores benchmark, which assesses machine translation capabilities between English and languages such as Lao and Khmer that have limited data for training conversational AI systems.
Besides Alibaba, the Singapore government is also driving the development of LLMs trained in Southeast Asian languages. Earlier this month, it launched a two-year initiative to build multimodal, localised LLMs for Singapore and the region and drive deeper understanding of how LLMs work, among other goals.
Read more about AI in APAC
- Organisations in APAC are deploying and experimenting with generative AI in healthcare, citizen services and other use cases amid cost-related concerns and other challenges.
- AI can be transformative in government, but its implementation has to be done in a way that drives positive outcomes while mitigating its downsides, according to the head of Australia’s Digital Transformation Agency.
- Culture Amp is building a generative AI capability that summarises employee survey responses, automating a process that typically takes HR admins up to hundreds of hours to complete.
- The multidisciplinary nature of AI offers career opportunities not only in builder roles like engineering and data science, but also in AI ethics and applied AI.