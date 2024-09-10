Sony Research and AI Singapore have teamed up to enhance the capabilities of Sea-Lion, a family of large language models (LLMs) pre-trained and instruction-tuned for Southeast Asian languages.

The initial focus will be on refining Sea-Lion for the Tamil language, estimated to be spoken by 60 to 85 million people worldwide, with many in India and Southeast Asia.

Sony Research, which has a strong presence in India, said the partnership aims to address the underrepresentation of Southeast Asian languages in the global LLM landscape. The work will be conducted through its Sony AI division.

“Access to LLMs that address the global landscape of language and culture has been a barrier to driving research and developing new technologies that are representative and equitable for the global populations we serve,” said Hiroaki Kitano, president of Sony Research.

“In Southeast Asia specifically, there are more than a thousand different languages spoken by the citizens of the region. This linguistic diversity underscores the importance of ensuring AI models and tools are designed to support the needs of all populations around the world,” he added.

Kitano has been an active participant in the Singapore technology community and is affiliated with a number of research organisations and initiatives in the city-state. He is a member of Singapore’s Advisory Council on the Ethical Use of AI and Data.

Leslie Teo, senior director of AI products at AI Singapore, said the Sea-Lion model, with its Tamil language capabilities, holds great potential to boost the performance of new AI applications.

“We are particularly eager to contribute to the testing and refinement of the Sea-Lion models for Tamil and other Southeast Asian languages, while also sharing our expertise and best practices in LLM development,” he added.

Sea-Lion was developed by AI Singapore to address the need for localised LLMs that better reflect the context and values of Southeast Asia. The open-source model was designed to be smaller, more flexible and faster than commonly used LLMs in the market today.

Besides Sony Research, AI Singapore has partnered with IBM to test Sea-Lion using IBM’s AI technology and incorporate AI governance into the model to assist companies in managing AI compliance, risk management and model lifecycle management.

It is also working with Google on a research project to build a corpus of training data that can be used to train, finetune and evaluate LLMs in Southeast Asian languages.