Feature

Translation software advances

Developing accurate translation software remains complicated and elusive.

Brian Clegg

Language is a barrier to business and demand for translation of foreign-language documents far outstrips the capabilities of expensive and over-stretched translation services. E-mail and the Web exacerbate the problem and the only long-term solution is to let the computer take over.

The concept of mechanical translation is surprisingly old. When Charles Babbage was petitioning the Government for funds to research a computing engine, one of the carrots he dangled in front of the bureaucrats was automatic translation to and from foreign languages. Babbage never got around to realising this claim - it is a big step from the cog wheels of Victorian computing to effortless machine translation.

Context is key

Even now, consistently good computer translation has proved elusive - it is all a matter of context. It is trivial to look up every word of a sentence in a dictionary and convert it into the equivalent in a different language, but languages have plenty of scope for confusion. One word can mean many different things and part of the richness of language is the use of metaphor to illustrate insubstantial concepts.

Take the sentence "time flies like an arrow". English readers know that this is not literal but computer software assumes that it is a description of time's aerodynamic properties. A machine translator could be programmed to treat the sentence specially but the rules would also fit "fruit flies like an apple". To realise that "flies" is now a noun, applied to small, irritating insects, takes a huge leap of context. A computer system could handle one of these phrases, but is unlikely to cope with both.

This does not make machine translation impossible. Present day software does make comprehensible, if inelegant, translations, particularly of technical and business documents which use a limited vocabulary and tightly-defined jargon.

According to Benoit Goes, translation product manager for Lernout & Hauspie (L&H), real-time simultaneous translation with a specialised vocabulary will be eminently possible in a few years. All that is missing, he claims, is the computing power to handle the three-part process of recognition, translation and speech.

One of the drivers behind the effort that has been put into machine translation is the Internet. Multi-language e-mails fly around the world daily. And English is by far the most common language for Web sites, about 17% are in Japanese, Spanish and German and another 10% in French.

The only practical solution is an in-place translation service, like the one offered by the search engine AltaVista, that retains the format of the Web page but converts the text.

AltaVista also provides a facility to type in text and have it instantly rendered in a different language. The AltaVista site is powered by Systran, L&H's main rival, while L&H has its own Web server vehicle used by Microsoft and the free translation facility at the BT Connect site.

Multinational corporates have particular problems with translation. For such companies, the translation suppliers provide client-server systems. One or more translation servers on the company's network handle the conversion both of documents and e-mails to and from the local language. Inevitably, such translations can only be regarded as a rough draft - the L&H enterprise solution includes the option to route important documents to human translators to ensure exact translation.

Desktop translation

Where an enterprise solution is not available but a more sophisticated translation is required than is available from a free Internet service, there are desktop translation packages. The best-known is Power Translator Pro, now part of the L&H portfolio. Recent versions of such products can translate documents directly from word processors and plug into Web browsers and e-mail software to provide on-the-fly translation.

Whether using a client-server approach or a standalone solution, the technology of machine translation breaks down into a number of components.

To achieve a good result, text has to be translated on a sentence-by-sentence basis. Within the sentence the individual words are linked to possible dictionary meanings, the sentence is parsed and then the parsed elements of the sentence are translated. A key resource in this process is the dictionary, or, more accurately, dictionaries.

The heavy-duty workhorse is a stem dictionary. This assigns special codes to each word to specify properties like part of speech, syntax, semantics and potential target language meanings. Where a word has several different meanings, the word will be cross-referenced to the alternative meanings and possible ambiguities over which part of speech is being dealt with, until parsing makes this clear. For example, is "flies" a noun or a verb?

A second dictionary is required to handle multiple-word expressions, where a combination of words can be regarded as a single word for the purposes of translation, or where the syntax depends on word combinations. There are also likely to be tertiary dictionaries for customer-specific terms and vocabulary.

With the dictionaries to call on, the most sophisticated part of the translator is the parser. This analyses each sentence and attempts to understand its structure and the context of the words in the sentence.

The combination of parsing and dictionary look-up alone is not enough. Depending on the target, the parsed and translated components of the sentence will have to be reassembled in different ways, for example, putting the verb at the end of a phrase in German. Finally, a sentence is produced in the target language.

Perfect machine translation is still a long way off, but the computer is encroaching upon the business requirements. Increasingly we can use technology to peer over the language barrier.

The pitfalls of current machine translation

This is an actual e-mail received by the company Creativity Unleashed. The author wanted to point out that the company's Web address, www.cul.co.uk, has unfortunate connotations in the French language. Unable to write in English, he used an automatic translator. Although the approximate meaning is clear, it demonstrates very effectively why machine translation is not yet a practical business tool for outgoing messages:

Please allow the transfer, I use a mechanical software because I very English of cannot. On the 14‚me, in the porque one, I slap a search with the form returned www.cul.co.uk. Then to say to you, cul is a bad French word? It average rest-on the flesh of the rectum of anybody. Since this, cannot think you the need to want the nation French with the arrangement of creative. Thus I give to help in all fraternity, to think please for the change.

Familiar the most pleasant

Henri.

Online translation links

  • AltaVista - www.babelfish.altavista.com

  • BT Connect - www.btconnect.com

  • Lernout & Hauspie - www.lhsl.com

  • Systran - www.systransoft.com

  • Centre for Speech Technology Research - www.cstr.ed.ac.uk

  • Stanford Research Institute - >www-speech.sri.com/

  • Carnegie Mellon University - www.speech.cs.cmu.edu/speech/


  • Email Alerts

    Register now to receive ComputerWeekly.com IT-related news, guides and more, delivered to your inbox.
    By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

    This was first published in January 2001

     

    COMMENTS powered by Disqus  //  Commenting policy