She's the sedentary Lara Croft - real-time news, virtually, from a
virtual person. Danny Bradbury reports
PA Newswire put an artificial person online last week, with the
launch of Ananova, a computer-generated newsreader designed to
deliver personal bulletins over the Web.
Computer-generated cartoon characters are not a new thing -
visitors to online chat rooms have been using symbolic figures
called avatars for the past few years. But this is one of the first
times that such a character has been designed with a voice and
facial expressions of its own.
The virtual newsreader, who made her debut on 19 April, reads
news articles created by journalists, but also relies on a
real-time news search engine that provides the basis for the
text-based news that PA subsidiary Ananova currently provides to
portal customers.
How Ananova speaks
The service brings to mind the old Max Headroom TV series of the
mid-1980s. It works using a combination of online news feed
processing, text-to-speech technology from computerised speech and
voice recognition specialist Lernout & Hauspie, and a digitally
animated character created by specialist multimedia house Digital
Animations.
News articles are coded into an XML-based format that includes
tags specifically designed to influence the tone of the character's
voice, and her facial expression.
Jonathan Jowitt, project manager for the Ananova character,
explains, "We had to develop a technique where traditional text
wouldn't sound flat and boring. We had to take that initial level
of conversation and animate it, putting in facial and body
movement. Reporters enter data into the XML template."
The XML-based script is parsed into the Realspeak text-to-speech
engine that Lernout & Hauspie configured for the company. This
produces the initial pass for the sound file that will be used as
the character's voice. It also contains timing information which
dictate how the phonemes (the small pieces of speech that make up a
statement) will be timed to sound natural.
Ananova uses the American English version of the text-to-speech
engine, which also determines the prosody (melody) of the sentence,
so that the intonation sounds human. The sound is created by
choosing phonemes from a large database of sounds, produced by a
large number of recordings from a real person.
Patrick Salenbien, account manager at Lernout & Hauspie,
explains that the software produces a set of markers describing
which phonemes it will use, enabling third-party software
developers to synchronise their own programs with the speech
output. This output, accessed through an API, enables Digital
Animations to synchronise the face of the character with the spoken
words.
Facial expressions
Mike Hambly, chief executive of Digital Animations, says the
company created the graphical character to be scalable in its
complexity, so that the movement of the face could be made more or
less realistic depending on the amount of processing power
available.
Many computer-animated models in the past have worked by simply
dislocating the jaw from the rest of the face, making it move
independently of the rest of the face. Digital Animations
engineered the model with a virtual set of 'muscles' that are
animated by the output from the L&H Realspeak engine, so all
parts of the face move accordingly, he says.
Jowitt says the system works by processing the speech and the
image synchronisation at the back end, and then rendering it in a
streaming protocol. It will initially be delivered in Realplayer
format, but others may become available later.
At present, Jowitt estimates that the hardware can process a
formatted news article into speech and animation in about three
times as long as it takes for the character to speak the story, so
there is still a slight lag between the article being entered into
the system and its announcement. "Within a very short time we will
be able to speed up so that it only takes as long as it would to
read the story," he adds. This will turn it into a truly real-time
service.
Where next
The company is providing localised servers around the world to
maintain the quality of the multimedia signal by avoiding network
congestion.
Concentrating on server processing reduces the processing power
and memory footprint necessary to run the service on a client
device. This strategy is designed to keep the company's options
open. It wants to license the character so that it can make
different announcements in specific settings. It anticipates that
once the technology develops, mobile phone operators will make the
character accessible to cellular users. This could make online
mobile phone services much more consumer-friendly.
"She could also pop up in your car to say that the road you are
driving down is blocked, and to ask if you would like to be told a
different route," says Jowitt. On its Web site, Ananova speculates
about radio alarm clocks that are able to display the character.
She could wake people up early if news arose that may affect their
day, such as congested traffic, for example. Any licence fees
produced by the character will generate a cut for Digital
Animations.
The Press Association is selling off the Ananova subsidiary,
claiming that its focus on consumer-oriented markets is
incompatible with the parent firm's business-to-business focus. PA
launched its business-to-business Web site on the same day as the
Ananova announcement, and you can find it at
www.pressassociation.press.net. Although it remains cagey about the
status of the sale, sources indicate that a number of buyers have
been shortlisted. Ananova made just over £4m last year from its
conventional Web news feeds.
More information about Ananova.
The face that launched a thousand chips
- XML - used to provide Ananova with information about the news
stories she is reading so that she can adapt the tone of her voice
and facial expressions appropriately
- Text-to-speech - Lernout & Hauspie's Realspeak
text-to-speech engine uses an XML script to time Ananova's speech
patterns to make it sound more natural
- Animation - programming interfaces within Realspeak are used by
the bespoke animation engine to model Ananova's facial expressions
based on the words she is speaking
What Ananova will offer
News bulletins updated every five minutes.
- Personalised news updates based on your own profile
- Updates on transactions (for example, reporting when spare
theatre tickets become available), and automatic fulfilment of
those transactions
- Web searching to find information about subjects that you
specify
- Sports information including statistics, scores and
gossip.
Updates on shopping opportunities
- Information on UK-based entertainment events such as cinema,
comedy and music, based on a personal profile. This will also
include TV listing information