Merrill Lynch has developed an
enterprise computing grid that allows it to run applications 800
times faster than previously by putting to work the power of
disaster recovery servers and other under-utilised
resources.
The investment bank plans to use the grid to run simulations and
risk analysis for high value derivatives trades. Ten applications
are currently being run, but the bank plans to have 30 running by
the end of the first quarter of 2007.
How fast crucial calculations can be made has a material impact
on profitability. "If you are looking at a £200m deal and it takes
600 hours to run the calculations, you need to get this down to an
hour," said Juan Lando, who heads up the grid centre of expertise
at Merrill Lynch.
Using dedicated servers led to an under-utilisation of hardware,
as users needed to over-specify servers to cope with peaks in
demand, Lando said. "Management wanted to avoid having to build
datacentres," he said. Accordingly, Lando's team looks to identify
and move suitable applications onto the bank's grid.
The grid works because applications are all written to a common
standard, said Lando. It uses Red Hat Linux and Windows operating
systems, Gemstone for data caching and Datasynapse for management
and the grid programming environment.
Applications are developed either for Microsoft .net 2.0 for
Windows 2003 applications, or the standard Java Enterprise virtual
machine for Red Hat Linux.
How the grid works
Merrill Lynch's strategy for running applications on the grid is
known as "intelligent scavenging".
While some users try to make use of spare desktop PC capacity,
the investment bank prefers using datacentre processors to boost
processing power, because the datacentre hardware runs in a
standard environment.
The Merrill Lynch grid implements a "follow the moon" policy to
use free datacentre processing power at the end of the day, when
the servers are being used less.
It takes advantage of servers in disaster recovery sites and
datacentres, which would usually be unavailable for normal business
use.
The software monitors keyboard and mouse activity, automatically
releasing the disaster recovery site from the grid whenever a
system administrator needs to access the site.
There is also a "red button", a script that the system
administrator can run to manually disconnect the site from the grid
if a disaster recovery plan is invoked. This approach gives
applications a massive amount of processing capacity to tap
into.
In one instance, Lando's team assessed a complex Excel model and
the developers recoded the calculations in C++, allowing them to
run on the grid, and so improving the performance of the
spreadsheet by 10,000 times.
Related article:
Datacentre technologies
rise to the challenge of blade servers