Merrill Lynch has developed an enterprise computing grid that allows it to run applications 800 times faster than previously by putting to work the power of disaster recovery servers and other under-utilised resources.
The investment bank plans to use the grid to run simulations and risk analysis for high value derivatives trades. Ten applications are currently being run, but the bank plans to have 30 running by the end of the first quarter of 2007.
How fast crucial calculations can be made has a material impact on profitability. "If you are looking at a £200m deal and it takes 600 hours to run the calculations, you need to get this down to an hour," said Juan Lando, who heads up the grid centre of expertise at Merrill Lynch.
Using dedicated servers led to an under-utilisation of hardware, as users needed to over-specify servers to cope with peaks in demand, Lando said. "Management wanted to avoid having to build datacentres," he said. Accordingly, Lando's team looks to identify and move suitable applications onto the bank's grid.
The grid works because applications are all written to a common standard, said Lando. It uses Red Hat Linux and Windows operating systems, Gemstone for data caching and Datasynapse for management and the grid programming environment.
Applications are developed either for Microsoft .net 2.0 for Windows 2003 applications, or the standard Java Enterprise virtual machine for Red Hat Linux.
How the grid works
Merrill Lynch's strategy for running applications on the grid is known as "intelligent scavenging".
While some users try to make use of spare desktop PC capacity, the investment bank prefers using datacentre processors to boost processing power, because the datacentre hardware runs in a standard environment.
The Merrill Lynch grid implements a "follow the moon" policy to use free datacentre processing power at the end of the day, when the servers are being used less.
It takes advantage of servers in disaster recovery sites and datacentres, which would usually be unavailable for normal business use.
The software monitors keyboard and mouse activity, automatically releasing the disaster recovery site from the grid whenever a system administrator needs to access the site.
There is also a "red button", a script that the system administrator can run to manually disconnect the site from the grid if a disaster recovery plan is invoked. This approach gives applications a massive amount of processing capacity to tap into.
In one instance, Lando's team assessed a complex Excel model and the developers recoded the calculations in C++, allowing them to run on the grid, and so improving the performance of the spreadsheet by 10,000 times.
Related article: Datacentre technologies rise to the challenge of blade servers