Moving from the mainframe to a distributed SOA is hotel group Starwood’s overall IT strategy. Helen Beckett reports on how installing a middle-tier cache is playing a key role
It is hard to put a mainframe out to pasture, however dated the technology, when it continues to provide reliability and high performance. In today’s hard-nosed business climate, this has nothing to do with sentimentality and everything to with finding a successor that can do the job as well.
Companies such as hotel group Starwood, which owns brands including Le Meridien and Sheraton, have discovered that implementing a middle-tier cache makes a distributed platform and high performance a possibility. By putting the data closer to the application, a cache can boost the number of transactions achieved on smaller servers.
Starwood realised it had reached crunch point with its IBM MVS mainframe when it became difficult to source skills. “It is hard sometimes to find experienced programmers who know Cobol,” says Song Park, director of pricing and availability technologies at Starwood.
Being sufficiently agile to scale to meet new business demand and to develop applications quickly that could bolt onto the mainframe were also key drivers. The idea behind using smaller servers was to gain the capability to scale in smaller increments to cope with different spikes of traffic.
For these reasons, Starwood decided that its long-term strategy had to be to “get off the mainframe”. The central reservation system was the biggest and most commercially important mainframe application, and Starwood decided to tackle this first. Park was given the task of developing the pricing and availability function of the legacy central reservation system as a service.
The project was part of an overall strategy to replace the mainframe platform with a distributed, service oriented architecture. Migrating to this model would enable greater flexibility to respond to business changes. “If the pricing model changes, for example, we can respond to that dynamically in a way you could not with Cobol,” says Vanessa Lapins, senior director of emerging systems development at Starwood.
Starwood considered two options: migrating data off the mainframe DB2 database onto a distributed platform, or keeping data on the mainframe and putting a layer of middleware in front to handle the more complex enquiries. “Long term, we wanted to be off the mainframe and so we decided to bite the bullet,” says Park.
The firm installed an Oracle database and built an application around it to serve its sales channel partners. Starwood takes business from multiple channels, including business-to-consumer sites such as Travelocity and Expedia, Starwood’s own site, and a host of non-affiliated agents.
The lion’s share of business comes from the global distribution system (GDS), powered by GDS suppliers Amadeus and Sabre. Connection to the hub requires suppliers to meet stringent service level agreements, and to process hundreds of transactions a second
A mid-tier web service was designed to enable different channels to interrogate the booking data in any way required. “Everyone wants to come in via web services and XML and to book using their own particular format. This might be to satisfy a particular clientele or to calculate prices differently to inform internal revenue teams,” says Park.
The pilot was not an unqualified success, however. An ability to scale is crucial to cope with the so-called “tsunami” effect, where a marketing promotion may bring in a huge wave of traffic. But when Starwood simulated loads in a test environment, the Oracle database only reached 300 transactions per minute on two servers.
“The only way the Oracle database could have achieved the required throughput and performance would have been to throw vast amounts of hardware at it,” says Park. Additional hardware costs and licensing fees were not an option, but gaining a throughput at least equal to the IBM predecessor remained a prerequisite.
One part of the pilot that had worked unexpectedly well was the caching component, and this revelation provided the way forward. “We had custom-built a cache that could serve second- and third-time requests directly out of the Oracle database. We found that the response times were really incredible – less than 100 milliseconds serving hundreds of requests concurrently per second on a four- CPU machine.”
Starwood decided to build a cache as an integral part of the new distributed architecture and looked at a variety of technologies.
Objectstore from Progress Software was chosen because of its ability to persist cache – keep data available in cache memory to speed accessibility. Most other caches achieve persistence of the cache through a spill over of data onto disc. “I am sure some other cache technologies also give you the options to offload the cache onto disc on demand. Objectstore does this in real time, yet lets us maintain in-memory performance numbers.”
While Starwood was busy building its data architecture, it took comfort from the feedback of consultants that the same design pattern had been adopted by financial traders. “Like stock traders with the need to compute masses of finely-tuned transactions at very high speeds, we have said, ‘Let’s bring the data as close as possible to the application’,” says Park.
Achieving both objectives was accomplished by creating a specialised middle tier. “We still have Oracle at the back end, but it serves a different purpose. It is a database of records,” says Park. Nor is Starwood planning to replace its relational database. “We want to retain our investment in SQL people and tools.”
The Objectstore database middle tier has brought another benefit. With its close allegiance to Java and web services architecture, Park and his team had the opportunity to build a pure object model with close dovetailing between query and cache.
“It means we could circumvent entirely the layer of object-relational mapping,” says Park. This happens normally when an application’s built-in objects have to query a database that stores data in relational tables. This bridge between the two worlds still has to happen at some place in the Starwood configuration, but crucially, not in real time.
If the mapping is done in real time it increases the complexity tenfold, says Park. Developers need to know not only how to pull SQL from three tables, but from a slew of other technologies too, such as database drivers and Java Entity Beans, in order to represent data elements from the relational database. Additional third-party tools can help with the mapping, but this requires yet another skillset.
Instead, Starwood’s development team built an object model in Java and plugged it into Objectstore. The mapping piece of the data architecture, where the object model talks to the Oracle database, is now a relatively minor task that is done offline.
The biggest challenge was getting used to a different mindset. “There has not been an enormous amount of pain, other than getting the object model right, which took about three iterations. The big plus is that it lets us focus on the business problem and logic.” The two main benchmarks were how naturally the cache performed and how easy it was to query, and how easy the model was to support and maintain.
The availability of the system has been cracked, but it is just one piece of the puzzle, and the replacement of the mainframe system with a distributed service oriented architechture remains to be implemented. However, the ambition is to join up all channels to the new, distributed and cache-enabled reservation system by the end of the year.
The fruits of the IT team’s efforts will be realised later this year when Starwood hopes to capitalise on the extensibility, or greater granularity, of the system. Multiple room details can be held as objects within the object model and queried and booked by the user, says Lapins. “Users will be able to compose a reservation using a host of details beyond whether a room is non-smoking or not.”
Why use middle-tier cache?
Explaining the purpose of middle-tier cache memory, Clive Longbottom, service director at analyst firm Quocirca, says Windows-based systems require screen refreshes that mainframe “green screen” applications do not. Legacy applications are able to survive on minimal bandwidth and mainframes can serve data rapidly in this environment.
In a distributed configuration, it is a problem getting the data from the disc to the CPU, and a lot can be gained from in-memory processing. Effectively this provides a memory in silicon rather than magnetic memory. However, this does not solve bandwidth latency issues, and in-memory databases are not cheap.
Rather than looking for a one-off solution, companies can tune several different technologies and achieve many, small performance improvements. The idea should be to “virtualise” – whether storage, database or network – and to bring data closer to the memory.
Adding cache as a middle tier is a good option because it decentralises the data and this tackles the latency problem.
The downside is that cache poses problems of data integrity. For example, a booking transaction can be processed only to discover that the main database sold that vacancy. And the “heartbeat” response of requesting data confirmation from cache to database requires additional CPU power.
Cache, therefore, needs lots of intelligence to make it work, for example knowing which data to store locally. You do not want to replicate the entire database – it is self-defeating– but too many requests back to the database creates an overhead.
The object model underpins everything, but it is still important to get the relational model right too, otherwise cache will struggle to get data out of it on time. The business model, object model, cache and relational database are all interdependent. You need a lot of skill to get it right, says Longbottom.