Jindal group, the US $10 billion conglomerate, has grown from a single-unit steel plant in Hisar, Haryana, to a multi-product steel giant. The group has been technology-driven, and has a varied product portfolio. Yet, steel has always been the focus at group company Jindal Steel Limited (JSL). From the mining of iron ore to the manufacturing of value-added steel products, JSL has a prominent position in India's flat steel segment. It is of utmost importance for JSL to have the entire setup running 24/7, which is why JSL is very serious about its disaster recovery capabilities.
Taking disaster recovery seriously
JSL's entire manufacturing activity across different time zones (nine locations in India and 30 overseas) is powered by SAP ERP. Without on-site disaster recovery, the organization will expose itself to substantial risk. According to Ajay Dhir, the CIO of JSL, "When we created our data center landscape, the first thing we wanted was built-in risk assurance so that neither a natural nor man-made calamity could disrupt the business." In the organization's present environment, every transaction has to pass through SAP before processing. This makes ERP quite the nerve center of the organization—and the leading reason for having on-site disaster recovery.
The disaster recovery process started in 2007 when JSL's data center in Hisar was set up. This location was selected primarily for reasons of security. As more plants of the organization come up, they will be connected to this data center through a virtual private network.
On-site disaster recovery has been created using clustering and replication. JSL has enterprise-class servers from Sun running in cluster mode, along with continuous replication. The on-site disaster recovery implementation, for which JSL partnered with Sun and Wipro, was completed in five months. Interestingly, a key result area of the IT team is the uptime they deliver; this makes the team take disaster recovery even more seriously.
For the disaster recovery process, JSL re-used existing clusters and old servers which had been around (even 10 year old servers). Says Dhir, "The disaster recovery setup is all about 100% utilization of existing resources, coupled with new ones." JSL followed the best practice guidelines offered by Sun. Dhir mentions that one of these references, an article from SearchStorage.com on synchronous applications, helped them a lot. The data is stored on Sun storage. Four Sun Enterprise 20000 servers are running in a cluster environment. SAP (which powers 90% of the organization) and Microsoft Exchange, run in a cluster along with the central file storage.
The core SAP server is the Sun E23000 server, which has three different server environments: test & development, quality assurance, and production. The second server category (quality assurance) is highly populated. "One also has to plan for the headroom. The sizing of servers is almost a scientific process for us," says Dhir.
JSL's IT team calculated the number of SAPs required to run the operational system. Based on that, the team undertook sizing of these three environments. Then the team had to cluster these servers. For this, the team used a SAP sizer solution. Next step was the synchronization and replication of these three servers. These Sun's SPARC servers run the Solaris platform, with Oracle database as the back end.
Comments Dhir, "On-site disaster recovery should be done at the beginning. Later on it is too risky to try out disaster recovery, especially in an online environment. On-site disaster recovery is like a pacemaker which has to keep running on a parallel basis. Otherwise, the heart will stop." While selecting partners, JSL considered their competencies as well as the reliability of the hardware and software provided by them.
Dhir recommends that any manufacturing industry which is into processing or automation, should look at a disaster recovery model based on the BFSI and telecom segments. Since these verticals have nearly online disaster recovery where downtime cannot be afforded, many lessons can be learned from these setups.
Off-site disaster recovery in the works
JSL plans to have off-site disaster recovery in 2010, though the company is yet to decide whether to opt for its own data centers or outsource the service. Through this disaster recovery setup, JSL plans to achieve a recovery time objective (RTO) of four to six hours and an recovery point objective (RPO) of 30 minutes.
According to Dhir, JSL's business continuity planning ensures minimal business disruption due to infrastructure, network and application issues. JSL also plans to go in for data center certification as part of its IT security, ISMS and IT governance initiatives.