doubleIQ increases data warehouse performance

Looking to data delivery performance, doubleIQ from its data warehouse, doubleIQ employed EMC's Greenplum solution.

doubleIQ is in the business of delivering business information to customers by building, operating and maintaining the systems at client sites or using its own secure environment. doubleIQ has assisted clients in banking, insurance, telecommunications, retail and utilities to develop in-house data warehousing applications and integrate, aggregate and distribute data.

Recently, the Melbourne-based organisation has added a hosted data warehouse infrastructure to its portfolio. The aim is to provide clients with a secure storage environment, for fast and efficient access to and analysis of massive amounts of information via cloud-based services.

“As a company, we have developed skills, techniques and technology to build the latest generation of information systems,” said Dennis Claridge, Business Director, doubleIQ. “This has enabled us to solve a range of business problems quickly and cost-effectively for some of the largest companies in Australia. Increasingly, our clients are interested in cloud-based services. Our cloud service allows us to dramatically streamline and automate the delivery of business intelligence.”

In August 2009, doubleIQ decided to deploy EMC’s Greenplum Database as the foundation for its hosted data warehouse and real-time analytics service for big data and cloud-based services.

doubleIQ focused on in-house database development as the foundation of its client services but developed a database architecture using the EMC Greenplum Database as the foundation of a cloud-based data warehouse delivered to clients as a public or private service. Called easiFacts Cloud Analytics solution, Claridge says that “We can now integrate and transform data from many different sources into a presentation layer that simplifies access for business users. With our proprietary knowledge and specialisation in cloud computing, we can streamline and automate the management of data, automate the delivery of business intelligence, and provide a change management capability that customises features for each client.”

When deciding on a database to underpin its data warehouse, doubleIQ looked at several alternatives before settling on the EMC Greenplum Database. However, they choose to leverage the PostgreSQL open-source database they already had. “We reviewed a PostgreSQL GridSQL shared-nothing clustered database system for data warehousing, but decided not to formally evaluate alternative databases such as Oracle Database and Microsoft SQL Server, as we already had a lot of experience with these" says Claridge.

doubleIQ relies on its internal data warehousing infrastructure to handle marketing and purchasing data, including information regarding merchants and customers. Once this information is fed into the warehouse from clients’ operational systems, doubleIQ undertakes business intelligence and competitive analytics before passing market intelligence data back to its clients. For utilities companies, doubleIQ analyses gas and electricity metering information, while for financial clients, it provides insights from information on merchant profiles and performance. For retail clients, doubleIQ processes customer and competitor data showing revenue, geographic penetration and market share statistics. The company distributes this information to its clients via its web interface.

After deploying the EMC Greenplum Database, doubleIQ has been able to process queries at least twice as fast meeting client requirements across terabytes of big data. “One of the key things we wanted to see after the deployment was how fast we were able to generate a query and deliver the data back to the end user, regardless of the volume of data involved,” said Claridge. “So far the speeds have been very good. I’d say it’s at least two to three times faster than any comparable alternative system.”

EMC’s Greenplum Database data warehouse infrastructure currently contains 3TB of performance data and has approximately 14TB of disk space attached. The database handles all the ‘heavy lifting’ of client data, particularly transaction processing. By using the database as a basis for its data warehousing infrastructure, doubleIQ has found Extract, transform and load , or ETL, tasks are completed much faster than by other systems. “We’ve run very similar customer segmentation processes for one of our clients using a different database and I know they take around three days to complete,” said Claridge. “On our own infrastructure, essentially the same process takes three hours".

The core production cluster is made up of five dual quad core Servers. Each Server has 48GB of Ram and the storage is directly attached to each server. Surrounding thisare a number of dual quad core Servers which act as application servers. Each application runs in its own VM machine and zoned accordingly to the individual application security requirements.

With a growing customer base, doubleIQ needed to make sure that the solution they chose could scale. “We’ve already scaled the environment once in the last 12 months and we will need to do it again in the coming year,” said Claridge. “EMC’s Greenplum Database offers very simple scaling abilities; it worked very well for us the first time and I’m expecting the whole process to be very smooth again.”

Due to the parallel nature of the database, adding more nodes also improves the performance and the speed of the warehouse. “From a capacity planning perspective, the environment is beautifully predictable,” said Claridge. “With a traditional database it’s harder to predict what scaling the environment will do. With Greenplum it’s very linear; if we double the size we know that our processing times will typically halve.”

Read more on Cloud computing services