In the data warehousing tools market, mergers and acquisitions (M&A) has been the story so far. Over the past two years, the M&A activity has placed the appliance-based startups Netezza and Greenplum into the hands of large vendors IBM and EMC, respectively. At the same time, SAP and Sybase, the established enterprise data warehousing...
tools vendors with limited appliance offerings, merged and began to ship their appliances strengthened with innovative approaches such as reliance on all-in-memory architectures. During this two-year period, Oracle released three generations of its Exadata appliance product, with the latest incorporating hardware from its Sun Microsystems acquisition. And the good old Microsoft, which acquired DATAllegro in mid-2008, finally began shipping its long-promised SQL Server 2008 R2 Parallel data warehousing appliance with petabyte-scalability.
The commodity warehouse
The established enterprise data warehousing tools vendors, Teradata, IBM/Netezza, Oracle/Exadata, Microsoft, and SAP/Sybase have released new appliance product families. In particular, Teradata, IBM Netezza, and Oracle have also launched industry-specific and/ or business-function focused analytic solution appliances that build on their core platforms, leveraging industry—and function-specific logical domain models (LDMs). The enterprise data warehousing tools price war has gone into overdrive. The market has commoditized as vendors flood the market with solutions, and budget-conscious customers maintain their laser focus on price-performance.
Says James Kobielus, Sr. Analyst, Forrester Research, “Vendors continue to cut prices, with many hovering around a starting-price threshold of US $20,000 (Rs 9.84 lakh, approximately) per terabyte of usable raw data on the enterprise data warehousing appliance. Oracle and IBM/Netezza were the prime movers in popularizing their price points with their latest-generation appliances, but Microsoft has done even better with a starting price of US $11,000 (Rs 5.39 lakh, approximately) per terabyte on its Fast Track appliances and US $13,000 (Rs 6.37 lakh, approximately) per terabyte on its new PDW (Parallel Data Warehouse) product.” .
Scope of DW tools
Today, data warehousing tools typically need to perform the following major operations:
- Batch and near real-time loads to integrate data from multiple resources (internal and external)
- Basic reporting with no drill-down/ drill-across
- Online analytical processing (OLAP)
- Predictive analytics
- Operational business intelligence
With time, data warehousing tools are required to meet a set of new challenges. Some of these are discussed in detail here.
1. Big data
Some of the big challenges facing data warehousing tools today are exploding data volumes, new emerging types of data, more real-time latencies, the lack of agility in delivering data to the business, and more mission critical use of data warehouses in operations and decision making. “The ratio of growth of data vis-à-vis that of storage capacity is about 2:1. Analyzing such rapidly growing data volumes and extracting business value from them will be a key challenge to data warehousing tools,” says Suganthi Shivkumar, managing director - Asia South at Informatica. To meet the challenges of real-time operations and big data, Informatica has released native connectivity with Hadoop in June 2011. This enables customers to deliver all types of data at varying latencies. IBM, another large vendor, has employed its core technologies InfoSphere, BigInsights, and InfoSphere Streams as its platform for big data.
For enterprise data warehousing tools, the Extract, Transform and Load (ETL) operations from the information sources represent another challenge. Doing so in real-time is tough further. Most ETL tools operate in a batch mode on the assumption that information will be available on a certain schedule. When operating in real-time mode, it’s a lot more challenging as the ETL operations need to happen simultaneously when the transactional systems are experiencing peak loads. “As OLAP and query tools are designed to operate on unchanging data, their operations on real time data can lead to inconsistent and confusing results,” says Sheshagiri Anegondi, vice-president – technology business at Oracle India.
3. Complex queries
At present, many companies are using traditional, proprietary data warehousing tools that aren’t designed to handle complex analytic queries against billions of rows of data. To answer even simple questions typically requires time-consuming retooling, creating indexes, partitioning the data and re-indexing the database.
Commenting on how the data warehousing tools will look like in the near future, Sanjay Raj, director of BI/DW practice at Syntel, says, “Doing all that the business wants will need a design that is modular, scalable and can sustain the performance required.”
1. Cloud computing
Enterprise data warehousing tools with cloud/ SaaS model have come into the enterprise. “Over the next two to three years, these will gain greater enterprise adoption as a complement or outright replacement for appliance- and software-based data warehousing tools,” says Kobielus of Forrester. Adds Syntel’s Raj, “Public clouds will take some time to mature because of security concerns. Private cloud is a reality and many organizations in the services and BFSI sectors are aggressively pushing it.”
2. Transformational technologies
The technologies like in-database analytics and transaction processing can transform the role of enterprise data warehousing tools. The current best-of-breed platforms support these application integration scenarios through features and interfaces such as MapReduce, in-database function pushdown, embedded statistical algorithm libraries, predictive modeling integration, decision automation, and mixed workload management.
3. Social media and unstructured data
Social media drove unstructured data and real-time architectures into the enterprise data warehouse. A key application is social media analytic dashboards to monitor customer awareness, sentiment, and propensities in real-time. To address these requirements, and the convergence of in-database data mining and text analytics, next-generation data warehousing tools incorporate unstructured sources, hybrid storage architectures, in-memory execution, distributed cache, complex event processing, solid-state drives, geospatial data sets, and rich metadata.
4. No SQL
Most commercial applications and solutions use a relational database under the hood for metadata/content store. Non-relational distributed NoSQL databases would have to go through a long cycle of time test to convince solution developers and business users in order to create a market place. Customers are highly sensitive to the application and database support availability of the solution; however, NoSQL databases being an open source offering, will have to address and overcome support challenges before it can be part of any critical business system.
5. In-memory technology
In-memory databases catering to sub-second response requirements do not share any common business space with data warehousing tools. Informatica is working closely and innovating with other data warehouse vendors in these areas; for example, Greenplum, Teradata/Aster Data, HP/Vertica, etc. Oracle acquired TimesTen in 2005 and today it forms the cornerstone of Oracle’s offerings in in-memory databases. However, data warehousing tools are marching towards in-memory paradigm with the advent of solid state drives in order to achieve higher performance.
6. Open Source
Open source isn’t new, of course. When the Internet took flight in the mid-1990s, Linux sparked a free software movement that today supports everything from operating systems to application servers to middleware and databases. So, why is open source a particularly smart strategy for data warehousing tools?
As of today, many features that are part of proprietary products are not available in open source analytical databases. But, some of the features like partitioning, bitmapped indexes, materialized views, parallel loading and query processing, support for SQL Windowing functions that were missing earlier are now being made available making the sun of open source data warehousing tools shine brighter.
According to Raj, the financial sector is taking the lead in experimenting with open source options. “Few Fortune 500 companies are moving to open source BI tools,” he says.
“Given enough time and money, corporate IT departments can develop a system perfectly designed to answer any question quickly. With this backdrop it is only natural that the flexibility of open source would make its way into the data warehouse market,” -admits Nitin Singhal, country manager - information management software at IBM India.
But some vendors are downplaying the competition from the open source world. “We don’t see many open source data warehousing tools as of today. We’ve seen some organizations trying to develop some open source tools but very few have deployed them in the production environment on an enterprise scale. Usually they isolate such tools to small departmental projects,” concludes Shivkumar of Informatica.