Data virtualisation on rise as ETL alternative for data integration

The Phone House and Novartis have turned to data virtualisation from Denodo and Composite to gain a single logical view of disparate data sources.

This article can also be found in the Premium Editorial Download: IT in Europe: Compliance and risk

Data virtualisation is emerging as a possible technique for businesses to use in tying together disparate databases to become more agile in both their business operations and their data integration processes.

Traditionally, companies have relied on data integration technologies, such as extract, transform and load (ETL) tools, to pull data from transactional systems and populate data warehouses for business intelligence (BI) and analytics uses. But for applications that require real- or near-real-time decision making, getting critical business insight out of an ETL-fed data warehouse can seem as effective as sending Lewis Hamilton out to qualify for a Grand Prix in an Alfa Romeo Series 1 Spider. The iconic 1960s roadster is a lovely machine, but one that’s likely to fall far short of Formula One’s uncompromising need for speed.

Another challenge for ETL processes is the increasingly large number of data sources that organizations are looking to tap. Such pressures are encapsulated by the pharmaceutical industry. Every year, billions of pounds are poured into research and development efforts, with companies hungering to create new and improved drugs that can provide booster shots to their businesses. Data is the lifeblood of pharmaceutical makers -- and there is no dearth of it for them to analyse.

As Fatma Oezdemir-Zaech, a knowledge engineering consultant at Switzerland-based Novartis Pharma, explained, her IT team serves a research department that needs to pull data from a huge variety of sources. That may include troves of trial research from medical publishers or commercial data sources, along with an abundance of data from internal systems. “Our team has extensive experience and skills in using ETL, and there are procedures that can be done in a semi-autonomous way,” said Oezdemir-Zaech. “But the more data sources we used, the more time it was taking to get the data in the format we want.”

Traditional data warehouses haven’t become redundant, said Gary Baverstock, UK regional director at data virtualisation vendor Denodo Technologies. But as the pressure for real-time insight and increased business agility intensifies, and companies increasingly look to utilise external data sources, many IT chiefs are seeking alternative ways to deliver data to business users. See ETL vs ELT for an example.

Data virtualisation keeps data in its place

One option is data virtualisation, which provides a layer of abstraction that can sit atop enterprise applications, data warehouses, transaction databases, Web portals and other data sources, enabling companies to pull together data from different systems without having to create and store new copies of the information. That eliminates the need to replicate data or move it from source systems, reducing IT workloads as well as the risk of introducing data errors.

The technology also supports the writing of transaction data updates back to the source systems. This, proponents say, is one of the clear benefits that set data virtualisation apart from data federation and enterprise information integration (EII), two earlier techniques with similar aims of making it easier to analyse data from a disparate array of sources.

While the three share some capabilities and are sometimes viewed as the same thing under different names, EII technology offered a read-only approach to data querying and reporting, said Brian Hopkins, a US-based analyst with Forrester Research.

Data federation emerged more than a decade ago and was meant to do away with ETL tools, data staging areas and the need to create new data marts. But critics say its initial promise masked key weaknesses: Data federation software was ill-suited to very large data sets or environments requiring complex data transformations. Worse still, it was, in the minds of many, intimately linked to the world of service-oriented architecture (SOA).

“There were a lot of good things associated with SOA, such as the efforts to drive complexity from organizations’ IT infrastructure, break down the information silos and untangle the spaghetti diagram of IT architecture,” said Baverstock. “But as the economic winds shifted, these tremendously complex IT projects fell out of favour, as businesses focused on those efforts that would bring quick wins.”

More on data virtualisation

Retailer looks to drive out data errors

The Phone House -- the trading name for the European operations of UK-based mobile phone retail chain Carphone Warehouse -- implemented Denodo’s data virtualisation technology between its Spanish subsidiary’s transactional systems and the Web-based systems of mobile operators because of the dual read-and-write capability supported by the tools, said David Garcia Hernando, business exchange manager for The Phone House Spain.

The retailer acts as an intermediary between its customers and the mobile operators. But, Hernando said, Phone House’s sales staff had to enter customer data into the company’s internal systems and then rekey it into the mobile operators’ systems because the different applications could not talk to each other.

“Whenever you have manual data entry, you're going to create errors,” said Hernando. “We'd have customer records that didn't match those held by the operators, and that was costing us money.”

And with approximately 1.5 million transactions processed each year in Spain, cutting the data entry time in half was a huge productivity boon for the retailer’s sales teams.

While there were simpler ways to achieve the integration, Hernando knew that the data virtualisation tools could provide other benefits, too. “Our invoicing system and CRM systems are pretty good, but they're 20 years old, so it can be tough when you want to introduce new things quickly,” he said. “But thanks to the Denodo technology, we can create new reports wanted by the business really quickly.”

Phone House’s data virtualisation experience is typical of many of the implementations Forrester sees. “Most organizations get into data virtualisation for tactical reasons, but once that's done they find that the benefits of not having to physically move the data around for integration has much wider use cases,” said Hopkins.

Data virtualisation: no limits?

It's a similar tale at Novartis, which implemented a data virtualisation tool from Composite Software to enable its researchers to quickly combine data from both internal and external sources into a searchable virtual data store. “Our particular challenge was taking vast column-based biological data sets from external sources and integrating that with our own Oracle database,” said Oezdemir-Zaech. “But Composite built us a proof of concept within three days. Once we were able to get easy access to all those data sources, the idea really took hold.”

She added that with data virtualisation, “there are no limitations -- it doesn't matter whether the data sets were huge or tiny. For us, that's really important.”

Hitherto, organizations may have been tempted to make their data easier to manage by undergoing a database consolidation programme. That has some obvious advantages, Hopkins said. “But it is a massive undertaking,” he warned. “It’s hard enough for structured data, never mind the morass of unstructured data swirling around the enterprise. Data virtualisation promises to deliver some of the same benefits -- most obviously, the ease of analysing data -- without the burden of massive data and application integration.”

Such benefits, combined with the belief that tactical data virtualisation projects will give rise to more strategic programmes designed to treat data as a utility-like service, lead Forrester to predict that the demand for data virtualisation is set to boom. It anticipates that organizations will spend $8 billion globally on data virtualisation licences, maintenance and services by 2014.

Still, even data virtualisation vendors acknowledge that the technology isn’t the answer to all data integration questions. “Data virtualisation is not the apogee of information management that means you can do away with all the other tools you've relied on over the years,” said Ash Parikh, director of product management at Informatica. “It's like a Swiss Army knife -- this is just one of the tools to get the job done.”

Read more on Master data management (MDM) and integration

Join the conversation


Send me notifications when other members comment.

Please create a username to comment.

Not having to create new tables or flat-file stores to hold transitory results also saves time and space at analysis time. Were ETL jobs easier to create and faster to deliver though, fewer virtualization applications might be needed. One vendor in between is IRI (CoSort). They use Eclipse to connect to diverse sources, integrate/transform (changed or full data sets) in a chosen file system, and spit out near real-time targets like unstored BIRT displays


Nice post! No doubt Hadoop is a powerful data management framework with MapReduce
providing all required aggregation capabilities, across massive
structured and unstructured data sets. We can definitely perform ETL via
custom coding but not replacing traditional ETL tools. Custom coding
take us to more than decade back where tools like DataStage started
changing the Industry. Though agree, they can compliment existing ETL
tools or replace core layers to improve the performance. For customers
they still remain ETL tools with lots of
built-in functionality for data cleansing, alignment, modeling, and
transformation which no one want to replace with custom management in
future. We need to be progressing than moving backward in the history. Now
where Hadoop/MapReduce can help is processing very large datasets, and
if we merge their capabilities with traditional engines, next generation
ETL tools going to be far more strong with seamless processing of
traditional and large data . MapReduce can also be used for complex
processing or unstructured data. More at

It’s really a great pleasure to provide an opinion about ETL tools. These are very important and useful to all the people from all over the world. ETL tools are useful to everyone which help to transform any data into any database fast and easy and comfortably.
Great article. Your readers may also find real user reviews for all the major Datacenter Virtualization tools on IT Central Station to be helpful:

According to the IT Central Station user community, Oracle Data Integrator is the #1 Datacenter Virtualization tool. This user writes, "Any IT company that relies on data coming from its clients or internal users requires data maintenance. All that collected data should be able to provide insight into the various behaviors of the clients across different platforms. For us, ODI has been key in collecting, transforming, and storing information from our various sources." You can read the rest of his review here: