Data virtualization is the wave of the future, analyst Rick van der Lans will tell a forthcoming data management conference in London.
Van der Lans is giving the keynote address at IRM UK’s compendium event in November, Data Management, Information Quality and Data Warehouse & Business Intelligence. His opening talk, on Tuesday, 8 November, is titled “The Impact of Data Virtualization on Data Warehousing and MDM.”
“Data virtualization allows you to see all your data stores as one big logical database,” he said, in a pre-conference interview with SearchDataManagement.co.UK. The main driver of interest “is the need for agility. Most of the data warehouse architectures are very layered, consisting of staging areas, data warehouse, data marts, cubes -- a whole chain of databases.
“So if we want to add a column to a report, say, we might have to change the cube and the ETL [extract, transform and load] process that feeds it and the data mart and so on. So a simple change leads to a waterfall of consequences. These days, that is no good. Customers want agility and faster decision making, so we have to have less of what we have just now.”
Data virtualization means a shorter chain of databases and ETL processes, according to Van der Lans. “We could get rid of some of these cubes and data marts.” It also makes self-service business intelligence, realized through tools like QlikView and Tibco’s Spotfire, more amenable to overarching control by IT, he said.
Van der Lans maintained that while data virtualization’s main antecedent, data federation, was mainly driven by technical considerations, there has more recently been a “business pull as well as a technical push.” Data federation is one aspect of data virtualization, which also includes data profiling and data cleansing, among other things, he added.
The existing architectures are too complex, he said. “We have reached the end of the first era of data warehousing. How agile your architecture is will dominate [the new]. That was not the old thinking. It was more ‘Can we build it?’ ”
“Big data,” he said, is another feature of the new era. He disparaged a definition of the term that stresses its being too big for relational databases. “That is a silly definition. If I put a SQL layer on top of Hadoop, then that is a relational database! No, big data is generated by automatic processes: clickstream logs, sensor-based data. The first real big data comes from the Internet companies -- Google, LinkedIn, Twitter -- that keep everything. It is a good catchy term, but I’d prefer ‘event data.’ ”
Data virtualization will also make master data management (MDM) and data governance easier to do, he said. This is because it is primarily about abstraction. A virtual layer can enable users of Excel, SAS and BusinessObjects -- to name but three -- share the same business rules. For example, if an organization creates a rule that says,, “The northern sales region does not include Scotland,” data virtualization will allow all three applications to follow that rule. This will also give more control to those charged with data governance.
“That kind of specification could end up anywhere in a classic data warehouse architecture, but with data virtualization you have the chance to know where to go. It won’t replace MDM but will make it more practical to build.”
He depicted the three main suppliers in the market -- Denodo, Composite and Informatica as equal in popularity, but said, “If this [data virtualization] really takes off, IBM, Oracle, Microsoft and so on will move in. IBM already has InfoSphere Federation Server and Cognos has Virtual View Manager, which is actually Composite. And Oracle’s “OBIEE [Oracle Business Intelligence EnterpriseEdition] stack includes federation technology as well.”