Data's main drivers: volume, velocity, variety and variability

bridgwatera | 1 Comment
| More

Trends typifying data usage today appear to fall into four categories. Volume, velocity, variety and variability; allow me to explain...

Volume -- of data is getting higher/bigger than ever.
Velocity -- of data is increasing e.g. Complex Event Processing of real time data.
Variety -- of data is spiraling e.g. unstructured video and voice.
Variability -- of data types is also increasing

These are the findings of the October 2011 Forrester report Enterprise Hadoop: The Emerging Core Of Big Data.

According to the report, "This growing tsunami of intelligence feeds downstream business processes in both the front and back office, helping organisations optimise their interactions and operations through powerful analytics."

As a result of these realities (if we accept Forrester's statements to be true), the market for data analytics is also potentially expanding. Vendors eyeing this space are busy about their business trying to develop tools and analytical algorithms that will work on the data stored in databases.

Logically then, it is the database companies that are trying to spearhead this mission.

Everybody's at it

IBM has its Smart Analytics System 7710 based on the IBM Power7 chips as well as the IBM DB2 Analytics Accelerator. Oracle's strategy in the big data analysis market encompasses NoSQL, Hadoop and R analytics. Compuware's Gomez Application Performance Management system now comes packaged with deep-code analysis - largely as a result of the company acquiring dynaTrace earlier this year. Plus there is also Sybase and its IQ column-based analytics database, which has just hit its version 15.4 release.

Speaking directly to the Computer Weekly Developer Network blog, Sybase business development manager Andrew de Rozairo explained that this latest release of Sybase IQ includes a native MapReduce API, Predictive Model Markup Language (PMML) support, integration with Hadoop and an expanded library of statistical and data mining algorithms.

The new product uses Sybase IQ PlexQ massively parallel processing (MPP) technology as well as some new APIs to enable developers to implement in-database algorithms achieving what the company claims to be greater than 10x performance acceleration over existing approaches.

"What we see today is that organisations have an array of different tools and techniques to leverage big data and gain insight. These different tools include MapReduce, predictive modeling and data mining tools, in-database or embedded analytics. The issue is that until now, all these tools have been separately, in different analytic environments. With Sybase IQ 15.4, we deliver a single analytics platform to bring together all these different tools and techniques, ensuring consistency and simplifying the architecture," said de Rozairo.

"Sybase IQ 15.4 delivers MapReduce functionality against data held entirely in IQ or in a combination of IQ and other storage systems, including Hadoop. PMML support will mean that statisticians and data scientists will be able to bring their sophisticated models from their favourite data mining tool into IQ and execute this against large volumes of data," he added.

The concept here is for extended in-database analytics capabilities to eliminate the time wasted transporting data to the analytics engine. So therein lies Sybase's attempt to justify its claims of increased speed. This "single platform for data analytics" is being promoted as a key advantage for BI (business intelligence) programmers and report-writers.

Editorial disclosure: Adrian Bridgwater works in an editorial capacity for the International Sybase User Group, an independent association that represents thousands of users of Sybase products in more than sixty countries around the world. He is not an employee of Sybase but seeks to work with ISUG to support its work challenging Sybase product development and training.

1 Comment

Great post Adrian. Note that when I first posited the now ubiquitous three-Vs in a 2001 Gartner (then Meta Group) research note, "variety" was defined to encompass both new data types and structures, which are arguably one and the same. Since that time, I and others have also suggested other data complexity dimensions including validity, volatility, viscosity (resistance to flow), etc. But the message remains the same: be aware of and tend to actual data management challenges; "big data" is just marketing jargon. -Doug Laney, Gartner

Leave a comment

Subscribe to blog feed

About this Entry

This page contains a single entry by Adrian Bridgwater published on November 3, 2011 7:32 AM.

Hackers rooting and jailbreaking mobile apps for evil malware creation was the previous entry in this blog.

Two examples of real information warfare hacking is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.