Intel graphs aim to make 'big data inside' easier

Intel is aiming to extend its recognition with the open source, developer and big data arenas by now offering an open source programmer tool aimed at supporting big data analysis.

The Intel GraphBuilder tool has been specifically engineered to help handle big data for computer learning.

Currently at beta stage release, GraphBuilder’s Ronseal naming is appropriate enough given its ability to construct large-scale graphs for software system frameworks devoted to big data analysis.

The tool also claims to be able to offloads many of the complexities of graph construction and these would typically include:

• graph formation,

• cleaning,

• compression,

• partitioning,

• … and serialisation.

Intel principal scientist Ted Willke asserts that GraphBuilder makes it easy for “just about anyone” to build graphs for interesting research and commercial applications.

Big data analysis relies upon the extensive use of complex graph data and Intel has reportedly worked with researchers at the University of Washington in Seattle to bring this innovation about.

Big data made easy

Discussed the perceived complexity of big data, Wilke says that big data does have structure, “It just needs to be discovered from within the raw text, images, video, sensor data, etc., that comprise it.”

Intel suggests that until recently, only the “wizards of big data” were able to (rapidly) extract knowledge from a different type of structure within the data, a type that is best modeled by tree or graph structures.

According to Wilke’s blog on Intel Labs, “Many of these graphs are very large, with tens of billions of vertices (i.e., things being related) and hundreds of billions of edges (i.e., the relationships). And, many that model natural phenomena possess power-law degree distributions, meaning that many vertices connect to a handful of others, but a few may have edges to a substantial portion of the vertices. For instance, a graph of Twitter relationships would show that many people only have a few dozen followers while only a handful of celebrities have millions. This is all very problematic for parallel computation in general and MapReduce in particular.”


A visualization created by IBM of Wikipedia edits. At multiple terabytes in size, the text and images of Wikipedia are a classic example of big data.