Big Data demystified

Big Data is the latest buzzword in IT circles. This tip gives you a quick overview of Big Data.

Big Data is the hot new buzzword in IT circles. The proliferation of digital technologies and digital storage and recording media has created massive accumulations of diverse data (Big Data), which can be used for marketing and other purposes. This tip gives you a quick overview of Big Data.

What is Big Data
Big Data refers to massive, often unstructured data that is beyond the processing capabilities of traditional data management tools. Big Data can take up terabytes and petabytes of storage space in diverse formats including text, video, sound, images, and so on. A good example would be websites like Facebook or Twitter, which have data growing by the day. Traditional relational database management systems cannot deal with such large masses of data.

Kinds of Big Data
Big Data consists of such data as search indexes, image and video archives, social networks, research data generated by R&D centers, weather and surveillance data from satellites and other sources, archives of all kinds: company records, medical records, the data generated in such data-heavy fields as astronomy, genomics, economics, and so on. All this data is now being stored digitally, leading to massive accumulation of digital data.

Technological impact
Big Data requires vast storage capacity and new kinds of data mining tools to make it accessible and useful. Major data storage vendors such as EMC, IBM, Hitachi are developing new products to meet Big Data needs. Companies such as Greenplum (EMC) are investing heavily in Big Data mining tools. Big Data mining tools require parallel processing capabilities and storage media with high data throughput rates.

Impact of Big Data on datacenters and datacenter professionals

  • Big Data throws up vast quantities of data to work with. Datacenters will see larger workloads due to Big Data.
  • Datacenter professionals must store, process, and secure what is literally an explosion of data. They may need to develop new skills to stay competitive.

What to consider while evaluating Big Data products

  • Data storage vendors have begun introducing Big Data products. Pay attention to the maximum file system size supported, and the maximum data throughput rate of the storage media being offered.
  • Focus on data transfer rate instead of the IOPS (input/ output operations per second) of the storage media. High IOPS makes better sense when dealing with smaller quantities of data.
  • Big Data needs software tools that can process data significantly faster than traditional data mining tools. It is recommended to ensure that the Big Data hardware and software are mutually compatible. Vendors may recommend using specific Big Data hardware and software together, which has been tested and certified by them for compatibility. For optimum performance, it is also recommended to purchase both solutions from the same vendor.

The future of Big Data

  • The pool of digital data is expected to grow steadily. New digital data is being generated every day by individuals on the internet, by governments and businesses, by universities and research laboratories, by media houses, and by organizations of every size.
  • There is a demand for a new generation of storage media and analytical software that can handle the vast storage and processing requirements of Big Data.
  • Big Data is piped to be the hot new IT trend for 2011.
Anuj Sharma
Anuj Sharma

About the author: Anuj Sharma is an EMC Certified and NetApp accredited professional. Sharma has experience in handling implementation projects related to SAN, NAS and BURA. He also has to his credit several research papers published globally on SAN and BURA technologies.

Read more on Data warehousing