Big data tutorial: Everything you need to know
A comprehensive collection of articles, videos and more, hand-picked by our editors
The growth of data – both structured and unstructured – will present challenges as well as opportunities for organisations over the next five years.
With growing data volumes, it is essential that real-time information that is of use to the business can be extracted from its IT systems, otherwise the business risks being swamped by a data deluge. Meanwhile, competitors that use data to deliver better insights to decision-makers stand a better chance of thriving through the difficult economy and beyond.
The aim is to be able to use real-time data for real-time decision-making to become a real-time business.
Dave Lounsbury, chief technology officer (CTO) at standards body The Open Group, says social and economic changes are fuelling the rise of business analytics and big data, and organisations must respond to these pressures to be successful.
“Mobile devices, social networks and real-time information are driving big data – be prepared to handle this by developing competence in data architecture and analysis tools. Business leaders know that the ability to get and understand competitive data is gold dust, and they will be knocking on your door requesting it,” he says.
Treating big data as a problem which must be addressed is misguided.
“It is only a problem when viewed from an IT-centric perspective,” says Lounsbury, and he advises discovering the opportunities business analytics presents.
“Understanding and taking charge of your big business data needs are the next challenge. The CIO really needs to look beyond the bits and apps and really work with business heads to understand how they make decisions, and get them the data feeds and analytical tools they need to do this. As data volume increases, the ability to collect and present data in a way that the business can understand, so it can make decisions faster than the competition, will be the key to keeping the business competitive,” he says.
Suranjan Som, head of the business intelligence practice at consultancy Information Management Group, says big data should be part of a wider information management strategy.
“Rather than just collecting and storing data, decisions need to be made about which data is of use to the business. The true value of data comes from being able to contextualise and understand it, in order to deliver insights which can give businesses a competitive advantage,” he says.
Organisations have been looking at big data in a siloed way with groups of individuals playing around with clusters, but Som says this will change.
“The way the economy is going, the world is becoming more finance focused and over the next five years, it will be important to have a business case to justify what you are doing with big data, whether that is gaining competitive advantage or increasing revenue, so tangible value can be demonstrated over and above what the organisation has already achieved with data,” he says.
For big data to achieve insights which deliver real results, it is essential to understand what the business is doing.
“Big data needs to add significant value,” says Som. “For example, if it is in the online space, you need to understand what data is doing beyond the website – and must extend to social sites, such as LinkedIn and Tweets, so you get to know your customer better and can sell up, or in different ways.”
Business leaders know that the ability to get and understand competitive data is gold dust
Dave Lounsbury, The Open Group
A predictable use and growth pattern of big data can’t be foreseen over the next five years, but it is important to make forays now to understand what is involved.
“The traits of big data are varied data types, speed of growth or velocity, and volume with billions of rows,” says Andrew Logie, CTO at consultancy DrPete.
He says key issues over the next five years will be securing data; issues of locality, privacy and regulation; human resources; and project requirements.
“Solutions always work more effectively when they are near users. The challenge with collecting and reporting on big data is that users can be anywhere, and in huge numbers. Using distributed infrastructure and software to deliver solutions over different geographic territories creates its own challenges. For example, within some European countries, legislation requires data relating to individuals to be identified on individual disk drive spindles. This scenario makes using cloud-based infrastructure challenging,” he says.
These are key questions to consider and answers will depend on the nature of the business and its compliance commitments, but for many organisations, finding a practical and cost-effective use for big data is the first step. Som says this often points to cloud services which offer flexibility.
“Big data is a very scalable model. You can start off on the big data journey without having large capital investment because there are interfaces for organisations to build big data clusters on the cloud, so you can start with a simple evaluation and if you find value, you can expand,” he says.
Som suggests starting with a small subset of data. For example, if you have 1,000 product lines, perhaps looking at the top 10 products and data generated by social media sites, and analysing how it works and what insights are gained.
Although big data doesn’t lend itself to SQL queries and benefits from a different way of storing data from traditional relational databases where data is stored in rows and columns, Som says over the coming years the gap between the two will close.
Read more on big data and analytics
“The industry is going through an interesting phase, and there is massive activity to converge traditional relational databases and the big data world with technology that allows you to spoof the way you write SQL, so you can leverage the skills base you already have. It will be a battle to see who brings out the most convergent model, but big data won’t eradicate traditional databases and will stand in parallel to existing structures,” he says.
There are trials underway which will increase in number over the coming years, but it is early days.
“A large investment bank decided to replace its risk database with [big data application] Hadoop to see if it gave a higher performance and throughput – the result was, not necessarily. What works on a traditional database doesn’t necessarily work better on big data technology,” says Som.
However, the benefit of convergence will allow organisations to relate global information generated by social media sites with enterprise information.
“As organisations aim to become more efficient, they need to work on getting more light on the meaning of data to increase market share, but it is no good if they can’t take information from reviews and on Twitter and translate that into revenue. The best way of succeeding is to be able to look at all the information available in real time in one place,” he says.
Accessing information as close to real time as possible relies upon having a flexible, responsive IT infrastructure, so business users can react appropriately. If this is not achieved, there are major risks, warns Som: “In today’s social media paradigm, you can kill or grow a brand within 15 minutes.”
Adfonic is a buying platform for mobile devices, offering advertisers a smarter way to buy mobile advertising space. The company’s business is based on business analytics and offers access to 100 billion monthly ad impressions, reaching more than 250 million mobile users and more than 6,000 campaigns per month.
“The basic premise of our business is we are able to find the exact users and audience that perform well for advertisers’ goals, and you are looking at needles in haystacks,” says CTO and co-founder Wes Biggs.
Computer Weekly Buyer's Guide to Big Data infrastructure
In this 10-page buyer’s guide, Computer Weekly looks at the mindset and technology businesses need to analyse various forms of data, the low-cost solid state memory powering datastreams from social network feeds and the industrial internet and a revision of the traditional approach of matching back-end infrastructure to application requirements.
- Choosing a platform to manage the big data mix.
- Storage struggles to keep up with data growth explosion
- Choosing a platform to manage the big data mix.
“Every time someone uses an app, it creates an ad request, and this is a chance to display an ad which is given to a different number of advertising platforms for real-time bidding for that particular option – how do you value that? This is the crux of why we need big data in our industry; we store and work with tens of terabytes each month.”
With big data, extracting meaning is the Holy Grail. “The key question is, how do we extract value from that data and at what point is it cost-effective? It is less about what data you have, and more about how effectively you can mine that,” says Biggs.
Adfonic has a team of data scientists, and acquiring such skills is a critical task, and will continue to be over the next five years as competition intensifies for big data skills.
“Our team is made up of mathematicians, many with PhDs, who provide the algorithms across 100 billion opportunities to find out what worked and what didn’t, and the key factors in the market to make it most effective to place an ad. For example, a vacation package advertised between 6 and 8pm, aimed at 25- to 34-year-olds, using iOS and so forth. There are a lot of different parameters about each opportunity and advertisers will have different metrics, such as clicks on an ad or based on downloads and installations,” he says.
Although big data skills are at a premium, Biggs says the data scientists are “facilitators” and organisations need to be able to ensure big data is not just meaningful to specialists.
“There may be a skills gap, but we use a specialised skill set to unlock value in order to ask the right questions. The tools need to be made available to leverage the benefits of data and the algorithms we have – for example, the planning tools we have for mobile campaigns,” he says.
But uses for big data must be democratised to support real-time decision-making, or it has no future.
“Big data is not an ivory tower pursuit and if only data scientists have the keys to the kingdom, the field will not advance. It may take data scientists to ask the right questions and create the right configurations, but once it’s established it should be put in the hands of everyone within the organisation,” says Biggs.
The technology for analytics and big data needs to be responsive and there are barriers.
“With how much data we’ve got and how fast it’s generated, you can hit bandwidth constraints. Naturally the move is towards more of a cloud model, but you need to understand the trade-off between processing in the cloud and moving data around, versus the cost of storage of data and in-house analytics. Everything we do has to improve the return on investment for ourselves and our customers,” he says.
The true value of data comes from being able to contextualise and understand it, in order to deliver insights.
Suranjan Som, Information Management Group
Biggs believes the anxiety over which tools to use will decrease over coming years and a mix of both traditional relational database and SQL tools with big data technologies and NoSQL tools will be used, as they converge and take on each other’s qualities.
“Organisations get hung up on what format and storage to use, but the important question is what you do with the data which is useful to the business. From my point of view, it’s what tools help you answer the questions you need to answer most efficiently and there are different tools for different scenarios, but there is a lot less difference than the purveyors of each world want us to believe,” he says.
Biggs says MySQL has a very advanced cluster product, which shares many traits of NoSQL big data technologies, while NoSQL products have added structured query functionality.
Adfonic uses a mixture of tools to help pinpoint value, based on fuzzier unstructured data alongside structured data to help get the most accurate answers.
“There’s not much value in big data for big data’s sake. It’s just a code word for a set of new tools that try to solve business analytics in a more economic way,” he says.
The better approach for organisations is to ask, “Do I have a business intelligence strategy fit for purpose for my business?”
“If the answer is no, then look at the big data tools on the market,” says Biggs. “They will all start to roll up into big services offered by platform providers, but it’s beneficial the start-ups have disturbed some of the complacency in the market to help drive the efficiency organisations need to deal with growing volumes of data.”