This is a guest blog from Steve Totman, data integration business unit executive of Syncsort
Big data is considered by many to be an integral part of an organisation's IT strategy. But as the technology and strategy grow at a rapid pace, it widens a gap in necessary skills that are needed to unlock the business value of big data.
One of our customers is already seeing this. For comScore, a global Internet information provider, to which leading companies turn to for consumer behaviour insight, data is its commodity. The company has a global cross-section of more than two million members who have granted it permission to confidentially capture their browsing and transaction behaviours. As comScore's customer base grows, so too does the sheer amount of information it is tasked with analysing. This puts pressure on the company's own strategy as well as the need to ensure that their staff are equipped with the right technical skills to handle big data efficiently.
With big data also comes a discussion about new technology and solutions to support it. Technical skills in Hadoop, MapReduce and proprietary commercial big data frameworks are increasingly becoming scarce and as such, those with experience in these practices are commanding higher salaries - even if their experience is limited. This is bucking the traditional trend of IT employment that involves years of hard work, training and experience and so highlights the employment issues facing the IT industry.
So how can this skills gap be addressed?
Big data, in the very essence of the subject requires two things, the IT infrastructure to store the data and a skilled team of analytical minds to enable businesses to understand the vast amount of information. Web logs, machine-generated data such as sensor systems, social media and transaction data is amongst this raft of information that needs to be stored correctly and then analysed by trained and able individuals.
Similarly there are two avenues to consider for companies looking to up-skill within their big data projects. Firstly, organisations should start by looking right under their noses.
The Extract, Transform and Load (ETL) team, otherwise known as data warehousing specialists, already understand the context in which data is used within their business; they know how it is being moved and transformed as well as recognising its value when turned into useful information. For this reason, this team is often best suited to begin the implementation of Hadoop-based big data solutions. Companies should look to bring in additional skills to their ETL teams to create collaborative Data Scientist teams.
But technical proficiency isn't always enough. Organisations need skills that transcend pure data analysis and need to hire individuals that are able to ask the right questions of the data to come up with analytical insights that really add business value. As such, the scientific community could offer the IT industry the necessary bridge for the big data skills gap. Companies should look within the science departments of universities and research houses, as scientists already deal with vast amounts of data and, importantly, are coming up with the right questions to query the information.
Secondly, organisations should follow the lead of those companies that are responding to the skills gap by offering outsourced big data services from third party providers in much the same way that Red Hat did with Linux. Offering Hadoop platforms where organisations can leverage the benefits of big data without having to invest in the necessary in-house resources.
But as the value of data increases and as more sensitive data is produced, do businesses really want to entrust it to a third party? Data is fast becoming the most valuable asset of a business and with questions over security and intellectual property continuing to be asked; Big data will fast become a function that companies will want to keep in-house.
The false economy of outsourcing big data specialists is there for businesses to comprehend. Rather than outsourcing to enjoy short term cost efficiencies, businesses should ideally bring in their own big data clusters and employ people with the necessary skills. For Hadoop to really deliver to its full advantage, it is important that the right data scientists are employed and allowed to roll up their sleeves and get stuck in.
It's equally important to give the IT team the tools to simply and efficiently move data into Hadoop, (e.g. mainframe sources which can be particularly tricky), cope with the steady stream of complex data from disparate data sources including relational, non-relational, cloud-based and SaaS, and emergent, less structured data types - speeding the time and reducing the resources required to collect and query against it.
For every additional month that businesses can experiment with Hadoop and the data at their disposal, the more competent they will become at monetising big data and the more competitive advantage they will achieve.