Four signs your business needs a data lake

This is a guest blogpost by Dr. Thore Rabe, vice-president Europe, Middle East, Africa – Isilon Division at EMC

It is now well known that the digital universe, which comprises most businesses’ data needs, is growing exponentially.

In this environment, it is critical that businesses use data analytics to enhance competitiveness and meet the needs of the ‘information generation’: millennials and more born into the digital era. From helping to predict buying behaviours, to driving innovation projects that will enhance customer service or improve business productivity, data lakes that can collate, store and analyse vast amounts of data have great power to transform a business for the better. Analytics should no longer be an aspiration, but a necessity.

However, many organisations get stuck early on in the journey. One of the main reasons is that IT and the rest of the business aren’t always aligned on the best use cases and business goals of big data projects. While some businesses might be experimenting with basic data analysis (and some haven’t even started), many just aren’t prepared for the next level, which is far more complex and in-depth. In fact, a minority of businesses currently (we estimate) have the capacity to be always on and operate in real-time across the organisation, and almost a third haven’t even started doing this.

So, how do businesses know when they need to scale-up and invest in a data lake?

There are four tell-tale signs:

1. Operational complexity: In a pre-data lake environment, if a business is trying to scale its infrastructure but doesn’t have any option for additional FTEs (full-time equivalents) manager support, there’s a good chance that their data requirements will outstrip their ability to manage them. Traditional tier 1 data resources aren’t always pooled virtually, limiting the amount of storage an individual manager can cope with and making a clear case for a more flexible common storage resource, i.e. a data lake.

2. Operational cost: When a company finds that business demands on IT keep growing even when it is trying to reduce OpEx. it is time to look at a new approach. The same operational overheads that limit the ability for additional, FTEs also result in growing OpEx for managing IT resources. In order to address these requirements, businesses either need more FTEs or to invest in additional third party support to monitor, manage, deploy and improve their systems. The latter approach scales an order of magnitude better – or more – than simply adding headcount.

3. Production strain: Another key indicator of the need for a data lake is when existing analytics applications are putting a strain on the production systems of a business. Real-time analytics can be extremely resource-intensive, whether trying to derive insights through video analytics from dozens of HD video streams or poring through a vast waterfall of social content; dedicated resources are needed so that people trying to use the production systems don’t drop-off in performance. Data lakes are key to ensuring that real-time analytics can run at optimum performance.

4. Multiprotocol analytics: A final key indicator that a business needs a data lake is when data scientists are running apps on a variety of different Hadoop distributions and need to hook their data up to them. Businesses will need multiprotocol support in the future as analytics experimentation carries on, and they need to plan for this with a data lake strategy.

Departments like marketing have led the way in analytics adoption, gathering insights to better understand their customers and tailor their communications accordingly, but other business areas are now interested in the benefits it can bring to them, from HR to IT to operations and beyond.

Across the industries, from finance to retail, manufacturing to media companies, each thinks that their problems, challenges and opportunities are unique. But, when you abstract the specifics you’ll always come back to the same universal challenges I’ve mentioned in this piece. What unifies and characterises all of these is the transformation brought about by information technology and the potential of big data.

Not every business will be ready to deploy data analytics yet, but most will, at the very least, need to start planning for it or risk losing out to competitors that embrace the technology. Because, eventually all businesses will need to embrace data analytics, and those that don’t will fade into obscurity.