3 ways data lakes are transforming analytics

| No Comments
| More

This is a guest blogpost by Suresh Sathyamurthy, senior director, emerging technologies, EMC

Data lakes have arrived, greeted by the tech world with a mix of scepticism and enthusiasm. In the sceptic corner, the data lake is under scrutiny as a "data dump," with all data consolidated in one place. In the enthusiasts' corner, data lakes are heralded as the next big thing for driving unprecedented storage efficiencies in addition to making analytics attainable and usable for every organization.

So who's right?

In a sense, they both are. Data lakes, like any other critical technology deployment, need infrastructure and resources to deliver value. That's nothing new. So a company deploying a data lake without the needed accoutrements is unlikely to realize the promised value.

However, data lakes are changing the face of analytics quickly and irrevocably--enabling organizations who struggle with "data wrangling" to see and analyze all their data in real time. This results in increased agility and more thoughtful decisions regarding customer acquisition and experience -- and ultimately, increased revenues.

Let's talk about those changes and what they mean for the world today, from IT right on down to the consumer.

 

Breaking data silos

·         Data silos have long been the storage standard -- but these are operationally inefficient and limit the ability to cross correlate data to drive better insights.

·         Cost cutting is also a big driver here. In addition to management complexity, silos require multiple licensing, server and other fees, while the data lake can be powered by a singular infrastructure in a cost efficient way.

·         As analytics become progressively faster and more sophisticated, organizations need to evolve in the same way in order to explore all possibilities. Data no longer means one thing; with the full picture of all organizational data, interpretation of analytics can open new doors in ways that weren't previously possible.

 

Bottom line: by breaking down data silos and embracing the data lake, companies can become more efficient, cost-effective, transparent -- and ultimately smarter and more profitable -- by delivering more personalized customer engagements.

 

Leveraging real-time analytics (Big Data wrangling)

Here's the thing about data collection and analytics: it keeps getting faster and faster. Requirements like credit card fraud alert analytics and stock ticket analytics needs to happen seconds after the action has taken place. But  real-time analytics aren't necessary 100% of the time; some data (such as monthly sales data, quarterly financial data or annual employee performance data) can be stored and analyzed only at specified intervals. Organizations need to be able to build the data lake that offers them the most flexibility for analytics.

Here's what's happening today:

·         Companies are generating more data than ever before. This presents the unique problem of equipping themselves to analyze it, instead of just store it -- and the data lake coupled with the Hadoop platform provides the automation and transparency needed to add value to the data.

·         The Internet of Things is both a data-generating beast and a continuous upsell opportunity -- provided that organizations can provide compelling offers in real time. Indeed, advertisers are on the bleeding edge of leveraging data lakes for consumer insights, and converting those insights into sales.

·         Putting "real-time" in context: data lakes can reduce time-to-value for analytics from months or weeks, down to minutes.

Bottom line: Analytics need to move at the speed of data generation to be relevant to the customer and drive results.

 

The rise of new business models

Data lakes aren't just an in-house tool; they're helping to spawn new business models in the form of Analytics-as-a-Service, which offers self-service analytics by providing access to the Data lake.

Analytics-as-a-Service isn't for everyone -- but what are the benefits?

·         The cost of analytics plummets due to outsourced infrastructure and automation. This means that companies can try things out and adjust on the fly with regard to customer acquisition and experience, without taking a big hit to the wallet.

·         Service providers who store, manage and secure data as part of Analytics-as-a-Service are a helpful avenue for companies looking to outsource.

·         Knowledge workers provide different value -- with the manual piece removed or significantly reduced, they can act more strategically on behalf of the business, based on analytics results.

·         Analytics-as-a-Service an effective path to early adoption, and to getting ahead of the competition in industries such as retail, utilities and sports clubs.

Bottom line: companies don't have to DIY a data lake in order to begin deriving value.

Overall, it's still early days for Data lakes, but global adoption is growing. For companies still operating with data silos, perhaps it's time to test the waters of real-time analytics.

App-based approach key to achieving efficient self-service Business Intelligence (BI)

| No Comments
| More

This is a guest blog by Sylvain Pavlowski, senior vice president of European Sales at Information Builders

As workers and business units clamour for more control over data analysis to gain insights at their finger tips, there is a rise in the use of self-service business intelligence (BI) tools to meet demands. But, this is not without its challenges for IT teams in particular.

A gap between business users and IT has ensued because historically IT departments have created a centralised BI model and taken ownership over BI. They want to maintain control over aspects like performance measures and data definitions, but workers are striving to gain access to the data they want, when they want it, and don't want IT to 'hand hold' them. This is creating a redistribution of self-service BI and could inhibit business success if IT departments and business users don't find a happy medium.

Gartner argues that, "Self-service business intelligence and analytics requires a centralised team working in collaboration with a finite number of decentralised teams. IT leaders should create a two-tier organisational model where the business intelligence competency centre collaborates with decentralised teams."

I agree that to manage all types of data in one place in one structure is difficult at the best of times but it's all the more difficult these days with a move towards individualism and personalisation where users want to help themselves to the data they need for their job roles, in real time. To manage the push and pull between IT and users, businesses need to look at ways to redefine self-service BI, and it's not just about the IT organisational model. An approach needs to address more than IT departments' needs.

Implementing an app-based approach to self-service BI can help appease everyone concerned. IT departments can build apps for self-service BI to serve every individual, irrespective of back end systems and data formats. "Info Apps", for example, is a new term used to describe interactive, purpose-built BI applications designed to make data more readily accessible to those business users who simply don't have the skills or the technical know-how to use complex reporting and analysis tools, to satisfy their own day-to-day needs. Some studies have even shown that such individuals can make up more than 75% of an organisation's BI user base. Using an app-based approach is therefore an extremely effective way to give business professionals the exact information they need, through an app paradigm, without requiring any analytical sophistication.

Next-generation BI portals play an important role here too. They can provide enterprises with a way to seamlessly deliver self-service BI apps to business users. By organising and presenting BI apps to users in a way that is simple and intuitive (similar to the Apple App Store), companies can empower workers with faster, easier, more interactive ways to get information.

These next-generation portals also offer high levels of customisation and personalisation so business users have full control over their BI content at all times. They will be empowered with the ability to determine what components they view, how they're arranged, how they're distributed across multiple dashboard pages, and how they interact with them. By offering unparalleled ease and convenience - giving them what they need, when and how they want it - organisations can encourage business users to take advantage of self-service BI in new and exciting ways, whilst having the peace of mind that IT departments are ensuring data quality and integrity in the background. This will all drive higher levels of BI pervasiveness, which in turn, will boost productivity, optimise business performance, and maximise return on investments. 

Understanding data: the difference between leaders and followers

| No Comments
| More

A guest blogpost by Emil Eifrem, CEO of Neo Technology.

Data is vital to running an efficient enterprise. We can all agree on that.

Of course, from there, thoughts and opinions differ widely, and it's no surprise why.

Too much of the data conversation is focused on acquiring and storing information. But the real value of data is derived from collecting customer insights, informing strategic decisions and ultimately taking action in a way that keeps your organisation competitive.

Leaders who conduct this level of analysis distinguish themselves from the rest. Data followers merely collect; data leaders connect.

Yet, with so many ways to analyze data for actionable insights, the challenge is to find the best approach.

The most traditional form of analysis is the simplest: batch analysis where raw data is examined for patterns and trends. The results of batch analysis, however, depend heavily on the ingenuity of the user in asking the right questions and spotting the most useful developments.

A more sophisticated approach is relationship analysis. This approach derives insights not from the data points themselves but from a knowledge and understanding of the data's entire structure and its relationships. Relationship analysis is less dependent on an individual user and also doesn't analyse data in a silo.

Real-World Success

Take a look at the biggest and best leading companies and you'll see a strong investment not only in data analysis but also analysis of that data's structure and inherent relationships.

For example, Google's PageRank algorithm evaluates the density of links to a given webpage to determine the ranking of search results. Or consider Facebook and LinkedIn: each site evaluates an individual's network to make highly relevant recommendations about other people, companies and jobs.

Together, these three organisations have developed real insight into their customers, markets and future challenges. In turn, they have become leaders in the Internet search, social media and recruitment sectors, respectively.

Every Data Point Matters

When it comes to effective data analysis, your enterprise must be gleaning insight from all of the data at its disposal, not just a portion of it.

With so much data to sift through, it's no surprise that most organisations fall into a similar trap, focusing their data analysis efforts on a small subset of their data instead of looking at the larger whole.

For instance, it's much easier for enterprises to only examine transactional data (the information customers supply when they purchase a product or service). However, this subset of data can only tell you so much.

The vast store of data a typical enterprise doesn't use is known as "dark data." Defined by Gartner as "information assets that organisations collect, process and store during regular business activities, but generally fail to use for other purposes," mining your dark data adds wider context to insights derived from transactional data.

Of course, data only tells part of the story with surface-level analysis. Enterprises need curious and inquiring minds to ask the right questions of their data. That's why so many leading organisations recruit data scientists solely to make sense of their data and then feed these insights back to strategic decision makers.

Ultimately, the real value of data lies not only in bringing your enterprise closer to the customer but also to prospective customers. And building a better bottom line is something we can all agree on.


Our data has to be perfect." No, it doesn't.

| No Comments
| More

Data is not really perfectible and ultimately, perfection is the enemy of progress, writes James Richardson, business analytics strategist at Qlik and former Gartner analyst, in a guest blog.

Ask people what slows or stops the use of business intelligence (BI) in their organisations and poor data quality is often one of first things they say. 

Now, I'm not going to argue with that view - I've been around BI for too long to do so and know that lack of good quality data a very real issue, particularly as organisations begin to put focus on BI and using data to drive decisions.  In fact I've written in the past on how to approach the procedural issues that give rise to much poor quality data. (If you have a Gartner subscription you can read my 2008 research note 'Establish a Virtuous Cycle of Business Intelligence and Data Quality'.) 

There's no doubt that addressing data sourcing processes helps ameliorate basic errors, build trust and overcome the initial resistance that's common in any BI programme's initial stages.

What I do take issue with is when I hear that "data quality needs to be perfect in order for us to roll out our dashboards".  In this case the aphorism that "perfection is the enemy of progress" is really true.  So, why do people set themselves this impossible goal?  Well, first you've got to consider who's saying this. In the main, it's IT or technical staff.  They often feel exposed because although they know the data's safe, secure and backed up (hopefully!), they've often no idea if its content is good or bad (and nor should they - it's not their job). The irony is that it is not until the data is exposed to active usage by managers, decision makers or analysts that the quality and therefore usefulness of data sets becomes truly apparent.    

The effort spent improving data has to be balanced against the value of doing so.  For most uses, it's not perfect data that's needed, but data within acceptable tolerances. Of course, the level of tolerance varies by function and use case. For financial data the tolerance for error is obviously very low. That's not such as issue, as the perfectibility of financial transaction data is within reach, but only because of the huge effort that goes into its stewardship. The whole practice of accountancy and auditing is fundamentally about data quality, as its aim is to remove as much error from the financial record as possible.  The fact that data is generated and controlled inside our organizations also helps.  In other, less regulated, functions the tolerance can afford to be somewhat less rigorous.  Why is this?  Because people need data to answer business questions right now! Data that's 80%+ accurate may be enough for many operational or tactical decisions.  They may only need a hint at the direction that data is taking overall for it to be valuable, and they may only need it for this instance. Immediacy often trumps purity.

Getting perfect (or even near perfect) data requires herculean efforts.  Zeno's paradox applies. It becomes harder and harder to reach perfection as the volumes and diversity of data grows. It simply isn't possible (or cost effective) to make all sources perfect to the same degree.  There's another big question - what does the perfectionist do about data which is not perfect? Conform it? Therefore changing it, with the risk of over-cleaning the data, or "correcting" errors incorrectly? We have to accept the fact that data is rarely perfectible, and getting less so, and mature our approach to data quality to ensure that it's fit for purpose in today's information environment where burgeoning and varied data flows into our organizations.

I'd go further and say that the myth of perfect data is more dangerous than decision makers understanding that the data you have is dirty, has anomalies and glitches.  Finally, reaching perfection, were it even possible, might not be a good thing.  Why?  Because perfection also implies a rigid standard, and a fixed frame of reference itself can limit innovative thinking, by stopping people answering really fundamental questions.  Blind faith in the certainty of perfect data can never stand up to the shock of the new, to those things we can't or won't see coming.

Why data is more valuable when it's shared

| No Comments
| More

This is a guest blogpost by Glen Rabie, CEO, Yellowfin.

 The role of collaboration in decision-making has been a question for academics and business leaders since modern business began. In fact, well before. In ancient Athens, back in 500 BC, the Greeks ran what can arguably be viewed as the world's first formal collaborative decision-making process. Each Athenian (excluding women, slaves and people from the Greek colonies of the time) was invited to vote, not for a representative to make laws, but actually on the merits of each individual law.

Today, business leaders are also acutely aware of the merits of making decisions collaboratively - involving different stakeholders that can help the company arrive at the best course of action. In a recent Economist Intelligence Unit study titled Decisive Action, 87% of senior decision-makers claimed to involve others when making decisions. Similarly, when asked what single factor would improve their ability to make better decisions, over a third said "being able to take decisions more collaboratively".

 BI's historic shortfalls

In the Business Intelligence (BI) industry, our job is to empower people and organisations to make more decisions based on a solid foundation of trustworthy and easy to interpret data. Unfortunately though, BI vendors have done a poor job of thinking beyond the initial analysis - the focus was placed on core analytics and the technical community alone. Usability (the ease of data consumption by business users) and, importantly, an enterprise's ability to share those insights tended to be afterthoughts - if they were considered at all. When I founded Yellowfin, after a long career working with inflexible BI tools on behalf of one of Australia's 'Big Four' banks, I wanted to change exactly this situation. I wanted to remove the cost and complexity. I wanted Yellowfin to help make BI easy.

 In a world where BI technology is becoming more pervasive, and insights can be valuable to ever increasing numbers of managers and employees, the trick was surely to make things simple for BI consumers. That is, to empower business users to make better, faster and more independent fact-based decisions by focusing on how data is displayed and shared. Collaboration is a huge part of this. There is very little point in having world-beating analysis if it is the exclusive preserve of a limited number of people and is hard to interpret and act on amongst decision-making groups.

 Moving to a collaborative BI environment

Advances in the Internet - particularly the Web-based interfaces of pervasive social media platforms - have taught me and many others some valuable lessons. The rise of social media tools like Facebook, Twitter and Instagram demonstrate how successful you can be by playing the role of a content facilitator - allowing content generated by users to be shared, distributed and interacted with by an interested user base of content consumers. If people on Facebook want to comment on or share a photo, they can. Why should BI be any different?

The omnipresent and collaborative nature of such social media platforms has many people in the modern workforce quite rightly wondering why enterprise BI can't be architected in a similar way. Downloading analysis to a static dashboard or spreadsheet and emailing it to colleagues, then phoning to discuss, then undertaking new analysis based on the new questions and then emailing another static chart simply isn't competitive or efficient practice. It won't deliver better, more accurate decision-making, and it's painfully slow.

 Closing the gap between decision-making and the data

What's needed is an acknowledgement that human decision-making today is, on the whole, taking place outside the BI platform. It could be in a meeting room or a conference call but, all too often, it is not where the data resides. Why not collaborate, and make collective decisions, within the BI platform itself? Why not facilitate the decision-making process alongside the data and data analysis, where stakeholders can interact with live datasets in real-time, add comments, make revisions and collaborate until the correct decision is reached? This is the direction in which our developers have been moving for some years now, and it's consistently been one of the areas our customers have told us they value.

Imagine a scenario where a restaurant manager wants to know how a new product line has been performing to decide if he will continue to stock it. Not only can he instantly view product performance via a self-service chart or dashboard, he can then annotate the chart and share it with other store managers to obtain their thoughts and insights.  He can even start an entire discussion thread around the performance of this new product, allowing others to contribute knowledge and other relevant BI content, to establish the underlying factors impacting performance and to agree on a desired course of action. Allowing such collaboration enables users to connect trends in their data to real-world events more readily, providing more context and deeper, faster insight. Perhaps the product has undersold due to a company-wide stock take shutdown during the end of the financial year? Or perhaps other store managers have experienced more success because they've promoted the new product with a series of discount coupons.

Data alone doesn't deliver ROI; it's the quality of the respective business decisions that yield the benefit. When data is shared - and therefore complemented with a range of appropriate human insights and other contextual information - it is easier to take smarter collective action that delivers better business outcomes. That's why I believe organisations should be focusing attention on collaboration as a means of increasing the value of their data, which improves decision-making processes and enhances the business benefits derived from BI.

   

 

 







Alan Turing Institute head Howard Covington has opportunity to boost UK economy

| No Comments
| More

This is a guest blog by David Richards, President, co-founder and CEO of WANdisco

Alan Turing may not have known it at the time, but he was one of the first pioneers of the data science industry. Seventy years on, we're seeing the rise of the data scientist, fuelled by an increasing realisation by organisations across the world that they require new leadership if they are to get the most out of their data. This is more than just another fickle trend; a quick search on LinkedIn reveals that "data scientist" now appears in roughly 36,000 help-wanted posts.

 If there's one thing big data can teach business leaders, it's this: be prepared to challenge your assumptions.

 Take the case of a major US insurance broker who decided, after years in the business, to stress test the actuarial assumptions which had formed the basis of their policies up until that point. Applying big data analysis to their business model showed them that their assumptions had all been wrong -- that the policies they had been selling had been flawed all along.

Or that of a global bank, who had identified China as a key market for expansion. After months of planning and investment, their data was too fragmented to run an analysis on the return on investment of the campaign to date - so efforts continued. When the data was brought together and analysed using Hadoop, the result was startling: not only had the bank made no profit, but it had been running its Chinese expansion at a loss.

 With traditional analytics methods rapidly being replaced by big data science, there is a great opportunity for businesses and governments alike.

The first chair of the Alan Turing Institute, Howard Covington has been announced. His task will not be easy: when the government-backed institute was first announced, George Osborne said it should enable Britain to "out-compete, out-smart and out-do the rest of the world" - no walk in the park. But if he plays Covington cards right, this could be a big step in Britain's big data opportunity.

 The Alan Turing Institute is a move straight from the Silicon Valley playbook, as the government hopes it will be "a world leader in the analysis and application of big data." Covington, whose background lies in investment banking and asset management, has said that the priority of the institute will be "leading-edge scientific research", with industry application of that research a close second.

 The Institute won't be able to do this alone. The strength of UK universities research centres can be amplified tenfold by the application of the private sector - industry experts will be able to throw a little Silicon Valley know-how into the mix.

 Companies like Hortonworks, Pivotal and WANdisco are helping industries from banks and utility providers, to hospitals and government agencies deploy big data strategies. It is something we have been doing for years, and rather than developing our products in isolation, we have been part of a continuous dialogue with customers. The experience of such companies will be invaluable to the Institute as it sets up its priorities, in my view.

 This is all the more important as data science is counted in dog years - the industry is accelerating so fast that, in terms of changes, it's like packing seven years into one.

 As Covington considers which industry professionals to consult, it is vital that the private sector representatives include vendors rather than end users alone - he will need to consult the companies designing and operating the technology, rather than those simply benefiting from it. Not doing so would be like investigating national spending habits without consulting a single bank.

 It is encouraging to see the UK government realise the importance of big data science, and investing in projects such as the Alan Turing Institute. But it is important that it also appreciates that heavy-handed legislation could do more harm than good in the long run. We need researchers and industry leaders to communicate with policy makers, to ensure innovative thinking is safeguarded from stifling regulation. The Institute is in a prime position to a communicator between these bodies.

 The big data industries are set to be worth £216 billion to the UK economy by 2017, and I expect that the Alan Turing Institute will play an important part in ensuring that the UK delivers on its big data promise. But with limited resources, it needs to make every action count. A focused plan, one that speaks to government as much as it does industry, will be critical in doing this.

Dave Goldberg: on history, Silicon Valley, failure as a virtue, and London as a technology hub

| No Comments
| More

Dave Goldberg, who died in May 2015, was someone I was looking forward to talking with more about the history and significance of Silicon Valley; and about the attempt to emulate its success of east London's Tech City.

London Technology Week, this week, is a good moment to reflect.

Mr Goldberg was a Silicon Valley executive - CEO of Survey Monkey at the time of his untimely death - who studied History and Government at Harvard University. His widow, Sheryl Sandberg is, as is well known, the COO of Facebook and author of Lean In, an influential book about how women can flourish better in leadership positions in business and government.

When I met him, I asked Dave who was his favourite historian, and he mentioned, as influential on him, the journalist and historian David Halberstam, author of The Best and the Brightest about the origins of the Vietnam War, and the young consiglieri around John F. Kennedy.

He was kind enough to ask me who were mine. EP Thompson and Hugh Trevor Roper was my reply - both great English stylists, though at opposite ends of the political spectrum, democratic communist and high Tory respectively, but I digress.

Does Silicon Valley, I asked, lack a sense of history, and does that matter? He responded: "I don't know that it lacks a sense of history, but there is a healthy scepticism about incumbency, and a desire to be disruptive that is unusual. Entrepreneurship is about the triumph of hope over experience.

"It's not that people don't know the history and don't think it is important. But they are willing to go against the odds, and any rational decision making process, and do something that does not look probable.

"There is a lot of utopianism and an idealistic view of the future, and I am not sure I believe in all of that stuff, but I am generally an optimist. One of Silicon Valley's distinguishing strengths is its pervasive optimism in the face of great challenges.

"Also, and I think this is a good thing, failure is a virtue and not a black mark in Silicon Valley. People will look at someone who has started a company and failed and someone who has learned some stuff, and will do better next time. Most other places will say: 'well, that guy failed, so why would we want to invest in him'?

"So, failing fast, yes, but failure - failure is a virtue. You know, Travis [Kalanick], who founded Uber founded two other companies, one of which failed and one of which was a modest success. Reid Hoffman, who started LinkedIn, founded a previous social networking company that failed. The history is littered with people who had failure before they had success. Even Steve Jobs, being fired from Apple.

"The history of Silicon Valley is about not letting failure get in the way of success, and that is different".

Could the UK, and London specifically, replicate that?

"I studied American history and government, and we would talk a lot about institutional memory, and where that was located, whether in companies or government. In Silicon Valley that memory lies in the service professionals around the entrepreneurs - the lawyers, accountants, the PR firms, the real estate agents, the recruiters, and so on. That is an advantage that is not well understood, that connective tissue that transmits the knowledge to the next group of 24-year old entrepreneurs who come along. Now, London is becoming one of those places, too, with Berlin maybe second. In fact there is probably more of that connective tissue in London than in New York. That is a big change over the past five years.

"Is London going to be bigger than Silicon Valley for technology companies? That is unlikely. But should it be the hub for Europe? Yes. Big global companies can start here [in London]".

Nevertheless, Dave Goldberg registered the negatives of Silicon Valley. "We've got a lot of things to work on. There is terrible infrastructure - roads, poor mass transit, high real estate prices -- rivalling London's!"

"The biggest issue is we don't have enough diversity. Not enough women or ethnic minorities. And there is ageism, speaking as an older person myself! There is a myth that all tech companies are founded by 24-year old college drop outs, and it is not true. Most of the data shows that the most successful entrepreneurs are those who start companies in their late thirties".

Dave Goldberg was 47 when he died. Too young.

Different horses for different courses: NoSQL and what your business needs

| No Comments
| More

This is a guest post by Manu Marchal, managing director EMEA at Basho Technologies

While the importance of distributed databases has become more apparent for a large number of businesses, with more and more enterprises in a wide variety of industries identifying the power of harnessing unstructured data, there are still many misconceptions about NoSQL.

It is a common misconception that NoSQL databases act as all-purpose Swiss-army knives, with each platform able to address each enterprise's specific data needs. This is a myth that should be dispelled - NoSQL platforms conform to the old adage that it takes different horses for different courses, with each one offering a variety of strengths and weaknesses, and each capable of catering to enterprises' own specific needs, whether they require speed, reliability, flexibility, or scalability.

When faced with a multitude of choice, it is a natural human reaction to seek out the quickest option. With databases, there is often the assumption that the platform providing the fastest speeds is the one most suited to their organisation. This, however, is not the case, and as a carpenter wouldn't use a hammer to sand a surface, enterprises should only select the platform that suits their needs best.

In the gaming industry, for example, it is of the utmost importance to process huge amounts of data quickly and reliably, ensuring that customers who want to place a bet at a specific time can do so. This data is changing so frequently - from score updates, to red cards, to number of corners - that it is absolutely imperative that gaming companies can empower users to place bets without delay while also updating odds and processing pay-outs. Needless to say, there's a lot of scope for things to go wrong here, and if this were to happen it would cost the company a great deal of money. Of course speed is important to the gaming industry, but arguably not as vital as a platform that can be relied upon to smoothly process the data under extreme duress and to not falter during failure scenarios.

Our own technology Riak, for example, is fast but not the fastest on the market. What it does do well, however, and why it is now being used by bet365 to process the enormous amount of data the company relies upon, is reliably scale and ensure performance under pressure, a vital asset for businesses who can't afford for any increased latency during peak times. Riak is made for mission critical applications and gives organisations that rely on such applications peace of mind. Now, we're not saying that this is what your organisation is looking for - perhaps you actually do need explosive speed - what we are saying is that enterprises are different, and so are NoSQL platforms.

By being aware of just how differently each platform can serve their enterprise, IT managers can better cater to their requirements and select the most appropriate platform for them, rather than finding out the hard way that for businesses and NoSQL platforms, there is no one-size-fits-all.

The growing importance of customer data governance

| No Comments
| More

This is a guest post by Sid Banerjee, CEO, Clarabridge

We are living in the "age of the customer," as Forrester Research recently dubbed it. There are more and more communications channels -- from company websites to call centres to social media -- for customers to interact with the companies they do business with. As a result, they also have high expectations when it comes to having their feedback heard and considered.

For businesses, this era of customer-centricity presents both challenges and opportunities. Acting on feedback straight from the customer's mouth can directly impact a company's bottom-line by reducing metrics such as customer churn.

But there are a few steps between customer feedback and that impact. Companies must make sure that they have the technical skills and capabilities to connect to all relevant customer experience data sources, and be equipped to bring all that data together for holistic and meaningful analysis. But even before that, companies must have the right data governance in place, which a foundational piece for any advanced analysis and action.

Traditional business intelligence (BI) data is generally very explicit and structured, focusing on what has already happened. The universe of structured data is vast, including demographic information, purchase history, digital engagement, multiple chose survey responses, and other CRM data. Businesses have had their hands full analyzing and interpreting these data sets for years, but now the big data challenge is becoming increasingly urgent.

Adding unstructured customer data

When it comes to customer experience management, all of this data must be combined with unstructured customer feedback data. This data includes social media comments, online reviews, call center recordings, agent notes, online chat, inbound emails, and free-form survey responses. Businesses need to consider this data alongside structured data for a complete picture of the customer experience. That's why, in the age of the customer, next-generation experience technology and techniques - like text analytics, sentiment analytics and emotion detection -- are not optional.

Data governance is crucial here, as it ensures that everyone is speaking the same language when it comes to the information's meaning. When you drop in customer feedback and sentiment data on top of historical data tied to transactions and promotions, there needs to be a standardizes process for interpreting it and distributing it. While many companies have processes already in place to manage high-priority enterprise data, it can be challenging to incorporate new streams of large, unstructured data into those rules and processes. Just consider this estimation from Anne Marie Smith, principal consultant at Alabama Yankee Systems: "I would venture to say that if you took the totality of companies that are engaging in some form of structured data cost governance, not even 1%, maybe one-half of 1% of them, are engaging in any form of unstructured data governance, for a variety of reasons."

Know what you are looking for

One reason that data governance is especially difficult in the omni-channel, unstructured age of the customer is because data governance, at a basic level, requires an understanding of what information and insights you're looking for. Companies shouldn't create high quality data for the sake of creating high quality data, but should have their eyes on a specific business goal. When it comes to customer data, one of the big challenges we've seen is that folks don't know exactly what data is relevant in the first place; they aren't sure what data to listen to and analyze, much less how to consistently work with it to gain meaningful insights that will impact business performance.

The lesson: When it comes to new sources of data, figure out what you want to get from it before you dive in. And then, use industry templates garnered from others work as a platform to start. Basics like data governance make up the foundation for a common understanding of the customer across your business, as they enable high-quality and advanced analysis for both explicit and implicit information, which is becoming a requirement in order to deliver the increased level of attention being demanded by customers.

Hadoop - is the elephant packing its trunk for a trip into the mainstream?

| No Comments
| More

This is a guest blog by Zubin Dowlaty, head of innovation and development at Mu Sigma.

 

Hadoop, the open-source software platform for distributed big data computing, has been making waves over the recent past. The IPO of HortonWorks in December 2014 contributed to that, and the stock market ambitions of the other two main distributors of Hadoop, Cloudera and MapR, have also been fanning the flames. Getting these big data technology companies trading as public institutions will create greater confidence in the technology. The increased funding levels will signal that these technologies are now proven, boosting their uptake.

A Schumpeter creative wave of technology destruction is occurring in the analytics space right now, triggered by Hadoop. It is quite amazing to witness the speed with which this is occurring. Larger enterprises are now eyeing it up for their corporate infrastructure; the technology has been set en route to becoming more accessible to business users rather than just data scientists. However, to exploit this opportunity, enterprises need to be willing to adopt a different mindset.

En route to an enterprise-scale solution?

Over the last year, the industry has seen widespread deployment of Hadoop and associated technologies across many verticals. Furthermore, significant momentum has started building in the enterprise segment, with Fortune 500 companies taking Hadoop more seriously as a data-operating platform for an enterprise-scale and -grade applications. Companies of this size have the muscle to take the technology from the 'early adopter' to 'early majority' stage and beyond, creating a network effect: as more - and more significant - companies implement Hadoop, others follow.

From the Hadoop solution perspective, the technology stack using Hadoop 2.0 and YARN is the critical technology component that has enabled Hadoop to become more of a general OS or computing platform for an analytics group, and not just a niche computing tool.

Technologies such as Apache Spark, Impala, Solr, and STORM, plugged into the YARN component model, have accelerated adoption for running real-time queries and computation. Technologies like ParAccel, Hive on Tez, Spark SQL, Apache Drill from a range of vendors have been created to support data exploration and discovery applications. SQL on Hadoop is another area which has seen a lot of traction in terms of development.

SPARK stands out as it has given the data science community a programming framework for creating algorithms that run more quickly compared to other technologies. It has come a long way to be considered as the new open standard in Hadoop and with robust developer support it is expected to become the de-facto execution engine for batch processing. Batch MapReduce is slow for computation but great for handling big data. With SPARK, data scientists will have fast in-memory capabilities for running algorithms on Hadoop clusters.

Governance and security for Hadoop clusters is still evolving, but these areas have progressed and the main vendors have recognized them as weaknesses, so they can be expected to improve in the short to medium term.

Wringing a lot more ROI for business people

In 2015, apart from scaling their Hadoop initiatives, companies will also be looking for the return on their data and infrastructure investments.

From a technology perspective, YARN will continue to gain momentum as it can support additional execution and programming engines other than MapReduce. Given the flexibility it brings to the table, it will help build more big data applications for better consumption by business users rather than just data scientists.

Analytical applications leveraging concurrent analysis will push analysts to adopt real-time or near real-time computation over the traditional batch mode.

Adoption of scalable technologies in storage, computing and parallelization will increase as more and more machine-generated data becomes available for analysis. Current BI, hardware and analytics-led software architectures are not suitable for scale. They will need to be revisited and carefully thought through. The industry is looking out for standards in this area, and a unified platform that offers an end-to-end solution.

Toolset, skillset, mindset

When it comes to the adoption of advanced technologies such as Hadoop, an organization can acquire toolsets and skillset over a period of time but the largest challenge lies in changing the mindset of the enterprise community as it is deeply ingrained.

For example, large organizations are still struggling with the need to shift from central Enterprise Data Warehouse frameworks towards more distributed data management structures. Similarly, deep-seated trust in paid solutions needs to give way to greater adoption of open source models and technologies, which are now very mature.

It is important to move away from the current 1980s technology and application mindset, and truly scale up in order for enterprise end users to reap the full benefits of Big Data insights and make better decisions. A holistic approach bringing math, business and technology together within a 'Man-Machine' ecosystem would be the key to achieving it.

Think scale, think agility, think continuous organizational learning - that is what technologies like Hadoop can make possible.







The Rise of Punk Analytics

| No Comments
| More

This is a guest blog by James Richardson, business analytics strategist at Qlik and former Gartner analyst.

England's dreaming ...

By the mid-1970s rock music was dominated by "prog-rock" and long, complex, concept-laden albums. The music was created using multi-track recording and very difficult to replicate live without trucks full of equipment and lots of highly-skilled session musicians.

But things changed after 1976; the 'anyone can play guitar' do-it-yourself (DIY) ethic of punk altered everything in rock, stripping it back to its essence and making it simple again. Further, the punk attitude then extended to fashion, to art, to design. This empowered people. We all thought "I can do that." Punk made us fearless. Sure, it was stupid sometimes. But it was joyful, and inclusive.

This transition, from exclusive domination by specialists to inclusive accessibility is a trend repeated in many fields. Take this blog, through the medium of the internet I can write and publish without needing the help of editors, typesetters, printers, distributors etc. Anyone with an opinion can share it. It's punk publishing. We're all in control of the presses.

So, what have we seen in BI until very recently? A field dominated by mavens, a small number of technical specialists whose role was predicated on arcane skills, and a large number of business people in their thrall. People who, just like rock fans in the early 70s waiting for the next double album to be released, waited months for a data model to be designed and a report coded that would deliver what they needed. These truly were data priests. Like Rick Wakeman, behind a stack of expensive keyboards, this approach stacked costly technology on technology. Even the nomenclature was defiantly and deliberately obscure, "yeah, we need an EDW fed via ETL from an ODS, and then a fringed MOLAP hypercube to enable drilling with a hyperbolic tree UI...". And the business people went "wow, it's really complicated" whist feeling vaguely shut out of the process of creation and remote from the data. The mavens sought virtuosity and aspired to deliver to a high concept - a set of clear user requirements - that they could deliver the whole of in one 'great' work, no matter how long it took. But business decision makers got bored of this, bored of waiting, bored of complexity - it wasn't helping them - and looked for an alternative way, a do-it-yourself way.

So, by now you're likely anticipating where I'm going with this train of thought. In the last few years we've entered the era of punk-style analytics. With the rise of new technologies that circumvent much of the need for mavens anyone can play data nowadays. This new approach displays characteristics shared with punk:

  • No barriers. You can download data discovery products for free, and get started with nothing to stop you except access to the data you want to explore.  There's no need to wait for someone else to provide you with the means to get started.
  • Mistakes are part of the process. Jamming with data is very often a trigger to finding insights. We get better through trying stuff out.  Both in terms of our use of an analytic software product and our familiarity with the data.
  • Fast is good. Think of a Ramones song. Fast and to the point. Fact is that business questions come thick and fast, and being able to riff through data at speed often works best. Many of the questions we want to analyse and answer are transient, and the visualizations and apps are throwaway. Use and discard.
  • Perfection and polish are not the aim. If it's perfect it has likely been manipulated to adhere to an agenda or to push a conclusion. The idea should not be to create flawless visualizations (think infographics) but a more transparent, less processed route to data that can flex.
  • Engagement with issues of the moment. Punk songs are about the world as it is right now. Data discovery prompts engaged debate too. Questioning orthodoxies about how we measure and evaluate the subject being analysed. It does that because the framework used is a starting point for active exploration, not an endpoint for passive consumption.
  • The collective experience is valuable in itself. No solos thanks! While self-expression and creativity are important they're secondary to the collaborative act to working and playing together with data to achieve a common result, as this in turn prompts action.

Further, and despite the marketing messages, not all new analytics has a punk ethos. Some approaches are just building a new wave of mavens - the new visualization gurus, often yesterday's Excel gurus, still revelling in their virtuosity. Sitting alone in their bedrooms cubes these specialists create beautifully crafted songs visuals - which are just so - and then distribute these as perfect tapes dashboards to people with cassette players Reader software to listen to look at and be impressed by. Not punk: perfectly polished, self-publicizing, one-to-many maven created artefacts. The approach is exclusive and not collective in its approach, it's not about engaging as many people in playing with data as possible. What's the aim of creating 'just so' visualizations? Who benefits?

The real work happens when more people can explore data and learn through play together.

When they pick up their data and play.

Fast and loud and loose.

Bobbies beat the self-service BI conundrum

| No Comments
| More

This is a guest blog by Michael Corcoran, SVP & CMO at Information Builders

Having recently attended the Gartner BI and Master Data Management Summits in London, it is clear that now is an exciting yet confusing time for data discovery and analytics. Of course, they have always been exciting areas for analysts, but now the industry as a whole is turning to self-service business intelligence (BI) to deliver critical information and data analytics to a much wider audience. This can only be done, however, by "operationalising" insights to bring together employees, the supply chain, and customers.

What do I mean by this? The issue at the moment is the market need to deliver self-service information and analytics to a broader audience. Businesses need to look at simpler ways of doing this than using complex dashboard tools. Visualisation and data discovery tools are, at their heart, still an analyst's best friend but leave the average employee scratching their heads.

Businesses need to stop deterring staff from using BI and analytics by offering ease of use and high functionality. This requires an app-based approach to easily and quickly view corporate data, as a next step in truly bringing big data to the masses. The average person does not have the time or inclination for formal training, and would much rather download an app that delivers analysis directly to their mobile device. Advanced analytics tools provide a mechanism for analytics to build sophisticated predictive and statistical models, but the ultimate value will come when we embed these models and their outcomes into consumable apps for operational decision-making.

Law enforcement is a great example of where self-service apps make a big impact, with analytics available at the tap of a mobile device to help the police force to work more efficiently. The amount of available data on crime grows day by day, and harnessing this to gain useful insights is an extremely powerful tool. Significant value can be derived from historical crime data, which helps predict and prevent crimes based on variables. It's not about individuals, but more about populations and environmental factors; weather, traffic, events, seasons, and so on. It sounds a bit like sci-fi, but it's actually very accurate. Think of it this way - how much crime would you expect at a London football derby, which happens twice yearly? Or in the rough part of town on payday? Or at a packed annual summer festival on a particularly humid day? By offering data on how likely it is for a crime to happen, these insights can help with prevention and more importantly help police forces accurately plan resourcing for such variables and events.

Self-service apps can make this predictive model easily accessible to 'bobbies on the beat'. Even a cop on their first day can access the same level of insightful knowledge as a veteran officer through their mobile device to make smarter decisions in real time. An app to find vehicle licence plate numbers, for example, is just one way to speed up police procedures, saving police time and resources and ultimately making them more efficient.  

However, this isn't the only place where an analytic app could add value. Real time data in an easy-to-use format will have a massive impact on all customer service professions. Providing customer-facing staff with access to key data about customers allows them to deliver a more personalised service to customers. Staff would be able to better understand complaints as they'd be able to quickly access their previous experiences or purchase history, in real time.

The added benefit of using an app-based approach means you can gather data from many difference sources and combine it. For example, you can combine various types of enterprise data with other data available in public and private clouds such as weather services, to pull in variables. This comprehensive combination provides an accurate, collective view which delivers self-service for daily and operational decisions - in real time. This approach is the future of self-service for the masses via an easy-to-consume app for the hungry user.

Open data and the horizon of the next government

| No Comments
| More

On the first official day of the general election campaign, it is opportune to pose the question of what the future of open data might be.

The good news is the future of the movement to open up government and private sector data to benefit the common good and create opportunities for the launch of new businesses would seem to be bright, in the UK.

At a recent ODI [Open Data Institute] Connect event, held at the Department for Business, Innovation and Skills, representatives of the two main Westminster parties expressed support for open data.

Chi Onwurah, who has been the Labour MP for Newcastle Central and Shadow Cabinet Office Minister for Digital Government, said that it is great that the UK is leading the open data movement, according to the Open Data Barometer. She also said to beware "blockers" to the fulfilment of open data as public good. "It needs to be democratised and understood. Just publishing data can be a screen for unaccountability. It should not be a way of avoiding the need for Freedom of Information requests. Open data needs to be in the right format, open and standard. And it needs to be shown to be delivering something. For example the open spending data is great but it needs to be in a context where citizens can see performance".

She also called for the principle of people owning their own data, and for the application of appropriate ethical standards.

From the other side of the house --- though not literally - Conservative peer Ralph Lucas, who sits on the Lords Digital Skills Select Committee, recalled how he learned the value of open data at the Department of Agriculture in 1996, at the time of the BSE crisis. "We had great scientists and great data, but for four years we had failed to understand what was happening with BSE. We threw the data open to the scientific community, and within three weeks we had an answer".

But he bewailed what he called "data huggers", giving the example of UCAS. "In the UK 25,000 kids drop out of university each year. Another 100,000 say they have made the wrong choice of course. UCAS has vast amounts of data that could help with that problem. But they won't release it. 250 students making a better choice each year would equal what UCAS makes from selling its data. That is a plum opportunity for the UK to make millions of pounds.

"Keep pushing government", he said.

Professor Nigel Shadbolt, the Chairman and Co-Founder of the Open Data Institute, who chaired the event, commented that open data is indeed a relatively non-partisan topic. In his summing up he said: "open data creates different kinds of value which are not mutually exclusive, from economic to social to environmental. If we are really going to guarantee high-quality supply the best way is to generate sufficient demand for that data. One of the things that the ODI cares hugely about is building that demand side whether it's the government itself or a vibrant commercial component.

"The idea of rebalancing the asymmetry of ownership between business and consumers, government and citizens is really fundamental. To get trust back when data breaches occur you need to empower people with responsibilities and rights as well as being simply benign receivers of data".

At the beginning of the event, Gavin Starks, the CEO of the ODI, recounted some of the progress the institute has made in its first two years, including its incubation of 18 start up businesses

He it was who signalled the UK's number one position on the UN's Open Data Barometer, ahead of the US and Sweden.

"This is not just about open data changing the nature of business. It's about reflecting a cultural shift to an open society", he said.

The opening up government data does seem to be an uncontroversial area of violent agreement on the blue and the red side of politics. But Onwurah and Lucas, at this ODI event, did register some shadows: dumping open data as an anti-FoI screen and data hugging.

Beyond the BI Tipping Point

| No Comments
| More

This is a guest blog by James Richardson, business analytics strategist at Qlik, and former Gartner analyst

 So what happens now that data discovery is the "new normal"?* Well, I'm no prognosticator - it was the task I always found strangest when I was an industry analyst - but we can make some reasoned guesses by extrapolating current trends forward.

Information activism will continue to grow

The fact is that we now live in a world shaped by data (both at a personal and professional level) and people express themselves through what they do with it. Inside each organisation, users want to be actively engaged with their data; however, few of them have had the technology to effectively do so. In 2015, with true self-service BI, more and more people will move from passively consuming data to actively using it.

Data from external sources will be increasingly used to provide much needed context

Organisations invest in BI to help them 'make better decisions', but most BI systems contain only internal data. As a result the decisions that get made based on BI are bereft of external context, and are therefore not likely to yield optimal decision outcomes. Analysts will increasingly add the lacking context through curated data normalized from external sources. Internal-only data myopia is no longer going to be acceptable.

Data discovery usage will get smarter

The shift in BI requirements, moving from reporting-centric to analysis-centric will continue on. Data discovery will begin to make more use of some types of advanced analytics for prediction and optimisation and deliver those analyses in ways that more people can make use of, through richer visualization types, and through sophisticated data navigation.

Governed data discovery becomes an essential

Self-service doesn't mean there are no rules. The spread of data discovery demands that organisations (and especially their IT management) ensure good governance of how it's being used, governance that gives them the certainty that the analysis being done with self-service is carried out in a managed framework.

Interactive data story telling will trigger decisions

The most insightful analysis in the world is useless unless it's communicated to those who are in a position to act on its findings. Telling the story of the analysis is the best way to do that. Data storytelling will become central to how we work, helping to create compelling narratives to convince team members and executives to take action. BI is no longer going to be about collating reports; it's becoming much more about interactive decision-making. As such, static stories are no good, as they lead to unanswered questions. The option to dive into the data from the story to answer questions in real time is what's needed. "Let's take it offline" is an anachronism. Tell the story and debate with data to reach an outcome.

These are just five threads that we see developing. What others do you see?

 

*Gartner "Magic Quadrant for Business Intelligence and Analytics Platforms" by Rita L. Sallam, Bill Hostmann, Kurt Schlegel, Joao Tapadinhas, Josh Parenteau,Thomas W. Oestreich, February 23, 2015.

Avoiding the potholes on the way to cost effective MDM

| No Comments
| More

This is a guest blog by Mark Balkenende, a manager atTalend

 

Master data management is one of those practices that everyone in business applauds. But anyone who has been exposed to the process realises that MDM often comes with a price. Too often what began as a seemingly well thought out and adequately funded project begins accumulating unexpected costs and missing important milestones.

First we need to know what we're talking about. One of the best definitions of a MDM project I've heard is from Jim Walker, a former Talend director of Global Marketing and the man responsible for our MDM Enterprise Edition launch. Jim describes MDM as, "The practice of cleansing, rationalising and integrating data across systems into a 'system of record' for core business activities."

I have personally observed many MDM projects going off the rails while working with other organisations. Some of the challenges are vendor driven. For example, customers often face huge initial costs to begin requirements definition and project development. And they can spend millions of upfront dollars on MDM licenses and services - but even before the system is live, upgrades and license renewals add more millions to the programme cost without any value being returned to the customer. Other upfront costs may be incurred when vendors add various tools to the mix. For example, the addition of data quality, data integration and SOA tools can triple or quadruple the price.

Because typically it is so expensive to get an MDM project underway, customer project teams are under extreme pressure to realise as much value as they can as quickly as possible. But they soon realise that the relevant data is either stored in hard-to-access silos or is of poor quality - inaccurate, out of date, and riddled with duplication. This means revised schedules and, once again, higher costs.

Starting with Consolidation

To get around some of these problems, some experts advise starting small using the "MDM Consolidation" method. Because this approach consists of pulling data into an MDM Hub (the system's repository) and performing cleansing and rationalising, the benefit is that consolidation has little impact on other systems.

While consolidation is also a good way to begin learning critical information about your data, including data quality issues and duplication levels, the downside is that these learnings can trigger several months of refactoring and rebuilding the MDM Hub. This is a highly expensive proposition, involving a team of systems integrators and multiple software vendors.

In order to realise a rapid return on MDM investment, project teams often skip the consolidation phase and go directly to a co-existence type of MDM. This approach includes consolidation and adds synchronisation to external systems to the mix. Typically data creation and maintenance will co-exist in both the MDM system and the various data sources. Unfortunately this solution introduces difficult governance issues regarding data ownership, as well as data integration challenges such as implementing a service-oriented architecture (SOA) or data services.

There are other types of MDM, each with its own set of problems. The upshot is that the company implementing an MDM system winds up buying additional software and undertaking supplementary development and testing, incurring more expense.

An alternative approach

Rather than become entangled in the cost and time crunches described above, you should be looking for vendors that provide a solution that lets you get underway slowly and with a minimum amount of upfront costs.

In fact, part of the solution can include Open Source tools that allow you to build data models, extract data, and conduct match analysis, while building business requirements and the preliminary MDM design. All at a fraction of the resource costs associated with more traditional approaches.

Then, with the preliminary work in place, this alternative solution provides you with the tools needed to scale your users. It is efficient enough to allow you to do the heavy development work necessary to create a production version of your MDM system without breaking the bank.

Once in an operational state, you can scale up or down depending on your changing MDM requirements. And, when the major development phase is over, you can ramp down to a core administrative group, significantly reducing the cost of the application over time.

You should look for vendors offering pricing for this model based on the number of developers - a far more economical and predictable approach when compared to other systems that use a pricing algorithm based on the size of data or the number of nodes involved.

This approach to MDM deployment is particularly effective when combined with other open source tools that form the foundation of a comprehensive big data management solution. These include big data integration, quality, manipulation, and governance and administration.

By following this path to affordable, effective MDM that works within a larger big data management framework, you will have implemented a flexible architecture that grows along with your organisation's needs.







Last month the UK government's Competition and Markets authority issued a 'Call for information' on the 'commercial use of consumer data'.

While data governance professionals have been taking measure of the ethics of data use garnered from the web for quite some time, it is gaining a higher public profile, with healthcare data an especially sensitive topic.

The CMA's call for information takes as its starting point the recent increased sophistication of data capture: "The last decade has seen rapid growth in the volume, variety and commercial applications of consumer data, driven by technological developments which enable data to be collected and analysed in increasingly rapid and sophisticated ways. Data exchange is likely to become even more important as mobile internet devices and smart devices become more prevalent".

Just because you can ...


Just because you can do something does not mean that you should do it. That's been a theme of Computer Weekly's coverage of Deloitte's research on data matters in the past few years, particularly its annual Data Nation report, which is produced under the direction of Harvey Lewis, and surveys around 2,000 UK consumers.


The first of the Deloitte reports, in July 2012, found that companies and public sector organisations in the UK needed, said Deloitte, to tread warily when it comes to performing customer data analytics. But, they added, there were opportunities for those who educate their constituencies and are clear about what customer data is used for.


A year later, in 2013, Lewis's colleague Peter Gooch, privacy practice leader at Deloitte said, regarding the second Data Nation report, that their survey showed that people were: "More aware that something is happening with their data, but they don't know what that is and there is increased nervousness.


"There is no real sign of a tipping point, where people see their own data as an asset that can be exploited. Consumers recognise their data as an asset to the extent that they want to protect it, but not to the extent of exploiting it.


"This almost lines up with the path that organisations have followed, going from protection to exploitation, from information security to analytics. Consumers might follow the same journey, but it will happen in pockets".


The 2014 report, interestingly, found that the NHS was much more trusted than the private sector with personal data. 60% of the 2,025 respondents were most trusting of public healthcare providers and 51% of other public sector organisations. By contrast, 31% trusted social media companies with their data, 34% trusted telephone companies and internet service providers, and 39% were "least concerned" about having banks and credit cards companies having their personal data.


Does this comparative trust in the NHS offer a platform on which better health outcomes can be built through clinical research on open data, or is it a fragile element that could be squandered? The Nuffield Council on Bioethics has published a report critical of the Department of Health's care.data programme, which aims to transfer the medical records of all NHS patients in England from GP surgeries to a central database, under an opt-out rather than an opt-in model.


The report, 'Public participation should be at the heart of big data projects', make these arresting points:

"as data sets are increasingly linked or re-used in different contexts to generate new information, it becomes increasingly difficult to prevent the re-identification of individuals. On its own, consent cannot protect individuals from the potentially harmful consequences of data misuse, nor does it ensure that all their interests are protected. Therefore, good governance is essential to ensure that systems are designed to meet people's reasonable expectations about how their data will be used, including their expectations about a sufficient level of protection".

The report also cites Professor Michael Parker, the University of Oxford: "Compliance with the law is not enough to guarantee that a particular use of data is morally acceptable - clearly not everything that can be done should be done. Whilst there can be no one-size-fits-all solution, people should have say in how their data are used, by whom and for what purposes, so that the terms of any project respect the preferences and expectations of all involved."

Data governance in the era of big data is a troublesome business. I'm chairing a panel on 'Data governance as corporate IT is remade by cloud, mobile, social and big data' at the Data Governance Conference Europe on 20 May, London, and this will explore some of these issues.

Meanwhile, the CMA call for information deadline is 6 May.

What do businesses really look for in open data?

| No Comments
| More

This is a guest blog by Harvey Lewis, Deloitte

 

"The value of an idea lies in the using of it." Thomas A. Edison, American Inventor.

 

In 2015, the UK's primary open data portal, www.data.gov.uk, will be six years old. The portal hosts approximately 20,000 official data sets from central government departments and their agencies, local authorities and other public sector bodies across the country. Just over half of these data sets are available as open data under the Open Government Licence (OGL). Data.gov.uk forms part of an international network of over three hundred open data efforts that have seen not just thousands but millions of data sets worldwide becoming freely available for personal or commercial use. [See http://datacatalogs.org and www.quandle.com].

Reading the latest studies that highlight the global economic potential of open data, such as that sponsored by the Omidyar Network, you get a sense that a critical mass has finally been achieved and the use of open data is set for explosive growth.

These data sets include the traditional 'workhorses', like census data, published by the Office for National Statistics, which provides essential demographic information to policy makers, planners and businesses.  There are many examples of more obscure data sets, such as that covering the exposure of burrowing mammals to Radon Rn-222 in Northwest England, published by the Centre for Ecology and Hydrology.   

Although I'm not ruling out the possibility there may yet be a business in treating rabbits affected by radiation poisoning, simply publishing open data does not guarantee that a business will use it. This is particularly true in large organisations that struggle to maximise use of their own data, let alone be aware of the Government's broader open data agenda. The Government's efforts to stimulate greater business use of open data can actually be damaged by a well-intentioned but poorly targeted approach to opening up public sector information - an approach that may also leave more difficult-to-publish but still commercially and economically important data sets closed.

But is business use predicated on whether these data sets are open or not? And what is the impact on economic success?

Businesses would obviously prefer external data to be published under a genuinely open licence, such as the OGL.  The data is free for commercial use with no restrictions other than the requirement to share alike or to attribute the data to the publisher. However, if businesses are building new products or services, or relying on the data to inform their strategy, a number of characteristics other than just openness become critical in determining success:

·         Provenance - what is the source of the data and how it was collected? Is it authoritative?

·         Completeness and accuracy - are the examples and features of the data present and correct, and, if not, is the quality understood and documented?

·         Consistency - is the data published in a consistent, easy-to-access format and are any changes documented?

·         Timeliness - is the data available when it is needed for the time periods needed?

·         Richness - does the data contain a level of detail sufficient to answer our questions?

·         Guarantees of availability - will the data continue to be made available in the future?

If these characteristics cannot be guaranteed in open data or are unavailable except under a commercial licence then many businesses would prefer to pay to get them. While some public sector bodies - particularly the Trading Funds - have, over the years, established strong connections with business users of their data and understand their needs implicitly, the Open Data Institute is the first to cement these characteristics into a formal certification scheme for publishers of open data.

A campaign is needed to get publishers to adopt these certificates and to recognise that, economically at least, they are as important as Sir Tim Berners-Lee's five-star scale for linked open data.  For example, although spending data may achieve a three- or even a four-star rating in the UK, not all central government departments publish in a timely manner, in a consistent format or at the same level of richness, and some local authority spending data is missing completely. These kinds of deficiencies, which are shared by many other open data sets, are inhibiting innovation and business take-up, yet are not necessarily penalised by the current set of performance indicators used to measure success.  

It's time for open data to step up. If it is to be taken seriously by businesses then the same standards they expect to see in commercially licensed data need to be exhibited in open data - and especially in the data sets that form part of the 'core reference layer' used to connect different data sets together.

Publishing is just the first and, arguably, the easiest step in the process. The public sector's challenge is to engage with businesses to improve awareness of open data, to understand business needs and harness every company's constructive comments to improve the data iteratively. We may have proven that sunlight is the best disinfectant for public sector information, but understanding and working with business users of open data is the best way of producing a pure and usable source in the first place. 

 

Harvey Lewis is the research director for Deloitte Analytics and a member of the Public Sector Transparency Board.

Data quality everywhere

| No Comments
| More

This is a guest blog by Jean Michel Franco, Talend

Data quality follows the same principles as other well-defined, quality-related processes. It is all about creating an improvement cycle to define and detect, measure, analyse, improve and control.

This should be an ongoing effort - not just a one-off. Think about big ERP, CRM or IT consolidation projects where data quality is a top priority during the roll out, and then attention fades away once the project is delivered.

A car manufacturer, for example, makes many quality checks across its manufacturing and supply chain and needs to identify the problems and root causes in the processes as early as possible. It is costly to recall a vehicle at the end of the chain, once the product has been shipped - as Toyota experienced recently when it recalled six million vehicles at an estimated cost of $600 million.

Quality should be a moving picture too. While working through the quality cycle, there is the opportunity to move upstream in the process. Take the example of General Electric, known for years as best-in-class for putting quality methodologies such as Six Sigma at the heart of its business strategy. Now it is pioneering the use of big data for the maintenance process in manufacturing. Through this initiative, it has moved beyond detecting quality defects as they happen. It is now able to predict them and do the maintenance needed in order to avoid them.

What has been experienced in the physical world of manufacturing applies in the digital world of information management as well. This means positioning data quality controls and corrections everywhere in the information supply chain. And I see six usage scenarios for this.

Six data quality scenarios

The first one is applying quality when data needs to be repurposed. This scenario is not new; it was the first principle of data quality in IT systems. Most companies adopted it in the context of their business intelligence initiatives. It consolidates data from multiple sources, typically operational systems and gets it ready for analysis. To support this scenario, data quality tools can be provided as stand-alone tools with their own data marts or, even better, tightly bundled with data integration tools.

A similar usage scenario, but "with steroids", happens in the context of big data. In this context, the role of data quality is to add a fourth V, for Veracity, to the well-known 3 Vs defining big data; Volume, Variety and Velocity. Managing extreme Volumes mandates new approaches for processing data quality; controls have to move where the data is, rather than the opposite way. Technically speaking, this means that data quality should run natively on big data environments such as Hadoop, and leverage its native distributing processing capabilities, rather than operate on top as a separate processing engine. Variety is also an important consideration. Data may come in different forms such as files, logs, databases, documents, or data interchange formats such as XML or JSON messages. Data quality would then need to turn the "oddly" structured data often seen in big data environments into something that is more structured and can be connected to the traditional enterprise business objects, like customers, products, employees and organisations. Data quality solutions should then provide strong capabilities in terms of profiling, parsing, standardisation, entity and resolution. These capabilities can be provided before the data is stored and designed by IT professionals. This is the traditional way to deal with data quality. Or, data preparation can be delivered on an ad-hoc basis at run time by data scientists or business users. This is sometimes referred to as data wrangling or data blending.

The third usage scenario lies in the ability to create data quality services. Data quality services allow the application of data quality controls on demand. An example could be a web site with a web form to catch customer contacts information. Instead of letting a web visitor type in any data they want in a web form, a data quality service could apply checks in real time. This then gives the opportunity of checking information such as emails, address, name of the company, phone number, etc. Even better, it can automatically identify the customer without requiring them to explicitly logon and/or provide contact information, as social networks, best-in-class websites or mobile applications such as Amazon.com already do.

Going back to the automotive example, this case provides a way to cut the costs of data quality. Such controls can be applied at the earliest steps of the information chain, even before erroneous data enters into the system. Marketing managers may be the best people to understand the value of such a usage scenario; they struggle with the poor quality of the contact data they get through the internet. Once it has entered into the marketing database, poor data quality becomes very costly and badly impacts key activities such as segmenting, targeting, calculating customer value. Of course, the data can be cleansed at later stages but this requires significant effort to resolve, and the related cost much higher.

Then, there is quality for data in motion. This applies to data that flows from one application to another; for example, to an order that goes from sales to finance and then to logistics. As explained in the third usage scenario, it is best practice that each system implements gatekeepers at the point of entry, in order to reject data that doesn't match its data quality standards. Data quality then needs to be applied in real time, under the control of an Enterprise Service Bus. This fourth scenario can happen inside the enterprise and behind its firewall. Alternatively, data quality may also run on the cloud, and this is the fifth scenario.

The last scenario is data quality for Master Data Management (MDM). In this context, data is standardised into a golden record, while the MDM acts as a single point of control. Applications and business users share a common view of the data related to entities such as customers, employees, products, chart of accounts, etc. The data quality then needs to be fully embedded in the master data environment and to provide deep capabilities in terms of matching and entity resolution.

Designing data quality solutions so that they can run across these scenarios is a driver for my company. Because one of the things about our unified platform is that it generates code that can run everywhere, our data quality processing can run in any context, which we believe is a key differentiator. Data quality is delivered as a core component in all our platforms; it can be embedded into a data integration process, deployed natively in Hadoop as a Map Reduce job and be exposed as a data quality service to any application that needs to consume it in real time.

Even more importantly, data quality controls can move up into the information chain over time. Think about customer data that can be initially quality proofed in the context of a data warehouse through data integration capabilities. Then, later, through MDM, this unified customer data can be shared across applications. In this context, data stewards can learn more about the data and be alerted that they are erroneous. This will help then to identify the root cause of bad data quality, for example a web form that brings junk emails into the customer database. Data services can then come to the rescue to avoid erroneous data inputs on the web form, and reconcile this entered data with the MDM through real time matching. And, finally big data could provide an innovative approach for identity resolution so that the customer can be automatically recognised by a cookie after they opt-in, making the web form redundant.

Such a process doesn't happen overnight. Continuous improvement is the target.

The rise of the Chief Data Officer

| No Comments
| More

This is a guest blog by Karthik Krishnamurthy, Global Business Leader for Enterprise Information Management, Cognizant

 

While there is a huge amount written about data scientists, much less has been said about the role of the Chief Data Officer (CDO).  However, the value of this individual to any business must not be underestimated. In fact, Wired describes the emergence of the Chief Data Officer as "a transformational change that elevates the importance of data to the top of the organisation".

In the last couple of years, businesses have recognized the role of the data and many of them have identified data as part of their core business strategy. Businesses are also acknowledging that data in the business setting is separate from the systems running it. There is now an understanding that data is hugely valuable; if harnessed and analysed properly, it can make businesses run better by realising cost efficiencies and run differently by bringing innovative products and services to the market.  Insight from data gives a better understanding of customer preferences, helping organisations develop new commercial models, deliver tangible business value, and remain competitive. This evolution has caused a demand for new business roles, the most prominent of which is the CDO.

The CDO in financial services

The role of the CDO first emerged as a valid and valuable role in the financial services industry to deal with the extreme pressure that arose from the financial crisis and rapidly evolving regulations. While a large part of the CDO's immediate focus was around helping banks to manage and orchestrate their risk response, the focus then shifted to identifying data-driven revenue opportunities through micro-personalization and marketing triaging. As a result, the CDO's focus is now on building integrated data ecosystems that bring in social cluster data to identify unusual patterns in transactional behaviour, flagging them to prevent loss and fraud. Interestingly, this is not something that is traditionally part of financial services per se, but is increasingly central to financial businesses.

The CDO plays a pivotal role in helping financial companies stay ahead by managing risks and remain compliant more efficiently.

The CDO in retail

Retail, which has witnessed a huge change in the way their global value chains work, is another industry where CDOs are bringing real business value. Through harnessing customer data, retailers can offer targeted products and services and improve customer satisfaction levels significantly. What data analysis has revealed for retailers is that shoppers have fewer issues with the cost of products but are more concerned with the overall retail experience. Sentiment analysis can detect subtle nuances in the relationship between customer and retailer. Focusing on the tonality of a customer's voice, both face-to-face and when liaising with them over other touch pints such as the phone, social media, fora, etc., can help retailers detect the true feelings of their customers.

Other industries are rapidly catching up: many new technology companies are driven by their ability to collect vast amounts of data and their desire to monetize that data or utilize it in product design and services delivered to customers. Sectors such as telecommunications, energy & utilities, pharmaceuticals, and automotive manufacturing have all identified the value of the data and are creating business leaders responsible for data.

Data management now sits at the C-suite level, emphasising the value the role of the CDO brings to organisations.

CDO traits

Here are Cognizant's insights for the traits needed to make up the ideal CDO:

·         Has a deep knowledge of data and ability to identify this as a corporate asset

·         Has strong business acumen and ability to identify business opportunities through new information products

·         Provides vision, strategy and insight for all data-related initiatives within the organization

·         Takes ownership and accountability of data and its governance within the organization

·         Passion and interest in technology

Preparation needs to start now for imminent European data protection changes

| No Comments
| More

This s a guest blog Mike Davis, author of a report published by AIIM.

 

The forthcoming European General Data Protection Regulation (GDPR) changes signal a major opportunity for cloud providers to deliver EU-wide services under a single operations model.

'Making sense of European Data Protection Regulations as they relate to the storage and management of content in the Cloud' is an AIIM report that details the changes the IT industry will need to make in response to imminent pan-European data protection changes.

These are changes that will affect anyone interested in hosting content in the cloud, be they service provider or end user.

The study examines the forthcoming GDPR, which is set to inaugurate major change in how customer data regarding EU citizens is stored and how organisations must respond if a data breach occurs.

The change, effectively the creation of a single European data law, will mean organisations will incur fines of up to €100 million if found guilty of a 'negligent breach' of privacy or loss of data.

That is a serious threat. However, GDPR also presents a number of opportunities and could clarify a lot of issues, as well as offer prospects for long-term planning by cloud specialists.

Aim and scope

The purpose of the GDPR is to provide a single law for data protection to cover the whole of the EU, instead of the present Directive that has ended up being implemented differently in each member state.

The GDPR will also see the establishment of a European Data Protection Board to oversee the administration of the Regulation, a move Brussels is confident will make it easier for European and non-European companies to comply with data protection requirements.

The GDPR also covers organisations operating in Europe irrespective of where data is physically stored. The new regulation is a major opportunity for cloud providers to deliver EU-wide services under a single operations model; meanwhile it also means US based cloud firms need to demonstrate compliance with Europe's new privacy operating model.

A broader definition of 'personal' data

In addition to a common approach to privacy, the GDPR covers privacy for cloud computing and social media, extending the definition of personal data to include email address(es), IP address of computer(s) and posts on social media sites.

That extension has implications for cloud-delivered services both users and cloud firms need to be aware of.

A GDPR-compliant plan of attack

Organisations need to set a GDPR compliant strategy in whichever part of Europe they operate in before the end of the transition period (currently 2017; track to see if this changes).

An important part of that work will be to establish GDPR-supportive procedures and start the process of gaining explicit consent for the collection and processing of customer data ready for the new regime.

If you're a cloud provider, we recommend drafting a GDPR-compliant strategy, educating your staff on the implications of the changes and amending your contracts and provisioning to be fully compliant.

To sum up: if handled correctly, GDPR will help organisations make more informed decisions about cloud versus on-premise storage; while for the cloud services market, there may be opportunity to deliver truly pan-European services that customers can have assurance are privacy-safe.

 

The author is a Principal Analyst at msmd advisors and is the author of a new AIIM report on EU data issues study produced in collaboration with a London legal firm, Bird and Bird.

Subscribe to blog feed

Categories

 

-- Advertisement --

 

Have you entered our awards yet?

Find recent content on the main index or look in the archives to find all content.

 

-- Advertisement --