polygraphus - Fotolia
The evolution of big data technologies has been a grand theme this year. These – whether the Hadoop ecosystem, Spark, or NoSQL databases – have been maturing. The spat between Hadoop distributors Cloudera and Hortonworks over the latter’s Open Data Platform has provided some drama, while the growth of Spark, as a data processing framework that might replace MapReduce, was notable in 2015.
Representing NoSQL in this “decade” of stories is agricultural information provider CABI’s use of MarkLogic to feed the world better.
On the business intelligence side, EAT offered an interesting predictive analytics case study. There is also in Computer Weekly’s Top 10 an interesting feature on online mapping tools, a classic way of visualising data.
Machine learning and predictive analytics as the data platform for a new kind of business are at the heart of a feature on Natural Language Generation, and a column on the ubiquitous Uber.
But it would be fair to say that the maturation of the big data technologies developed over the last few years in Silicon Valley has been the big topos of 2015.
The younger, nimbler Spark technology looks set to replace MapReduce in big data architectures. What is the pace, scope and scale of replacement?
The message from big data suppliers who are throwing their weight behind Apache Spark is: Step aside, MapReduce. You have had a good run, but today’s big data developers are hungry for speed and simplicity.
The hype around the early generation of Hadoop is giving way to the reality of business programmes based on Hadoop 2 and Apache Spark. Even so, according to Gartner, investment in the open-source environment – which enables the distributed processing of large datasets across commodity computing clusters – remains “tentative” in the face of what it describes as “sizeable challenges around business value and skills”.
The Open Data Platform (ODP) initiative is “an affront to the Apache Software Foundation”, declared Cloudera CEO Tom Reilly. The ODP brings together a group of companies whose goal is to promote Hadoop, the open source distributed computing framework used in big data, but does not include Cloudera or fellow Hadoop distributor MapR.
What should enterprise customers make of the launch of the Open Data Platform (ODP), an initiative by IT industry leaders to establish a core set of common technologies for the Hadoop big data platform? Not much, according to Gartner analyst Nick Heudecker. “At this point, we are advising clients not to take ODP into account at all when evaluating their Hadoop options,” he said.
The Open Data Platform (ODP) initiative marked growing co-operation between IBM, Hortonworks and Pivotal at the Hadoop Summit Europe 2015 in Brussels, said Herb Cunitz, president of Hortonworks.
Andrea Powell is a CIO on a mission. She leads the business technology strategy of not-for-profit agricultural science organisation CABI, which has been gathering data about insects, and much other environmental science besides, for over a century. She says the not-for-profit agri-science data provider aims to solve the world’s agricultural problems with data, structured and unstructured.
Find out how food retailer EAT uses predictive analytics from Blue Yonder to cut food waste by predicting future demand better. The big question for the "on-the-go" food retailer is how much to prepare. Too much risks discarding good food, while too little can mean running low on stock and disappointing customers.
While the robots are coming for journalists, might natural language generation also make inroads into other fields? Software-written journalism attracts a lot of coverage – possibly more than is justified, by journalists fearful of their jobs. Nevertheless, Associated Press now produces more than 1,000 company results stories a month using software.
But suppliers of natural language generation (NLG) systems are keen to apply their technology to any kind of writing based on structured data. While the opportunity/spectre of automatic journalism has provided good publicity for such software, it is financial and business services, healthcare and intelligence that are driving growth.
Councils are drawing on Ordnance Survey, Google Maps and OpenStreet Map for online mapping. What’s the optimal mix?
Uber’s significance may lie more in its being a data-rich platform that taps into unused human economic resources than in its role as a passenger pick-up service. Its head of corporate communications is David Plouffe, president Obama’s 2008 campaign manager. The very fact that the car ride hiring company has a figure of such stature in such a role says something about its imperative to get its story across in ways favourable to its narrative of social benefit.