More than one technology pundit has used the “drinking data from a fire hose” analogy to talk about the spiraling amounts of data that we are currently creating globally.
Indeed, IBM has recently suggested that 80 percent of data is unstructured — and this is mainly down to video and email and texting.
So how we bring value to this data? The short answer is data analysis right? The trouble is that turning raw information into something useful requires that you know how to extract precisely what you need.
If you believe the messages coming out of IBM (with relation to Cognos) — and Oracle and Sybase for that matter, then 2011 is set to be the year of analytics. But these are largely proprietary technologies, so what of open source?
“Data Analysis with Open Source Tools: A hands-on guide for programmers and data scientists” is a new book from O’Reilly that aims to talk to intermediate to experienced programmers interested in data analysis and teach them techniques for working with data in a business environment.
“As corporations get more data-driven, it is important to understand what you are doing with data. Otherwise, you are just adding to the confusion. This book is the first on data analysis that was written for programmers, taking a hands-on approach that is accessible for anyone with some software development skills,” said data consultant and author Philipp K. Janert.
The publishers say that this book will teach the open source focused reader how to look at data to discover what it contains, how to capture ideas in conceptual models and then how to feed your understanding back into an organisation through business plans, metrics dashboards and other applications.
In this book you will find open source flavoured advice on:
• Developing conceptual models using “back-of-the-napkin” calculations, as well as scaling and probability arguments.
• Mining data with computationally intensive methods such as simulation and clustering.
• Making your conclusions understandable through reports, dashboards, and other metrics programs.