This is a guest blogpost by Mike Weston, CEO of data science consultancy Profusion, in which he discusses supercomputing and its implications for data science.
We are all creating data. A lot of data. The figures involved are mind-blowing. According to Information Service ACI, five exabytes of content were created between what it calls "the birth of the world" and 2003. In 2013, five exabytes of content were created each day. Just so you know, an exabyte is a quintillion bytes. Every minute (on average) we send around 204 million emails, make four million Google searches, and send 277,000 tweets.
With each individual creating and receiving more and more data, computers are in an arms race to keep up. Earlier this month another shot was fired: President Obama issued an executive order designed to ensure that the US leads the field in supercomputers, by building an exascale computer capable of undertaking one quintillion calculations per second. The computer will be used for, among other things, climate science, medicine and aerospace. However, from my perspective, the most exciting proposition is the application of exascale computers to data science.
The first noticeable advantage in having increased computing power is a reduction in the time it will take to carry out data science projects. Reducing the time it takes to receive results will allow for more real-time decision making. This will have a significant impact on industries such as retail, where a shop could automatically alter its pricing strategy instantaneously based on weather data, customer demographics and footfall.
Next, the processes involved in data science will become ultra-efficient. There will be decreased processing time and less time spent accumulating and preparing data. This will open up data science to work with data which previously wasn't accessible before. For instance, assisting in the mapping of the human brain and combining that information with data on a participant's emotions and lifestyle to obtain a picture of how the brain is affected by external factors.
The advanced computing power will also lead to more accuracy and the ability to create more detailed and advanced models. This will enable data science to answer more complicated questions with a larger range of structured, unstructured, historical and real-time datasets. Machine learning will become much more powerful. More computing power will allow more interactions to be presented to the machine to create artificial intelligence. Eventually, the majority of computations will become automated, with data scientists managing the AI as opposed to carrying out the day to day processes.
These new algorithms could be applied to everyday activities, such as tracking the real-time weather conditions impacting on aircraft, along with their locations and speed, the identities of all passengers on board and overall customer satisfaction as detailed through individual's social media accounts. All of this information could be combined into one user friendly interface for airline staff to then monitor and respond.
There will be additional benefits to product design, especially in the field of aeronautics. Proposed designs could be simulated without the need for wind tunnels and other expensive, not readily available, tools. Potentially one of the most exciting advances will be the development of personalised medicines. Data science will be able to look at an individual's genome, their lifestyle and alter drug properties accordingly to make them more effective.
The analysis of big data has already had revolutionary impacts on the commercial sector and within scientific discovery -- from assisting in relief efforts following natural disasters to tailoring the consumer journey on eBay. In the future we can expect to see more advanced weather forecasts, natural disaster prediction services and more accurate cancer diagnosis. With data science also unlocking key Islamic State military strategies, it's going to play a bigger role within US national security.
In the short term, the biggest impact for consumers will be in relation to the 'Internet of Things'. With more real-time data readily available, the productivity of autonomous vehicles would greatly improve. Imagine a scenario where every vehicle within a city could be mapped onto a central computer, with all those vehicles able to tell each other their locations, speeds and proposed routes. Driving would certainly be better informed and safer than it is currently.
Data science is going to undergo a rapid transformation into a faster, more accurate and more efficient process. The range of tasks that will be undertaken by machines will increase, spurred along by advances in machine learning and faster computer speeds. What we may be able to calculate in a week, in the future will take minutes. The scope of data we will be able to deal with will also increase and a greater variety of data will lead to more insights that can be found from seemingly disparate data sets.
This will lead to an exciting future where we are better informed and by virtue should be able to make more educated decisions. A master painter is only as good as his brush, and the advent of better computing will create better data scientists who will make better data insights. More powerful computers will lead to a more empowered society.