WavebreakMediaMicro - Fotolia

Microsoft’s grand plan aims to make big data intelligence accessible to all

Microsoft CEO Satya Nadella wants to democratise access to big data intelligence across the enterprise through the use of cloud, machine learning and new datacentre chip technologies

This article can also be found in the Premium Editorial Download: Computer Weekly: How Microsoft aims to tackle information overload

Microsoft wants to help enterprises make sense of their big data with the help of the software giant’s cloud, artificial intelligence and machine learning product portfolio, CEO Satya Nadella said during his keynote presentation at Microsoft Ignite 2016 in Atlanta, Georgia.

Before a 23,000-strong crowd, Nadella said innovations such as the emergence of printing technology in the 15th century and the global spread of internet connectivity since the 1990s have ensured humans have near unlimited access to information on all manner of subjects.

What has not changed, though, is that humans have neither the time nor the attention spans to deal with all the data they need to sift through, and that is a conundrum Nadella says Microsoft is well positioned to help them work through.

“We’ve used technology very successfully to democratise both the creation of information and the distribution and access of that information. And now we need to turn to technology to democratise creation and access to intelligence,” he said.

This process is already under way at Microsoft, where there are plans to embed its Cortana personal assistant software deeper into its productivity software, for example, with Nadella talking up its ability to ensure applications are better tuned to users’ individual needs.

To emphasis this point, Nadella described Cortana as a “new organising layer” that will take its place alongside PC and mobile operating system as a means of helping enterprises get stuff done.

For example, if a user has a meeting booked in their Outlook calendar, Cortana can alert users about scheduling conflicts and pull in data from LinkedIn about who else is invited

“We are well on our way here with Cortana. In fact, we have 133 million active users each month using Cortana across 116 countries, and they’ve already asked 12 billion questions,” he said.

“It can take text input, it can take speech input, [and] it knows you deeply. It knows your context, your family, your work. It knows the world. It is unbounded,” he said.

During the keynote speech, light was shone on Microsoft’s plans to expand the type of information Cortana can access, including health and fitness data, to give enterprise users a comprehensive overview of their business and leisure plans for the day, so they can see how work commitments may impede their exercise plans.

From an application perspective, Nadella discussed the ongoing work Microsoft is doing to ensure its cloud-based productivity suite, Office 365, is not preventing users from getting access to important emails, through the deployment of machine learning and neural net technology.

“It understands the type of mail, the people you’re corresponding with, the content, the semantic content of your inbox, and to be able to focus your attention on things that matter the most,” added Nadella.

Real-world applications

While Nadella’s keynote was firmly focused on the future and the potential for machine learning to change the way enterprises operate, some of these capabilities are already in production.

A lot of the machine learning capabilities Microsoft talked about incorporating into Office 365 to make it easier for users to access and act on their business data, for example, was discussed at Microsoft conferences in 2013 and 2014 under the code-name of Project Oslo.

This initiative gave rise to Office Delve, which was rolled out to Office 365 enterprise accounts in September 2014, and draws on the data gathered about who users interact with and what they talk about to offer up information that might help them do their jobs.

The company has also pushed out services that allow Office 365 users to benchmark their productivity through the use of analytics dashboards, so they can see at glance what they spend most of their working time on.

While Nadella talked up the productivity potential of these technological advances, the company’s work should also help enterprises grappling with how to monetise their big data stores, said David Smith, principal programme manager for machine learning at Microsoft.

“Companies have made such a massive investment in data between storing it, collecting it and hosting it, and are really trying hard to make that data pay its way,” he told Computer Weekly.

Their ability to do this is often hampered by the industry-wide shortage of skilled data scientists, and because machine learning and big data analytics tools are notoriously complex to use.

The appliance of data science

These are both areas Microsoft has worked to address, through the roll-out of data science certification programmes, and the introduction of machine learning and big data analytics tools in Azure.

“They have not been designed as point and click-type things,” he said. “Configuring Hadoop used to be an enormously complex process of getting 40 or so machines in a room and individually installing 10 different pieces of software and configuring it all together, and it was enormously painful.”

The cloud has gone some way to lowering the skills barrier, added Smith, and made it possible for smaller firms to take their first steps into the world of data science.

“The idea now that we can just go into the Azure portal and fill in a form requesting a [Hadoop] cluster with 20 machines, each one with eight cores and they have all these services on them, and then it’s up and running minutes later, that is pretty magic,” he added.

Meanwhile, the types of data some organisations deal in may have made it harder for some industries to make the most of the information they have, Smith said, but things are changing.

“Financial services companies were the first on the wagon when it came to data science because their data was relatively simple – it’s just numbers measured hourly, daily, weekly or whatever,” he said.

“Then we have industries that have much messier data sources. The fashion industry, where data relates more to images, clothing types and runways, is opaque and traditionally difficult for data scientists to deal with.”

Read more about Microsoft

Now, though, there are application programming interfaces (APIs) that allow programmers to extract numerical data from image-based data, giving data scientists a lot more material to work with.

“If we stick with the fashion example, if you have a database of models, you can use our face APIs, sort through the headshots, and it will come back with how old the model is, their gender, dimensions and all the information useful to a data scientist working in the fashion industry,” said Smith.

Similarly, Microsoft-based technology has recently been deployed by taxi firm Uber, which is incorporating it into its mobile app so the identities of its drivers can be verified in real time.

The driver uploads a selfie to the app, which matches it to their profile photo, so passengers can be assured the person behind the wheel is registered with Uber, while protecting drivers from losing custom through identity theft.

Future-proofing the infrastructure

Having the capacity to deal with such queries at speed and scale is an area Microsoft is in the process of addressing through the roll-out of reprogrammable chips, known as field programmable gate arrays (FPGA), in its Azure datacentre estate.

During the show, the firm demonstrated how replacing CPUs with FPGA boards in its infrastructure markedly speeds up the time required to process big data-style queries, such as photo classification and language translation tasks.

The company’s use of FPGA has been in development since 2011, with the work overseen by Doug Burger, a distinguished engineer in Microsoft’s research division, who demonstrated the technology during the keynote.

It included using an FPGA-based setup to translate 1,440 pages of War and Peace from Russian to English, while carrying out the same task with an infrastructure featuring 24 CPU cores that consumes around one-fifth more power.

The FPGA-based translation task took 2.5 seconds, while the CPU set-up took eight times longer to complete the same task.

The technology has been deployed in Azure datacentres in 15 countries, prompting the company to declare itself the first in the world to be running FPGA technology at scale to create – what Microsoft has termed – an AI Supercomputer.

Speaking to Computer Weekly at Ignite, Burger outlined how adding FPGA technology to its cloud infrastructure affects the economics of using Azure compared with the Amazon Web Services (AWS) cloud.

“Amazon’s highest end and very expensive virtual machines can drive 20Gbps, but when Azure rolls out its accelerated networking, because of its use of FPGA, we’ll be able to go to 25Gbps for any virtual machine as a standard feature. It drops the latency down by 10 times, with no CPU,” he said.

“Once we [roll it out] to all our customers, they’ll be able to do many more transactions per second. Latency will improve, bandwidth will improve and, because they are reprogrammable, we can just roll out upgrades to the network and protocols on a consistent basis. It gives us the ability to turn the crank on innovation very quickly.”

The roll-out of FPGA technology across its infrastructure will underpin the spread of Cortana and machine learning capabilities to other parts of Microsoft’s product portfolio, so Microsoft can make good on its pledge to make big data intelligence accessible to all enterprises.

“We want to pursue democratising AI, just like we pursued information at your fingertips,” said Nadella, during the closing remarks of his Ignite keynote. “But this time around, we want to bring intelligence to everything, to everywhere, and for everyone.”

Read more on Datacentre performance troubleshooting, monitoring and optimisation