ra2 studio - Fotolia

Russian big data technologies evolve with one eye on US open source

Russian data technicians have, until recently, shunned full-scale use of US-based open-source big data technology; but with one eye on it

Russian technology firms have largely focused on developing their own technologies for big data. And these are put to work across many sectors of the Russian economy.

Russia, a country with some of the most highly reputed programmers in the world, is also a top technology market. Russia’s economy – one of the world’s largest, with a GDP of over $2tn, according to World Bank figures – attracts extensive investment in its burgeoning technology sector.

But, as Russia turns itself more inwards during a period of increasing isolation from the west, resulting from the turmoil in Ukraine, many of the best Russian programmers are choosing to stay in Russia, both due to a sense of patriotism and increasing wages.

Russian data technicians have, until recently, largely shunned full-scale use of open-source big data technology, because “the need arose before the technology had developed”, says Jane Zavalishina, who heads Yandex Data Factory. To meet its own needs, Yandex, Russia’s largest search engine, developed its own big data technologies.

Since that time, Yandex has begun to use Hadoop in some instances, but still relies largely on its own custom system. However, Zavalishina notes that due to the rapid expansion of open-source big data technologies such as Hadoop, its programmers must pay careful attention to developments to make sure it’s not left in the dust.

According to Zavalishina, large Russian companies generally prefer to buy package software for big data technology. However, the future may hold more in store for Hadoop and other open-source big data technologies, as the programs develop more quickly with more collaborators on open-source websites. Smaller companies, which have become more common in Russia, are also more interested in the cheaper solutions that Hadoop offers, even if implementation can be trickier.

Technology startup Pediatr 24/7, which provides pediatric doctoral consultations online via its own video, audio and chat services, does not currently use big data technology, says founder and CEO Denis Yudchyts. However, as the company grows, it may come to favour a cheaper standardised technology such as Hadoop due to the lower costs. 

Pediatr 24/7 operates as part of a tech incubator where the technology could be applied to companies across the group, providing scale. However, at this stage, many of these companies are only at their beginnings in Russia, and have little need for such technologies.

Machine learning hits the road

For Yandex, the most important part of big data technology lies in analysis via its machine learning technology MatrixNet, which does everything from sorting the spam out of mailboxes to tailoring web searches and ads specifically for each user. 

While big data technology is relatively advanced in Russia, data scientists are still in short supply
Jane Zavalishina, Yandex Data Factory

The idea is the same as for any other search engine or email provider: absorb and process large datasets from internet users so rapidly that companies can provide real-time systems to more accurately and appropriately provide relevant information.

As the quality of the big data technology improves and machine learning grows, more real-world solutions become prevalent, such as the Yandex Maps traffic tracking function. Seen almost universally in any taxi in Moscow, and even frequently used by passengers to track journeys, Yandex’s tool allows drivers to see the worst traffic ahead of time. Through the tool, users see green, yellow and red markings, indicating severity of traffic, through which drivers can identify the clearest route to take to a destination.

The Yandex Maps traffic analysis tool is also charged with maintaining roads in Russia. The tool takes large datasets, combining road types, and weather conditions among other variables, to create a predictive analysis of what road conditions will likely be in the near future. “A simple statistical analysis is seven times less efficient,” according to Yandex Data Factory’s Zavalishina.

The traffic mapping would not be possible without the opportunity big data technology provides to absorb and store massive datasets. While big data technology is relatively advanced in Russia, data scientists are still in short supply, Zavalishina says.

Building a pool of data scientists to mine big data

Labelled “the sexiest job of the 21st century” by Harvard Business Review, data scientists are a rare find in Russia. To resolve this challenge, Yandex has created its own Yandex University to educate its already adept technology workforce in the growing profession. The graduates of Yandex University have helped Yandex Data Factory, a subsidiary of Yandex that focuses on monetising big data for other Russian ventures outside of its search engine.

One such example is an HR product the company has been working on for an unnamed client. In a proof-of-concept project, Yandex Data Factory conducted a retrospective analysis of over 2,000 engineers at the company. From an extensive dataset on the workers, Yandex Data Factory identified the 50 employees most likely to leave the company in the next year. While not all of the employees left, 26 of them were correctly identified, while few who were not labelled high-risk did.

While this type of big data analysis will likely never be entirely accurate, it can give employers a better sense of the human resources risks they face. Given large enough datasets and variables, big data technology could put this type of technology to use in many areas previously thought to be unpredictable.

Other typical uses of big data technology analysis are also common in Russia. Yandex Data Factory offers tools for major retailers to sell their products more appropriately to buyers through its MatrixNet machine learning technology. Through both in-store analysis and internet purchasing analytics, Yandex’s big data technology can help retailers improve sales by 15% or more, which for large retailers can translate into millions of dollars.

Industry and infrastructure is another key area where big data technology plays a major role in Russia. Following the privatisation of the Russian utilities industry seven years ago, capital production has become far more important for the industry. Reliability needs are also increasing with the modernisation and growth of Russian industry. Big data technology allows companies such as Yandex Data Factory to provide analytics tools that can predict failures in machinery, improving consistency and profits.

As big data expands in Russia, we can expect more of the same. Russian programmers will keep one eye on the constantly growing open-source technologies such as Hadoop, while developing their own custom systems.

Read more about IT in Russia


Next Steps

Data lakes offer new home for mainframe data with analytics value

Read more on Business intelligence software