Ideas about big data are spreading across Polish enterprises and established thinking about data storage is slowly changing.
The big data pioneers in Poland are websites, utilities and other companies that process very large volumes of data – and which are the first to build frameworks for big data projects.
The rapid increase in the amounts of data processed presents business decision-makers with the challenge to derive knowledge hidden in that data.
“During conversations with CIOs of large companies, I often ask why not try translating a wide range of information and available data into business proposals for the board. And, after discussion about the necessary actions that should be taken, I hear that my suggestion has gained some attention,” says Adam Wojtkowski, general manager of EMC Poland.
Often in Poland, when plans for new systems are needed, the board will make assumptions based on conventional IT infrastructure.
“The CIO ensures the company has access to valuable information, but – almost as a rule – he does not know where the data is stored; in what kind of system, in what format and on what medium,” says Radoslaw Machnica, consultant at Hitachi Data Systems Poland.
IT administrators in large companies often struggle with infrastructure for big data.
“The answer is to make converged infrastructure that enables easier deployment of applications and responds to the need for advanced analysis,” says Wojtkowski.
Some business executives question whether big data is just another way of saying analytics – and, if so, why there should be a problem implementing it.
But Grzegorz Chmielowski, head of architecture practice at Teradata Poland, says: “It’s impossible to replace traditional analytics with big data and vice versa. You should use both methods simultaneously, and only such an approach will help to achieve the desired results.”
Kinga Piecuch, CEO at SAP Poland, says: “A company needs efficient equipment and appropriate software for quickly analysing large amounts of data.”
Big data requires an investment in hardware, software and additional specialist work, but “the cost of implementing big data systems are within the reach of many Polish companies”, says Alicja Wiecka, managing director at SAS Institute Poland.
“If we add the costs to a calculation of benefits for the board, it turns out that investment in big data should be profitable, even for a medium–sized company.”
The first big data projects were developed by the larger companies: Banks, telecoms utility companies and heavily transactional websites.
The systems have to analyse users' behaviour, to assist them in moving through the site, finding wanted goods and shopping
Wojciech Szczęsny, Allegro.pl
NaszaKlasa is a Polish school-based social networking site with 7.2 million users.
Every year NaszaKlasa organises a Big Data Summer Camp competition for young, ambitious programmers. Participants who create the most interesting projects for analysing data and extracting hidden knowledge win financial prizes and compete for the title of NaszaKlasa Data Scientist.
“Our data warehouse, built on NetApp FAS filers, contains 0.5PB data with information about our users’ behaviour, relationships, communications, shopping patterns and their consumption of entertainment and advertising,” says Krzysztof Sobieszek, research representative on the board of NaszaKlasa.
“We have created a big data system for analytical goals using our own efforts. Every day the volume of data increases by more than 1.3TB.
“Allegro.pl, the largest internet auction service in Poland, has done something similar, using Oracle databases, analytical systems and Exadata database machines."
Wojciech Szczęsny, CTO of Allegro.pl, says: “Big data software is created and tested mostly by our startups, Allegro subsidiaries. The systems have to analyse users' behaviour, to assist them in moving through the site, finding wanted goods and shopping. We use cloud storage from Beyond.pl for building big data systems.”
PKP PLK manages the state railways in Poland and controls 18,500km of lines. It plans to thoroughly modernise the railway network in Poland – but that begins with its IT infrastructure.
“We anticipate a rapid growth of data in the near future. And there will be demand for quick access to detailed analysis that will force us to increase our IT infrastructure to collect, process and analyse data,” says Rafal Zbirog, IT director at PKP PLK.
So far, the organisation has virtualised servers with VMware and gradually consolidated and simplified the entire IT infrastructure. Data has been migrated from Oracle databases onto an SAP data warehouse with analytics running on SAP Hana.
“Today, the process of generating an analytical report is not counted in hours, but minutes and seconds,” says Zbirog.
"At the same time we’ve got dynamic data growth and need to find a way to capture and manage big, fast data and provide a platform for our users to analyse this information in real time.”
Big data systems need great efficiency, flexibility and capacity that cannot be ensured by traditional storage systems
Michał Słupski, Geofizyka Torun
Under the pressure of rapidly growing data and the need to process it, oil and gas exploration company Geofizyka Torun implemented a big data system. Geofizyka Torun has an extensive international presence and works as a contractor for companies such as Chevron, ExxonMobil, GSPC, Oil India, Shell and Total.
Every day Geofizyka Torun stores, manages and processes about 100TB of production data, related to seismic research and geophysical measurements. The volume of data processed has grown rapidly in recent years, with the rise of shale gas exploration.
Geofizyka Torun was using IBM iDataplex and Sun Fire X4100 servers running SeisSpace/Pro MAX, GeoDepth and Echos software. Data was processed by more than 50 analysts in four labs who analysed data in different stages.
“We see continuous progress in the field of tools for geospatial analysis and ground examination. But big data systems need great efficiency, flexibility and capacity that cannot be ensured by traditional storage systems,” says Michał Słupski, ICT manager at Geofizyka Torun.
Geofizyka Torun looked for a system to automatically distribute data flow and ensure continuous data access at the application level.
The company also required advanced performance monitoring to allows users to perform a detailed disk array load analysis, and create reports on disk array condition.
EMC Isilon met all these requirements. The system provides advanced control and allocation of specified storage resources to particular departments and individual users with high throughput and I/O.
“Isilon is an advanced tool designed to manage capacity allocation. This made it possible to allocate capacity on multiple levels; for example, to a unit in the organisation and to individuals in that unit. It also made it possible to identify and solve problems associated with the incorrect use of system memory,” says Słupski.
Isilon considerably improved the functioning of Geofizyka Torun’s whole system and streamlined the IT department’s operations. The system is largely self-operating and does not require continuous monitoring. An appropriate amount of working memory is allocated to each user. One feature of the system is its substantial scalability. Currently it uses 10 nodes and has the capacity to expand to 144.
“Isilon offers the possibility to grow the system quickly – adding a node takes less than a minute – and this is an entrance to big data processing,” says Słupski.