News

Digital Transformation Week Amsterdam: The growing divide between the haves and have-nots

The most successful companies carefully manage how data is taken in and processed – and they use tools to ensure they squeeze the most they can out of every bit of information

Pat Brans, Pat Brans Associates/Grenoble Ecole de Management

Published: 24 Oct 2022 13:25

Data and machine learning ecosystems change as organisations scale.What works for a large company differs from a startup, as the experience of online travel site Booking.com demonstrates.

Speaking at the Digital Transformation Week conference in Amsterdam in September, Sanchit Juneja, director-product of the firm's data science and machine learning platform, presented an ideal data ecosystem for a large tech company and showed how this differs from the data ecosystem of a startup. He used a layered description of all the data processing activities that need to take place in any company that has to process a lot of data and apply machine learning tools to maintain a competitive advantage.

In a big tech organisation, there are various data sources, he explained. These can be separated by vertical product groups that consumers interact with – for example, flights, attractions or hotels. This layer of processing is called the data formative layer. At this level, a user performs an action on the data – and, based on that action, the data is created. A decision is then made as to how the data will be formatted for downstream processing.

The data flows from the formative layer into a DataOps layer, which is a very new concept in the industry. At this layer, DevsecOps principals, such as Git, are applied to data pipelines. This layer provides information on how the data will be used downstream to the formative layer, where the data is produced.

From the DataOps layer, the data flows into a data aggregation layer, where it can be processed as a transaction, or it can be used for analytical decision-making. In the first case, the data is treated by a set of processes called online transactional processing (OLTP); in the second case, it is treated by a second set of processes, called online analytical processing (OLAP).

For transactional processing, e-commerce platforms might be used. For analytical processing, big data platforms are used. In a typical startup, this distinction doesn’t exist – one platform does both the transactional processing and the analytics. Only larger organisations can afford to make the distinction between the two types of system.

After the data is stored at the data aggregation layer, the data consumption part begins. If the data is being used for machine learning applications, part of this layer is called MLOps, which is a hot area with a lot of different tools being applied – Pachyderm, for example.

Some big organisations, such as Uber and Amazon, built their own MLOps layer – and what they built was so good that they are now selling it. Amazon calls its platform SageMaker; Uber calls its platform Michelangelo. Both are available as software as a service (SaaS) for smaller companies.

The data aggregation layer consists of several sets of activities. A group of product managers will be concerned with data protection, another will work on how data is stored, with a manager also looking at how data is presented.

Digital Transformation Week Amsterdam: The growing divide between the haves and have-nots

The most successful companies carefully manage how data is taken in and processed – and they use tools to ensure they squeeze the most they can out of every bit of information

Read more about professionalism in data science

Read more on Big data analytics

Everywoman announces 2025 tech awards finalists

5 tips for creating a data-driven culture

Salt Labs identifies OAuth security flaw within Booking.com

NCSC tackles unconscious bias in security terminology