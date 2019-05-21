By using Google Kubernetes Engine (GKE), price comparison site MoneySupermarket.com has been able to parallelise its data pipeline. This is part of a wider deployment of analytics services on Google’s public cloud. It recently moved to Google Cloud Platform (GCP), which has enabled it to take advantage of the analytics services built into GCP.

“GCP is being used as our analytics cloud platform,” says Harvinder Atwal, head of analytics at MoneySupermarket.com. “We did a proof of concept with Google. It has invested a lot on analytics services, which means there are many managed services on GCP, so our team of data scientists have a lot less to worry about.” He adds that GCP offers MoneySupermarket.com an easier-to-maintain analytics platform.

For its enterprise data warehouse, MoneySupermarket.com takes lots of data from its website, which goes into Google’s BigQuery. It uses Google Kubernetes Engine (GKE) to orchestrate a process through containerised applications that cleans the data and loads it into BigQuery.

“BigQuery is very fast and scalable,” says Atwal. “We don’t need to worry about fix sized queries and Google takes care of scaling. BigQuery becomes a main point of truth, and it also becomes the integration point for other data, enabling the data science team an MoneySupermarket.com to integrate third party data.”

Using BigQuery also helps MoneySupermarket.com speed up the process of extracting, translating and loading data (ETL) into its enterprise data warehouse. “It takes a lot of work taking raw data to ingest into a data warehouse,” says Atwal.

“The ETL pipeline can be quite brittle. Rather than wait for the ETL developers to create a data pipeline, we now ingest direct into GCP.”

He says MoneySupermarket.com has also created a training and model scoring pipeline using GKE and containers to break up model training into individual tasks.

Machine learning data pipeline The various steps in the machine learning data pipeline involve data quality, preprocessing for normalisation and standardisation, feature extraction to identify new classes of data, model training, and assessments of model accuracy accuracy based on using a test data set. The flexibility of GKE allows MoneySupermarket.com to use it for several projects, including machine learning (ML) and web-facing application programming interface (APIs) – using Python and mostly XGBoost as the ML classifier in the container application code.