A closer look at Google Container Engine

In just 12 months, Docker has transformed from a simple open-source container management project into a powerful platform

2014 was a big year for Docker. In just 12 months, it transformed from a simple open-source, container-management project into a powerful platform.

Along with hundreds of smaller open-source projects surrounding Docker, large platform companies such as AWS, Google, IBM, Microsoft Red Hat and VMware pledged their support. Google was one of the first public cloud companies to officially offer container hosting. Backed with proven experience of managing billions of containers, Google quickly moved towards opening up its internal tools to developers.

It first announced Kubernetes – an open-source, container-orchestration and cluster-management tool. It then launched Google Container Engine (GKE), which uses Kubernetes.

This combines the power of its infrastructure-as-a-service (IaaS) platform - Google Compute Engine (GCE) with Docker.

Although it is still in alpha (Google’s terminology for early access/technical preview), GKE is one of the first tools to have all the key building blocks of container management.

The goal of GKE is to make development and operations teams productive in managing container-based workloads. It hides all the complexity and mundane management tasks behind simple user experience and easy-to-use command line tools.

Read more about Docker and Kubernetes

Understanding Kubernetes

Kubernetes is the fabric of GKE. While developers need not learn it to use GKE, it helps to understand the concepts. Containerised applications don’t rely on underlying infrastructure. Since they have the required operating system (OS), runtime, libraries, frameworks and dependencies packaged as one unit, it is possible to deploy multiple containers on a single host or distribute them across multiple hosts.

As long as containers can discover the other containers they depend on, it doesn’t matter where they are deployed. This attribute of containers makes infrastructure a mere commodity. Once a fleet of virtual machines (VMs) is provisioned, they can be collectively treated as a single compute unit, making it possible to run clusters of containers.

While Docker effectively manages the lifecycle of each container, developers need a tool to manage an entire cluster of containers. Containerised applications need a discovery mechanism for dynamically connecting to each other. For example, the web server container needs to discover and connect to the database container. Kubernetes is the tool that provides discovery and lifecycle management for containerised applications running in clusters. From a DevOps perspective, Kubernetes is the remote control to configure and manage clusters, from running a few containers to large deployments dealing with tens of thousands of containers.

Kubernetes terminology

            • Clusters - Compute resources that run containerised applications;

            • Pods - Homogenous set of containers that share data with a constraint to be deployed on the same cluster;

            • Replication controllers - They manage the lifecycle of pods ensuring that a specified number of pods are running all the time;

            • Services - Load balancers that abstracts a logical set of pods. They route the traffic to one of the pods;

            • Labels - Identifiers that help in selecting homogenous pods to perform common tasks.

Technically, Kubernetes is not tied to Google cloud. It can be wired to work with a variety of infrastructure providers, ranging from bare metal to hypervisors to public cloud. Microsoft has integrated it with Azure VMs while VMware is working on getting it ready for vSphere and vCloud Air. Red Hat’s OpenShift already works with Kubernetes.

Those familiar with platform as a service (PaaS) will relate to Kubernetes. The metadata required to define the specifications and constraints are declared as part of a JSON file. The details of controllers, services and pods are declared in individual files and submitted to Kubernetes via the command line. The master server receives and parses the JSON file to decide the placement of each container. It is possible to define constraints to prevent provisioning conflicting containers on the same host. This is applicable to master/slave configuration of database servers. Kubernetes brings PaaS-like orchestration and lifecycle management to containers. In the future, we may see multiple open-source PaaS implementations powered by Docker and Kubernetes.

Google Compute Engine

Announced at Google Cloud Platform Live event in November 2014, GKE is still in its early days. It acts as an abstraction layer that orchestrates container management on GCE.

GKE provides two mechanisms to deploy and manage containers - web-based console and command line tools. In the current release, the command line is more powerful than the web console. Although the containers are provisioned on top of VMs running in GCE, it is not required to log into any of the VMs. The command line tools interact with the Kubernetes agent – called kubelet – that runs on each VM.

Since GKE simplifies Kubernetes, it provides further abstraction. The first release introduced the concepts of cluster and nodes.

A cluster contains master and worker nodes. The master exposes an application programming interface (API) endpoint for creating and controlling compute resources. Both the console and command line tools talk to it to perform tasks. A worker node in a cluster is responsible for providing the compute and memory resources to applications. They are the workhorses of the deployment. A node typically belongs to only one cluster and is provisioned as a part of the cluster creation. The number of nodes created represents the cumulative computer power of the underlying VMs. The cluster master schedules work on each node. All nodes in the cluster are based on the same VM instance type.

Apart from GKE, developers can also use managed VMs available through Google App Engine. They offer the best of both of worlds – PaaS and IaaS based on Docker containers.

With Docker moving fast in adding orchestration and clustering capabilities to the core engine, it will be interesting to see how the project differentiates from implementations such as Kubernetes.

Read more on Software development tools