Africa Studio - stock.adobe.com

News

Inside Grab’s DevOps practices

A senior software engineer at Grab offers a peek into the DevOps practices at one of Southeast Asia’s technology unicorns

Aaron Tan, Informa TechTarget

Published: 26 Jan 2021 2:31

For the most part, DevOps is about bringing development and operations teams together to speed up application delivery to keep pace with the needs of a business. At Grab, one of Southeast Asia’s technology unicorns, DevOps responsibilities are shared across the whole organisation.

The company’s DevOps teams are primarily focused on enabling engineers to test and deploy their code on their own instead of handing that responsibility off to a different team.

In an interview with Computer Weekly, Allwin Baby, a senior software engineer who is part of a team that manages the DevOps lifecycle at Grab, offers a glimpse into the DevOps practices at the company, how he became a DevOps engineer and the challenges of the job.

What exactly is your job like on a typical 24-hour day – is it deskbound, or on shifts, who might you be with, where might you be and what might you be doing?

Allwin Baby: I am part of the foundations servers and enablement team, which is one of the teams that focuses on managing the DevOps lifecycle at Grab. There are five of us in total, and we ensure our engineers can work efficiently and publish their work safely.

It is, for the most part, deskbound. My day starts with a stand-up meeting at 10am when we regroup and provide updates on tasks we are working on and share any challenges we are facing so that other team members can lean in and support whenever needed.

“I am part of a team that focuses on managing the DevOps lifecycle at Grab. We ensure our engineers can work efficiently and publish their work safely”

Allwin Baby, Grab

After which, we are very much left to work on various tasks on our own, which includes fixing reported bugs, making sure our developer tools are up to date, developing and testing new features for our infrastructure, improving availability or proactively fixing potential bottlenecks, post-mortems for past incidences, and liaising with suppliers to address issues, among other tasks. We’d end the day around 7pm.

We also have a rotational on-call schedule, where the person on call will be on standby to address any customer issues or service disruptions that may occur. A shift lasts a week, so we end up doing about a shift a month.

Was it a conscious decision or a serendipitous event that led you to a career as a DevOps engineer at Grab?

Baby: A little bit of both. Prior to joining Grab, I had very limited experience in doing DevOps or cloud infrastructure.

Back then, I was working at a startup and split my time between developing server back-ends and doing a poor job trying to translate Sketch designs to web apps. I was working in a very small team, which meant that there was no real need for a dedicated DevOps team and hence fewer opportunities for me to learn about it.

It was during my interview with Grab when I realised just how much I didn’t know. So when the opportunity presented itself, I was happy to take it.

Did you pursue any specific education and personal training regime to give you an edge in this career?

Baby: I graduated with a bachelor’s degree in computer engineering, and while that helped, I didn’t feel it necessary to develop a career in DevOps.

What you do need is interest, and a drive to learn and be better. I used to participate in a lot of hackathons as a student and that helped a lot to improve myself. A good foundation in algorithms and data structures, familiarity with a programming language such as Go, Ruby, Python, Java, an understanding of system design and practical knowledge of Git and Linux – whatever I could not learn in university I was able to learn online with some effort.

How is your DevOps team organised? Who are the members and what are their responsibilities?

Baby: At Grab, DevOps responsibilities are shared across the whole organisation. We are primarily focused on enabling engineers to test and deploy their code changes on their own instead of handing that responsibility off to a different team.

When an engineer needs something created or changed, it has to go through several stages from being built to being tested on a staging environment before it goes live. Each of these stages has to be properly instrumented to collect data and detect anomalies. The DevOps team is responsible for providing the tools and systems necessary to enable our engineers to do all of this.

To that end, our team is split into smaller teams focused on a specific part of the DevOps lifecycle:

The build automation team focuses on maintaining the first stage (continuous integration) and is responsible for policing the general quality of code on our mono repository.
The test automation team focuses on the second stage and is responsible for building systems that perform engineer-specified end-to-end tests.
The deployment automation team focuses on providing our engineers with ways to deploy their changes safely.
The observability team focuses on providing tools and software necessary for metrics and logs collection.
The frameworks team (internally called “flip”) is responsible for all the machinery and libraries used by our microservices to do what they need to do, including configuration management and inter-service communication.
The foundations teams are responsible for the general health of our cloud infrastructure and certain services used across the organisation.

Are there any roles that are not usually seen as DevOps roles but are instrumental to the success of DevOps teams?

Baby: Yes, definitely. DevOps is almost an independent service provider within the company given the size of our operations, with our customers being the thousands of engineers working in various roles. We run campaigns and communicate changes and new features to the engineering organisation to encourage them to experiment and adopt them.

As such, our broader team includes members who audit our processes and operations to make sure we have adequate documentation for all our services, as well as create training materials for interested engineers to learn something new.

Most of the tools we build come with user interfaces, so we have a few front-end engineers and user experience designers working in our organisation even though this is not very common.

What are the skills required of a DevOps engineer? Could you elaborate in terms of platform familiarity, programming/scripting languages, configuration, provisioning and deployment, security, integration and communication?

Baby: I’d argue you’d need to have most of these listed skills to some degree. Familiarity with your chosen cloud provider, supplier tools such as GitLab and the ability to program scripts are mandatory to complete most tasks.

“While we work in different teams focusing on a different part of the DevOps lifecycle, they are all tightly coupled so we constantly communicate and sync with all the other teams to make sure a pending change does not cause disruptions further down the chain”

Allwin Baby, Grab

Programming skills are also important when you’re designing new internal tools to make a process or workflow easier for your engineers. Other skills such as deployment or configuration management are obviously crucial if you belong to the team responsible for managing that for the whole company.

And while we work in different teams focusing on a different part of the DevOps lifecycle, they are all tightly coupled so we must constantly communicate and sync with all the other teams to make sure that a pending change does not cause disruptions further down the chain. We often conduct operational excellence meetings and knowledge sharing sessions to keep each other on the same page.

So far, what has been the biggest challenge you have ever faced in your job?

Baby: My biggest challenge was adapting to the sheer scale at which Grab Engineering operates, which was obviously very different from my previous experience in a startup. The engineering challenges faced here are very different.

For example, soon after I joined, one of the biggest challenges we faced was that our Git remote repository was silently dropping commits every now and then. An engineer would develop some feature, get it reviewed, then merge that to the trunk, and a couple of hours later there would be no evidence of that change ever being merged.

After a long and arduous investigation, we discovered the root cause of the problem to be a tiny bug in the Linux kernel. Our numerous engineers update the remote so frequently that the Git remote inadvertently lost track of certain commits when updating the branch. But we do get to learn from these issues, and it helps us deal with these sorts of issues better.

Inside Grab’s DevOps practices

A senior software engineer at Grab offers a peek into the DevOps practices at one of Southeast Asia’s technology unicorns

Read more about DevOps in APAC

Read more on IT operations management and IT support

How breaking things builds resilient systems

Inside Grab’s platform strategy

Security highlights from KubeCon + CloudNativeCon 2023

How Endowus is leveraging Kafka and microservices