Chad McDermott - stock.adobe.com

How Agoda is shaping up to be a technology powerhouse

Agoda’s CTO Idan Zalzberg explains why the online travel agency with a massive technology footprint prefers to run things in-house and not rely too much on public cloud services

With a massive technology footprint, including half a million CPU cores that process some two trillion Kafka messages a day, Agoda has the scale and expertise to run its IT operations efficiently and at a lower cost than many of its peers.

That’s why the online travel agency, which caters primarily to consumers in Asia-Pacific, runs its IT infrastructure almost entirely on its private cloud. The company, an emerging technology powerhouse, operates four datacentres across the region and uses a lot of open-source software to build its services.

In an interview with Computer Weekly, Idan Zalzberg, chief technology officer (CTO) at Agoda, talks up the company’s technology strategy, such as building its own expertise to avoid vendor lock-in. He also shares his thoughts on the challenges of public cloud adoption and what Agoda is doing to manage the complexities of working with a large partner ecosystem.

Aaron Tan: Tell us more about Agoda’s overall technology strategy

Idan Zalzberg: I think what drives our technology strategy is not only our business needs, but also the culture and values we want to drive within the company. First, we’d like to have a culture where teams can work independently, be as self-service as possible and not have to wait for somebody to approve anything before they move forward. And they have to move fast in small increments and learn.

They also must be able to experiment. One of the things we’ve built is an extensive experimentation platform that runs, at any point in time, around 1,000 experiments to improve our user experience. A user would be subjected to about 72 experiments on average, which means no two users would have the same experience on Agoda.

When you take all those things together, that’s what makes us special. We want to be self-service, and still be able to do it efficiently at scale while maintaining quality. So, there are a lot of things we want to do at the same time, which is not easy for us as a tech organisation.

That leads us to our strategy of running things on our own and not relying on the public cloud. We’re almost completely on-premise. We have four datacentres, and we use a lot of open-source software. We build a lot of stuff ourselves and we see the value of doing that. We also try very hard to avoid vendor lock-in.

One of the things that’s important to us is to keep people moving fast for the long term and to keep our technology stack to the minimum. If we let people use whatever database they want on the cloud, not only would it be difficult to offboard later, but we could also have a shallow organisation in terms of knowledge. For us, it’s important that we have deep knowledge and become experts in different technologies to achieve efficiency at scale.

Sometimes, the cloud makes it easy for people to choose the services they want, but they lose the ability to become experts. I have no problem with paying for a good service, but what I don’t like is people paying out of ignorance. They may not even know how a service works, but they just use it because they can click on a button, and it just works. So, it’s very important from a strategic perspective that we don’t use something that we don’t understand.

A lot of vendors like to say that it’s too hard and you can’t do it yourself. But honestly, it’s a double-edged sword because once you start relying too much on public cloud, how can you even get talented people to be interested in what you’re doing?
Idan Zalzberg, Agoda

Now, there are some services that are awesome in the cloud. We use some SaaS [software-as-a-service] services as well, but in some cases, some services almost feel like it’s an open-source project. And the vendor charges an insane amount of money for it because they manage it for you as a service.

While it’s not common to see companies running off the cloud today, every time we test the use of cloud, it always comes up way more expensive when I look at the cost per core and account for the cost of land and cooling infrastructure of our datacentres.

That’s why we took the private cloud approach where we run everything on Kubernetes with our platform on top of that. We want to be in a position where we don’t need to feel like we’re stuck on public cloud or on-premise. We want to have a flexible platform that can run anywhere, whether it’s some cloud in some location because of connectivity, or requirements for us to have datacentres in some countries. But if it’s more cost effective for us to manage our machines as it is today, then we will do that.

Tan: I’ve spoken to companies that are also running a fair bit of on-premise infrastructure. From a cost perspective, would it make more sense for certain types of workloads to run on the public cloud than on-premise in terms of total cost of ownership (TCO)?

Zalzberg: I agree with you that we should look at TCO, but we must come from a position of knowledge. When someone comes to me and says we can go for the cloud, we can build things ourselves or we can buy proprietary software, it’s all on the table. I’m not dogmatic but do the research. Don’t come to me because the vendor has a nice homepage or it’s so easy to log in. Maybe it’s great that it’s easy to onboard but let’s look at the whole picture. What does it mean? What’s the cost structure? What would be the equivalent of doing it ourselves? And what happens if we don’t want to use it anymore?

That’s where a lot of the hardship with public cloud adoption comes in. For many public cloud services, because you’re often working with startups, sometimes they disappear when they get acquired and we will need to find a solution and figure out how to offboard. If we are in control, then we can make sure that on the first day we onboard, we already know how to offboard if we need to. And it’s only possible when you’re coming from a position of knowledge.

A lot of vendors like to say that it’s too hard and you can’t do it yourself. But honestly, it’s a double-edged sword because once you start relying too much on public cloud, how can you even get talented people to be interested in what you’re doing?

Today, you can talk to anyone at Agoda about the internals of Kubernetes, Spark and Kafka, even up to the CPU level. We celebrate technical knowhow and expertise and that’s important. I wouldn’t say it’s the most important thing because the business case should also make sense. But by itself, it’s what makes our employees happy and why they enjoy working for us.

That said, it’s important that we keep validating our TCO. We’ve proven again and again that we are way more cost efficient than current public cloud offerings in running Kubernetes containers. Even if you look at data engines, like the ones from the major vendors that cost more, we’ve shown that we can achieve the same performance in our own datacentres. Every few months, we run tests and if the cloud becomes cheaper, I’m fine with that. But I want to make sure I am down to earth in that sense because it’s hard not to be like everyone else. The philosophical side of being a company with knowledge and expertise is a bonus, but it cannot be the sole driver of our decisions.

The other side of it is the fear. Obviously, what will drive us to the public cloud today is not cost, but how long we can maintain our expertise. We need to be experts in hardware, Kubernetes, big data and Kafka, on top of other things like React and mobile development. It’s a tall order and in the worst-case scenario that I’m unable to maintain our expertise in the long term, I will always have this big lever that I can use to jump to the public cloud with my containers, even if it’s less cost-effective to do so.

Tan: So how does Agoda sustain its talent pipeline?

Zalzberg: It’s important for us to bring in our own talent. We hardly use consultants and when we use them in the short term, we will learn things on our own, again because we want to avoid vendor lock-in. I have no problem with consultants who can do a great job, but you are locked into their services. You have to pay whatever they want and every time they raise prices, it’s your problem.

To attract talent, you need to know what they want. Generally, they want to work with smart people, learn and be challenged. So, there is a positive feedback loop when you start to undertake big projects. You don’t say, this is too much for us – you go for it. For example, we talked about our experimentation platform, but the ones from the big vendors don’t do it very well, so we built our own from scratch.

You might wonder why a travel company would want to do that? While everything we do has a business case, not being afraid to take them on is what inspires people to work with us. I’ve had so many people in the last couple of years who left for companies like Meta, Grab and Google but came back because of our culture of getting stuff done.

Being in this region helps, too. If we were in Silicon Valley, which has a concentration of talent, we would be competing with other companies that are also doing great technical stuff. Here, I don’t know many companies that would be competing with us for talent.

Today, our talent comes from across the region, including Thailand, Indonesia, Vietnam, Korea and India. We do a lot of code competitions, and we work with universities to get hundreds of interns every year – that’s a great way to connect with tech talent.

We also coped well during the pandemic. We didn’t have a spike in hiring that we had to undo with layoffs – and people appreciate that consistency. If you look at work-from-home arrangements, many companies go back and forth between working from home the entire time, to returning to the office. We try to be consistent and not just jump on the bandwagon. If people cannot trust decisions made by the company, then their stress level goes up and they won’t feel safe in their jobs.

Tan: Could you give me a sense of how your teams are organised?

Zalzberg: We’re organised around different functions that follow the flow of the user. For example, in acquisition marketing, we take the user through the marketing funnel before they make their bookings. Then, you have post-booking, customer support and finance – and each of them is a tech organisation. We also have our platform and data teams.

While it seems like we are very product oriented in the way we organise ourselves, it’s also very tech oriented. If you look at the marketing funnel, for example, we move very quickly and run a lot of experiments because we’re trying to improve conversion when customers search for hotels, flights and other products. So, we need to work with databases to get the results quickly and optimise performance.

Now, compare that to a person working on our booking site. For them, it’s all about being accurate, transactional and making sure nothing gets lost. Everything has to happen exactly as planned, so it’s not so much about moving fast. And obviously, it’s not the same load as the search process.

One of the things I'm trying to push for is a log database like what Grafana Loki does, but completely built from scratch. We have 100 terabytes of logs a day, so once we are confident about using this technology in production, we’ll be happy to release it
Idan Zalzberg, Agoda

The marketing side, on the other hand, is very data oriented. There’s a lot of data science involved, and you have to do automated bidding and email generation, but it’s offline work for the most part. So, each of our teams take ownership of a function, they understand their product areas and have similar expertise to solve problems in their respective functions.

Tan: Agoda works with a lot of partners such as hotels and airlines and some of them might be in various stages of technology maturity. What sorts of challenges are you facing in that regard?

Zalzberg: That’s a great point and so many people can miss the fact that when you run a business like ours at scale that works with thousands of partners, there’s a lot of complexity involved. There are no standard APIs [application programming interfaces], and some partners have a limited number of API calls they can take. We make tens of billions of API calls a day and sometimes they can only handle a fraction of a fraction of that.

To overcome that, we use data models to predict how long a room rate is valid before we make the call to get a new rate again. We also look at the chance of a customer being interested in a certain property or rate and get those to the customer earlier. Some suppliers have very good deals, but they may not have a certain API response time. Our mission is to get the best deals for our customers, and we want to make sure we can find them.

Tan: You talked about leveraging open-source software like Kubernetes and Kafka. Are you working on projects that you are planning to release to the open source community?

Zalzberg: One of the differences between us and a Silicon Valley company is communication. For us, getting a blog post out takes more friction than you would think, partially because of the language barrier. It’s harder for my team to write articles and produce clear documentation that’s needed before we can proudly release a project to the open source community.

We have a few projects that are publicly available on GitHub, but they make up a small subset of what we build. One of the things I’m trying to push for is a log database like what Grafana Loki does, but completely built from scratch. We have 100 terabytes of logs a day, so once we are confident about using this technology in production, we’ll be happy to release it.

Tan: For the open-source software that you use, do you use any enterprise grade support from vendors like Red Hat?

Zalzberg: Not a lot. We tried it a few times and we were often disappointed, maybe because we don’t get the best people allocated to us in the region. When we tried to get support, it was not fast enough, and they didn’t seem to have the passion to solve the problem.

For example, they could say something is a known issue, but just telling us that doesn’t solve the problem. Maybe it isn’t an issue but give us the workaround. Very few companies have the mentality of trying to solve a problem, not just to close the support ticket. That’s been my experience. I’m sure there are great companies out there, but for us the experience has been that it’s always better to build our own expertise.

Tan: Besides the infrastructure pieces, what sorts of capabilities are you looking to build for the Agoda platform?

Zalzberg: I talked a lot about the internals and that’s the stuff I’m very passionate about. For sure, we are building exciting things this year. First, we’re evolving our travel platform into a travel super app. As you might have noticed, we’ve added offerings for attractions and activities recently. We’ve had flight offerings for some time, but those are now being taken up like crazy. We’ll probably have a car transport offering later this year.

So, we’re building these pillars of travel on top of our accommodation business, but the key is to connect them. We already have this idea of a shopping cart that people recognise from e-commerce. The idea is for you to add things to your cart to build a package and get better deals.

We’re also excited about B2B [business-to-business] opportunities where we work with organisations like banks and airlines to power their travel websites and that’s a testament to our technology. In fintech, we are also looking at the ability to take a booking that’s not refundable or cancellable and make it cancellable, basically putting the risk on us.

That’s where all the data science and understanding of the market comes in. The hotel may not allow me to cancel a booking, but I will cancel it for you and try to sell it to someone else for a smaller fee. Or, if you’ve made a booking and the price dropped, I could make that booking again for you and return the difference to you.

Read more about IT in APAC

Read more on IT strategy

CIO
Security
Networking
Data Center
Data Management
Close