vadim yerofeyev - Fotolia

Five lessons learned from a year in the cloud

Startup online estate agency was built on the cloud. The technical director shares the company's experiences and lessons learned

There are a great variety of startups benefiting from the scalability of cloud computing, growing their business and customer offerings as a result. Online estate agency, launched in April 2014, is one of them. Our verdict? While cloud poses certain challenges at first, it is well worth the investment in the long run. 

Here are five lessons for startups in the early stages or considering cloud technology for the first time, based on our experiences.

Message queuing - essential but fiddly

A message typically represents a task created by someone – the “producer” – that has to be processed by someone else – the “consumer”. Each message has a body and some attributes - the main architectural benefit is loose coupling. 

A message queueing service aims to remove the traditional overhead associated with operating in-house messaging infrastructures. As well as reducing cost, queues in the cloud simplify access to messaging resources and facilitate integration efforts in organisations and between them.

Queues leverage cloud computing resources such as storage, network, memory and processing capacity. By using virtually unlimited cloud resources, message queueing services provide an internet-scale messaging platform.

At the start of a project, it’s difficult to predict what the future needs of the project will be. By introducing a layer in between processes, message queues create an implicit, data-based interface both processes implement. This allows you to extend and modify these processes independently, by simply ensuring they adhere to the same interface requirements. 

Optimise to reduce fees

Optimising is key if you want quality performance. Microsoft Azure, or any cloud service, is built to penalise you if you use its resources poorly. The challenge is to fix this before the invoice for an unenlightened design decision arrives.

Saving money and application performance go hand in hand. When launched in April 2014 and our first TV adverts hit the screens later that year, we made a conscious effort to scale up our servers. 

This meant our highest monthly bill to date was our first month, despite the lowest traffic. Through experience, we have found the key is to set alerts and auto-scale correctly to ensure the cloud powers up extra resources when needed.

Architecting the cloud is a learning curve

The cloud stores information in different places and the rules are different for each. Expect a learning curve and learn from your mistakes.

When unexpected things happen in the cloud, it is worth taking the time to dig deep into what went wrong and determine a course of action to help mitigate the issue in the future. After significant outages in both Windows Azure and Amazon EC2/S3, suppliers publish a root cause analysis. It is important to read these and become familiar with them.

We constantly ask ourselves whether this is something that can happen to our code. In the same way, if your systems experience any downtime as a result of cloud outages, share your root cause analysis with your customers, especially on how you plan to prevent it from happening again.

It is important to acknowledge the cloud is a constantly moving platform and features are released weekly. Keep up to date with these and create a learning and sharing culture in the development team.

If it wasn’t logged, it never happened

We need to be careful about what we log and we should log everything that can help us figure out what went wrong.

When working with the cloud it’s normal to experience failure every so often. Never build an application without thinking about how you will recover from a fault and how long will it take. 

Sometimes, when you start on a project, you only have time to plan for the straightforward cases. This means we can learn a lot from a new application with real users. To facilitate this learning process and support it properly, you need logging in place.

Prepare for failure

Cloud providers want us to believe the cloud never fails. The reality is different. It has been proved several times that even well-managed clouds will fail. The problem is not that they fail, but that most people are unprepared for such failures because they believe the cloud is an indestructible silver bullet.

Cloud providers do not explicitly plan for the failover of your services – they just provide the platform and the tools and it’s your job to plan and implement your own failover system.

Read more about startups and the cloud

Cloud services are known for their accessibility, but they are still bound to Murphy’s law: “Anything that can go wrong, will go wrong”. Amazon Web Services (AWS), Microsoft Azure and Google Mail, among others, have all failed in the past and most of them will fail again in the future.

An important step for us was to lessen the reliance on third-party integrations. Our working assumption is that any third-party system will fail and – if handled badly – a third-party slowdown could quickly escalate and become our slowdown and affect our systems.

To mitigate this, the vast majority of our services run through an out-of-band message bus. Messages sent to the bus are sent in a “fire and forget” fashion. Messages sent to the bus take an average of 2ms, regardless of the state of the third party. This mechanism allows us to handle requests in a fashion that does not affect the user’s experience.

All of our emails are handled by a specialist email provider, which provides an application programming interface (API) that we use to send emails. This service has proved to be highly reliable, but if it has performance issues or the service becomes unavailable, the user experience is not affected because failed messages are stored and placed in the queue to be re-sent later. 

This mechanism allows us to handle a complete outage from a range of providers using the same principle, without having to worry about our users being affected. Once a provider resumes service, we simply pick up the previously failed messages.

David Kavanagh, Purplebricks.comDavid Kavanagh is technical director at, the online estate agency.

Read more on Technology startups