peshkova - Fotolia

DevOps done right: Why work-life balance matters to digital transformation success

DevOps practitioners warn enterprises off neglecting the health and well-being of the IT staff responsible for delivering their digital transformation projects

This article can also be found in the Premium Editorial Download: Computer Weekly: Making the UK fit for 5G

As enterprises in every industry grapple with digital transformation, and fixate on meeting user demands for always-on services, IT departments find themselves under growing pressure to perform and deliver.

It is not difficult to see why, against this backdrop, enterprises are increasingly looking to DevOps as a means of speeding up their software development and testing cycles, as well as their responsiveness to ever-changing customer demands and expectations.

The long-term success of any DevOps transformation, though, requires a company culture that is collaborative, experimental, and – crucially – empathetic to the needs of the IT workers responsible for delivering it.

In the pursuit to ensure user expectations are met, companies can fall into the trap of overlooking the importance of empathy and – in turn – the work-life balance needs of their IT department.

Rob Elkin, chief technology officer of Busuu, a global social networking site for people learning new languages, says the “always-on” nature of digital services puts pressure on IT departments to have someone on-call at all times of the day and night in case something goes wrong.

“We’re in a 24-hour business now where our websites, applications and services are always on and available and we, as technology professionals, need to be always-on as well,” says Elkin.

“People expect it from every part of the business, but especially from the IT department because technology is becoming just a massive focus for so many businesses.”

Oliver Wood, an Amazon Web Services (AWS) architect at managed services provider, Solarwinds, backs this view. “You see a lot of products and services supported through goodwill and there is no on-call rota in place. So, when something goes wrong, everyone piles in to help, and that means you're always kind of on duty,” he says.

And how on-call work gets organised, along with senior management’s attitude to those carrying it out, is a good barometer of how empathetic an enterprise is.

For instance, is responsibility for responding to out-of-hours downtime alerts shared equally amongst the team, and are people compensated when their personal lives and sleep patterns are disrupted by out of hours work?

“If I’m up until 2am trying to fix something, I’m not going to be very effective the next day and we need to recognise that because it can seep into other parts of the company culture,” says Elkin.

“Expecting people to work crazy hours and be in the office the next day will contribute to them getting burned out really, really quickly and that becomes a really bad culture for a technology team to operate under.”

For that reason, it is important to have systems and processes in place that allow stressed-out employees to raise a red flag when they run into trouble, and make senior management aware of any emerging problems. Especially because people are often reluctant to seek help for workplace stress or feelings of burnout, says Elkin.

“People have a tendency to just deal with it. They will moan about it a little bit, but it can soon start to affect family life, their social life and every other part of it,” he says.

Hannah Foxwell, product manager at software as a service infrastructure monitoring company Server Density, says – based on her experience working as an IT programme manager at a large UK supermarket chain – it all too easy for burnout to creep up on people.

“I was in a delivery-focused role, so I was incentivised to hit deadlines and you put a lot of pressure on yourself and the team around you to hit those milestones, and working very hard,” she recalls.

“The morning meetings would creep earlier and earlier to fit everything in, starting with an 8am update call, and then at 7pm you would find yourself having to decide whether or not to order takeaway for the team because everyone is still in the office working,” she says.

“Then, when you do leave the office, it’s like – right, down the pub, because you need to decompress as quickly as possible. I don’t think I burned myself completely out, but it was not a healthy time for me either,” she adds.

Ask for help

Employers also need to recognise that there will be people in the organisation who may not feel confident or comfortable opening up about this to senior management, says Jon Cowie, web operations manager at online marketplace Etsy.

“Many of your employees may have spent years in the technology industry being penalised for speaking out and causing a fuss,” he says.

“Don’t assume your employees will necessarily come to you if there is a problem affecting their lives, but be switched on to the fact there might be and pay attention.”

Employers that ignore the warning signs or fail to take appropriate action when a member of staff does speak up can leave themselves open to legal action, warns Emma O’Leary, employment law consultant at business support consultancy Elas.

“If you tell your employer the working conditions are impacting on your health, they have a duty to act reasonably and a duty of care to protect your health and safety,” she says.

“If your employer is refusing to do this, or dismisses you or treats you differently as a result, your only recourse would be the tribunal.”

Humanising operations

The importance of empathy in the context of IT has emerged as a major talking point over the past 12 months, giving rise to conferences talks, standalone events and online discussions about the how the push for continuous delivery can affect the health and well-being of DevOps practitioners.

Server Density has played an active role in getting the DevOps community talking about this issue, through a series of events in the UK under the HumanOps banner.

This, in turn, has given rise to community-led events in San Francisco and mainland Europe, where DevOps practitioners share best practice and guidance for creating empathy-laden working cultures within their teams, and the wider benefits this has brought to the organisations they work for.

Server Density CEO and founder David Mytton tells Computer Weekly his organisation’s interest in “HumanOps” started back in early 2016 with some realisations about the impact its system monitoring products could have on the work-life balance of the IT departments using them.

“The product is designed to wake people up [when an incident occurs], and we started to look internally at how our team deals with problems in our own infrastructure, and the number of times our team has to deal with incidents out of hours, and thought other people are probably looking at this too,” he says.

From speaking to industry contacts and Server Density’s own client base, Mytton says the company quickly came to realise very few enterprises are keeping tabs on how out-of-hours system alerts, unsociable working hours and punishing on-call rotas affect the work-life balance of IT teams.

Etsy is an oft-quoted example of a company that does, after introducing an opt-in programme whereby the sleeping patterns of its developers are monitored using wearable fitness trackers. The data from the devices is then used to inform their managers about how to organise its staff’s on-call schedules of their staff. 

Automation software supplier Chef also devoted an entire keynote, during its annual user conference in July 2016, to the importance of creating supportive and empathetic working environments for developers and operations staff.

According to Mytton, the big tech firms and the startup community tend have a better track record than most enterprises when it comes to ensuring the work-life balance needs of their IT staff are met.

“The massive technology companies, such as Etsy and Google, are doing this well because this way of working is already embedded into their culture, and then there are the startups who are influenced by that and want to work the same way,” he says.

“Then there is a massive gap in the middle made up of companies that aren’t cool startups or big tech firms that aren’t doing anything, and they are in the majority.”

Digital transformation dangers

Many of the organisations in this “majority” are likely to be either in the throes of (or on the cusp of) digital transformation initiatives, and safeguarding the health and well-being of their IT staff should be a high priority, argues Busuu’s Elkin.

After all, if IT staff start to experience burnout , the risk of absenteeism through illness rises. It might also push some people to reconsider their future with the company altogether.

This, in turn, could have a dampening effect on the company’s digital transformation ambitions, says Elkin. “It is incredibly dangerous because people do not tend to speak up [about how unhappy they are] until it’s too late, which is really dangerous for an organisation because they start losing people.”

And it can prove difficult to replace them, particularly if the organisation has a reputation for not looking after their staff properly, says Wood.

“I get people emailing me asking what it is like to work for company x, for example, and you can only be honest because you can’t say it was great if it wasn’t because you are always going to be associated with that,” he says.

“Equally, when I look a prospective employers, I get in touch with friends that have worked there because there are relatively few tech firms in the UK where you don’t know somebody who knows somebody who works there.”

It can also cost firms dearly, in terms of productivity and effectiveness, if the people tasked with delivering a company’s digital transformation are suffering from tiredness and exhaustion.

“Staff retention is one thing, because there is a cost involved with replacing someone, but there is also a cost associated with having someone sat that their desk who is not being effective because they’ve had no sleep,” Elkin says.

“That person is likely to create more bugs, more problems and create more frustration in a team that is likely to be tired and frustrated already, which is all going to add up to negative sentiment in your team, as well as economic problems.”

Economically speaking

When getting this point across to senior management, it is worth pointing out that – in other fields of work – there are measures put in place to prevent people turning up to work fatigued for precisely this reason.

“In specialise areas, such as medicine, there is a lot of evidence about the effect unsympathetic shift patterns and lack of sleep have on performance, accuracy and the emotional state of people,” says Dr. Paul McLaren, medical director and adult psychiatrist at mental healthcare provider The Priory Group.

“If complex tasks need to be done, you want people to be sharp and not cognitively impaired, regardless of whether they are writing code, putting carburettor in Ford cars or operating on patients.”  

Setting it out in these terms should help IT departments make the case for addressing how their on-call rota is organised, and ensure adequate compensation – such as time off in lieu - is in place for staff who find their free time disrupted by outages, says Mytton.

Therefore, IT leaders who want to initiative a workplace culture change need to focus on selling the business benefits to senior management if they want them to approve any proposed working practice changes.

“If you just focus the discussion on the fact that people are being woken and losing sleep, the response is likely to be, ‘it is on-call work, what do you expect?’” he says.

“But, if you emphasise the economic impact of that – because it is likely to result in people making mistakes, and not being as effective as they should be because they’re tired, you stand a better job of getting their attention.”

In DevOps-adopting organisations, though, it’s not just operations staff and engineers who can find themselves roped into on-call work: developers are often expected to take greater ownership of the code they produce by retaining responsibility for it once it enters production.

Read more about HumanOps

For this reason, Mytton says it is likely that enterprise IT leaders will start to find themselves under increased pressure from within their own teams to push through change, as more people get added to the on-call rota.

And the changes that need to happen to ensure a fairer deal for on-call teams need not be particularly complex or onerous, says Mytton.

“One thing we do is, if you’re woken up or you deal with an out-of-hours call incident, you are off-call for 24 hours to give people a chance to recover.”

“Because the worst thing is when you’re woken up in the middle of the night, deal with an issue, and then the next night the same thing breaks – or maybe something completely different – and the fatigue builds up as you’re being woken up, night after night while on-call.”

However, the ease with which an organisation can adopt a similar way of working largely depends on how big their IT team is, Mytton concedes.

“Once the team size goes over 2-3 people, you should be able to have enough people to provide cover,” he says.

“The difficulty lies when you just have one person or if there is just two of you, because there is no-one else who can take over.”

Under pressure to perform

While company culture is a big determinant of how empathetic an organisation is instances of staff burnout are not necessarily solely down to how senior management runs the company. How employees choose to work and the pressure they put on themselves to perform can also be a factor.

This is a situation Solarwinds’ Wood can relate to, after the pressure to go live on a troublesome IT project prompted him to work round the clock, and shrug off the nagging pain in his back as one of those things.

In time it emerged Wood needed major surgery, requiring him to take three months’ off work, after a disc in his spine collapsed.

“The company didn’t do this to me – I did this to me. They were very supportive,” says Wood.

“We can be our own worst enemies, and I cannot fault my managers in that job because their attitude was, just do what you need to do for you and get this sorted out,” he says.

At the time, Wood was working on (what he terms) an “ugly government IT project” in 2008 with a team who had jokingly adopted the mantra “go live or die trying” to describe their attitude to getting elements of the project into production.

“We were small team and you don’t want to let people down. Looking back on it, it was stupid, but it was me who let it get that stupid,” he says.

“I don’t know when it started. I remember one day swapping out my office chair for a conference room chair, just because it meant I could perch differently. It hurt, and looking back on it, that should have been a real red flag.”

Wood says the condition is one he will live with for the rest of his life, as the discs above and below the one that collapsed are at heightened risk of following suit, but it is manageable.

Wood shared his story at the 2016 DevOpsDays Conference, where he urged attendees to treat his experience as a cautionary tale, and a reminder to put their own health and well-being above the servers and systems they look after.

“I don’t think it is hard to sell the benefits of looking after your IT staff to a company that wants to hear them, and get them to understand why this is a sensible thing to do,” he says.

“But then you get people who like running their business that way, and they like everything to be in chaos, with last-minute things happening and people pulling out all the stops to get things done because they think it is a good way to be. Those are the companies you eventually walk away from.”

Why anti-social working is nothing new in IT

Even before the dawn of the internet age, IT departments have needed to exercise a degree of flexibility with regard to how long, how late and how antisocial their working hours are, says Helen Beal, a DevOpsologist at London-based digital transformation consultancy, Ranger4.

Even organisations that are not pursuing a continuous delivery strategy (but may plan to in future) might still require their technology teams to stay late or work weekends to carry out system upgrades or service rollouts, she points out.

“Lots of organisations ask a large proportion of IT staff to be on-call or on-site over a weekend while do a formal release, because they do them so infrequently and it’s a scary prospect for them,” she says.

“That can be a very stressful environment to work in because everyone is on edge and waiting to see what’s going to go wrong.”

In organisations that have embraced a DevOps-style approach to software delivery, the stressed generated by the rollout of new code updates tends to be lower as the releases tend to be smaller, more frequent, and less of a daunting prospect to embark upon.

But getting to this point, where frequent, low-stress code deploys become the norm hinge on how much an organisation’s senior management team trusts those pushing DevOps to get on with the job at hand.

“A high-performing DevOps environment, where frequent code deploys are the norm, is a very high trust environment, but trust is not something that it handed to you – it has to be earned,” says Beal.

Read more on DevOps

Data Center
Data Management