peshkova - Fotolia
Chef CTO Adam Jacob is calling on the DevOps community to help foster a more empathetic working culture within their organisations to improve the work-life balance of their continuous delivery teams.
Speaking at the ChefConf summit in Austin, Texas, Jacob opened up about the pressure that developers and operations teams often find themselves under to ensure the applications and services they work on are always available and functioning as expected.
“This is the feeling of nights spent away from the people you love. This is the feeling of stressful days, and that feeling in the pit of your stomach where you’re not sure the thing you said would work is actually going to,” he said.
“This is those moments where [you start thinking] if I don’t hit that deadline, will I get to keep my job? Will I get promoted? Will I get to be the person I want to be?”
“How many of you have had a fight with someone you love because work was so stressful and you didn’t have a place to put it, and you were so worried about it that when you went home you took it out on the people you care most about?” he said.
“That ain’t right and it doesn’t feel good at all. Literally, when I do that, that’s the moment when I feel like I’m failing the most as a human being.”
Action is needed now to cultivate company cultures where people are not constantly in fear of losing their jobs or feel they have to shoulder the burden of keeping things running on their own, said Jacob.
Unless things change, the community is condemning future generations of developers and operations staff to similar difficulties.
Adam Jacob, Chef CTO
“We should make a conscious and intentional choice to build the future we want to be a part of, with our technology and culture,” he said.
“If we don’t, what we’re really saying is we’re going to let it ride and let that anguish and difficulty happen to the next generation of people who come after us, and [ensure they] remain on that same treadmill of stress and difficulty.”
He pointed out that the IT industry has a vested interest in supporting this type of change because it is the output of stressed-out teams that determines how successful their companies will be in the long term.
“If we do that, what we start to do is get people who are happier and more productive, feel safer, and have a shot at building better products,” he said.
“If we build better products, it’s possible the products we build will then turn into great companies, but it goes in that order. If you get that right, the revenue will come. People will buy what you are selling and will want you to succeed.”
Human cost of IT
The toll that working long, anti-social hours can exact on the mental health and physical well-being of developers and operations staff as the organisations they work for strive for continuous delivery has emerged as a major talking point within the DevOps community.
It has led to some adopting the term “HumanOps” to describe the human side of managing IT infrastructure. It has also given rise to several standalone events on the subject in London and San Francisco in recent months.
Speaking to Computer Weekly, Chef CEO Barry Crist said there are organisations that penalise people for outages and errors. That is why developers often live in fear of losing their jobs or having their pay docked in case an errant line of code ends up in production.
Barry Crist, Chef CEO
“If you want to unlock innovation within an organisation, you have to stop people running around like chickens with their heads cut off at all hours of the day and night,” he said.
“We’re all high-performance people. Whether you’re an athlete or in IT, you need to have a good night’s sleep and eat right, and not have all the stress that operations sometimes experience.”
At the first London HumanOps meetup in May 2016, representatives from the Government Digital Service (GDS) and music-streaming site Spotify shared details of the steps they have taken to reduce the stress and burden on their operations teams.
In the case of GDS, this involved limiting the range of scenarios for which they will be paged out of hours should a problem arise. Spotify has adopted a joint responsibility model whereby developers are as accountable as the operations team for what happens to their code in production.
Empathy in action
Nicole Forsgren, director of organisational performance and analytics at Chef, is a DevOps researcher who also works with enterprises to help them negotiate the process of digital transformation.
Speaking to Computer Weekly, she said the joint responsibility model employed by Spotify is popular elsewhere in the industry, with Google adopting a similar approach to help reduce the amount of pressure its developers find themselves under.
“Google developers have to maintain their own code for a period of six months. If there are no problems with it, then they hand it over to operations to maintain, but if it starts acting up they hand it straight back,” she explained.
It was a great way, she said, to cultivate a culture of empathy within an organisation, as developers get first-hand experience of what operations staff are up against.
“It helps the developers think differently about writing code because on many occasions they are writing code on a development environment that is very different [to production] and may not have thought about how to write code that scales, for example,” she said.
“They get to see and understand what an environment they might not see otherwise looks like, and how to troubleshoot that so they can understand what type of scalability issues they should be thinking about in future.”
Online marketplace Etsy has an opt-in programme whereby developers can wear fitness tracker-type devices that monitor when they are sleeping, she said, which influences who might be paged in the middle of the night should something go wrong.
Again it is a system that shows Etsy is sympathetic to the plight of its DevOps teams and their need to get a good night’s sleep, Forsgren added.
“There is someone who works there and he doesn’t turn up to work until 10am or noon, because he likes to work late and sleep in, so they have tried to – where possible – set his pager cycle to be late so they can accomodate that,” she said.
“It’s completely opt-in and it is empathetic because they want to make sure you’re not getting woken up or paged too often.”
If something does go wrong, like an outage or an app feature fails to work, organisations further along the empathy journey will hold a post mortem where the circumstances leading up to the incident are picked over and learnt from without the finger of blame being pointed at anyone.
“The goal is to improve our information and communication to avoid making the same mistake twice,” said Forsgren.
“Making mistakes is okay, but making the same mistake multiple times should hopefully be avoided and that should be the goal.”