Staying Cool When Firefighting Production Outages
She's on fire and she burns through the night at the speed of light
- Keith Ott
- Software Engineering
- April 27, 2019
- 6 Minutes
I work for a large, international company on one of their main e-commerce websites. This past week, the company had a substantial product launch (one even the CEO was heavily involved in), and the website I work on released some few features along side it. And all the new website features went live without any issue.
…okay, obviously not. There were some serious launch issues, and I took point on fixing most of them. It was a full, non-stop day of updates to nervous managers, researching, fixing code, testing, and releasing, but by the end of the work day we had the issues fixed. I did a lot of things that day, but one thing I didn’t do was panic.
A few days later I was talking to my manager about how the day unfolded, and he remarked that we did a good job handling it. We got on the subject of how some engineers freeze up or panic when they’re put on the spot when the system isn’t working correctly and managers and product owners are looking to you for answers. At one point I was that engineer; I would panic the minute someone questioned me as to why something wasn’t working right. It’s taken me a long time to become cool and confident under pressure. And it’s a skill that I believe anyone can learn, no matter how stressed of an individual you are.
Why You Want to Stay Cool Under Pressure
Before we get into how you can become more cool under pressure, let’s talk first talk about why you don’t want to panic when you have to firefight a production outage.
Software Development is Hard
First, software development is hard enough when you’re thinking clearly. Throw in worried managers focusing on you and time pressure to fix the problem as fast as possible, and that’s a recipe for disaster. You need to be thinking clearly to ensure that you can properly diagnose the issue and fix it without breaking something else.
Keep Others Cool
Second, it helps others stay cool, and helps them make better decisions. When you’re busy fixing issues, those responsible for the system may be communicating with customers or discussing mitigation strategies. If decision makers see you’re confident in fixing the problem, they will be more confident in the decisions they have to make.
Why Add More Stress?
Finally, life is stressful enough. Why add extra stress to your life for something that you’ll have fixed in a few hours and you’ll soon forget even happened?
Learn to be Cool when Production is on Fire
Clearly, staying calm and collected when you’re under pressure is a useful skill, but how do you learn this skill? Here’s some things that have helped me to mature from freaking out when I can’t open a jar of pickles to being calm and under control when the world is burning down.
It’s Not You, It’s Me
First, realize that this isn’t about you. Even if you were the one responsible for the bug that’s causing the production outage, when the system is down, managers and product owners are more concerned with getting the system running again.
We’re all human and we all make mistakes. Even huge tech companies like Facebook and Google have had their share of outages. Every manager I’ve worked with isn’t concerned about blaming someone, they just want the problem fixed. (Of course, this advice doesn’t apply if this is a trend. If an engineer can’t release a piece of code to production without causing huge outages, then he or she should be worried come review time.)
The More you Know
Second, learn all you can about the system and the technology it’s built on. Not only will this help you become more confident and trust your own skills more, but it will help you diagnose and fix problems quickly. Plus, when managers and product owners start interrogating you, you’ll be able to confidently answer their questions.
Practice Staying Calm Under Pressure
Third, practice staying calm in stressful situations. Years ago my friends and I were hooked on the zombie mode in the Call of Duty video games, where you had to fight never ending hordes of zombies that would relentlessly attack you. When the zombies broke through our defenses, we needed to stay calm and communicate who needed backup and what defenses needed repairing. If we were panicking, we were guaranteed to lose. A second spent panicking was a second we could have spent fixing the problems.
I’m not saying video games are the only way to practice this skill. There’s plenty of other ways to practice staying calm and thinking on your feet, such as sports or chess.
Finally, practice mindfulness. Meditation teaches you to allow thoughts to come and go. When you’re researching a bug and thoughts creep in, such as “What if I can’t solve this? I’m going to get fired!”, you can allow those thoughts to pass and get back to fixing the problem.
If you work in software, you will inevitably spend part of your time firefighting production issues. There are steps that we as engineers can take to prevent issues from reaching production, such as coding defensively and building extensive automated test suites, but ultimately, we’re human and we all make mistakes. But when those mistakes occur, staying calm lets you quickly fix those issues - without losing your mind.