top of page

Showstopper Escapes into Production, what do you do next

Here's an example from real life. Let's see how many of you can identify with it: It's a typical Tuesday afternoon in the middle of a busy day. You are the Quality assurance Manager for a company that is developing an enterprise solution, and your team released a relatively insignificant version recently that included all of the fixes from the previous two updates, as well as two minor features requested by product management in order to "close some pretty important deals."

The phone unexpectedly rings. Your R&D director is "inviting" you to an emergency meeting in his office. You arrive to find the R&D director waiting next to the whiteboard, along with his development group directors, the product marketing manager, and the support project manager in charge of your product...

As you sit there, the support project manager informs you about an urgent showstopper that was issued in the previous week's version and is likely to affect approximately one-third of the customers who apply this update. I'd like to freeze this situation (which occurred to me roughly a decade ago) and ask you a simple question: What would you and your colleagues do in this circumstance? I believe that no two teams would behave the same way, and I don't want to create best- and worst-case scenarios, but here are two opposing methods to serve as points on a conceivable behavioral continuum.

Panic and Blame

The meeting quickly devolved into a witch hunt, with development blaming testing for not finding the bug, testing blaming development for not recording all appropriate modifications, and development and testing blaming product marketing for pressuring the teams to release even though not all tests had been completely finished, and so on.

After the meeting concludes with really no concrete action items, support begins informing customers not to update the new release, programmers begin working on a way to solve without fully comprehending the problem, and you are left on the side beginning to wonder how you missed this flaw and attempting to locate the person who should be held accountable.

Because the developers believe this is an emergency, they choose to submit the fix immediately to support while also forwarding it to your team for testing and approval. They do this at 7:30 pm, when no one remains in the office, and you can only begin testing it the next day at 9 a.m. You discover bugs in their new edition about half an hour into your testing process. The problem is that your support team has already begun offering this solution to the early group of customers who discovered the bug.

Your team completes the system testing by noon. They discover that the original patch only functions on around half of the permutations that are supported and, more significantly, it also introduces a regression bug in an area unrelated to the change. Within 2 - 3 hours, you will get a new version that has been confirmed and delivered to the support staff by 5 pm

Customer support wants to kick the a#@ for both your testing team and the developers because they now have to track down every customer who received the original update and contact them to ask them not to install it—or, worse, to install another patch on top of it.

Product marketing is also furious with you since they have already received complaints from customers, sales teams, and even some of your company's top executives complaining about the shambles and bad PR this catastrophe is causing in the field. They inform you that as a result, your firm will have to provide big discounts to any consumers who complain, and they believe that this issue may cause a lot of massive sales to be postponed or canceled.

Step inside your time machine and advance three to four months. Everything remains the same. No one was sacked as a result of the incident, but the environment was uncomfortable for approximately ten days. It then turned into water beneath the bridge. Your team is set to release another minor version that includes all of the updates from the previous months as well as three new features required for significant negotiations.

As usual, product marketing is pushing for the release to be on time, even though your team received the final version a week late and you only found out yesterday that they incorporated another feature you were unaware of.

"Nothing changes, if nothing changes," as the saying goes.

Problem-solve, study, and develop

The very last thing you wish to do is start seeking someone to point the finger at. There is a significant chance that more than one individual is "to blame" for the faults that lead to the problem being disclosed. It is almost clear that no one did this on purpose, and blaming will not help you address the problem!

Resist the temptation to blame someone else for the problem. If another team member begins blaming, immediately ask him how this is helping to solve the problem any quicker. You may also say that there will be enough time once the problem has been resolved to figure out what went wrong and why. So, there's a problem out there, and you need to solve it right away! Put together the brightest team you can to assess the problem, design the quickest, safest, and most effective solution, implement the solution, and lastly determine how to test it.

Making a choice of which testing technique to use is also not an easy decision. Depending on the bug and the product, you may choose to send the patch without testing and check only after the issue has been resolved. (For example, if your application is web-based and the servers are down, it is preferable to restore them and test once they are operational.) In other situations, you may decide to perform any or all of the necessary tests in-house before releasing the patch. This is frequently the case if the bug isn't significant and the solution might result in worse problems, such as data loss or company interruption.

Once the crucial period has passed and the problem has been resolved, before individuals "go on" with their lives and activities, it is important to ensure that you understand what went so wrong. The research should never be a witch hunt. It should be a safe space for everyone to cooperate and bring up all of the factors, both internal and external, that contributed to the problem occurring in the first place. This type of action is known as a retrospective, Root Cause analysis, or post-mortem.

Once the variables and concerns have been discovered as part of the analysis, the next stage is to establish corrective measures to avoid this from happening again. Make certain that these actions are clearly stated and actionable (obviously!). Many times, we read activities like "Make sure communication is improved," but this is not actionable and will not assist someone change the way they have communicated in the past. So, as simple as it may sound, make sure your corrective measures are actionable and will result in a change in the way things have been done in your firm up until now.

I recall saying a few years back at a presentation to a group of engineers, "When you are a true expert, you see failure as a chance to become a better professional." I've observed numerous development teams conduct retrospectives but are hesitant or embarrassed to share their findings with the rest of the company. Would you blame the group for being egotistical and self-centered? I would actually blame their managers for not ensuring that their work environment encourages teams to take chances and learn from their mistakes.

You must ensure that your teams, and preferably your entire organization, foster the sharing of retrospectives and remedial measures. It is one of the greatest free guidance resources accessible. Furthermore, teams with the confidence and maturity necessary to freely communicate their risks and mistakes are also the most pleasant and interesting to work in.

8 views0 comments
bottom of page