Post-Incident Questionnaire for Managers

This is my light-hearted attempt to help engineering managers get the most value out of a downtime incident.

Introduction

So you had an incident? Condolences.
On the bright side, however, perhaps there is an opportunity to learn and make things better?

Process

I will offer you a series of questions that you should ask yourself about the incident and its root cause.
The goal here is to suggest a way forward based on the incident’s underlying cause.
Please note, the questions are ordered based on priority.

Questionnaire

  1. Many incidents are caused by a mistake. Assuming this incident caused by a mistake, is this a mistake you personally could have made?
    • If No, please proceed to question 6
  2. Is there any knowledge that you have now or still need to acquire, that would have prevented this incident?
    • If Yes:
      1. How do you plan to acquire this knowledge, and by when?
      2. How do you plan to disseminate this knowledge to your team?
  3. Did any of the tools you use contribute to the cause of the incident?
    • If Yes:
      1. Can we improve or extend the tool to avoid this incident?
        • If Yes, is this planned and prioritized?
      2. Would the removal of the tool have helped to avoid this incident?
        • If Yes, is this planned and prioritized?
      3. Is there another tool that we could adopt that would help avoid this incident?
        • If Yes, how does the cost of acquiring, learning, and using this tool compare to the cost of this incident?
  4. Did any of your practices contribute to the cause of this incident?
    • If Yes:
      1. Would removing this practice help avoid this cause?
        • If Yes, is this planned and prioritized?
      2. Can we improve our practices to avoid this incident?
        • If Yes, is this planned and prioritized?
      3. Is there another practice that we can adopt that would avoid this incident?
        • If Yes, how does the cost of learning and adopting this practice, compare to the cost of this incident?
  5. Time for some “out of the box” thinking.
    Have you asked the engineers (even fresh grads) how to solve this problem?
    • If no, please go ask them and then repeat steps 2 through 4
  6. Great, given than you would not have made this mistake, the problem can, therefore, be solved with knowledge and/or experience.
    Ask yourself the following questions and then re-examine Questions 2-4 from the perspective of your engineers.
    • How can you make your engineers as capable as you?
    • Do they lack knowledge?
    • Do they lack good engineering practices/rigor?
    • Can you use your experience to contribute tests (or test scenarios) to account for their lack of experience?
  7. NOTE: This question is intentionally last and should be only used if nothing else worked.

    Did any of your processes contribute to the cause of this incident?
    • If Yes:
      1. Would removing this process help avoid this cause?
        • If Yes, is this planned and prioritized?
      2. Can we improve our process to avoid this incident? Improving can include removing or automating steps.
        • If Yes, is this planned and prioritized?
      3. Is there another process that we can adopt that would avoid this incident? Please take special note of the fact that the cost of creating, implementing, and improving a new process is astronomical and should always be the absolute last option.
        • If Yes, how does the cost of learning and adopting this process compare to the cost of this incident?
    • If no:
      1. Would removing any of your processes help avoid this incident?
        • If Yes, is this planned and prioritized?
      2. How can we improve our processes to avoid this incident? Improving can include removing or automating steps.
        • If Yes, is this planned and prioritized?

If you found this useful, check out the companion article Post-Incident Questionnaire for Engineers.


If you like this content and would like to be notified when there are new posts or would like to be kept informed regarding the upcoming book launch please join my Google Group (very low traffic and no spam).

Photo by Nathan Cowley from Pexels