Building an Incident Response Playbook

Why You Need a Playbook Before an Incident

During an incident, adrenaline is high and clear thinking is hard. A pre-written playbook removes decision-making from the crisis and lets your team execute a proven process. The time to figure out your incident response is not during a production outage at 2 AM.

The Four Phases of Incident Response

Every incident follows a predictable lifecycle. Structure your playbook around these phases.

1. Detection

How do you learn about incidents? FourSight monitoring, customer reports, internal alerts? Define your detection channels and ensure they're all routed to the same on-call system.

2. Triage

Within 5 minutes of detection, determine severity (critical/major/minor), blast radius (how many users affected), and initial response team. FourSight's incident severity classification helps standardize this.

3. Mitigation

Focus on stopping the bleeding, not finding the root cause. Roll back the last deployment, scale up infrastructure, switch to a backup provider. Speed matters more than elegance.

4. Resolution & Post-Mortem

After the incident is resolved, update your status page, notify affected customers, and schedule a blameless post-mortem within 48 hours.

Monitoring a Commercial SaaS?

FourSight includes 25 commercial-safe monitors with multi-region validation.

Start Monitoring Free

Building Your Communication Plan

Incident communication is just as important as the technical response. Define who communicates, where they communicate, and what they say at each severity level.

💡 Write your status page update templates before you need them. During an incident, you want to fill in blanks, not compose prose under pressure.

Status update template:

[INVESTIGATING] We are investigating reports of [issue].
  We are aware of the impact and working to resolve it.

[IDENTIFIED] We have identified the cause of [issue].
  [Brief technical explanation]. Working on a fix.

[MONITORING] A fix has been deployed for [issue].
  We are monitoring to confirm resolution.

[RESOLVED] [Issue] has been resolved.
  Total impact: [duration]. A post-mortem will follow.

On-Call Rotation Best Practices

Sustainable on-call requires fair rotation, adequate compensation, and clear escalation paths. Use FourSight's notification rules to route alerts to the current on-call engineer automatically. Set up secondary escalation if the primary doesn't acknowledge within 10 minutes.

Post-Mortem Template

Every incident deserves a post-mortem, even minor ones. Document what happened, why it happened, how you detected it, how you resolved it, and what you'll change to prevent recurrence. Share post-mortems internally to build organizational learning.