Why You Need a Playbook Before an Incident
During an incident, adrenaline is high and clear thinking is hard. A pre-written playbook removes decision-making from the crisis and lets your team execute a proven process. The time to figure out your incident response is not during a production outage at 2 AM.
The Four Phases of Incident Response
Every incident follows a predictable lifecycle. Structure your playbook around these phases.
1. Detection
How do you learn about incidents? FourSight monitoring, customer reports, internal alerts? Define your detection channels and ensure they're all routed to the same on-call system.
2. Triage
Within 5 minutes of detection, determine severity (critical/major/minor), blast radius (how many users affected), and initial response team. FourSight's incident severity classification helps standardize this.
3. Mitigation
Focus on stopping the bleeding, not finding the root cause. Roll back the last deployment, scale up infrastructure, switch to a backup provider. Speed matters more than elegance.
4. Resolution & Post-Mortem
After the incident is resolved, update your status page, notify affected customers, and schedule a blameless post-mortem within 48 hours.
Monitoring a Commercial SaaS?
FourSight includes 25 commercial-safe monitors with multi-region validation.
Start Monitoring FreeBuilding Your Communication Plan
Incident communication is just as important as the technical response. Define who communicates, where they communicate, and what they say at each severity level.
Status update template:
[INVESTIGATING] We are investigating reports of [issue].
We are aware of the impact and working to resolve it.
[IDENTIFIED] We have identified the cause of [issue].
[Brief technical explanation]. Working on a fix.
[MONITORING] A fix has been deployed for [issue].
We are monitoring to confirm resolution.
[RESOLVED] [Issue] has been resolved.
Total impact: [duration]. A post-mortem will follow.On-Call Rotation Best Practices
Sustainable on-call requires fair rotation, adequate compensation, and clear escalation paths. Use FourSight's notification rules to route alerts to the current on-call engineer automatically. Set up secondary escalation if the primary doesn't acknowledge within 10 minutes.
Post-Mortem Template
Every incident deserves a post-mortem, even minor ones. Document what happened, why it happened, how you detected it, how you resolved it, and what you'll change to prevent recurrence. Share post-mortems internally to build organizational learning.