Wolf Incident Postmortem

January 8th, 2023
kids, satire, tech

Incident #210

Status

Complete, one action item outstanding.

Summary

Sentinel consumed by wolf after repeated false alarms.

Impact

Loss of sentinel. No flock impact.

Root causes

Sentinel generated noisy alerts due to premature deployment, incomplete training, and overly monotonous task. Oncalls failed to respond to true positive due to alert fatigue.

Trigger

Wolf.

Resolution

Gathered flock. Deployed replacement sentinel.

Detection

Sentinel did not report at end of shift.

Action Items

Priority Action Item Type Status
P0 Gather flock mitigate complete
P0 Deploy replacement sentinel mitigate complete
P1 Update playbook for wolf alerts prevent complete
P2 Update remaining sentinels prevent complete
P2 Revise sentinel training program prevent complete
P2 Investigate equipping sentinels with flutes or slings prevent in progress

Lessons Learned

What went well

  • Flock gathering proceeded without issues.
  • No flock injuries or losses.
  • Replacement sentinel did not exhibit false positive alerts.

What went wrong

  • Noisy alerts not addressed.
  • Alerts silenced contrary to playbook.
  • Loss of sentinel.

Where we got lucky

  • Only one wolf.
  • Wolf sated after sentinel consumption.
  • Replacement sentinel available.

Timeline

All times local

March 3rd:

  • 16:32 Oncalls paged "wolf".
  • 16:34 First oncall arrives at sentinel location.
  • 16:34 Alert diagnosed as false positive. No corrective action performed.

March 4th:

  • 14:15 Oncalls paged "wolf".
  • 14:19 First oncall arrives at sentinel location.
  • 14:19 Alert diagnosed as false positive. No corrective action performed.

March 5th:

  • 17:03 (Reconstructed) Outage begins, sentinel notices wolf.
  • 17:03 Oncalls paged "wolf".
  • 17:04 Oncalls paged "wolf".
  • 17:04 Oncalls paged "real wolf".
  • 17:05 (Reconstructed) Wolf consumes sentinel.
  • 18:45 Sentinel does not report at end of shift.
  • 19:05 Primary oncall dispatched to field.
  • 19:10 Oncall diagnoses issue.
  • 19:10 Incident begins, secondary and tertiary oncalls paged.
  • 19:15 First sheep located.
  • 19:52 Last sheep located.
  • 20:05 Flock safe in pens.
  • 20:05 Outage ends, flock protection fully restored.
  • 20:45 Replacement sentinel identified.
March 6th:
  • 07:38 Replacement sentinel deployed
  • 18:45 Replacement sentinel reports at end of shift
  • 18:45 Incident ends, 24hr without wolf alerts or activity (exit criterion).

Comment via: facebook, lesswrong, hacker news, mastodon, substack

Recent posts on blogs I like:

Tuberculosis Considered As Dating Strategy

Against some evopsych

via Thing of Things July 8, 2025

Retrospective on life tracking and effectiveness systems

I’ve been doing life tracking for around 10 years, and this post is looking back at some things I learned from the data (since my previous retrospective in 2017). Highlights include what I get out of the Oura ring, correlations between sleep and deep work…

via Victoria Krakovna July 4, 2025

Elixir's Last Dance

On May 18th, the contra dance band Elixir had their last gig ever. The dance was packed: there were three hundred people. It was the only dance BIDA has ever done where they sold tickets. People flew from across the country just to hear Elixir play one la…

via Lily Wise's Blog Posts June 5, 2025

more     (via openring)