It is well documented that being a leader can be a lonely experience. If you've ever lead a project, a team, a department, or a company you know how the weight of the mantel of responsibility for outcomes and decisions can lead to isolation. But, in those moments of crisis when teams must be formed, guided, and motivated to rise to a challenge that leadership takes on a new dimension. This is the case I found myself in during the morning of July 19, 2024. By the time I was involved, the emergency management function was well engaged, understating the scope and impact of the problem and developing remediation steps and planning for how to best use the small army of volunteer "technologists" who would serve as the backbone for the on-site, hands-on recovery work. I jumped in and helped where I could, but there was already significant momentum.
Then an opportunity presented itself: someone automated a fix. I tracked down Jeremiah Heyt (JR) and confirmed he was having success recovering systems centrally. Was this was a chance to fix many systems without having to touch them? It was worth exploring. JR didn't work for me, I had never talked to JR before that moment, but he was ready and willing to jump into a bigger effort.
We pulled in a few other people and before long we had formed a "script swat team" to look into automating and expanding the idea of centralized recovery. By the end there were 12 people working on various aspects of the script and supporting tech. This team also spear headed the development of telemetry to confirm when systems had been fixed, a key element for the hands-on teams as well.
In the end, it is estimated that the script resolved between 6% and 8% of impacted systems. That doesn't sound like much, but it meant that the amazing army of over 500 in-field volunteers didn't have to touch almost 2,500 systems and those 2,500 systems were put back into the hands of users faster than they might otherwise have been.
Of the 12 people on the team, I was the "upline" of only three. The common cause was greater than hierarchy and positional authority. It was a privilege to be a part of this great team: Jeremiah Heyt, Dan Boik, Eric Craddock, Kenny Daldine, Joshua Hineline, Travis Hughes, Adam C. Johnson, Jeremy Jordan, Jason Lehman, Alex Meyer, Jeremy Reynolds,John Schiffman. Thank you Scott Dresen for your support and participation! Jerry Matt and Nina Padavil, thank you for being there every step of the way!
This view is but small part of the work that went into recovering. The untold part of this story is that the Corewell Health team was in the go-live of a multi-year project to consolidate the EMR when the Crowdstrike outage occurred. Our CIDO, Jason Joseph, described it as "launching a rocket in a hurricane." This could only be possible because of a team dedicated to a mission greater than oneself.
Awesome job team, I'm proud to be a small part of this amazing group of dedicated professionals!