Get the latest tech news
CrowdStrike's outage should not have happened
A global IT outage occurred on [2024-07-18 Thu], with several industries having significant economic problems (see Appendix 1: The impact for some quotes on what happened). The outage what caused by a bug in the remote update system of the software of Crowdstrike, a popular Threat Intelligence/Response company. The company has published the Post Incident Review(Crowdstrike 2024a) right after the incident and has just released its root cause analysis (Crowdstrike 2024b).
According to the RCA, the essence of what happened was an index out of bounds, which is a special case of a buffer overflow and considered an undefined behavior in C++, the language that seems to be used to develop crowdstrike’ system( Stack 2024). Once the implementation has been completed against this full specification and all VCs generated by the analyzer have been proved, we have reached Platinum level of SPARK assurance. FindingMitigationThe number of input fields .. not validated at sensor compile timeValidate the number of input fields at compile timeMissing runtime array bounds checkAdd runtime input array bounds checksLack of variety in testingIncrease test coverageInconsistency between validator and interpreterFix the instance of inconsistency and add checksNo validation in the interpreterAdd testsNo staged deploymentAdd staged deploymentFigure 1: St Nedelya Church, partially destroyed in a terrorist attack by the Bulgarian Communist Party.
Or read this on Hacker News