Get the latest tech news
The human factor: How companies can prevent cloud disasters
Let's explore human engineering techniques that have worked at scale across the most successful tech companies in history.
Large companies work very hard to make sure their services don’t go down, and the reason is simple — significant outages will hurt your brand and drive customers to competing products with a better track record. The latency tolerance of live interactions (chat) is much lower than that of asynchronous workloads (training a machine learning model, uploading a video). If chaos engineering seems like overkill, you should at least require your teams to do ‘game days’ (simulated outage practice runs) once or twice a year, or leading up to any major feature launch.
Or read this on Venture Beat