How important is uptime to your business? As more business moves online, an increasing number of companies have to answer than critical question.
When products and services fail, customers can’t buy online or use their software, they churn, and the company suffers brand damage. Too much downtime can trigger clauses in software contracts for service level agreements (SLAs), which start cancellation conversations.
Many businesses are moving their infrastructure and software to the cloud to adopt Kubernetes and microservices. Those technologies offer terrific benefits in exchange for some complexity.
Chaos engineering or resiliency engineering is the dominant way to mitigate the intricacies of modern cloud stacks. By stressing a system ahead of time to find its weaknesses, engineers can buttress the weak points of a system, ensuring the software runs smoothly more often.
Just how effective is chaos engineering? The chart above, from the 2021 edition of the State of Chaos Engineering, shows the impact visually. Teams that perform chaos have more uptime.
82% of those surveyed who run chaos regularly have 99% uptime or better. 87% of those who run chaos twice per month, and 92% who run daily have 99% uptime or better. The more active the chaos program, the better the for a business.
There’s a critical corollary benefit. Teams running chaos fix problems faster. The median-time-to-resolution (MTTR, aka time to fix an issue) for companies running chaos improves dramatically. Half of those running chaos boast an MTTR of less than a day, compared to less than 30% of those not using chaos.
Amazon and Netflix pioneered the use of chaos to improve uptime and decrease MTTR. Today, many more leaders adopt chaos to ensure smooth delivery of their software to the customers. Expedia, JP Morgan, Mailchimp, Qualtrics, Target, Twilio, Under Armour, Walmart are just some of those adopting chaos.
Chaos is the next step after adopting Kubernetes and microservices. These modern technologies are powerful but complex, and chaos is the toolkit modern teams use to ensure the infrastructure sustains the pounding traffic of customers and users.