Dr. Richard Cook, formerly an associate professor at the University of Chicago, published a paper in 1998 entitled How Complex Systems Fail. In his paper, Dr. Cook lists 18 observations from his research in medicine about failure of complex systems. His insights are directly applicable to running software at scale, and those observations informed our latest investment in Gremlin.
Some of his Cook’s are obvious. Complex systems are intrinsically hazardous systems. Complex systems are heavily and successfully defended against failure. Catastrophe requires multiple failures – single point failures are not enough.
Others are surprising. Catastrophe is always just around the corner. Humans are both the producer and the defenders against failure. All practitioner actions are gambles. Change introduces new forms of failure.
All of this applies to software. Every company is becoming a software company. Every company relies on software to make money. If that software fails, companies lose money. Gremlin solves that problem. Gremlin instills resiliency through orchestrated chaos.
Kolton and Matt, the founders of Gremlin, understand deeply the challenges of operating colossal webscale systems. They lived it at Amazon, Salesforce and Netflix. To achieve ever greater uptime in those organizations, they leveraged chaos engineering.
Chaos engineering is the discipline of introducing failure into a software system on purpose to identify why, how and when it might break in the future; and then fixing it. Chaos teams start small, with experiments challenging the resilience of a single server or virtual machine. As the team gains confidence in their infrastructure, these experiments grow in scope.
Gremlin’s software provides the platform for planning and executing these experiments, understanding the impacts and identifying the right next steps to fix issues before they arise. Companies like Twilio, Expedia and Under Armour use their technology today to fight the failure Dr. Cook observed in complex systems.
We’re thrilled to partner with Gremlin, and work with Kolton and Matt to scale their business and bring the discipline and benefits of chaos engineering to many more companies.