CHAOS ENGINEERING Companies People Tools Practices
CHAOS ENGINEERING Companies People Tools Practices
CHAOS ENGINEERING Companies People Tools Practices
People
ChaosIQ
Platform for your teams to apply Chaos Engineering to their Tools
Paul Harris
rapidly evolving, business critical Cloud Native microservices and
Staff Software Engineer
platforms so they can build confidence that those systems won't Chaos Kong
fail your users. King of Gorilla and drop a full Amazon Region
Tools
James Hamilton Chaos Gopher
AWS VP, Ex-Microsoft Research Chaos testing/engineering in GO
About testing in production, 2007 People
Simian Army
Charles Torre
Gremlin Chaos Engineering, Programming, Technical Leadership
Matt Fornaciari
Framework to safely, securely, and easily simulate real outages
with an ever-growing library of attacks.
Tools https://msdevshow.com/2016/11/chaos-engineering-with-charles-
torre/
Hosting and Cloud Companies People
CTO - avid practitioner of #chaosengineering Matthew Campbell
Chaos Engineering: the history, principles, and practice Latency Monkey
Former at Salesforce and Amazon Ex-General Purpose GO Hacker at DigitalOcean
Experiment Tools & Framework Cofounder at Loom Network
By introducing communication delays at the communication layer
level, a tool that allows to test the tolerance to the loss of
Heather Nakama
performance of an external component whose system is
Software Engineer at Microsoft
dependent upon, up to the simulation of a complete cut - an
https://azure.microsoft.com/en-us/blog/inside-azure-search- https://www.slideshare.net/MatthewCampbell7/presentationchaos infinite delay ; without having to ask the partner concerned to cut
chaos-engineering/ monkey his service.
People https://fr.slideshare.net/alexLM/chaos-engineering-for-docker
Disaster Recovery Program (DiRT)
Hailstorm drives integration tests and simulates peak load during
Google runs an annual, company-wide, multi-day Disaster Tools
off-peak times
Recovery Testing event—DiRT—the objective of which is to
ensure that Google's services and internal business operations
continue to run following a disaster. FIT: Failure Injection Testing
https://fr.slideshare.net/KoltonAndrus/breaking-things-on-purpose- Tools uDestroy intentionally breaks things so we can get better at
handling unexpected failures Platform that simplifies creation of failure within our
with-gremlin Kripa Krishnan ecosystem with a greater degree of precision for what we
Director, Cloud Ops & Site Reliability Engineering fail and who we will impact. FIT also allows us to propagate
Google's Queen of Chaos ChAP : Chaos Automation Platform our failures across the entirety of Netflix in a consistent and
ChAP enables engineering teams to run Chaos Engineering controlled manner.
experiments on live traffic in production in order to build
confidence that their service will degrade gracefully when non-
Tammy Butow Experiment critical downstream services fail.
Site Reliability Engineering Manager People https://arxiv.org/pdf/1702.05849.pdf
Now at Gremlin Inc. Tools
Nemesis
Simulate error conditions using "disruptors"
Search Chaos Monkey
Search Chaos Monkey has been instrumental in providing a
Thomissa Comellas deterministic framework for finding exceptional failures and driving
SRE causing chaos at Dropbox, People them to resolution as low-impact errors with planned, automated Nora Jones
previously at StanfordEng, TeslaMotors. solutions. Shay Holmes Senior Chaos Engineer at Netflix, formerly at Jet.
People Sr. Director, Engineering Services Co-author Chaos Engineering (O'Reilly 2017)
Storm
Experiments In Production chaos-engineering
CHAOS ENGINEERING
To prepare for the loss of a datacenter, Facebook regularly tests Experiment
the resistance of its infrastructures to extreme events. Known as Suresh Visvanathan
the Storm Project, the program simulates massive data center Nemesis Architect & Lead
failures. LinkedOut
Companies, People,
Jay Parikh
Framework and tooling to test how user experience will degrade
Vice president and head of
in different failure scenarios associated with downstream calls. It
engineering and infrastructure
People provides a seamless way to simulate failures across our
application stack with minimal effort.
Benjamin Gakic
SRE Architect
Gameday IT & #ChaosEngineering
David Halsey People
Greg Orzell
VP, Performance Engineering, Fidelity Investments Gremlin Fault Injection Tool Cloud Distributed Systems Architecture Consulting, at Crispy
Tools https://fr.slideshare.net/madrockriss/paris-chaos-engineering- Mountain GmbH Ali Basiri
meetup-1 Founded the Simian Army Luke Koweski Senior Software Engineer
Senior Software Engineer and a Wreaking Havoc
Experiment founding member of the Traffic &
Days Of Chaos Chaos team at Netflix
Yury Izrailevsky
Monké Go VP, Cloud Computing and Platform Engineering
https://www.slideshare.net/devopsrex/days-of-chaos-le- Not a monkey, but a automation platform to run monkeys during https://fr.slideshare.net/InfoQ/chaos-
dveloppement-de-la-culture-devops-chez-voyagessncfcom-laide- integration testing kong-endowing-netflix-with-antifragility
de-la-gamification-80396202
https://fr.slideshare.net/AmazonWebServices/ent101-embracing-
the-cloud-final
Casey Rosenthal
Tools Philosopher. Traffic and Chaos
Engineering Manager
Chaos Monkey
Allows random selection of instances in the production
Gameday AWS at Veolia Water Technologies Experiment
Days of Chaos environment and deliberately put them out of service.
https://www.slideshare.net/D2SI/aws-summit-paris-2017- Inspired by AWS GameDays to test the resilience of its Processkiller Monkey
gameday-veolia applications, teams volunteer applications in a Day of Chaos. Cousin of Chaos Monkey, e little more definitive...
Every 30 minutes, operators simulated failures in pre-production.
Teams earned points based on detections, diagnoses and
Gameday AWS Interactive, six-part series to get hands-on cloud
computing experience
resolutions. This type of gamified event helps to introduce
development teams to the concept of resilience. Principles of Chaos
Gameday by DiUS
https://fr.slideshare.net/DiUSComputing/gameday-achieving-
https://fr.slideshare.net/AmazonWebServices/game-days-crash-
test-your-application-and-your-team
Minions Bestiary
Latency Monkey Engineering
resilience-through-chaos-engineering By introducing communication delays at the communication layer
The Ultimate Ressources to prepare your Gameday by DiUS level, a tool that allows to test the tolerance to the loss of
Gameday at TIAD by D2SI performance of an external component whose system is
dependent upon, up to the simulation of a complete cut - an
infinite delay ; without having to ask the partner concerned to cut
https://fr.slideshare.net/TIADParis/tiad-2016-gameday-aws his service.
https://fr.slideshare.net/madrockriss/paris-chaos-engineering- Fulldisk Monkey
meetup-1 Allows to full a disk and test resilience of application, specillay
logging
Properties Monkey
Allows to modify properties of an application and test resilience of
application.