CHAOS ENGINEERING Companies People Tools Practices

Download as pdf or txt
Download as pdf or txt
You are on page 1of 1

Sylvain Hellegouarch

Engineering and Learning Chaos ; ChaosIQ founder & CTO


Bruce M. Wong
Stitch Fix Eng - keeper of chaos, breaker of systems :: formerly
practiced at Twilio, Netflix, Adobe
https://fr.slideshare.net/SylvainHellegouarch/mucon-2017-build-confidence-in-your-
system-with-chaos-engineering
https://fr.slideshare.net/BruceWong3/the-journey-of-chaos-
engineering-begins-with-a-single-step

People

Chaos Lemur Tools


Cousin to Chaos Monkey, but built for Pivotal Cloud Foundry
Russ Miles James Burns
Chaos Engineering Officer (CEO) of ChaosIQ.io Software Architect at Stitch Fix
People Former Tech Lead at Twilio
https://fr.slideshare.net/russmiles/chaos-engineering-101- Sergiu Bodiu
by ChaosIQ Passionate IT craftsmanship #blitzscaling, avid student of life,
autodidact, #cloudnative evangelist.
Tools Kube-monkey
Chaos Toolkit An implementation of Netflix's Chaos Monkey for Kubernetes
https://fr.slideshare.net/sbodiu/from-resilient-to-antifragile-chaos- People
clusters
engineering-primer-devseccon Chaos Monkey
Free, open source project that enables you to create and apply
The first tool developed by Netflix, it allows random selection of
Chaos Experiments to various types of infrastructure, platforms
instances in the production environment and deliberately put them
and applications.
out of service.

ChaosIQ
Platform for your teams to apply Chaos Engineering to their Tools
Paul Harris
rapidly evolving, business critical Cloud Native microservices and
Staff Software Engineer
platforms so they can build confidence that those systems won't Chaos Kong
fail your users. King of Gorilla and drop a full Amazon Region
Tools
James Hamilton Chaos Gopher
AWS VP, Ex-Microsoft Research Chaos testing/engineering in GO
About testing in production, 2007 People

Simian Army

Charles Torre
Gremlin Chaos Engineering, Programming, Technical Leadership
Matt Fornaciari
Framework to safely, securely, and easily simulate real outages
with an ever-growing library of attacks.
Tools https://msdevshow.com/2016/11/chaos-engineering-with-charles-
torre/
Hosting and Cloud Companies People
CTO - avid practitioner of #chaosengineering Matthew Campbell
Chaos Engineering: the history, principles, and practice Latency Monkey
Former at Salesforce and Amazon Ex-General Purpose GO Hacker at DigitalOcean
Experiment Tools & Framework Cofounder at Loom Network
By introducing communication delays at the communication layer
level, a tool that allows to test the tolerance to the loss of
Heather Nakama
performance of an external component whose system is
Software Engineer at Microsoft
dependent upon, up to the simulation of a complete cut - an
https://azure.microsoft.com/en-us/blog/inside-azure-search- https://www.slideshare.net/MatthewCampbell7/presentationchaos infinite delay ; without having to ask the partner concerned to cut
chaos-engineering/ monkey his service.

Matt Jacobs Pumba


Tools
Engineering Chaos testing and network emulation for Docker containers
Previously at Netflix (and clusters)
Tools
https://fr.slideshare.net/MattJacobs11/using-hystrix-to-build-
resilient-distributed-systems-58836753 Alexei Ledenev
People
Kolton Andrus
CO-FOUNDER & CEO Chief Research Officer at Codefresh

People https://fr.slideshare.net/alexLM/chaos-engineering-for-docker
Disaster Recovery Program (DiRT)
Hailstorm drives integration tests and simulates peak load during
Google runs an annual, company-wide, multi-day Disaster Tools
off-peak times
Recovery Testing event—DiRT—the objective of which is to
ensure that Google's services and internal business operations
continue to run following a disaster. FIT: Failure Injection Testing
https://fr.slideshare.net/KoltonAndrus/breaking-things-on-purpose- Tools uDestroy intentionally breaks things so we can get better at
handling unexpected failures Platform that simplifies creation of failure within our
with-gremlin Kripa Krishnan ecosystem with a greater degree of precision for what we
Director, Cloud Ops & Site Reliability Engineering fail and who we will impact. FIT also allows us to propagate
Google's Queen of Chaos ChAP : Chaos Automation Platform our failures across the entirety of Netflix in a consistent and
ChAP enables engineering teams to run Chaos Engineering controlled manner.
experiments on live traffic in production in order to build
confidence that their service will degrade gracefully when non-
Tammy Butow Experiment critical downstream services fail.
Site Reliability Engineering Manager People https://arxiv.org/pdf/1702.05849.pdf
Now at Gremlin Inc. Tools
Nemesis
Simulate error conditions using "disruptors"
Search Chaos Monkey
Search Chaos Monkey has been instrumental in providing a
Thomissa Comellas deterministic framework for finding exceptional failures and driving
SRE causing chaos at Dropbox, People them to resolution as low-impact errors with planned, automated Nora Jones
previously at StanfordEng, TeslaMotors. solutions. Shay Holmes Senior Chaos Engineer at Netflix, formerly at Jet.
People Sr. Director, Engineering Services Co-author Chaos Engineering (O'Reilly 2017)

DRT : Disaster Recovery Test https://fr.slideshare.net/InfoQ/choose-your-own-adventure-


Experiment

Storm
Experiments In Production chaos-engineering

CHAOS ENGINEERING
To prepare for the loss of a datacenter, Facebook regularly tests Experiment
the resistance of its infrastructures to extreme events. Known as Suresh Visvanathan
the Storm Project, the program simulates massive data center Nemesis Architect & Lead
failures. LinkedOut

Companies, People,
Jay Parikh
Framework and tooling to test how user experience will degrade
Vice president and head of
in different failure scenarios associated with downstream calls. It
engineering and infrastructure
People provides a seamless way to simulate failures across our
application stack with minimal effort.

Tools & Practices Waterbear


“application resilience” as a service
Lorin Hochstein
Putting the engineering in computer science
and the science in software eng. Academic refugee.
Map based mainly on : FireDrill Chaos engineer, Netflix
Provides an automated, systematic way to trigger/simulate
Disaster Recovery https://github.com/dastergon/awesome-chaos- infrastructure failure in production, with the goal of helping build
engineering Pavlos Ratis
Testing Graduate Software Engineering MSc student at the End-User Companies
applications that are resistant to these failures.

Something or someone missing ? University of Glasgow, Open Source Developer Tools


Too big to test: Breaking a production brokerage platform without
Experiment
Don't want to be on the map ?
causing financial devastation *
Please send me your feedbacks Simoorg People
Open Source Failure Induction Framework
https://cdn.oreillystatic.com/en/assets/1/event/124/Too%20big%2 Free eBook at O'Reilly
0to%20test_%20Breaking%20a%20production%20brokerage%2
0platform%20without%20causing%20financial%20devastation%2 Map created by : with the help of Chaos Engineering Slack Bhaskaran Devaraj
Aaron P Blohowiak
0Presentation%202.pdf
Training Experiments team and Chaos Community People Senior Director, Site Reliability Engineering at LinkedIn
Co-Author of O'Reilly's "Chaos Engineering". Work on distributed
Christophe Rochefolle Ariel Tseitlin
Experienced IT executive providing technology & organization to Investor, entrepreneur, and accomplished technology executive system reliability and design @ Netflix.
Jesse Robbins improve quality & agility of IT systems, Chaos Engineering fan O'Reilly Velocity San Jose 2017: Precision Chaos
Former Amazon « Master of disaster » Former Cloud Director at Netflix
OrionLabs Founder and CEO
People Creator of Gameday AWS
https://fr.slideshare.net/madrockriss/paris-chaos-engineering- https://fr.slideshare.net/atseitlin/aws-reinvent-2012-chaos-
Kyle Parrish meetup-1 monkey-the-netflix-simian-army
Innovative, multi-dimensional leader focused on Technology Risk
and Information Security in Financial Services https://fr.slideshare.net/jesserobbins/ameday-creating-resiliency-
through-destruction Experiment
People Chaos Monkey
Former fireman

Benjamin Gakic
SRE Architect
Gameday IT & #ChaosEngineering
David Halsey People
Greg Orzell
VP, Performance Engineering, Fidelity Investments Gremlin Fault Injection Tool Cloud Distributed Systems Architecture Consulting, at Crispy
Tools https://fr.slideshare.net/madrockriss/paris-chaos-engineering- Mountain GmbH Ali Basiri
meetup-1 Founded the Simian Army Luke Koweski Senior Software Engineer
Senior Software Engineer and a Wreaking Havoc
Experiment founding member of the Traffic &
Days Of Chaos Chaos team at Netflix
Yury Izrailevsky
Monké Go VP, Cloud Computing and Platform Engineering
https://www.slideshare.net/devopsrex/days-of-chaos-le- Not a monkey, but a automation platform to run monkeys during https://fr.slideshare.net/InfoQ/chaos-
dveloppement-de-la-culture-devops-chez-voyagessncfcom-laide- integration testing kong-endowing-netflix-with-antifragility
de-la-gamification-80396202
https://fr.slideshare.net/AmazonWebServices/ent101-embracing-
the-cloud-final
Casey Rosenthal
Tools Philosopher. Traffic and Chaos
Engineering Manager

Chaos Monkey
Allows random selection of instances in the production
Gameday AWS at Veolia Water Technologies Experiment
Days of Chaos environment and deliberately put them out of service.
https://www.slideshare.net/D2SI/aws-summit-paris-2017- Inspired by AWS GameDays to test the resilience of its Processkiller Monkey
gameday-veolia applications, teams volunteer applications in a Day of Chaos. Cousin of Chaos Monkey, e little more definitive...
Every 30 minutes, operators simulated failures in pre-production.
Teams earned points based on detections, diagnoses and
Gameday AWS Interactive, six-part series to get hands-on cloud
computing experience
resolutions. This type of gamified event helps to introduce
development teams to the concept of resilience. Principles of Chaos
Gameday by DiUS
https://fr.slideshare.net/DiUSComputing/gameday-achieving-
https://fr.slideshare.net/AmazonWebServices/game-days-crash-
test-your-application-and-your-team
Minions Bestiary
Latency Monkey Engineering
resilience-through-chaos-engineering By introducing communication delays at the communication layer
The Ultimate Ressources to prepare your Gameday by DiUS level, a tool that allows to test the tolerance to the loss of
Gameday at TIAD by D2SI performance of an external component whose system is
dependent upon, up to the simulation of a complete cut - an
infinite delay ; without having to ask the partner concerned to cut
https://fr.slideshare.net/TIADParis/tiad-2016-gameday-aws his service.
https://fr.slideshare.net/madrockriss/paris-chaos-engineering- Fulldisk Monkey
meetup-1 Allows to full a disk and test resilience of application, specillay
logging

Properties Monkey
Allows to modify properties of an application and test resilience of
application.

You might also like