Awesome Chaos Engineering Awesome

A curated list of awesome Chaos Engineering resources.

What is Chaos Engineering?

Chaos Engineering is the discipline of experimenting on a distributed system in order to build confidence in the system’s capability to withstand turbulent conditions in production. - Principles Of Chaos Engineering website.

Contents

Culture

Books

Education

Notable Tools

  • Chaos Monkey - A resiliency tool that helps applications tolerate random instance failures.
  • orchestrator - MySQL replication topology management and HA.
  • kube-monkey - An implementation of Netflix's Chaos Monkey for Kubernetes clusters.
  • Gremlin Inc. - Failure as a Service.
  • Chaos Toolkit - A chaos engineering toolkit to help you build confidence in your software system.
  • steadybit - A Chaos Engineering platform (SaaS or On-Prem) with auto discovery features, different attack types, user management and many more.
  • PowerfulSeal - Adds chaos to your Kubernetes clusters, so that you can detect problems in your systems as early as possible. It kills targeted pods and takes VMs up and down.
  • drax - DC/OS Resilience Automated Xenodiagnosis tool. It helps to test DC/OS deployments by applying a Chaos Monkey-inspired, proactive and invasive testing approach.
  • Wiremock - API mocking (Service Virtualization) which enables modeling real world faults and delays
  • MockLab - API mocking (Service Virtualization) as a service which enables modeling real world faults and delays.
  • Pod-Reaper - A rules based pod killing container. Pod-Reaper was designed to kill pods that meet specific conditions that can be used for Chaos testing in Kubernetes.
  • Muxy - A chaos testing tool for simulating a real-world distributed system failures.
  • Toxiproxy - A TCP proxy to simulate network and system conditions for chaos and resiliency testing.
  • Chaos engineering for Docker:
    • Pumba - Chaos testing and network emulation for Docker containers (and clusters).
    • Blockade - Docker-based utility for testing network failures and partitions in distributed applications.
  • chaos-lambda - Randomly terminate ASG instances during business hours.
  • Namazu - Programmable fuzzy scheduler for testing distributed systems.
  • Chaos Monkey for Spring Boot - Injects latencies, exceptions, and terminations into Spring Boot applications
  • Byte-Monkey - Bytecode-level fault injection for the JVM. It works by instrumenting application code on the fly to deliberately introduce faults like exceptions and latency.
  • GomJabbar - ChaosMonkey for your private cloud
  • Turbulence - Tool focused on BOSH environments capable of stressing VMs, manipulating network traffic, and more. It is very simmilar to Gremlin.
  • chaosblade - An Easy to Use and Powerful Chaos Engineering Toolkit.
  • KubeInvaders - Gamfied Chaos engineering tool for Kubernetes Clusters
  • Cthulhu - Chaos Engineering tool that helps evaluating the resiliency of microservice systems simulating various disaster scenarios against a target infrastructure in a data-driven manner.
  • VMware Mangle - Orchestrating Chaos Engineering.
  • Byteman - A Swiss Army Knife for Byte Code Manipulation.
  • Litmus - Framework for Kubernetes environments that enables users to run test suites, capture logs, generate reports and perform chaos tests.
  • Perses - A project to cause (controlled) destruction to a JVM application.
  • ChaosKube - chaoskube periodically kills random pods in your Kubernetes cluster.
  • Chaos Mesh - Chaos Mesh is a cloud-native Chaos Engineering platform that orchestrates chaos on Kubernetes environments.
  • failure-lambda - A small Node module for injecting failure into AWS Lambda using latency, exception, statuscode or diskspace.
  • aws-chaos-scripts - Collection of python scripts to run failure injection on AWS infrastructure
  • chaos-ssm-documents - Collection of AWS SSM Documents to perform Chaos Engineering experiments
  • aws-lambda-chaos-injection - A library injecting chaos into AWS Lambda. It offers simple python decorators to do delay, exception and statusCode injection and a Class to add delay to any 3rd party dependencies.
  • chaos-dingo - A tool to mess with Azure services using the Azure NodeJS SDK.
  • Chaos HTTP Proxy - Introduce failures into HTTP requests via a proxy server
  • Chaos Lemur - A self-hostable application to randomly destroy virtual machines in a BOSH-managed environment
  • Simoorg - Linkedin’s very own failure inducer framework.
  • react-chaos - A chaos engineering tool for your React apps
  • vue-chaos - A chaos engineering tool for your Vue apps
  • Chaos Engine - tool designed to intermittently destroy or degrade application resources running in cloud based infrastructure. Documentation
  • kubedoom - Kill Kubernetes pods by playing Id's DOOM.
  • kubethanos - Kills half of your randomly selected Kubernetes pods.
  • go-fault - Fault injection middleware in Go
  • Proofdock's Chaos Engineering Platform - A chaos engineering platform that seamlessly integrates in Azure DevOps and has a focus on the Azure cloud platform.
  • Pystol - Pystol is a fault injection platform allowing users to execute fault injection Actions in cloud-native environments in a controlled and prescribed way.
  • AWSSSMChaosRunner - Amazon's light-weight open-source library for chaos engineering on AWS. It can be used for EC2, ECS (with EC2 launch type) and Fargate.
  • Kraken - Chaos and resiliency testing tool for Kubernetes and OpenShift.
  • kube-burner - A tool aimed at stressing Kubernetes clusters by creating or deleting a high quantity of objects.
  • Chaos Experimentation Framework - An extensible platform for infrastructure management including Chaos Engineering
  • NetHavoc - A Chaos Engineering Tool for Linux, K8s, Windows, PCF, Cloud, and Containers for injecting Resource, Infrastructure, Network, and Application failures.
  • gorm-sqlchaos - A runtime SQL manipulator for your Golang applications based on gorm.
  • Chaos Frontend Toolkit - A set of tools to apply Chaos Engineering to frontend
  • Mitigant - The Continuos Security Verification Platform, enables confidence in cloud security posture by leveraging security chaos engineering.

Retired tools

  • The Simian Army - A suite of tools for keeping your cloud operating in top form.
  • ChaoSlingr - Introducing Security Chaos Engineering. ChaoSlingr focuses primarily on the experimentation on AWS Infrastructure to proactively instrument system security failure through experimentation.

Cloud Services

Papers

Gamedays

Blogs & Newsletters

Podcasts

  • Break Things On Purpose - Monthly podcast about Chaos Engineering presented by Gremlin Inc. Also available on Spotify, Google Play, and Stitcher.

Conferences & Meetups

Forums

Contributing

Please take a look at the contribution guidelines first. Contributions are always welcome!