Releases: litmuschaos/litmus
2.0.0-Beta7
Added CRD for Event-tracker (#2812) Signed-off-by: Jonsy13 <[email protected]>
2.0.0-Beta6
Major Updates
- Added MongoDB go-interface and refactored the database operations and structure to accommodate the test cases easily.
- Support for adding custom container image registry to chaos workflow manifest.
- Enhanced the performance of the analytics APIs with memory caching and added APIs to fetching labels and values for a Prometheus series.
- Added support for mutating the sequence of the workflow steps by drag and drop which reflect the live changes in the DAG.
- Enhanced the workflow graph to show other node phases such as Omitted, Skipped, and Error for a good user experience.
- Enhanced the verify and commit page to allow users to have a final review and edit their workflow details before scheduling the same.
- Bug fixed for some user management operations and refactored teaming APIs to increase the performance.
- Enhanced the litmusportal user interface to fastens the onboarding process.
Minor Updates
- Adding support for liveness check of the dependent applications in the agent plane before going active.
- AirGapped support for the pre-defined workflows by moving the fetching logic to the backend.
- Added instance-id label in the chaos workflow manifest to avoid multiple scheduling in the multi-Argo server cluster.
- Added validations for workflow name, GitHub URL, and different probe inputs.
2.0.0-Beta5
Minor SA fix in eventtracker (namespace) (#2760) Signed-off-by: Raj Das <[email protected]>
2.0.0-Beta4
Major Updates
- Fixes the inability to successfully register the agents/targets when litmus portal server is brought up with loadbalancer/nodeport service type
- Makes MyHub source configurable by branch so that latest stable versions of experiments are pulled for custom & predefined workflows
- Updates the chaos operator dependencies on the subscriber to make use of the latest api changes for chaos resources
- Updates the chaos operator, runner & exporter image tunables/ENVs in the subscriber so that the latest stable versions are installed on the targets
- Updates Okteto dev setup instructions to reflect latest image versions and changes in specification (env) as well as instructions
- Updates the chaosengine CRD validation schema for annotation injection in the manifests maintained & installed by the subscriber
Minor Updates
- Improves the icons for revert chaos and workflow scheduling
- Optimizes the teaming code to remove redundant conditions
- Improved styling & background adopted from litmus-ui
2.0.0-Beta3
Litmus 2.0.0-Beta3
Major Updates
- Support for policy-based control of event tracker where users can define their own policy using JMESPath query and based on that event-tracker will react to the application changes.
- Enhanced UI for workflow Scheduling, gives users the ability to tune annotations, target application details like application namespace, labels, and kind, and probe data using User Interface.
- New UI for workflow visualization for showing information about workflow and nodes in a better way.
- We made the onboarding process for users and easier to use through the new UI.
- Enhanced the homepage to show information like Recent workflow runs, Agent details, and Project details.
- Shifting project switching from using Redux-based technique to URL-based technique to avoid caching problems.
- Migrated CircleCI to GitHub workflow and enhanced the continuous integration of the project.
- Enhanced the analytics module in terms of UI and computation
- Enhanced the browse workflows table to show resilience score and the total number of experiments passed for the listed workflows.* Support role-based access control in the backend for handling authorization for all requests.
- Support for storing scheduled workflow templates and adding some new podtato-head predefined workflow templates
Minor Updates
- Increment in the Better Code Hub(BCH) score
- Optimized the frontend by shifting the resiliency score calculation to the backend.
- Restructured the directory structure for settings in the frontend to modularise the code.
- Support for a reinstall of litmus agents by moving the
litmus-portal-config
configmap independent of the subscriber. - Support for Ingress and Load balancer network type for connecting external agents with Litmus Portal. Based on the server service type, it will generate the endpoint for the external agent.
2.0.0-Beta2
Added beta2 fixes for auth and teaming (#2612) Signed-off-by: Saranya-jena <[email protected]>
2.0.0-Beta1
Major Updates
- Support for in-built analytics, where users can connect their data sources and generate dashboard panels.
- Support for Git as a single source of truth for workflow artifacts. This enables users to have their workflows synced between the portal and Git source.
- Introduces the event-tracker microservice to trigger chaos workflows automatically upon change to application images. This feature works in tandem with GitOps frameworks that rollout changes to applications upon manual changes in the Git source or upon image push to registries.
- Support for re-running of existing chaos workflow from the litmus portal.
- Adding a command-line tool called
litmusctl
to manage litmus portal services. The key role of litmusctl is to connect the external cluster with the litmus server and install the external agents. - Redesigning the teaming user interface and adding some significant features such as leave project, decline invitation.
- Recreating litmus docs for litmus 2.0.x. For more information, visit https://litmusdocs-beta.netlify.app/
- Integration of Litmus-UI with litmus portal components
- Major directory restructuring of litmus portal’s server for database handlers
Minor updates
- Changing MongoDB kind from deployment to statefulsets
- Adding chaos-exporter as default external cluster agents for litmusportal
- Refactoring authentication server to accommodate new teaming integration
- Removing some unnecessary inputs from the welcome modal and predefined chaos workflow
2.0.0-Beta0
Fixed default error state for password fields and fixed modal padding…
1.13.8
New Features & Enhancements
-
Introduces upgraded pod-cpu-hog & pod-memory-hog experiments that inject stress-ng based chaos stressors into target containers pid namespace (non-exec model).
-
Supports multi-arch images for chaos-scheduler controller
-
Supports CIDR apart from destination IPs/hostnames in the network chaos experiments
-
Refactors the litmus-python repository structure to match the litmus-go & litmus-ansible repos. Introduces a sample python-based pod-delete experiment with the same flow/constructs as its go-equivalent to help establish a common flow for future additions. Also adds a BYOC folder/category to hold non-litmus native experiment patterns.
-
Refactors the litmus-ansible repo to remove the stale experiments (which have been migrated and improved in litmus-go). Retains (improves) samples to help establish a common flow for future additions
-
Adds GCP chaos experiments (GCP VM stop, GPD detach) in technical-preview mode
Major Bug Fixes
-
Fixes erroneous logs in the chaos-operator seen while attempting to remove finalizer on chaosengine
-
Fixes a condition where the chaos revert information is present in both annotations as well as the status of chaosresult CR (the inject/revert status is typically maintained/updated as an annotation on the chaosresult before it is updated into the status and cleared/removed from annotations)
-
Removes hardcoded experiment job entrypoint, instead of picking from the ChaosExperiment CR’s
.spec.definition.command
-
Fixes a scheduler bug that interprets a minChaosInterval mentioned in hours (ex: 1h) in minutes
-
Improves the scheduler reconcile to stop flooding/logging every “reconcile” seconds irrespective of the minChaosInterval
-
Enables the scheduler to start off with the chaos injection immediately upon application of the ChaosSchedule CR without waiting for the first installment of minChaosInterval period - in repeat mode with only the minChaosInterval specified
-
Handles edge/boundary conditions where chaos
StartTime
is behindCreationTimeStamp
of ChaosSchedule OR next iteration of chaos as per minChaosInterval is beyond the EndTime -
Adds a check to ignore chaos pods (operator, runner, experiment/helper/probe pods) and blacklist them from being chaos candidates (esp. needed when appinfo.applabel is configured with exclusion patterns such as:
!keys
OR<key> notin <value>
) -
Removes hostIPC,
hostNetwork
permissions for pod stress chaos experiments -
Fixes an incorrect env key for TOTAL_CHAOS_DURATION in pod-dns experiments
-
Fixes a regression introduced in 1.13.6 wherein the experiment expected the parent workloads (deployment, statefulset et al) to carry labels specified in
appinfo.applabel
, apart from just the pods even when.spec.annotationCheck
was set to false in the ChaosEngine. Prior to this, the parent workloads needed to have the label only when.spec.annotationCheck
was set to true. This has been re-corrected as per earlier expectations.
Limitations
-
Chaos abort (via .spec.engineState set to stop OR via chaosengine deletion) operation is known to have an issue with the namespace scoped chaos-operator in 1.13.8, i.e., an operator running with WATCH_NAMESPACE env set to a specific value and using role permissions. In such cases, the finalizer on the ChaosEngine needs to be removed manually and the resource deleted to ensure the operator functions properly.
This is not needed/necessary for cluster scoped operators (which is the default mode of usage)(where WATCH_NAMESPACE env is set to empty string to cover all ns & leverages clusterrole permissions.)
The fix for correcting the behavior of namespace scoped operators will be added in the next patch.
Installation
kubectl apply -f https://litmuschaos.github.io/litmus/litmus-operator-v1.13.8.yaml
Verify your installation
-
Verify if the chaos operator is running
kubectl get pods -n litmus
-
Verify if chaos CRDs are installed
kubectl get crds | grep chaos
For more details refer to the documentation at Docs
1.13.6
New Features & Enhancements
-
Supports automated rollback/abort of chaos depending upon predefined conditions (defined in the probes). The probes can now be configured with a StopOnFailure property set to true or false to control the execution flow of the experiment.
-
Enhances the ChaosResult status schema to provide details of (a) the target resource impacted (b) success of the chaos revert operation.
-
Introduces additional labels for the “interleaved” chaos metrics (
litmus_awaited_experiments
&litmus_experiment_verdict
) to indicate workflow name & chaos injection timestamp. This is expected to help in the construction of more meaningful dashboards to track app behavior under chaos. -
Adds the golang chaoslib and experiment logic for docker-service-kill (from ansible)
-
Introduces the tech-preview of a new category (aws-ssm) of chaos experiments that can inject common resource and network chaos in EC2 instances (which is part of a kubernetes cluster or a standalone/vanilla instance).
-
Introduces the tech-preview of refactored pod-cpu-hog & pod-memory-hog chaos experiments that can inject resource chaos on target apps externally (non-exec mode) via cgroup operations.
-
Improves/dockerizes the build process for most components (removes vendor packages stored on the repo and migrates to github workflows)
-
Reduces the size of the experiment (go-runner) image by creating a single chaos helper component that takes specific chaos operations as flags
-
Extends the StatusCheckTimeout property to the helper pods (earlier releases had this only for pre/post chaos checks), thereby helping the flexible evaluation of application availability/readiness during the chaos
-
Adds a new event for “Abort” on the ChaosResult
-
Increases coverage in the commit-based e2e runs on the litmus-go repo with the addition of node chaos tests
-
Adds a new helm chart for kube-aws (chaos experiment bundle) in the litmus-helm repository.
-
Enhances the litmus-sdk to (a) create a highly generic experiment scaffolding that can trigger and kill chaos via shell commands passed as environment variables (change from an earlier sample of pod-delete) and (b) push all non-code files (CR yamls) into a dedicated directory that can be directly copied/committed to the chaos-charts repo.
-
Cuts the first tagged release on the test-tools repository and sets up downloadable artifacts for the dependent chaos utils (nsutil, pauseutil, promql, dns-interceptor).
Major Bug Fixes
-
Adds missing environment variables for kill sequence and pod affected percentage in the kafka-broker-pod-failure experiment
-
Fixes the missing environment variable for defining the spoof map within the dns-spoof experiment.
-
Fixes the ChaosScheduler to work with the latest versions of the chaos-operator and updates documentation with missing mandatory properties in the .spec.engineTemplate
Installation
kubectl apply -f https://litmuschaos.github.io/litmus/litmus-operator-v1.13.6.yaml
Verify your installation
-
Verify if the chaos operator is running
kubectl get pods -n litmus
-
Verify if chaos CRDs are installed
kubectl get crds | grep chaos
For more details refer to the documentation at Docs