We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1
Site Reliability
Toil Reduction Work Sharing
Engineering (SRE) Reduce Non-Value Add Work using Tooling, Address Technical Debt in Small Increments, Foundation SM Automation, VSM, and Platform Engineering Manage Load Percentage for Ops, Dev, and On-Call Work BLUEPRINT SLAs/SLOs/SLIs Deployments Site Reliability Engineering (SRE) is Metrics such as Availability, Latency, Progressive Deployments using Green/Blue, A/B, a discipline and a role that incorporates and Response Time with Error Budgets Canary Deployments, Automation Scripts, Testing aspects of software engineering and and Monitoring applies them to infrastructure and operations problems to create ultra Measurements Performance Management scalable and highly reliable Observability, Monitoring, Monitoring, APM, Capacity Testing, distributed software systems. Telemetry, Instrumentation, and AIOps Auto-Scaling, and AIOps
Culture Anti-Fragility Incident Management
Reliability @ Scale, Shift-Left “Wisdom Improve Resilience using Fire Drills, Emergency Response, On-Call, and Blameless of Production”, Learn from Failure, and Chaos Engineering, Security and Automation Retrospectives Continuous Learning