0% found this document useful (0 votes)
44 views23 pages

SREDetailed With Google

This document outlines the training plan for a Site Reliability Engineering course on Google Cloud Platform (GCP). The course will cover concepts of site reliability engineering, implementing SRE practices, Linux basics, Bash scripting, Docker, Kubernetes, and GCP services and infrastructure. Key topics include building reliable systems, error budgets, service level objectives, Linux commands, Bash scripting, Docker architecture and images, GCP networking, compute, storage, databases and autoscaling services. Hands-on assignments include automating system tasks with scripts and networking/security scripts.

Uploaded by

likhitha.liki27
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views23 pages

SREDetailed With Google

This document outlines the training plan for a Site Reliability Engineering course on Google Cloud Platform (GCP). The course will cover concepts of site reliability engineering, implementing SRE practices, Linux basics, Bash scripting, Docker, Kubernetes, and GCP services and infrastructure. Key topics include building reliable systems, error budgets, service level objectives, Linux commands, Bash scripting, Docker architecture and images, GCP networking, compute, storage, databases and autoscaling services. Hands-on assignments include automating system tasks with scripts and networking/security scripts.

Uploaded by

likhitha.liki27
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 23

Site Reliability Engineering on GCP

Training Plan
Course Duration

● TBD (to be shared after finalizing the content).

Prerequisites

● A foundational understanding of IT infrastructure

● Basic knowledge about Software Development Life Cycle

● Basic Knowledge of Production deployment

Learning outcomes

● Learn concepts of Site Reliability engineering.

● Learn how to implement SRE practices.

Course Overview - Docker & Kubernetes


Environment setup

● GCP connectivity
● Prometheus and Grafana
● Dynatrace, Datadog
● ELK
● Linux on google compute engine
● Github, Jenkins

Course Content

Introduction to SRE
o Part 1: Introduction to SRE
 What is SRE?
 The history of SRE and its development
 The principles of SRE
 Differences between SRE and traditional operations roles
 The benefits of implementing SRE
 SRE teams and organizational structure
 The role of SRE in DevOps
o Part 2: Building Reliable Systems
 Define reliability by embracing risk
 Measure reliability through SLIs, SLOs, and error budget
 Lab Activity: Define SLx
 Lab Activity: Build Error Budget Policy
 Reliability concepts and metrics
 Understanding the "error budget" concept
 Setting service level objectives (SLOs)
 How to measure and report on reliability
 Designing reliable systems
 Building resiliency into systems
 Managing risk and failure modes
 Scaling systems for growth

Linux Basics
o Linux Basics
o Linux Directory Structure
o Linux Basic Commands
o clear
o pwd
o cd
o echo
o ls
o history
o whoami
o sudo su
o Copy, Remove, Move and Time Commands
o touch and cat
o watch
o env
o Dif and Grep Commands
o Head, tail, sort and more commands
o zip and tar
o tr and wc commands
o Disk utilities like free, fdisk, df and du commands
o Getting Help From Command Line user Interface
o w, who, hostname, hostnamectl and uname commands
o Search for files and directories using find and locate commands
o top command its output explanation
o User and group management commands
o id
 id -u <user>
 id -g <group>
o sudo useradd <user>
o sudo passwd <user>
o sudo userdel <user>
o sudo groupadd <group>
o sudo groupdel <group>
o sed, awk, vmstat and nestat commands
o vnstat command
o cut command
o Merge multiple files using paste command
o Connect and Manage remote machine using SSH
o Changing files and directory permissions
o tar and zip commands
o Scheduling future jobs using crontab
o PATH environment variable
o Curl
o short tutorial on ssh
o short tutorial on vi text editor
o ifconfig, ip, netstat, nslookup
o short tutorial on apt-get and yum
-
Bash Scripting

o 1. Introduction to Bash Scripting

 Understanding the Bash Shell


 Basics of Shell Scripting
 Advantages and Use Cases of Bash Scripts

o 2. Getting Started with Bash

 Launching the Bash Shell


 Basic Commands and Navigation
 Working with Files and Directories

o 3. Variables and Data Types

 Variable Declaration and Assignment


 Data Types in Bash (Strings, Numbers)
 Environment Variables

o 4. Input and Output

 Reading User Input


 Displaying Output
 Formatting Output

o 5. Conditional Statements

 if, elif, else Statements


 Case Statements
 Logical Operators in Bash

o 6. Looping Constructs

 for Loop
 while Loop
 until Loop
 Loop Control Statements (break, continue)

o 7. Functions in Bash

 Defining and Calling Functions


 Function Parameters and Return Values
 Local and Global Variables in Functions

o 8. Error Handling

 Exit Status and Return Codes


 Handling Errors with trap
 Error Messages and Logging

o 9. String Manipulation

 Concatenation
 Substring Extraction
 Searching and Replacing in Strings

o 10. Arrays in Bash


 Declaring and Initializing Arrays
 Accessing and Modifying Array Elements
 Iterating Through Arrays

o 11. File Operations


 Reading from Files
 Writing to Files
 File Permissions and Ownership
 File Testing Operations

o 12. Advanced Topics


 Regular Expressions in Bash
 Process Control and Background Jobs
 Signals and Traps

o 13. Command-Line Arguments


 Parsing Command-Line Arguments
 Options and Arguments Handling

o 14. External Commands and Scripting Tools


 Running External Commands from Scripts
 Using Awk and Sed in Scripts

o 15. Best Practices and Tips


 Writing Modular and Readable Scripts
 Commenting and Documentation
 Debugging Techniques

Bash Scripting for SRE


Assignment: To be done by engineers with little help from trainer.

1. Automating System Tasks

 System Maintenance Scripts


 Scheduling Jobs with Cron
 Task Automation Best Practices

2. Networking and Security Scripts

 Network Configuration and Monitoring


 Security Auditing with Bash
 Automating Firewall Rules and Security Policies

GCP (Google Cloud Platform)

o Introduction to Google Cloud Platform (GCP)


 Overview of Cloud Computing
 Introduction to GCP
 GCP's Core Infrastructure and Services

o Google Cloud Fundamentals - Core Infrastructure


 Understanding GCP Regions and Zones
 Introduction to Virtual Machines (Instances)
 Networking Basics in GCP
 Overview of Persistent Disks and Images
 Identity and Access Management (IAM) in GCP

o Essential Google Cloud Infrastructure: Foundation


 Billing and Pricing in GCP
 Overview of GCP Networking Services
 Interconnecting Networks in GCP

o Essential Google Cloud Infrastructure: Core Services


 Overview of GCP's Core Services
 GCP Compute Services (Compute Engine)
 GCP Storage Services (Google Cloud Storage)
 GCP Database Services (Cloud SQL)

o Elastic Google Cloud Infrastructure: Scaling and Automation

 Understanding Elastic Infrastructure


 Introduction to Autoscaling in GCP
 Instance Groups and Load Balancing

Docker
 Virtualization and Containerization
 Install git on windows as well as VM
 Overview of virtualization
 Overview of Hypervisors
 Install docker
 Docker Architecture
 Lab:
 Pull docker images from DockerHub
 Create containers from Docker images
 Access it outside the machine
 Get inside a container
 Get inside an already created container
 Install java in it
 Commit and create a new image
 Watch the container created from it has java in it
 Pushing our image to Docker Hub
 Push/pull images from/to ECR repository
 Export and import images and Containers
 Discussion on lot more options while creating containers

o Understanding the internals


 Namespaces.
 Control Groups
 Filesystem
 COW
 How are the images stored?
 Hands on session
 How are the containers stored?
 Hands on session
 Docker Lifecycle
 Understanding the lifecycle handson
 Discussion on COW
 Discussion on docker info

o Build our own Docker image


 Discuss about each of the Dockerfile commands.
 Difference between RUN, EntryPoint and Command
 Difference between Add and Copy
 Lab:
 Create Docker images with
o FROM
o RUN
o ENTRYPOINT
o CMD
o ARG
 Build a Docker image of an existing small application hosted on Tomcat.

o Volume
 What is Volume and why do we need them.
 Different types of Docker volumes
 Lab
o Create a container attached to
 volume and understand the internals
 Named volumes
 Anonymous volumes
 Bind mounts and understand the internals
 Tempfs and understanding the internals
o Understand the various options
 How to distinguish them
 Which volume type to use? When to use them?
 Lab:
 Create Docker web container connected to backend mysql
container.
 Crash and restore of mysql container.

o Discussion on Multi Stage Build


o Docker Networking
 Understanding networking in general
 Docker networking
 Lab
o Use the default bridge
 Understand the internals
o Create a custom bridge
 Understand the internals
o Understand the difference between default and custom
bridge
o Use the host network
 Understand the internals
o Use the none network
 Understand the internals
o Discussion on Overlay network and how it works.
Understand the internals

Assignment: To be done by engineers with little help from trainer.

1. Automating System Tasks

 Create or download a simple but multi container app in java and python
 Dockerize it
 Explain the difference in dockering applications in different language

Kubernetes

o Kubernetes
 Overview of Orchestration
 Install a three-node cluster (one master and two worker) using kubeadm
 Understand each components getting installed
 Kubernetes architecture
 kube-admin namespace
 Advantages of Kubernetes
 Pods
 Deep dive into pods
 Labs: Creating our own Pods
o Using imperative approach
o Using declarative approach
 How was the Pod created?
 Hands on Deep dive into Pods
 Namespaces
 Lab: Namespaces
 Labels and selectors
 Lab: Labels and selectors
 ReplicationController and Replica Set
 Lab
o Hands on impact of Replica Set
o Difference between them

o Deployment
 Lab:
o Create a deployment
o Overview of deployment strategies
o Rolling update
o Scale out and scale in
o Update and rollback
o Recreate and OnDelete

o Overview of StatefulSet and Daemonset


o Kubernetes Jobs
o Lab:
 Kubernetes jobs

Scheduling, eviction, affinity, taints and tolerations


 Lab on scheduling a Pod on a Node.
o Secretes
o Lab:
 Kubernetes Secrets

o Kubernetes volumes
 Discussion on the internals
 EmptyDir
 Hostpath
 PV
 PVC
 Connecting Pods to NFS
o Lab:
 Kubernetes volumes
o ServiceTypes
 Clust IP
 Lab: Cluster IP
 NodePort
 Lab: NodePort
 Loadbalancer
 Deep dive into how Kubernetes networking work
 DNS Lookup
 Deep dive into how dns lookup work
 Ingress
o Kubernetes security
 Network policies
 RBAC
Assignment: To be done by engineers with little help from trainer.

1. Assignment: Automating System Tasks


 Orchestrate the application dockerized earlier.
 Implement all standard practices in Orchestration including
o Service discovery
o Loadbalancing
o HPA
o Ingress through NodePort
o Kubernetes security

GKE

o Introduction to GKE
 1.1 Overview
 1.2 Key Features
 1.3 Benefits

o Getting Started
 2.1 Setting up a GKE Cluster
 2.2 Installing gcloud CLI
 2.3 Configuring kubectl

o Managing GKE Clusters


 3.1 Creating and Deleting Clusters
o 3.4 Cluster Autoscaler

Monitoring
- Monitoring using native GCP
 Introduction to Monitoring on GCP
 1.1 Overview of Monitoring in GCP
 1.2 Importance of Monitoring for Cloud Applications
 1.3 GCP Monitoring Services Overview

o Setting Up Monitoring Infrastructure


 2.1 Creating a Monitoring Workspace
 2.2 Configuring Monitoring Agents
 2.3 Defining Monitoring Policies

o Google Cloud Monitoring Console


 3.1 Navigating the Monitoring Console
 3.2 Overview of Monitoring Dashboards
 3.3 Customizing Dashboards

o Metrics and Alerting


 4.1 Understanding GCP Metrics
 4.2 Creating Custom Metrics
 4.3 Setting Up Alerts and Notifications
 4.4 Best Practices for Alerting

o Logging and Stackdriver Logging


 5.1 Introduction to Stackdriver Logging
 5.2 Configuring Log Exports
 5.3 Querying and Analyzing Logs
 5.4 Creating Log-Based Metrics

o Tracing and Stackdriver Trace


 6.1 Tracing Overview
 6.2 Instrumenting Applications for Tracing
 6.3 Analyzing Trace Data

o Debugging with Cloud Debugger


 7.1 Setting Up Cloud Debugger
 7.2 Inspecting and Debugging Applications in Production

1. Assignment: Automating System Tasks

 Implement the monitoring for the Node and Kubernetes cluster created earlier using
GCP services.

- Monitoring using Prometheus and Grafana


o Introduction to Monitoring with Prometheus and Grafana
 1.1 Overview of Prometheus
 1.2 Introduction to Grafana
 1.3 Benefits of Using Prometheus and Grafana Together

o Installing and Configuring Prometheus


 2.1 Installing Prometheus Server
 2.2 Configuring Prometheus Targets
 2.3 Prometheus Configuration Best Practices

o Installing and Configuring Grafana


 6.1 Installing Grafana Server
 6.2 Connecting Grafana to Prometheus
 6.3 Configuring Data Sources in Grafana
 6.4 Dashboard for monitoring kubernetes cluster
 6.5 Dashboard for monitoring a node

o Prometheus Query Language (PromQL)


 3.1 Basics of PromQL
 3.2 Aggregation and Filtering
 3.3 Common PromQL Functions

o Alerting with Prometheus


 5.1 Setting Up Alerting Rules
 5.2 Alertmanager Configuration
 5.3 Notification Integrations

1. Assignment: Automating System Tasks

 Implement the monitoring for the Node and Kubernetes cluster created earlier using
Prometheus and grafana.

- Monitoring using Dynatrace


o Introduction to Dynatrace
 1.1 Overview of Dynatrace
 1.2 Key Features and Capabilities
 1.3 Use Cases and Benefits
o Getting Started with Dynatrace
 2.1 Creating a Dynatrace Account
 2.2 Navigating the Dynatrace Web Console
 2.3 Installation and Setup
o Basic Concepts in Dynatrace
 3.1 Dynatrace Entities (Applications, Services, Hosts)
 3.2 Understanding Transactions and Requests
 3.3 User Experience Monitoring
o Monitoring Applications with Dynatrace
 4.1 Instrumenting Applications
 4.2 Auto-Discovery of Application Components
 4.3 Application Performance Monitoring (APM)
o Infrastructure Monitoring
 5.1 Host and Server Monitoring
 5.2 Container Monitoring
 5.3 Cloud Platform Integration with GCP
o Dynatrace Dashboards
 6.1 Creating Custom Dashboards
 6.2 Dashboard Widgets and Metrics
 6.3 Sharing and Collaborating on Dashboards
o Alerting and Notifications
 7.1 Setting Up Alerting Rules
 7.2 Integrating with Notification Channels
 7.3 Best Practices for Effective Alerts
o Dynatrace AI and Automation
 8.1 AI-Driven Problem Detection
 8.2 Root Cause Analysis
 8.3 Automated Remediation
o
o Synthetic Monitoring with Dynatrace
 9.1 Overview of Synthetic Monitoring
 9.2 Setting Up Synthetic Tests
 9.3 Analyzing Synthetic Monitoring Results

o Real User Monitoring (RUM)


 10.1 Capturing User Interactions
 10.2 Analyzing User Behavior and Performance
 10.3 Improving User Experience

1. Assignment: Automating System Tasks

 Implement the monitoring for the Node and Kubernetes cluster created earlier using
Dynatrace.

- Datadog
o Introduction to Datadog
 Overview of Datadog
 Key Features and Capabilities
 Use Cases and Benefits
o
o Getting Started with Datadog
 Creating a Datadog Account
 Navigating the Datadog Web Interface
 Installation and Setup

o Basic Concepts in Datadog


 Datadog Agents and Integrations
 Metrics, Traces, and Logs
 Datadog Tags and Attributes

o Monitoring Infrastructure with Datadog


 Host and Server Monitoring
 Container Monitoring
 Cloud Platform Integration with GCP

o Application Performance Monitoring (APM)


 Tracing Applications with Datadog APM
 Instrumentation and Code Profiling
 Identifying Performance Bottlenecks

o Datadog Dashboards
 Creating Custom Dashboards
 Dashboard Widgets and Metrics
 Sharing and Collaborating on Dashboards

o Alerting and Notifications


 Setting Up Alerting Policies
 Integrating with Notification Channels
 Anomaly Detection and Thresholds

o Log Management with Datadog


 Configuring Log Collection
 Querying and Analyzing Logs
 Correlating Logs with Metrics and Traces

o Real User Monitoring (RUM)


 Capturing User Interactions
 Analyzing User Behavior and Performance
 Improving User Experience

1. Assignment: Automating System Tasks

 Implement the monitoring for the Node and Kubernetes cluster created earlier using
Datadog.

Log management
- ELK
o Introduction to ELK Stack
 Overview of ELK Stack
 Key Components: Elasticsearch, Logstash, Kibana
 Use Cases and Benefits
o Installing and Setting Up ELK Stack
 Installing Elasticsearch
 Installing Logstash
 Installing Kibana
 Configuring Basic Settings
o Understanding Elasticsearch
 Introduction to Elasticsearch
 Indexing and Searching Data
 Data Sharding and Replication
 Mapping and Analysis
o Logstash: Data Collection and Processing
 Logstash Overview
 Configuring Logstash Input
 Filter Plugins for Data Processing
 Output Plugins for Data Routing
o Kibana: Data Visualization and Exploration
 Introduction to Kibana
 Connecting Kibana to Elasticsearch
 Creating Index Patterns
 Building Visualizations and Dashboards
o Elasticsearch Query DSL
 Basics of Elasticsearch Query Language
 Querying and Filtering Data
 Aggregations and Metrics
o Advanced Elasticsearch Features
 Full-Text Search and Analyzers
 Highlighting and Fuzzy Search
 Geo-Location Queries
o Log Management and Parsing
 Parsing Log Files with Logstash
 Grok Patterns and Regular Expressions
 Enriching Log Data
o Beats: Lightweight Data Shippers
 Overview of Beats
 Configuring Filebeat for Log Shipping
 Metricbeat for System and Service Metrics
-

1. Assignment: Automating System Tasks

 Centralized the logging of the Kubernetes cluster on a ELK stack setup.

CI/CD
- Jenkins
o Introduction to Jenkins
 Overview of Jenkins
 Continuous Integration and Continuous Delivery (CI/CD)
 Key Features and Benefits
o Installing and Setting Up Jenkins
 Installing Jenkins
 Configuring Jenkins
 Jenkins Plugins and Integration
o Creating Your First Jenkins Job
 Introduction to Jenkins Jobs
 Setting Up Source Code Repositories
 Configuring Jenkins Freestyle Projects
o Pipeline as Code with Jenkinsfile
 Understanding Jenkinsfile
 Declarative vs. Scripted Pipelines
 Writing and Configuring Jenkins Pipelines
o Version Control Integration
 Integrating Jenkins with Git
 Configuring Jenkins with GitHub
o Build Tools and Build Environments
 Integration with Build Tools (e.g., Maven)
 Configuring Build Environments
 Artifact Management
o Automated Testing with Jenkins
 Setting Up Automated Tests
 Running Unit Tests
 Integration Testing and Code Quality Checks
o Continuous Deployment with Jenkins
 Introduction to Continuous Deployment
 Deploying Applications with Jenkins
 Implementation of CI/CD with Jenkins
o Jenkins Distributed Builds
 Configuring Jenkins Agents
 Master-Slave Architecture
 Cloud-Based Build Agents
o Security in Jenkins
 User Authentication and Authorization
 Role-Based Access Control (RBAC)
o Monitoring and Logging

 Jenkins Metrics and Monitoring
 Centralized Logging with Jenkins
 Health Checks and Notifications

1. Assignment: Automating System Tasks

 Implement the CI/CD using Jenkins for the application orchestrated earlier using
Jenkins.

- GitOps (Github Actions)

o Introduction to GitHub Actions


 Overview of GitHub Actions
 Key Concepts: Workflows, Jobs, and Steps
 Use Cases and Benefits
o Getting Started with GitHub Actions
 Enabling GitHub Actions in a Repository
 Workflow YAML Configuration
 GitHub Actions Triggers
o Creating Basic Workflows
 Simple Workflow Structure
 Running Jobs on Different Operating Systems
 Workflow Syntax and Expressions
o Building and Testing Applications
 Setting Up Build Jobs
 Running Tests with GitHub Actions
 Artifact and Cache Management
o Continuous Integration (CI) with GitHub Actions
 Configuring CI Workflows
 Matrix Builds for Multiple Environments
 Parallelism and Resource Allocation
o Deployments with GitHub Actions

 Deploying to Cloud Services GCP
 Custom Deployment Scenarios
o GitHub Actions for Automation
o
 Automating Code Reviews and Quality Checks
 Automated Release Workflows
 Scheduled Jobs and Cron Triggers

o Custom Actions
 Creating Custom Actions
 Sharing and Using Custom Actions
 Best Practices for Action Development

1. Assignment: Automating System Tasks

 Implement the CI/CD using Jenkins for the application orchestrated earlier using
GitHub Actions.

- Infrastructure provision tool


o Terraform
o Introduction to Infrastructure as Code (IaC) and Terraform
 Overview of Infrastructure as Code
 Introduction to Terraform
 Key Concepts: Providers, Resources, and State
o Setting Up Terraform for GCP
 Installing Terraform
 Configuring GCP Provider
 Authentication and Service Account Setup
o Terraform Configuration Language (HCL)
 Basics of HCL Syntax
 Variables and Data Types
 Modules and Code Organization
o Creating GCP Resources with Terraform
 Provisioning Virtual Machines (VMs)
 Configuring Networks and Firewalls
 Managing GCP Services (e.g., Cloud Storage)
o Managing Compute Instances and Infrastructure
 Creating and Configuring Compute Engine Instances
 Using Instance Templates and Managed Instance Groups
 Configuring Auto-Scaling with Terraform
o Networking in GCP with Terraform
 Creating VPCs and Subnets
 Configuring Load Balancers
 Networking Best Practices with Terraform
o Storage and Database Resources
 Managing Cloud Storage Buckets
 Configuring Cloud SQL Databases
 Terraform and Cloud Spanner
o Identity and Access Management (IAM) with Terraform
 Managing Service Accounts
 Configuring IAM Roles and Permissions
 Terraform Best Practices for Security
o Working with GCP Kubernetes Engine (GKE)
 Provisioning GKE Clusters with Terraform
 Configuring Kubernetes Workloads
 Terraform and Helm Charts
o Managing Secrets and Encryption
 Handling Secrets in Terraform
 Encryption and Data Protection
 HashiCorp Vault Integration
o Remote State Management
 Storing Terraform State in GCS (Google Cloud Storage)
 Using Remote Backends for Collaboration
 State Locking and Concurrency

o Terraform Variables and Outputs


 Input Variables and Variable Files
 Output Variables and Data Export
 Dynamic Blocks and Expressions
o Terraform Modules
 Creating and Using Terraform Modules
 Module Input and Output Variables
 Best Practices for Module Design

1. Assignment: Automating System Tasks

Provision an Kubernetes stack such that you can reuse it in dev, testing, staging and
production env using Terraform

Configuration Management Tool


- Ansible
o Introduction to Ansible and GCP
 Overview of Ansible
 Key Concepts: Playbooks, Roles, and Modules
 Integrating Ansible with GCP

o Setting Up Ansible for GCP


 Installing Ansible
 Configuring Ansible for GCP
 Authenticating Ansible with GCP

o Ansible Playbooks for GCP


 Writing Your First Ansible Playbook
 Ansible Tasks and Handlers
 Variables and Facts in Ansible

o Dynamic Inventory and GCP


 Configuring Dynamic Inventory for GCP
 Ansible Groups and Hosts
 Customizing Dynamic Inventory Scripts

o Working with GCP Kubernetes Engine (GKE)


 Managing GKE Clusters with Ansible
 Deploying Kubernetes Workloads
 Helm Charts and Ansible

o GCP Identity and Access Management (IAM)


 Managing Service Accounts
 Configuring IAM Roles with Ansible
 Ansible Best Practices for Security

o GCP Networking with Ansible


 Creating VPCs and Subnets
 Configuring Load Balancers
 Network Security with Ansible

o Managing GCP Storage Resources


 Working with Cloud Storage Buckets
 Ansible and Cloud SQL Databases
 Automating Data Pipelines with Ansible

o Working with Ansible Vault and GCP Secrets


 Securing Ansible Playbooks with Vault
 Managing Secrets in Ansible
 Encryption and Data Protection

1. Assignment: Automating System Tasks

Implement the deployment using ansible for the following types


- Rolling update
- Blue/green update
- Canary update

Python

o Introduction to Python and SRE


 Overview of Python Programming Language
 Role of Python in Site Reliability Engineering
 Python's Importance in Automation and Scripting

o Setting Up Your Python Environment


 Installing Python
 Understanding Virtual Environments
 Popular Python IDEs and Editors

o Python Basics
 Variables and Data Types
 Operators and Expressions
 Control Flow (if statements, loops)

o Functions in Python
 Defining Functions
 Function Parameters and Return Values
 Scope and Lifetime of Variables

o Data Structures in Python


 Lists, Tuples, and Sets
 Dictionaries and Mapping
 Working with Collections

o Error Handling and Exceptions


 Understanding Errors and Exceptions
 Using Try-Except Blocks
 Handling Multiple Exceptions

o File Handling in Python


 Reading and Writing to Files
 Working with Text and Binary Files
 File Handling Best Practices

o Working with Modules and Libraries

 Importing Modules and Packages


 Exploring Standard Python Libraries
 Installing and Using External Libraries with pip

o Introduction to Object-Oriented Programming (OOP)


 Classes and Objects
 Inheritance and Polymorphism
 Encapsulation and Abstraction

o Python for Automation


 Automating Tasks with Python
 Building Simple Scripts
 Scheduling Jobs with Python
1. Assignment: Automating System Tasks

 Collect basic server metrics such as CPU usage, memory usage, disk space, and
network statistics.

 Use external Python libraries such as psutil (for system monitoring) and requests (for
fetching external data).

 Generate a simple report that includes the collected metrics.


 Display information such as CPU percentage, memory usage, disk space, and network
activity.

SRE final wrap up

o SRE Practices and Processes


 Incident response and management
 Post-incident reviews (PIRs)
 What is Toil?
 What is Error Budget?
 Automating toil and reducing manual work
• Identify Toil
• Prioritize & Attack Toil
• Optimize incident management
• Distinguish Signal vs Noise
• Develop Runbook
• Optimize Runbook
• Optimize Alerts
 Service ownership and service level agreements (SLAs)
 Change management and continuous improvement
 Monitoring and alerting best practices
 Capacity planning and resource allocation
 Disaster recovery and business continuity planning

Optional final Assignment

Assignment 1: Create and Deploy a Web Application

Objective: Set up a basic web application on GCP using Google App Engine.

Steps:

1. Get/download a webapplication.
2. Set up a new project on GCP.
3. Deploy the web application to Google App Engine.
4. Configure a custom domain for your web application.
5. Explore monitoring and logging features in Google Cloud Console.

Assignment 2: Build a Scalable Storage Solution

Objective: Create a scalable and redundant storage solution using Google Cloud Storage.

Steps:

1. Create a new Google Cloud Storage bucket.


2. Upload various types of data (images, documents, etc.) to the bucket.
3. Enable versioning on the bucket.
4. Implement object lifecycle management.
5. Set up access controls and permissions for the bucket.

Assignment 3: Implement Virtual Machines and Networking

Objective: Deploy virtual machines and set up networking components on GCP.

Steps:

1. Create multiple virtual machines using Google Compute Engine.


2. Configure a Virtual Private Cloud (VPC) with multiple subnets.
3. Set up firewall rules to control incoming and outgoing traffic.
4. Test connectivity between virtual machines and external resources.

Assignment 4: Automate Infrastructure Deployment

Objective: Explore Infrastructure as Code (IaC) by automating the deployment of resources.

Steps:

1. Repeat the same using Terraform


2. Code it such that you can easily destroy and create same environment across dev,
stage, and production.

Assignment 5: Set Up a Kubernetes Cluster and Deploy an Application


Objective: Learn container orchestration by deploying an application on Google Kubernetes
Engine (GKE).

Steps:

1. Create a Kubernetes cluster on GKE.


2. Deploy a sample application (e.g., a microservice) to the cluster.
3. Configure autoscaling for the application.
4. Implement rolling updates for the deployed application.
5. Explore monitoring and logging features for the Kubernetes cluster using
1. GCP native monitoring
2. Prometheus and Grafana
3. Dynatrace
4. Datadog.

You might also like