FANG CI - CD DevSecOps Best Practices
FANG CI - CD DevSecOps Best Practices
Certainly! Let's delve into the detailed configurations and actionable steps for each of the Next Steps
you've outlined. This will provide you with concrete settings and examples to implement robust IT
infrastructure tailored to your projects and organizational needs.
Action: Modify the provided Terraform scripts, Kubernetes manifests, and Helm charts to fit your
specific project requirements and align with your organizational policies.
Tip: Use variables and modules in Terraform to create reusable and parameterized configurations.
hcl
variable "region" {
description = "AWS region to deploy resources"
type = string
default = "us-east-1"
}
variable "environment" {
description = "Deployment environment"
type = string
default = "production"
}
variable "cluster_name" {
description = "EKS cluster name"
type = string
default = "my-eks-cluster"
}
variable "node_instance_type" {
description = "EC2 instance type for EKS nodes"
type = string
default = "t3.medium"
}
variable "desired_capacity" {
description = "Desired number of EKS nodes"
type = number
default = 3
}
variable "max_capacity" {
description = "Maximum number of EKS nodes"
type = number
default = 10
}
variable "min_capacity" {
description = "Minimum number of EKS nodes"
type = number
default = 1
}
css
terraform/
├── main.tf
├── variables.tf
├── outputs.tf
└── modules/
└── eks/
├── main.tf
├── variables.tf
└── outputs.tf
File: `modules/eks/main.tf`
hcl
vpc_config {
subnet_ids = var.subnet_ids
}
tags = var.tags
}
scaling_config {
desired_size = var.desired_capacity
max_size = var.max_capacity
min_size = var.min_capacity
}
instance_types = [var.node_instance_type]
tags = var.tags
}
assume_role_policy = jsonencode({
Version = "2012-10-17",
Statement: [{
Effect = "Allow",
Principal = { Service = "eks.amazonaws.com" },
Action = "sts:AssumeRole"
}]
})
tags = var.tags
}
assume_role_policy = jsonencode({
Version = "2012-10-17",
Statement: [{
Effect = "Allow",
Principal = { Service = "ec2.amazonaws.com" },
Action = "sts:AssumeRole"
}]
})
tags = var.tags
}
File: `modules/eks/variables.tf`
hcl
variable "cluster_name" {
description = "Name of the EKS cluster"
type = string
}
variable "subnet_ids" {
description = "List of subnet IDs for the EKS cluster"
type = list(string)
}
variable "node_instance_type" {
description = "EC2 instance type for EKS nodes"
type = string
}
variable "desired_capacity" {
description = "Desired number of EKS nodes"
type = number
}
variable "max_capacity" {
description = "Maximum number of EKS nodes"
type = number
}
variable "min_capacity" {
description = "Minimum number of EKS nodes"
type = number
}
variable "tags" {
description = "Tags to apply to resources"
type = map(string)
default = {}
}
File: `modules/eks/outputs.tf`
hcl
output "cluster_endpoint" {
description = "Endpoint for the EKS cluster"
value = aws_eks_cluster.this.endpoint
}
output "cluster_certificate_authority_data" {
description = "Certificate authority data for the EKS cluster"
value = aws_eks_cluster.this.certificate_authority.0.data
}
output "node_group_role_arn" {
description = "IAM Role ARN for the EKS node group"
value = aws_iam_role.eks_node.arn
}
File: `main.tf`
hcl
provider "aws" {
region = var.region
}
module "eks" {
source = "./modules/eks"
cluster_name = var.cluster_name
subnet_ids = aws_subnet.private[*].id
node_instance_type = var.node_instance_type
desired_capacity = var.desired_capacity
max_capacity = var.max_capacity
min_capacity = var.min_capacity
tags = {
Environment = var.environment
Team = "DevOps"
}
}
tags = {
Name = "private-subnet-${count.index + 1}"
}
}
tags = {
Name = "main-vpc"
Environment = var.environment
}
}
output "cluster_endpoint" {
value = module.eks.cluster_endpoint
}
output "node_group_role_arn" {
value = module.eks.node_group_role_arn
}
Explanation:
yaml
replicaCount: 3
image:
repository: myregistry.com/myapp
tag: "latest"
pullPolicy: IfNotPresent
service:
type: ClusterIP
port: 80
ingress:
enabled: true
annotations:
kubernetes.io/ingress.class: "nginx"
hosts:
- host: myapp.example.com
paths:
- path: /
pathType: Prefix
resources:
requests:
cpu: "500m"
memory: "512Mi"
limits:
cpu: "1"
memory: "1Gi"
nodeSelector: {}
tolerations: []
affinity: {}
File: `helm/myapp/templates/deployment.yaml`
(Same as previously provided, with Helm templating for variables.)
Tip: Utilize Helm's templating capabilities to inject variables and conditionally include resources based
on `values.yaml`.
Creating reusable modules for networking components like VPCs and subnets.
File: `modules/networking/main.tf`
hcl
tags = var.tags
}
tags = {
Name = "${var.name_prefix}-public-subnet-${count.index + 1}"
Environment = var.environment
}
}
tags = {
Name = "${var.name_prefix}-private-subnet-${count.index + 1}"
Environment = var.environment
}
}
output "vpc_id" {
value = aws_vpc.this.id
}
output "public_subnet_ids" {
value = aws_subnet.public[*].id
}
output "private_subnet_ids" {
value = aws_subnet.private[*].id
}
File: `modules/networking/variables.tf`
hcl
variable "cidr_block" {
description = "CIDR block for the VPC"
type = string
}
variable "public_subnet_count" {
description = "Number of public subnets"
type = number
default = 2
}
variable "private_subnet_count" {
description = "Number of private subnets"
type = number
default = 2
}
variable "availability_zones" {
description = "List of availability zones"
type = list(string)
}
variable "name_prefix" {
description = "Prefix for resource names"
type = string
}
variable "environment" {
description = "Deployment environment"
type = string
}
File: `modules/networking/outputs.tf`
hcl
output "vpc_id" {
value = aws_vpc.this.id
}
output "public_subnet_ids" {
value = aws_subnet.public[*].id
}
output "private_subnet_ids" {
value = aws_subnet.private[*].id
}
File: `main.tf`
hcl
module "networking" {
source = "./modules/networking"
cidr_block = "10.0.0.0/16"
public_subnet_count = 2
private_subnet_count = 2
availability_zones = ["us-east-1a", "us-east-1b"]
name_prefix = "main"
environment = var.environment
}
module "eks" {
source = "./modules/eks"
cluster_name = var.cluster_name
subnet_ids = module.networking.private_subnet_ids
node_instance_type = var.node_instance_type
desired_capacity = var.desired_capacity
max_capacity = var.max_capacity
min_capacity = var.min_capacity
tags = {
Environment = var.environment
Team = "DevOps"
}
}
Explanation:
Reusable Networking Module: Simplifies creating VPCs, public and private subnets.
Parameterization: Adjusts the number of subnets, availability zones, and naming conventions
based on variables.
2. Automate Deployments
Action: Utilize the provided CI/CD pipeline examples (GitHub Actions, Jenkins) to automate
infrastructure provisioning and application deployments.
Tip: Ensure that secrets and sensitive information are managed securely using secret management tools
or CI/CD secret features.
File: `.github/workflows/ci-cd-pipeline.yml`
yaml
on:
push:
branches:
- main
jobs:
terraform:
name: Terraform Apply
runs-on: ubuntu-latest
steps:
- name: Checkout Code
uses: actions/checkout@v2
deploy:
name: Deploy to Kubernetes
runs-on: ubuntu-latest
needs: terraform
steps:
- name: Checkout Code
uses: actions/checkout@v2
Explanation:
Jobs:
Terraform Job: Initializes, plans, and applies Terraform configurations to provision or update
infrastructure.
Deploy Job: Deploys the application to Kubernetes using Helm after the infrastructure is
updated.
Secrets Management: AWS credentials are securely stored in GitHub Secrets to prevent exposure.
Dependencies: The `deploy` job depends on the successful completion of the `terraform` job.
b. Jenkins Pipeline
File: `Jenkinsfile`
groovy
pipeline {
agent any
environment {
AWS_ACCESS_KEY_ID = credentials('aws-access-key-id')
AWS_SECRET_ACCESS_KEY = credentials('aws-secret-access-key')
HELM_HOME = "${env.WORKSPACE}/.helm"
}
stages {
stage('Checkout') {
steps {
checkout scm
}
}
stage('Terraform Init') {
steps {
sh 'terraform init'
}
}
stage('Terraform Plan') {
steps {
sh 'terraform plan -out=tfplan'
}
}
stage('Terraform Apply') {
steps {
sh 'terraform apply tfplan'
}
}
stage('Setup kubectl') {
steps {
sh 'aws eks update-kubeconfig --name my-eks-cluster --region us-east-1'
}
}
stage('Deploy with Helm') {
steps {
sh 'helm upgrade --install myapp ./helm/myapp --namespace production --creat
e-namespace --set image.tag=${BUILD_ID}'
}
}
}
}
Explanation:
Stages:
Checkout: Retrieves the latest code from the repository.
Terraform Stages: Initializes, plans, and applies Terraform configurations.
Kubeconfig Setup: Configures `kubectl` to interact with the EKS cluster.
Deploy with Helm: Deploys or updates the application in Kubernetes using Helm.
Credentials Management: Uses Jenkins credentials to securely handle AWS access keys.
Name: `DOCKER_REGISTRY_USERNAME`
Value: Your Docker registry username.
Name: `DOCKER_REGISTRY_PASSWORD`
Value: Your Docker registry password.
yaml
Explanation:
Secure Handling: Secrets are injected as environment variables and not exposed in logs.
Best Practices: Rotate secrets regularly and use least privilege principles.
Action: Deploy Prometheus and Grafana using the provided configurations, and set up Alertmanager
with integrations (e.g., Slack) to receive proactive alerts.
Tip: Create custom Grafana dashboards tailored to your application's metrics and performance
indicators.
Commands:
bash
# Install Prometheus
helm install prometheus prometheus-community/prometheus --namespace monitoring
# Install Grafana
helm install grafana prometheus-community/grafana --namespace monitoring \
--set adminPassword='YourSecurePassword' \
--set service.type=LoadBalancer
Explanation:
File: `alertmanager-config.yaml`
yaml
global:
resolve_timeout: 5m
receivers:
- name: 'slack-notifications'
slack_configs:
- api_url: 'https://hooks.slack.com/services/T00000000/B00000000/XXXXXXXXXXXXXXXXXXXXX
XXX'
channel: '#alerts'
send_resolved: true
route:
group_by: ['alertname']
group_wait: 10s
group_interval: 10m
repeat_interval: 1h
receiver: 'slack-notifications'
Apply Configuration:
bash
Explanation:
yaml
scrape_configs:
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: (.+):(?:\d+);(\d+)
replacement: $1:$2
Explanation:
Scrape Configs: Define how Prometheus discovers and scrapes metrics from Kubernetes pods
based on annotations.
bash
1. Access Grafana:
Obtain Grafana's external IP:
bash
Open Grafana in your browser using the external IP and port 3000.
2. Login:
Username: `admin`
Password: As set during Helm installation (`YourSecurePassword`).
3. Create Dashboard:
Click on the `+` icon and select `Dashboard`.
Add panels by selecting metrics from Prometheus.
Customize visualizations (graphs, tables, etc.) based on your application's performance
indicators.
4. Export and Import Dashboards:
Export: Share dashboards by exporting JSON files.
Import: Reuse dashboards by importing JSON configurations.
Panel Query:
promql
sum(rate(container_cpu_usage_seconds_total{image!="",pod!=""}[5m])) by (pod)
Tip: Utilize Grafana's templating features to create dynamic and reusable dashboards.
Action: Implement RBAC, network policies, and secret management using Kubernetes, OPA Gatekeeper,
and HashiCorp Vault.
Tip: Regularly audit your security policies and perform vulnerability assessments to identify and mitigate
potential threats.
yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: production
name: developer-role
rules:
- apiGroups: ["", "apps", "extensions"]
resources: ["deployments", "pods", "services"]
verbs: ["get", "watch", "list", "create", "update", "patch", "delete"]
RoleBinding (`rolebinding-dev.yaml`):
yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: developer-binding
namespace: production
subjects:
- kind: User
name: "developer-user"
apiGroup: rbac.authorization.k8s.io
roleRef:
kind: Role
name: developer-role
apiGroup: rbac.authorization.k8s.io
Apply Configurations:
bash
Explanation:
Example: Restricting Backend Pods to Accept Traffic Only from Frontend Pods
NetworkPolicy (`networkpolicy-frontend-backend.yaml`):
yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: frontend-to-backend
namespace: production
spec:
podSelector:
matchLabels:
app: backend
ingress:
- from:
- podSelector:
matchLabels:
app: frontend
ports:
- protocol: TCP
port: 8080
policyTypes:
- Ingress
Apply Configuration:
bash
Explanation:
NetworkPolicy: Restricts backend pods to receive traffic only from frontend pods on port 8080.
bash
yaml
apiVersion: templates.gatekeeper.sh/v1beta1
kind: ConstraintTemplate
metadata:
name: k8sdenyprivilegedcontainers
spec:
crd:
spec:
names:
kind: K8sDenyPrivilegedContainers
validation:
openAPIV3Schema:
properties:
message:
type: string
targets:
- target: admission.k8s.gatekeeper.sh
rego: |
package k8sdenyprivilegedcontainers
Constraint (`deny-privileged-containers.yaml`):
yaml
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sDenyPrivilegedContainers
metadata:
name: deny-privileged-containers
spec:
match:
kinds:
- apiGroups: [""]
kinds: ["Pod"]
parameters:
message: "Privileged containers are not allowed."
Apply Configurations:
bash
Explanation:
hcl
path "secret/data/myapp/*" {
capabilities = ["read"]
}
bash
yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
namespace: production
spec:
replicas: 3
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
annotations:
vault.hashicorp.com/agent-inject: "true"
vault.hashicorp.com/role: "myapp-role"
vault.hashicorp.com/agent-inject-secret-db-password: "secret/data/myapp/db-password"
spec:
serviceAccountName: myapp-serviceaccount
containers:
- name: myapp-container
image: myregistry.com/myapp:latest
ports:
- containerPort: 8080
env:
- name: DB_PASSWORD
value: "/vault/secrets/db-password"
Apply Deployment:
bash
Explanation:
5. Optimize Costs
Action: Configure Kubernetes Cluster Autoscaler and utilize Spot Instances to manage resource usage
efficiently and reduce costs.
Tip: Monitor your AWS billing and resource utilization regularly to identify areas for further optimization.
hcl
module "eks" {
source = "terraform-aws-modules/eks/aws"
cluster_name = "my-eks-cluster"
cluster_version = "1.21"
subnets = aws_subnet.private[*].id
vpc_id = aws_vpc.main.id
node_groups = {
eks_nodes = {
desired_capacity = 3
max_capacity = 10
min_capacity = 1
instance_type = "t3.medium"
key_name = "my-key-pair"
additional_tags = {
Name = "eks-node"
}
tags = {
"k8s.io/cluster-autoscaler/enabled" = "true"
"k8s.io/cluster-autoscaler/my-eks-cluster" = "owned"
}
}
}
tags = {
Environment = "Production"
}
}
policy = jsonencode({
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"autoscaling:DescribeAutoScalingGroups",
"autoscaling:DescribeAutoScalingInstances",
"autoscaling:DescribeTags",
"autoscaling:SetDesiredCapacity",
"autoscaling:TerminateInstanceInAutoScalingGroup",
"ec2:DescribeLaunchTemplateVersions"
],
"Resource": "*"
}
]
})
}
Explanation:
Cluster Autoscaler Tags: Enables the Cluster Autoscaler by tagging the node group.
IAM Policy: Grants the necessary permissions for the Cluster Autoscaler to manage Auto Scaling
Groups.
Deploy Configuration:
bash
terraform init
terraform plan
terraform apply
hcl
module "eks" {
source = "terraform-aws-modules/eks/aws"
cluster_name = "my-eks-cluster"
cluster_version = "1.21"
subnets = aws_subnet.private[*].id
vpc_id = aws_vpc.main.id
node_groups = {
spot_nodes = {
desired_capacity = 5
max_capacity = 20
min_capacity = 2
instance_type = "m5.large"
key_name = "my-key-pair"
spot_price = "0.073"
additional_tags = {
Name = "spot-node"
}
tags = {
"k8s.io/cluster-autoscaler/enabled" = "true"
"k8s.io/cluster-autoscaler/my-eks-cluster" = "owned"
}
}
}
tags = {
Environment = "Production"
}
}
Explanation:
Spot Price: Sets the maximum price for Spot Instances to control spending.
Autoscaler Enabled: Allows the Cluster Autoscaler to manage Spot Instances effectively.
Deploy Configuration:
bash
terraform init
terraform plan
terraform apply
Prometheus: Already deployed; use it to monitor resource usage within the Kubernetes cluster.
Grafana Dashboards: Create dashboards to visualize cluster and node resource utilization.
promql
sum(rate(container_cpu_usage_seconds_total{image!="",pod!=""}[5m])) by (namespace)
promql
sum(container_memory_usage_bytes{image!="",pod!=""}) by (namespace)
Tip: Regularly review dashboards and set up alerts for unusual spikes in resource usage.
6. Continuous Improvement
Action: Regularly review and update your configurations, monitoring setups, and security policies to
adapt to evolving project requirements and threat landscapes.
Tip: Foster a culture of automation and continuous learning within your team to stay updated with the
latest best practices and tools.
1. Scheduled Audits:
Conduct periodic audits of Terraform scripts, Kubernetes manifests, and Helm charts.
Ensure configurations align with current project requirements and organizational policies.
2. Automated Tools:
Use tools like Terraform fmt and kubeval to enforce syntax and schema compliance.
Implement pre-commit hooks to catch issues before code is committed.
yaml
repos:
- repo: https://github.com/antonbabenko/pre-commit-terraform
rev: v1.57.0
hooks:
- id: terraform_fmt
- id: terraform_validate
- repo: https://github.com/yannh/kubeval
rev: v0.16.0
hooks:
- id: kubeval
Install Pre-commit:
bash
pre-commit install
Explanation:
go
package test
import (
"testing"
"github.com/gruntwork-io/terratest/modules/terraform"
"github.com/stretchr/testify/assert"
)
terraformOptions := &terraform.Options{
TerraformDir: "../",
Vars: map[string]interface{}{
"cluster_name": "test-eks-cluster",
"environment": "test",
},
NoColor: true,
}
terraform.InitAndApply(t, terraformOptions)
Explanation:
1. Stay Updated:
Follow industry blogs, attend webinars, and participate in community forums.
Encourage team members to pursue certifications and training.
2. Experiment and Iterate:
Allocate time for experimenting with new tools and technologies.
Implement Proof of Concepts (PoCs) to evaluate their benefits.
3. Knowledge Sharing:
Conduct regular knowledge-sharing sessions within the team.
Document learnings and best practices in internal wikis or documentation platforms.
Tip: Encourage continuous learning by recognizing and rewarding team members who contribute to
knowledge sharing.
i. Version Control
ii. Modularization
hcl
ingress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = var.tags
}
variable "name_prefix" {
description = "Prefix for resource names"
type = string
}
variable "vpc_id" {
description = "VPC ID"
type = string
}
variable "tags" {
description = "Tags to apply to resources"
type = map(string)
default = {}
}
File: `main.tf`
hcl
module "common_security_group" {
source = "./modules/common"
name_prefix = "main"
vpc_id = aws_vpc.main.id
tags = {
Environment = var.environment
Team = "DevOps"
}
}
hcl
terraform {
backend "s3" {
bucket = "my-terraform-state-bucket"
key = "eks-cluster/terraform.tfstate"
region = "us-east-1"
dynamodb_table = "terraform-lock-table"
encrypt = true
}
}
Explanation:
yaml
on:
pull_request:
branches:
- main
jobs:
terraform-lint:
runs-on: ubuntu-latest
steps:
- name: Checkout Code
uses: actions/checkout@v2
Explanation:
i. Regular Backups
Using Velero for Kubernetes Backups:
Deploying Velero with AWS S3 Backup Location:
bash
velero install \
--provider aws \
--bucket my-backup-bucket \
--secret-file ./credentials-velero \
--use-restic \
--backup-location-config region=us-east-1 \
--snapshot-location-config region=us-east-1
Explanation:
bash
bash
ii. DR Drills
1. Simulate Outages:
Action: Manually delete critical deployments or nodes to simulate failures.
Objective: Test recovery procedures and validate backup integrity.
2. Automate DR Testing:
Action: Use scripts or tools to automate the execution of DR drills.
Objective: Ensure consistent and repeatable testing.
bash
Explanation:
Drain Node: Evicts all pods from the node, simulating a failure.
Recovery: Ensure pods are rescheduled on healthy nodes.
hcl
provider "aws" {
alias = "us-east-1"
region = "us-east-1"
}
provider "aws" {
alias = "us-west-2"
region = "us-west-2"
}
module "networking_us_east_1" {
source = "./modules/networking"
cidr_block = "10.0.0.0/16"
public_subnet_count = 2
private_subnet_count = 2
availability_zones = ["us-east-1a", "us-east-1b"]
name_prefix = "main"
environment = "production"
providers = {
aws = aws.us-east-1
}
}
module "networking_us_west_2" {
source = "./modules/networking"
cidr_block = "10.1.0.0/16"
public_subnet_count = 2
private_subnet_count = 2
availability_zones = ["us-west-2a", "us-west-2b"]
name_prefix = "main"
environment = "production"
providers = {
aws = aws.us-west-2
}
}
module "eks_us_east_1" {
source = "./modules/eks"
cluster_name = "my-eks-cluster-east"
subnet_ids = module.networking_us_east_1.private_subnet_ids
node_instance_type = "t3.medium"
desired_capacity = 3
max_capacity = 10
min_capacity = 1
tags = {
Environment = "production"
Team = "DevOps"
}
providers = {
aws = aws.us-east-1
}
}
module "eks_us_west_2" {
source = "./modules/eks"
cluster_name = "my-eks-cluster-west"
subnet_ids = module.networking_us_west_2.private_subnet_ids
node_instance_type = "t3.medium"
desired_capacity = 3
max_capacity = 10
min_capacity = 1
tags = {
Environment = "production"
Team = "DevOps"
}
providers = {
aws = aws.us-west-2
}
}
Explanation:
i. Comprehensive Documentation
markdown
# EKS Module
This Terraform module deploys an Amazon EKS cluster with a managed node group.
## Inputs
## Outputs
## Usage
```hcl
module "eks" {
source = "./modules/eks"
cluster_name = "my-eks-cluster"
subnet_ids = module.networking.private_subnet_ids
node_instance_type = "t3.medium"
desired_capacity = 3
max_capacity = 10
min_capacity = 1
tags = {
Environment = "production"
Team = "DevOps"
}
}
markdown
**iii. Training**
1. **Regular Workshops:**
- Conduct hands-on workshops for team members on Terraform, Kubernetes, Helm, etc.
2. **Online Courses:**
- Encourage team members to take relevant online courses and certifications.
3. **Internal Presentations:**
- Share learnings from external sources through internal presentations or brown-bag sessi
ons.
1. **Introduction to Terraform:**
- Basics and benefits.
2. **Terraform Syntax and Commands:**
- Writing configurations, initializing, planning, applying.
3. **Modules and Reusability:**
- Creating and using modules.
4. **State Management:**
- Remote backends, locking.
5. **Best Practices:**
- Version control, linting, testing.
6. **Hands-On Lab:**
- Deploying a simple infrastructure using Terraform.
**ii. Modularization**
**File:** `modules/common/main.tf`
```hcl
resource "aws_security_group" "allow_all" {
name = "${var.name_prefix}-allow-all"
description = "Allow all inbound and outbound traffic"
vpc_id = var.vpc_id
ingress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = var.tags
}
variable "name_prefix" {
description = "Prefix for resource names"
type = string
}
variable "vpc_id" {
description = "VPC ID"
type = string
}
variable "tags" {
description = "Tags to apply to resources"
type = map(string)
default = {}
}
File: `main.tf`
hcl
module "common_security_group" {
source = "./modules/common"
name_prefix = "main"
vpc_id = aws_vpc.main.id
tags = {
Environment = var.environment
Team = "DevOps"
}
}
hcl
terraform {
backend "s3" {
bucket = "my-terraform-state-bucket"
key = "eks-cluster/terraform.tfstate"
region = "us-east-1"
dynamodb_table = "terraform-lock-table"
encrypt = true
}
}
Explanation:
yaml
on:
pull_request:
branches:
- main
jobs:
terraform-lint:
runs-on: ubuntu-latest
steps:
- name: Checkout Code
uses: actions/checkout@v2
Explanation:
i. Regular Backups
Using Velero for Kubernetes Backups:
Deploying Velero with AWS S3 Backup Location:
bash
velero install \
--provider aws \
--bucket my-backup-bucket \
--secret-file ./credentials-velero \
--use-restic \
--backup-location-config region=us-east-1 \
--snapshot-location-config region=us-east-1
Explanation:
bash
bash
ii. DR Drills
1. Simulate Outages:
Action: Manually delete critical deployments or nodes to simulate failures.
Objective: Test recovery procedures and validate backup integrity.
2. Automate DR Testing:
Action: Use scripts or tools to automate the execution of DR drills.
Objective: Ensure consistent and repeatable testing.
bash
Explanation:
Drain Node: Evicts all pods from the node, simulating a failure.
Recovery: Ensure pods are rescheduled on healthy nodes.
hcl
provider "aws" {
alias = "us-east-1"
region = "us-east-1"
}
provider "aws" {
alias = "us-west-2"
region = "us-west-2"
}
module "networking_us_east_1" {
source = "./modules/networking"
cidr_block = "10.0.0.0/16"
public_subnet_count = 2
private_subnet_count = 2
availability_zones = ["us-east-1a", "us-east-1b"]
name_prefix = "main"
environment = "production"
providers = {
aws = aws.us-east-1
}
}
module "networking_us_west_2" {
source = "./modules/networking"
cidr_block = "10.1.0.0/16"
public_subnet_count = 2
private_subnet_count = 2
availability_zones = ["us-west-2a", "us-west-2b"]
name_prefix = "main"
environment = "production"
providers = {
aws = aws.us-west-2
}
}
module "eks_us_east_1" {
source = "./modules/eks"
cluster_name = "my-eks-cluster-east"
subnet_ids = module.networking_us_east_1.private_subnet_ids
node_instance_type = "t3.medium"
desired_capacity = 3
max_capacity = 10
min_capacity = 1
tags = {
Environment = "production"
Team = "DevOps"
}
providers = {
aws = aws.us-east-1
}
}
module "eks_us_west_2" {
source = "./modules/eks"
cluster_name = "my-eks-cluster-west"
subnet_ids = module.networking_us_west_2.private_subnet_ids
node_instance_type = "t3.medium"
desired_capacity = 3
max_capacity = 10
min_capacity = 1
tags = {
Environment = "production"
Team = "DevOps"
}
providers = {
aws = aws.us-west-2
}
}
Explanation:
Tip: Use DNS-based load balancing (e.g., AWS Route 53) to distribute traffic across regions, enhancing
availability and reducing latency.
i. Comprehensive Documentation
markdown
# EKS Module
This Terraform module deploys an Amazon EKS cluster with a managed node group.
## Inputs
## Outputs
## Usage
```hcl
module "eks" {
source = "./modules/eks"
cluster_name = "my-eks-cluster"
subnet_ids = module.networking.private_subnet_ids
node_instance_type = "t3.medium"
desired_capacity = 3
max_capacity = 10
min_capacity = 1
tags = {
Environment = "production"
Team = "DevOps"
}
}
markdown
**iii. Training**
1. **Regular Workshops:**
- Conduct hands-on workshops for team members on Terraform, Kubernetes, Helm, etc.
2. **Online Courses:**
- Encourage team members to take relevant online courses and certifications.
3. **Internal Presentations:**
- Share learnings from external sources through internal presentations or brown-bag sessi
ons.
1. **Introduction to Terraform:**
- Basics and benefits.
2. **Terraform Syntax and Commands:**
- Writing configurations, initializing, planning, applying.
3. **Modules and Reusability:**
- Creating and using modules.
4. **State Management:**
- Remote backends, locking.
5. **Best Practices:**
- Version control, linting, testing.
6. **Hands-On Lab:**
- Deploying a simple infrastructure using Terraform.
**Tip:** Encourage continuous learning by recognizing and rewarding team members who contrib
ute to knowledge sharing.
---
## **Summary**
By following these detailed configurations and best practices, you can establish a robust, s
calable, and secure IT infrastructure tailored to your projects and organizational needs. He
re's a quick recap:
1. **Adapt Configurations:**
- Use variables and modules in Terraform for reusable and parameterized configurations.
- Customize Kubernetes manifests and Helm charts to align with project requirements.
2. **Automate Deployments:**
- Implement CI/CD pipelines using GitHub Actions or Jenkins to automate infrastructure pr
ovisioning and application deployments.
- Securely manage secrets using CI/CD secret features or external secret management tool
s.
5. **Optimize Costs:**
- Configure Kubernetes Cluster Autoscaler to adjust node counts based on demand.
- Utilize Spot Instances to reduce costs by leveraging unused EC2 capacity.
- Monitor AWS billing and resource utilization to identify optimization opportunities.
6. **Continuous Improvement:**
- Regularly review and update configurations, monitoring setups, and security policies.
- Foster a culture of automation and continuous learning within your team.
- [Terraform Documentation](https://www.terraform.io/docs)
- [Kubernetes Official Documentation](https://kubernetes.io/docs/home/)
- [Helm Documentation](https://helm.sh/docs/)
- [Prometheus Documentation](https://prometheus.io/docs/introduction/overview/)
- [Grafana Documentation](https://grafana.com/docs/)
- [Istio Documentation](https://istio.io/latest/docs/)
- [HashiCorp Vault Documentation](https://www.vaultproject.io/docs)
- [Jenkins Documentation](https://www.jenkins.io/doc/)
- [GitHub Actions Documentation](https://docs.github.com/en/actions)
- [OPA Gatekeeper Documentation](https://open-policy-agent.github.io/gatekeeper/website/doc
s/)
- [Terratest Documentation](https://terratest.gruntwork.io/docs/getting-started/introductio
n/)
By meticulously implementing these configurations and adhering to best practices, your organ
ization can build a resilient, secure, and efficient IT infrastructure that supports your de
velopment and operational needs effectively.
If you require further customization or have specific scenarios you'd like to address, feel
free to ask!
Printed using Save ChatGPT as PDF, powered by PDFCrowd HTML to PDF API. 1/1