I am a Production Support & Cloud Engineer with 3+ years of real-world experience supporting mission-critical healthcare and financial applications.
My daily work is not tutorials β it is:
- Handling P1/P2 incidents
- Debugging live production failures
- Ensuring high availability, security, and compliance
- Working closely with Dev, DevOps, QA, and Security teams
I focus on keeping systems running when things break.
β Respond to P1 / P2 incidents with business impact
β Troubleshoot API failures (4xx / 5xx / latency / timeouts)
β Analyze logs using CloudWatch, Kibana, Dynatrace
β Perform Root Cause Analysis (RCA) & preventive fixes
β Support secure payment & healthcare platforms
β Monitor systems using Grafana & PagerDuty
β Validate deployments and releases in CI/CD pipelines
β Work in ITIL environments (Incident, Problem, Change)
Security is part of my daily production work:
- Least-privilege access & IAM awareness
- Secure API troubleshooting (401 / 403 / auth issues)
- Audit-friendly logging & monitoring
- Compliance awareness: HIPAA, PCI-DSS
- Detecting abnormal behavior via monitoring & logs
I treat security + stability as one responsibility.
Cloud & Platforms
- AWS: EC2, S3, RDS, EKS, CloudWatch
- Azure (foundational exposure)
Monitoring & Reliability
- Grafana, Dynatrace, Kibana, PagerDuty
DevOps & CI/CD
- Docker, Kubernetes, Jenkins
OS & Debugging
- Linux (logs, processes, networking, services)
APIs & Data
- REST APIs, Postman, SQL (RDS)
ITSM & Process
- ServiceNow, Jira, Confluence
- ITIL: Incident, Problem, Change, RCA
π¨ Production Incident Simulator
Simulated real production outages with logs, RCA, fixes & prevention
π AWS Production Troubleshooting Playbook
Step-by-step guides for EC2, RDS, EKS & monitoring issues
π Secure API Monitoring & Debugging
API failures, auth issues, security signals & observability
π Monitoring & Alerting Scenarios
Grafana, CloudWatch & PagerDuty examples
- Calm under pressure during incidents
- Clear communication with stakeholders
- Strong documentation (SOPs, RCAs, runbooks)
- Preventive mindset β not just firefighting
- Ownership of systems, not just tickets
β‘ I donβt just deploy systems β I keep them alive.



