0% found this document useful (0 votes)

22 views53 pages

Troubleshooting and Workaround in Kubernetes

Uploaded by

scridb

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views53 pages

Troubleshooting and Workaround in Kubernetes

Uploaded by

scridb

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 53

Troubleshooting

And
Workaround
In Kubernetes
瑞嘉軟體 Jack Kuo

1
Speaker

Jack Kuo

瑞嘉軟體 CTO Department SRE Team

○ Senior SRE
○ Speaker

目前專注領域
○ Agile
○ DevOps
○ Cloud Native
Agenda
● About Us
● Case 1: Kubernetes Pod Restart
○ CrashLoopBackOff
○ ThreadPool
○ Warmup Setting
● Case 2: Client Get 504 Timeout
○ Network Troubleshooting
○ Prometheus Metrics
● Case 3: P99 Latency Is High
○ Resource Issue
○ Inconsistent Performance
● Takeaways

4
About Us
SRE Team 防守範圍
Infra Team 防守範圍

SRE Team 防守範圍

5
Not Just Only Troubleshooting

6
But Also Workaround

7
8
https://static.learnk8s.io/cbe079ed7f6e7764445aa746b2c9f295.png 9
Case 1
Kubernetes Pod Restart

10
Kubernetes Pod Restart - CrashLoopBackOff

11
Kubernetes Pod Restart - CrashLoopBackOff

12
Kubernetes Pod Restart - CrashLoopBackOff
● Application Issue
○ Code Error
○ Environment Conﬁg Error
○ Dependent Applications Not Ready
● Resource Issue
○ CPU
○ Memory
● Health Check
○ Startup Probe
○ Liveness Probe
○ Readiness Probe

13
所以我說...那個...未知狀態呢？

14
Kubernetes Pod Restart - ThreadPool

15
Kubernetes Pod Restart - ThreadPool
.Net ThreadPool

● ThreadPool.SetMaxThreads
○ Bigger is not necessarily better
○ Context Switching is expensive
● ThreadPool.SetMinThreads
○ The number of Threads is not always above MinThreads
○ MinThreads means how many threads will be generated without delay

16
Kubernetes Pod Restart - Warmup Setting
Latency, High CPU Consumption And HPA Scaling 的惡性循環

17
Kubernetes Pod Restart - Warmup Setting
minReadySeconds ≠ min Ready Seconds

18
Please Tell Me When You Are Ready

19
Kubernetes Pod Restart - Warmup Setting

return non-200 to Kubernetes return 200 to kubernetes

20
Case 2
Client Get 504 Timeout

21
Client Get 504 Timeout
● Application Issue
○ Application is too busy to handle requests from Client
○ There is a problem with Dependent Applications
● Network Issue
○ CDN
○ Load Balancer
○ Kubernetes Ingress
○ Kubernetes Service
○ Kubernetes Application Pod

Bottom-Up
22
A Chain Is Only As Strong As Its Weakeast Link

23
Client Get 504 Timeout - Network Troubleshooting
Pod to Pod

Node 1 Node 2 Node 3

24
Client Get 504 Timeout - Network Troubleshooting
Service to Pod

Node 1 Node 2 Node 3

25
Client Get 504 Timeout - Network Troubleshooting
Ingress to Service
Node 1 Node 2 Node 3

curl -v -H “Host: <Host>” http://<Ingress-Nginx-Controller IP>:<Ingress-Nginx-Controller Port>/<Path>

26
Client Get 504 Timeout - Network Troubleshooting
HA Proxy to Ingress-Nginx-Controller
Node 1 Node 2 Node 3

curl -v -H “Host: <Host>” http://<HAProxy IP>:<HAProxy Port>/<Path>

27
好像有點冗長且麻煩

28
Client Get 504 Timeout - Prometheus Metrics
Ingress Request Volume By Status

29
Client Get 504 Timeout - Prometheus Metrics
Ingress Percentile Response Time

30
Client Get 504 Timeout - Prometheus Metrics
Endpoint RPS

31
Client Get 504 Timeout - Prometheus Metrics
Requests Currently In Progress By Endpint

32
Workaround
Separate Kubernetes Deployment By Ingress Host And Path

33
Workaround
Separate Kubernetes Deployment By Ingress Host And Path

34
Case 3
P99 Latency Is High

35
P99 Latency Is A Leading Indicator Of Problems

36
P99 Latency Is High
● Application Issue
● Resource Issue
○ Resource Competition
○ Memory Leak
● Performance Inconsistency
○ Sticky Session Setting
○ VM Host Issue

37
P99 Latency Is High - Resource Issue
Resource Competition worker node resource

guaranteed resource for container available resource for container

Requests Limits Resource Competition !!!

max resource container can use

guaranteed resource for container available resource for container

Limits Requests

max resource container can use

38
P99 Latency Is High - Resource Issue
Pod Anti-Aﬃnity

39
P99 Latency Is High - Resource Issue
Memory Leak

40
Workaround
A Cronjob To Detect Memory Leak

Memory Usage < Memory Target

Calculate Memory Usage Prometheus API If

*/2 * * * *

Memory Usage > Memory Target

Notify To Slack Restart Deployment

41
Workaround
A Cronjob To Detect Memory Leak

42
P99 Latency Is High - Inconsistent Performance
Pod RPS

43
P99 Latency Is High - Inconsistent Performance
Sticky Session Setting

User-Agent X-Forwarded-For X-Actual-IP

nginx.ingress.kubernetes.io/upstream-hash-by:
"$http_x_actual_ip"

44
P99 Latency Is High - Inconsistent Performance
Inconsistent Pod Memory Resource Usage

45
P99 Latency Is High - Inconsistent Performance
.Net Runtime Bug In AMD Machine

46
P99 Latency Is High - Inconsistent Performance
Inconsistent Pod CPU Resource Usage

47
P99 Latency Is High - Inconsistent Performance
VM Host Issue

48
CPU Exceeds Tipping Point, Performance Reduction

49
Workaround
Dummy Pod

50
Workaround
Dummy Pod worker node resource

Dummy Pod

51
Workaround Doesn’t Mean That The Problem Is Solved

52
Takeaways
● Warmup Setting With Kubernetes
Health Check

● Prometheus Metrics Is Helpful

For Network Troubleshooting

● Same Pod But Inconsistent

Performance In Kubernetes

● Workaround Doesn’t Mean That

The Problem Is Solved

Ckadv5 0
No ratings yet
Ckadv5 0
59 pages
Ckav7 0
No ratings yet
Ckav7 0
75 pages
Devops Shack 50 Complex Kubernetes Scenario-Based Q&A: 1. Scenario: Zero-Downtime Deployment For Multiple Services
No ratings yet
Devops Shack 50 Complex Kubernetes Scenario-Based Q&A: 1. Scenario: Zero-Downtime Deployment For Multiple Services
45 pages
LinuxFoundation CKA v2021-07-16 q22
50% (4)
LinuxFoundation CKA v2021-07-16 q22
29 pages
DevOps Shack - 100 Common Kubernetes Errors and Solutions
No ratings yet
DevOps Shack - 100 Common Kubernetes Errors and Solutions
54 pages
LinuxFoundation CKA v2023-03-27 q73
100% (2)
LinuxFoundation CKA v2023-03-27 q73
69 pages
CKA Exam
100% (1)
CKA Exam
54 pages
Cka Kubernetes Application Developer Crash Course
No ratings yet
Cka Kubernetes Application Developer Crash Course
172 pages
Cka (1) (1) 2
No ratings yet
Cka (1) (1) 2
69 pages
Kubernetes Troubleshooting - Dec24
No ratings yet
Kubernetes Troubleshooting - Dec24
20 pages
K21-CKA Exam Questions Guide
100% (1)
K21-CKA Exam Questions Guide
14 pages
CKA Certified Kubernetes Administrator Updated Practice Questions
No ratings yet
CKA Certified Kubernetes Administrator Updated Practice Questions
7 pages
Running Airflow Reliably With Kubernetes
100% (1)
Running Airflow Reliably With Kubernetes
47 pages
Kuber Net Es
No ratings yet
Kuber Net Es
219 pages
Practise Set 01 CKA 41 Qs
100% (2)
Practise Set 01 CKA 41 Qs
24 pages
Kubernetes Getting Started
100% (1)
Kubernetes Getting Started
17 pages
Asas Algebra 2021 - Persamaan Linear
No ratings yet
Asas Algebra 2021 - Persamaan Linear
6 pages
50 Kubernetes Errors & Solutions
No ratings yet
50 Kubernetes Errors & Solutions
15 pages
Cka Practice Questions
100% (1)
Cka Practice Questions
11 pages
ISO 27001 Presentation
No ratings yet
ISO 27001 Presentation
9 pages
LinuxFoundation CKA v2020-10-25 q41
No ratings yet
LinuxFoundation CKA v2020-10-25 q41
24 pages
Kubernetes CKA 0200 Scheduling PDF
No ratings yet
Kubernetes CKA 0200 Scheduling PDF
34 pages
k8ss Qna
No ratings yet
k8ss Qna
84 pages
Kubernetes Troubleshooting Steps With Answers Pocket Guide
No ratings yet
Kubernetes Troubleshooting Steps With Answers Pocket Guide
149 pages
2019-05-21 Kubernetes Failure Stories - KubeCon Europe
No ratings yet
2019-05-21 Kubernetes Failure Stories - KubeCon Europe
89 pages
Lab23 - Liveness and Readiness Probes
No ratings yet
Lab23 - Liveness and Readiness Probes
11 pages
Lab19 - Horizontal Pod Autoscaler
No ratings yet
Lab19 - Horizontal Pod Autoscaler
9 pages
Linux-Foundation Premium CKA - 30q-DEMO
No ratings yet
Linux-Foundation Premium CKA - 30q-DEMO
45 pages
Key Point Mapping
No ratings yet
Key Point Mapping
69 pages
Kubernets Command Ckad
No ratings yet
Kubernets Command Ckad
19 pages
Only Componet Interact With ETCD
No ratings yet
Only Componet Interact With ETCD
35 pages
Linux Foundation Certified Kubernetes Administrator Go4braindumps Actual Questions by Valenzuela 12 12 2023 10qa
No ratings yet
Linux Foundation Certified Kubernetes Administrator Go4braindumps Actual Questions by Valenzuela 12 12 2023 10qa
23 pages
t8 Manual 1.2
No ratings yet
t8 Manual 1.2
323 pages
Apache Spark On Kubernetes
No ratings yet
Apache Spark On Kubernetes
63 pages
Kubernetes Cheat Sheet
0% (1)
Kubernetes Cheat Sheet
3 pages
Kubernets Command Ckad
No ratings yet
Kubernets Command Ckad
14 pages
Diagnosing and Resolving Performance Errors in Kubernetes
No ratings yet
Diagnosing and Resolving Performance Errors in Kubernetes
21 pages
100 Kubernetes Errors With Solution in Detail
No ratings yet
100 Kubernetes Errors With Solution in Detail
30 pages
CKA Docs
No ratings yet
CKA Docs
11 pages
Characterising Resource Management Performance in Kubernetes
No ratings yet
Characterising Resource Management Performance in Kubernetes
25 pages
Kubernetes Scenario Based Questions
No ratings yet
Kubernetes Scenario Based Questions
9 pages
All Q
No ratings yet
All Q
7 pages
Kuber Troubleshooting
No ratings yet
Kuber Troubleshooting
7 pages
IBM z14 ZR1 - Hardware Innovation
No ratings yet
IBM z14 ZR1 - Hardware Innovation
18 pages
55+ K8s Issues and Remediations You Should Be Aware of
No ratings yet
55+ K8s Issues and Remediations You Should Be Aware of
21 pages
All Q - 2
No ratings yet
All Q - 2
4 pages
CKA Exam Questoins
No ratings yet
CKA Exam Questoins
12 pages
Linux Foundation Passleader Cks Study Guide 2022-May-02 by Sebastian 22q Vce
No ratings yet
Linux Foundation Passleader Cks Study Guide 2022-May-02 by Sebastian 22q Vce
9 pages
CKA-Application Introspection and Debugging
No ratings yet
CKA-Application Introspection and Debugging
16 pages
Part 26 - Troubleshooting Kubernetes Scenarios
No ratings yet
Part 26 - Troubleshooting Kubernetes Scenarios
18 pages
K8S Master Node Command (M)
No ratings yet
K8S Master Node Command (M)
9 pages
Part 7 Kubernetes Real Time Troubleshooting 1721726688
No ratings yet
Part 7 Kubernetes Real Time Troubleshooting 1721726688
6 pages
HF Security Smart-Pass - Installation Instructions - 1.5.9 - 20220304
No ratings yet
HF Security Smart-Pass - Installation Instructions - 1.5.9 - 20220304
28 pages
Production Ready Checklists For Kubernetes v3
No ratings yet
Production Ready Checklists For Kubernetes v3
7 pages
Troubleshooting Kubernetes Scenarios Part 11 PDF 1721659767
No ratings yet
Troubleshooting Kubernetes Scenarios Part 11 PDF 1721659767
7 pages
CKA Practice Questions Final With Long Tasks
No ratings yet
CKA Practice Questions Final With Long Tasks
8 pages
Cks 0
No ratings yet
Cks 0
10 pages
k8s Scenario Based Questions With The Expected Answers-1
No ratings yet
k8s Scenario Based Questions With The Expected Answers-1
11 pages
Part 9 Kubernetes Real Time Troubleshooting 1721726663
No ratings yet
Part 9 Kubernetes Real Time Troubleshooting 1721726663
6 pages
5 Real-Time Kubernetes Interview Questions & Answers
No ratings yet
5 Real-Time Kubernetes Interview Questions & Answers
6 pages
Part 6 Kubernetes Real Time Troubleshooting 1721726699
No ratings yet
Part 6 Kubernetes Real Time Troubleshooting 1721726699
5 pages
DevOps Quiz-4 - Solution
No ratings yet
DevOps Quiz-4 - Solution
2 pages
Part 15 - Kubernetes Real-Time Troubleshooting
No ratings yet
Part 15 - Kubernetes Real-Time Troubleshooting
5 pages
List of K8s Errors & Troubleshooting Tips
No ratings yet
List of K8s Errors & Troubleshooting Tips
3 pages
Kubernetes Real Time Errors and Troubleshooting
No ratings yet
Kubernetes Real Time Errors and Troubleshooting
3 pages
What Are The Benefits of A User Manual
No ratings yet
What Are The Benefits of A User Manual
3 pages
Libre Office Writer MCQ
No ratings yet
Libre Office Writer MCQ
13 pages
30 Tips in 30 Minutes Cheat Sheet 2017
No ratings yet
30 Tips in 30 Minutes Cheat Sheet 2017
1 page
Kranji MYSEP Jan2011 Web
No ratings yet
Kranji MYSEP Jan2011 Web
9 pages
Database Management System Practical
No ratings yet
Database Management System Practical
3 pages
Thesis Documentation Template
100% (3)
Thesis Documentation Template
7 pages
Nozomi Networks WP Drone Telemetry
No ratings yet
Nozomi Networks WP Drone Telemetry
73 pages
Student Dropout Prediction
No ratings yet
Student Dropout Prediction
11 pages
DTC B1615/14 Front Airbag Sensor LH Circuit Malfunction: Description
No ratings yet
DTC B1615/14 Front Airbag Sensor LH Circuit Malfunction: Description
2 pages
New Feature:: Known Problem
No ratings yet
New Feature:: Known Problem
6 pages
Prota Technical Support and Maintenance - End User Guide
No ratings yet
Prota Technical Support and Maintenance - End User Guide
12 pages
18.10.2007 Electroni 3rd Year RAGHVENDRA KUMAR
No ratings yet
18.10.2007 Electroni 3rd Year RAGHVENDRA KUMAR
31 pages
Computer Communication (MIS Project)
No ratings yet
Computer Communication (MIS Project)
16 pages
Manifest UFSFiles Win64
No ratings yet
Manifest UFSFiles Win64
362 pages
Enhancing Discontinuities in Seismic Data and Automated Fault Mapping
No ratings yet
Enhancing Discontinuities in Seismic Data and Automated Fault Mapping
19 pages
OOP-Week3 - Class 2UML-CLass Diagram-Pages
No ratings yet
OOP-Week3 - Class 2UML-CLass Diagram-Pages
20 pages
7180 Rudder Angle Indicator: Owner's Operation, Installation & Maintenance Manual
No ratings yet
7180 Rudder Angle Indicator: Owner's Operation, Installation & Maintenance Manual
24 pages
2-Digit Addition & Subtraction: With and Without Regrouping Worksheets
No ratings yet
2-Digit Addition & Subtraction: With and Without Regrouping Worksheets
21 pages
Megaproject
No ratings yet
Megaproject
6 pages
Depth-First Search: 11.1 Topological Sort
No ratings yet
Depth-First Search: 11.1 Topological Sort
20 pages
Staad Aashto LRFD Parameters
No ratings yet
Staad Aashto LRFD Parameters
2 pages
Software Project Management: Pert/Cpm
No ratings yet
Software Project Management: Pert/Cpm
14 pages
JD - MIS Data Scientist
No ratings yet
JD - MIS Data Scientist
2 pages
Course Expert: Prof. Arunkumar Khannur, Course Code: 17CS61 Course Name: Cryptography, Network Security and Cyber Law Module: 01 & Part of 02
No ratings yet
Course Expert: Prof. Arunkumar Khannur, Course Code: 17CS61 Course Name: Cryptography, Network Security and Cyber Law Module: 01 & Part of 02
4 pages
Accelerated Computing With HIP: Second Edition
From Everand
Accelerated Computing With HIP: Second Edition
Yifan Sun
No ratings yet

Troubleshooting and Workaround in Kubernetes

Uploaded by

Troubleshooting and Workaround in Kubernetes

Uploaded by

Troubleshooting

瑞嘉軟體 CTO Department SRE Team

SRE Team 防守範圍

return non-200 to Kubernetes return 200 to kubernetes

Node 1 Node 2 Node 3

Node 1 Node 2 Node 3

curl -v -H “Host: <Host>” http://<Ingress-Nginx-Controller IP>:<Ingress-Nginx-Controller Port>/<Path>

curl -v -H “Host: <Host>” http://<HAProxy IP>:<HAProxy Port>/<Path>

guaranteed resource for container available resource for container

Requests Limits Resource Competition !!!

max resource container can use

max resource container can use

Memory Usage < Memory Target

Calculate Memory Usage Prometheus API If

Memory Usage > Memory Target

Notify To Slack Restart Deployment

User-Agent X-Forwarded-For X-Actual-IP

● Prometheus Metrics Is Helpful

● Same Pod But Inconsistent

● Workaround Doesn’t Mean That

You might also like