Horizontal Pod Autoscaling (HPA) is a Kubernetes feature to dynamically increase/decrease the number of pod replicas based on resource utilization metrics.
The Horizontal Pod Autoscaling feature was introduced in Kubernetes v1.2. It allows users to autoscale off of basic metrics like CPU, but requires a resource called metrics-server to run along side your application. As of Kubernetes v1.6, it is possible to autoscale off of custom metrics. Custom metrics are user defined and are collected from within the cluster. As of Kubernetes v1.10, support for external metrics was introduced so users can autoscale off of any metric from outside the cluster that is collected for you by Datadog.
The custom and external metric providers, as opposed to the metrics server, are resources that have to be implemented and registered by the user.
Running Kubernetes v1.10+ in order to be able to register the External Metrics Provider resource against the API Server. Having the Aggregation layer enabled, refer to the Kubernetes aggregation layer configuration documentation to learn how to enable it. Using EKS 2.0, this will be automatically enabled for you.
This section of the workshop should be done a posteriori of 207-cluster-monitoring-with-datadog, so you can benefit from the applications in place to generate load and simulate the autoscaling.
Autoscaling over External Metrics does not require the Node Agent to be running, you only need the metrics to be available in your Datadog account. Nevertheless, for this walkthrough, we autoscale an NGINX Deployment based off of NGINX metrics, collected by a Node Agent.
Before proceeding, please make sure you went through the section 207 of this workshop. This entails that you have Node Agents running with the Autodiscovery process enabled and functional.
In order to autoscale in Kubernetes, you need to register a Custom/External Metrics Server - The Datadog Cluster Agent implements this feature.
Start by creating the appropriate RBAC rules. Allowing the cluster agent to watch and parse Horizontal Pod Autoscalers as well as cluster level metadata.
kubectl apply -f templates/cluster-agent/rbac/rbac-cluster-agent.yaml
clusterrole.rbac.authorization.k8s.io "dca" created
clusterrolebinding.rbac.authorization.k8s.io "dca" created
serviceaccount "dca" created
Add your <API_KEY> and <APP_KEY> in the Deployment manifest of the Datadog Cluster Agent.
Then enable the HPA Processing by setting the DD_EXTERNAL_METRICS_PROVIDER_ENABLED
variable to true.
Finally, spin up the resources:
kubectl apply -f templates/cluster-agent/datadog-cluster-agent_service.yaml
kubectl apply -f templates/cluster-agent/hpa-example/cluster-agent-hpa-svc.yaml
kubectl apply -f templates/cluster-agent/cluster-agent.yaml
Note that the first service is used for the communication between the Node Agents and the Datadog Cluster Agent, but the second is used by Kubernetes to register the External Metrics Provider.
At this point you should be having:
Pods:
NAMESPACE NAME READY STATUS RESTARTS AGE
default datadog-cluster-agent-7b7f6d5547-cmdtc 1/1 Running 0 28m
Services:
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default datadog-custom-metrics-server ClusterIP 192.168.254.87 <none> 443/TCP 28m
default datadog-cluster-agent ClusterIP 192.168.254.197 <none> 5005/TCP 28m
Once the Datadog Cluster Agent is up and running, register it as an External Metrics Provider, via the service exposing the port 443.
Apply the following RBAC rules:
kubectl apply -f templates/hpa-example/rbac-hpa.yaml
clusterrolebinding.rbac.authorization.k8s.io "system:auth-delegator" created
rolebinding.rbac.authorization.k8s.io "dca" created
apiservice.apiregistration.k8s.io "v1beta1.external.metrics.k8s.io" created
clusterrole.rbac.authorization.k8s.io "external-metrics-reader" created
clusterrolebinding.rbac.authorization.k8s.io "external-metrics-reader" created
You can confirm that the cluster agent is properly registered as an External Metrics Provider by running:
kubectl describe apiservice v1beta1.external.metrics.k8s.io
[...]
Service:
Name: datadog-custom-metrics-server
Namespace: default
Version: v1beta1
Version Priority: 100
[...]
Status:
Conditions:
Last Transition Time: 2018-09-28T16:19:34Z
Message: all checks passed
Reason: Passed
Status: True
Type: Available
Once you have the Datadog Cluster Agent running and the service registered, create an HPA manifest. The Datadog Cluster Agent will subsequently parse the manifest and pull metrics from Datadog.
At this point, you should be seeing:
Pods:
NAMESPACE NAME READY STATUS RESTARTS AGE
default datadog-agent-4c5pp 1/1 Running 0 14m
default datadog-agent-ww2da 1/1 Running 0 14m
default datadog-agent-2qqd3 1/1 Running 0 14m
[…]
default datadog-cluster-agent-7b7f6d5547-cmdtc 1/1 Running 0 16m
Now is time to create a Horizontal Pod Autoscaler manifest. If you take a look at the hpa-manifest.yaml file, you should see:
-
The HPA is configured to autoscale the Deployment called nginx
-
The maximum number of replicas created is 5 and the minimum is 1
-
The metric used is nginx.net.request_per_s and the scope is kube_container_name: nginx. Note that this metric format corresponds to the Datadog one.
Every 30 seconds (this can be configured) Kubernetes queries the Datadog Cluster Agent to get the value of this metric and autoscales proportionally if necessary. For advanced use cases, it is possible to have several metrics in the same HPA, as you can see in the Kubernetes horizontal pod autoscale documentation the largest of the proposed value will be the one chosen.
We will be relying on the nginx deployment used in the section 207 of this workshop. Make sure that everything is still running:
kubectl get deploy,po -lrole=nginx
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
deployment.extensions/nginx 1 1 1 1 2h
NAME READY STATUS RESTARTS AGE
pod/nginx-69cb46b4db-6bbml 1/1 Running 0 2h
Then, apply the HPA manifest.
kubectl apply -f templates/cluster-agent/hpa-example/hpa-manifest.yaml
horizontalpodautoscaler.autoscaling "nginxext" created
You should be seeing your nginx pod running with the corresponding service:
Pods:
default nginx-6757dd8769-5xzp2 1/1 Running 0 3m
Services:
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default nginx ClusterIP 192.168.251.36 none 8090/TCP 3m
Horizontal Pod Autoscalers:
NAMESPACE NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
default nginxext Deployment/nginx 0/50 (avg) 1 3 1 3m
At this point, the set up is ready to be stressed. As a result of the stress Kubernetes will autoscale the NGINX pods.
To do so, you can use the interface of the application spun up during the step 207. For instance, you can trigger a cache stress, which will also stress the NGINX service by simulating requests.
Looking into your application, you should be able to correlate the requests per second on your NGINX boxes with the autoscaling event and the creation of new replicas.
In the above screenshot, we triggered 2 simulations, one at 12.30 and another one at 12.45. You can see that as a result of the stress, an additional replica is spun up serving the requests. The average request per second falls at ~27 request. Finally, after a cooling down period, we downscale.
$ kubectl delete -f templates
You are now ready to continue on with the workshop!