Salute, habrovchane! The translation of the next article was prepared specifically for students of the course
“Infrastructure platform based on Kubernetes” , which classes will start tomorrow. Let's start.
Auto scaling in Kubernetes
Autoscaling allows you to automatically increase and decrease workloads depending on resource usage.
')
Kubernetes autoscaling has two dimensions:
- cluster autoscaling (Cluster Autoscaler), which is responsible for scaling nodes;
- Horizontal Pod Autoscaler (HPA), which automatically scales the number of pitches in a deployment or replica set.
Cluster autoscaling can be used in conjunction with horizontal automatic scaling of hearths to dynamically adjust computing resources and the degree of system parallelism required to comply with service level agreements (SLAs).
Cluster autoscaling is highly dependent on the capabilities of the cloud infrastructure provider that hosts the cluster, and HPA can operate independently of the IaaS / PaaS provider.
HPA development
The horizontal autoscaling of the hearth has undergone major changes since it appeared in Kubernetes v1.1. The first version of HPA scaled hearths based on measured CPU consumption, and later on based on memory usage. Kubernetes 1.6 introduced a new API called Custom Metrics, which provided HPA with access to arbitrary metrics. And in Kubernetes 1.7, an aggregation level was added that allows third-party applications to extend the Kubernetes API, registering as an API add-on.
Thanks to the Custom Metrics API and the level of aggregation, monitoring systems such as Prometheus can provide HPA-specific application performance.
Horizontal autoscale podov implemented in the form of a control loop, which periodically requests the Resource Metrics API (Resource Metrics API) key indicators, such as CPU and memory usage, and in the Custom Metrics API (User Metrics API) specific application indicators.
Below is a step-by-step guide to configuring HPA v2 for Kubernetes 1.9 and later.
- Install the Metrics Server Add-in, which provides key metrics.
- Run the demo application to see how the auto-scaling of the hearths works based on CPU and memory usage.
- Deploy Prometheus and a dedicated API server. Register a special API server at the aggregation level.
- Configure HPA using the special metrics provided by the sample application.
Before you start, you must install Go version 1.8 (or later) and clone the
k8s-prom-hpa repository in
GOPATH
:
cd $GOPATH git clone https:
1. Setting up a metrics server
Kubernetes
metrics server is an intracluster resource utilization aggregator that replaces
Heapster . The metrics server collects information about CPU and memory usage for nodes and pods from
kubernetes.summary_api
. The Summary API is a memory-efficient API for transferring Kubelet / cAdvisor data metrics to the server.
The first version of HPA needed the Heapster aggregator to get CPU and memory metrics. In HPA v2 and Kubernetes 1.8, you only need a metric server with
horizontal-pod-autoscaler-use-rest-clients
enabled. This option is enabled by default in Kubernetes 1.9. GKE 1.9 comes with a pre-installed metrics server.
kube-system
metrics server in the
kube-system
namespace:
kubectl create -f ./metrics-server
After 1 minute, the
metric-server
will begin transmitting data about CPU and memory usage by the nodes and heartbeats.
View node metrics:
kubectl get --raw "/apis/metrics.k8s.io/v1beta1/nodes" | jq .
View scores:
kubectl get --raw "/apis/metrics.k8s.io/v1beta1/pods" | jq .
2. Autoscale based on CPU and memory usage
You can use the Golang web-based small web application to test horizontal auto scaling (HPA).
Deploy podinfo in the
default
namespace:
kubectl create -f ./podinfo/podinfo-svc.yaml,./podinfo/podinfo-dep.yaml
Contact
podinfo
using the NodePort service at
http://<K8S_PUBLIC_IP>:31198
.
Specify an HPA that will serve at least two replicas and scale up to ten replicas if the average CPU usage exceeds 80% or if the memory consumption is above 200 Mbyte:
apiVersion: autoscaling/v2beta1 kind: HorizontalPodAutoscaler metadata: name: podinfo spec: scaleTargetRef: apiVersion: extensions/v1beta1 kind: Deployment name: podinfo minReplicas: 2 maxReplicas: 10 metrics: - type: Resource resource: name: cpu targetAverageUtilization: 80 - type: Resource resource: name: memory targetAverageValue: 200Mi
Create an HPA:
kubectl create -f ./podinfo/podinfo-hpa.yaml
After a couple of seconds, the HPA controller will contact the metrics server and get information about CPU and memory usage:
kubectl get hpa NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE podinfo Deployment/podinfo 2826240 / 200Mi, 15% / 80% 2 10 2 5m
To increase CPU utilization, perform a load test with rakyll / hey:
#install hey go get -u github.com/rakyll/hey #do 10K requests hey -n 10000 -q 10 -c 5 http:
You can monitor HPA events as follows:
$ kubectl describe hpa Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal SuccessfulRescale 7m horizontal-pod-autoscaler New size: 4; reason: cpu resource utilization (percentage of request) above target Normal SuccessfulRescale 3m horizontal-pod-autoscaler New size: 8; reason: cpu resource utilization (percentage of request) above target
Temporarily remove podinfo (you will have to redeploy it in one of the following steps in this guide).
kubectl delete -f ./podinfo/podinfo-hpa.yaml,./podinfo/podinfo-dep.yaml,./podinfo/podinfo-svc.yaml
3. Custom Metrics server setup
To scale on the basis of special indicators, two components are needed. The first — the
Prometheus time series database — collects application metrics and saves them. The second component,
k8s-prometheus-adapter , complements Custom Metrics API Kubernetes with indicators provided by the assembler.
A dedicated namespace is used to deploy Prometheus and the adapter.
Create a namespace
monitoring
:
kubectl create -f ./namespaces.yaml
Deploy Prometheus v2 in the
monitoring
namespace:
kubectl create -f ./prometheus
Generate the TLS certificates required for the Prometheus adapter:
make certs
Deploy the Prometheus Adapter for Custom Metrics API:
kubectl create -f ./custom-metrics-api
Get a list of the special metrics provided by Prometheus:
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq .
Then retrieve the file system usage data for all subs in the
monitoring
namespace:
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/monitoring/pods/*/fs_usage_bytes" | jq .
4. Autoscale based on special indicators
Create the NodePort
podinfo
service and deploy to the
default
namespace:
kubectl create -f ./podinfo/podinfo-svc.yaml,./podinfo/podinfo-dep.yaml
podinfo
will pass a special indicator
http_requests_total
. The Prometheus adapter will remove the
_total
suffix and mark this indicator as a counter-indicator.
Get the total requests per second from the Custom Metrics API:
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/*/http_requests" | jq . { "kind": "MetricValueList", "apiVersion": "custom.metrics.k8s.io/v1beta1", "metadata": { "selfLink": "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/%2A/http_requests" }, "items": [ { "describedObject": { "kind": "Pod", "namespace": "default", "name": "podinfo-6b86c8ccc9-kv5g9", "apiVersion": "/__internal" }, "metricName": "http_requests", "timestamp": "2018-01-10T16:49:07Z", "value": "901m" }, { "describedObject": { "kind": "Pod", "namespace": "default", "name": "podinfo-6b86c8ccc9-nm7bl", "apiVersion": "/__internal" }, "metricName": "http_requests", "timestamp": "2018-01-10T16:49:07Z", "value": "898m" } ] }
The letter
m
means
milli-units
, so, for example,
901m
is 901 milliquery.
Create an HPA that will expand the podinfo deployment if the number of requests exceeds 10 requests per second:
apiVersion: autoscaling/v2beta1 kind: HorizontalPodAutoscaler metadata: name: podinfo spec: scaleTargetRef: apiVersion: extensions/v1beta1 kind: Deployment name: podinfo minReplicas: 2 maxReplicas: 10 metrics: - type: Pods pods: metricName: http_requests targetAverageValue: 10
podinfo
HPA
podinfo
in the
default
namespace:
kubectl create -f ./podinfo/podinfo-hpa-custom.yaml
After a few seconds, HPA will get the value of
http_requests
from the API metrics:
kubectl get hpa NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE podinfo Deployment/podinfo 899m / 10 2 10 2 1m
Apply load to the podinfo service with 25 requests per second:
#install hey go get -u github.com/rakyll/hey #do 10K requests rate limited at 25 QPS hey -n 10000 -q 5 -c 5 http:
After a few minutes, HPA will begin to scale the deployment:
kubectl describe hpa Name: podinfo Namespace: default Reference: Deployment/podinfo Metrics: ( current / target ) "http_requests" on pods: 9059m / 10< Min replicas: 2 Max replicas: 10 Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal SuccessfulRescale 2m horizontal-pod-autoscaler New size: 3; reason: pods metric http_requests above target
With the current number of requests per second, the deployment will never reach a maximum value of 10 levels. Three replicas are enough to ensure that the number of requests per second for each pod is less than 10.
When the load tests are complete, HPA will scale down the deployment to the original number of replicas:
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal SuccessfulRescale 5m horizontal-pod-autoscaler New size: 3; reason: pods metric http_requests above target Normal SuccessfulRescale 21s horizontal-pod-autoscaler New size: 2; reason: All metrics below target
You may have noticed that the autoscaling tool does not immediately respond to changes in indicators. By default, they are synchronized every 30 seconds. In addition, scaling occurs only if during the last 3-5 minutes there has been no increase and decrease in workloads. This helps prevent the execution of conflicting solutions and leaves time to connect the cluster autoscaling tool.
Conclusion
Not all systems can enforce SLA requirements only on the basis of CPU or memory utilization rates (or both). Most web servers and mobile servers need to autoscale based on the number of requests per second to handle traffic surges.
For ETL applications (from the English Extract Transform Load - “extraction, transformation, loading”), autoscaling can be launched, for example, when the specified threshold length of the job queue is exceeded.
In all cases, application instrumentation with Prometheus and highlighting the necessary auto-scaling metrics enable fine-tuning of applications to improve the processing of traffic bursts and ensure high infrastructure availability.
Ideas, questions, comments? Join the
Slack discussion!
Here is such a material. We are waiting for your comments and see you on the
course !