Sep 12, 2023
Kubernetes is a container orchestration system that helps you deploy, manage, and scale containerized applications. One of the key features of Kubernetes is its ability to automatically scale your applications up or down based on demand. This is done through a mechanism called horizontal pod autoscaling.
91% respondents reported using or evaluating Kubernetes in a 2021 survey by the Cloud Native Computing Foundation (CNCF).
Traditional scaling methods are often slow and inefficient, requiring manual intervention to increase or decrease the number of instances to match demand. This can be a challenge in the dynamic world of cloud-native applications, where demand can rapidly change.
Kubernetes HorizontalPodAutoscaler (HPA) automates this process, allowing you to set up rules that will automatically scale your application up or down based on CPU utilization or other metrics. This ensures that your application remains responsive while avoiding overprovisioning or underutilization of resources.
Horizontal pod autoscaling works by monitoring specific resource metrics, such as CPU utilization or memory consumption, and adjusting the number of pod replicas to maintain a target metric value. When resource usage crosses a predefined threshold, Kubernetes scales the deployment up or down, ensuring that the application’s performance remains within acceptable limits.
Kubernetes horizontal pod autoscaling offers numerous business benefits, including cost savings, improved application performance, high availability and reduced manual intervention. By automatically adjusting resources based on demand, HPA empowers your applications to efficiently handle varying workloads while maintaining a positive user experience.
The HPA works by creating a controller that monitors the metrics of the target workload. The controller then uses these metrics to calculate the desired number of pods for the workload. If the current number of pods is less than the desired number, the controller will create new pods. If the current number of pods is more than the desired number, the controller will delete pods.
Note: The HPA controller runs periodically, so it may take some time for the number of pods to change after the metrics have been updated.
Configuring HPA in your Kubernetes cluster involves several steps:
1. Enable metrics server: Ensure that you have a metrics server installed and running in your cluster. The metrics server collects and exposes resource utilization metrics.
2. Create HPA: Define an HPA resource in your Kubernetes configuration, specifying the target metric, target value, and scaling thresholds.
3. Deploy a metrics provider: If you want to use custom metrics, deploy a metrics provider like Prometheus to collect and expose these metrics to HPA.
4. Monitor and tweak: Continuously monitor your application’s performance and adjust your HPA configuration as needed to fine-tune scaling behavior.
To set up horizontal pod autoscaling, you need to create a horizontal pod autoscaler (HPA) object. The HPA object specifies the target workload, the metrics to monitor, and the desired number of pods.
You can create a HPA object using the kubectl command-line tool or the Kubernetes API.
Suppose a company has a website that experiences spikes in traffic during certain times of the day, such as during peak business hours or when there is a major news event.
This use case demonstrates how horizontal pod autoscaling can help ensure that a Kubernetes application is always running at its optimal performance level. By scaling the application up or down based on demand, the company can avoid overprovisioning resources and save money on their cloud bill. Let’s see the workings of this use case:
apiVersion: apps/v1
kind: Deployment
metadata:
name: sample-app
spec:
replicas: 3
selector:
matchLabels:
app: sample-app
template:
metadata:
labels:
app: sample-app
spec:
containers:
– name: sample-app-container
image: nginx
kubectl apply -f app-deployment.yaml
Create an HPA resource to specify the autoscaling behavior. Save this as app-hpa.yaml:
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: sample-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: sample-app
minReplicas: 1
maxReplicas: 10
targetCPUUtilizationPercentage: 80
kubectl apply -f app-hpa.yaml
To test the autoscaling, you can simulate load on the deployment. One way to do this is by using a tool like hey, which generates HTTP requests. Install hey and send some requests to your sample app:
# Install hey (assuming you have Go installed)
go get -u github.com/rakyll/hey# Send requests to the app
hey -z 30s -c 20 http://
oscaleTargetRef: This specifies the target workload that the HPA will scale. This can be a deployment, a stateful set, or another scalable resource.
ominReplicas: This specifies the minimum number of pods that the target workload should have.
omaxReplicas: This specifies the maximum number of pods that the target workload should have.
Horizontal pod autoscaling is a powerful tool that can help you to ensure that your Kubernetes applications are always running at optimal performance level. By scaling your applications up or down based on demand, you can avoid overprovisioning resources and save money on your cloud bill.
Softweb Solutions has a team of DevOps experts who can help you with containerization by leveraging Kubernetes. Contact us to learn more about how HPA works and how you can use it to scale your Kubernetes applications.
Co-author: Anshita Solanki
Need Help ?
We are here for you