Create HPA¶

Suanova AI platform supports elastic scaling of Pod resources based on metrics (Horizontal Pod Autoscaling, HPA). Users can dynamically adjust the number of copies of Pod resources by setting CPU utilization, memory usage, and custom metrics. For example, after setting an auto scaling policy based on the CPU utilization metric for the workload, when the CPU utilization of the Pod exceeds/belows the metric threshold you set, the workload controller will automatically increase/decrease the number of Pod replicas.

This page describes how to configure auto scaling based on built-in metrics and custom metrics for workloads.

Note

HPA is only applicable to Deployment and StatefulSet, and only one HPA can be created per workload.
If you create an HPA policy based on CPU utilization, you must set the configuration limit (Limit) for the workload in advance, otherwise the CPU utilization cannot be calculated.
If built-in metrics and multiple custom metrics are used at the same time, HPA will calculate the number of scaling copies required based on multiple metrics, and take the larger value (but not exceed the maximum number of copies configured when setting the HPA policy) for elastic scaling .

Built-in metric elastic scaling policy¶

The system has two built-in elastic scaling metrics of CPU and memory to meet users' basic business cases.

Prerequisites¶

Before configuring the built-in index auto scaling policy for the workload, the following prerequisites need to be met:

Integrated the Kubernetes cluster or created the Kubernetes cluster, and you can access the UI interface of the cluster.
Created a namespace, deployment or statefulset.
You should have permissions not lower than NS Editor. For details, refer to Namespace Authorization.
Installed metrics-server plugin install.

Steps¶

Refer to the following steps to configure the built-in index auto scaling policy for the workload.

Click Clusters on the left navigation bar to enter the cluster list page. Click a cluster name to enter the Cluster Details page.
On the cluster details page, click Workload in the left navigation bar to enter the workload list, and then click a workload name to enter the Workload Details page.
Click the Auto Scaling tab to view the auto scaling configuration of the current cluster.
After confirming that the cluster has installed the metrics-server plug-in, and the plug-in is running normally, you can click the New Scaling button.
Create custom metric auto scaling policy parameters.
- Policy name: Enter the name of the auto scaling policy. Please note that the name can contain up to 63 characters, and can only contain lowercase letters, numbers, and separators ("-"), and must start and end with lowercase letters or numbers, such as hpa- my-dep.
- Namespace: The namespace where the payload resides.
- Workload: The workload object that performs auto scaling.
- Target CPU Utilization: The CPU usage of the Pod under the workload resource. The calculation method is: the request (request) value of all Pod resources/workloads under the workload. When the actual CPU usage is greater/lower than the target value, the system automatically reduces/increases the number of Pod replicas.
- Target Memory Usage: The memory usage of the Pod under the workload resource. When the actual memory usage is greater/lower than the target value, the system automatically reduces/increases the number of Pod replicas.
- Replica range: the elastic scaling range of the number of Pod replicas. The default interval is 1 - 10.
After completing the parameter configuration, click the OK button to automatically return to the elastic scaling details page. Click ┇ on the right side of the list to edit, delete, and view related events.