Prometheus Resource Planning¶

In the actual use of Prometheus, affected by the number of cluster containers and the opening of Istio, the CPU, memory and other resource usage of Prometheus will exceed the set resources.

In order to ensure the normal operation of Prometheus in clusters of different sizes, it is necessary to adjust the resources of Prometheus according to the actual size of the cluster.

Reference resource planning¶

In the case that the mesh is not enabled, the test statistics show that the relationship between the system Job index and pods is Series count = 800 * pod count

When the service mesh is enabled, the magnitude of the Istio-related metrics generated by the pod after the feature is enabled is Series count = 768 * pod count

When the service mesh is not enabled¶

The following resource planning is recommended by Prometheus when the service mesh is not enabled :

Cluster size (pod count)	Metrics (service mesh is not enabled)	CPU (core)	Memory (GB)
100	8w	Request: 0.5 Limit: 1	Request: 2GB Limit: 4GB
200	16w	Request: 1 Limit: 1.5	Request: 3GB Limit: 6GB
300	24w	Request: 1 Limit: 2	Request: 3GB Limit: 6GB
400	32w	Request: 1 Limit: 2	Request: 4GB Limit: 8GB
500	40w	Request: 1.5 Limit: 3	Request: 5GB Limit: 10GB
800	64w	Request: 2 Limit: 4	Request: 8GB Limit: 16GB
1000	80w	Request: 2.5 Limit: 5	Request: 9GB Limit: 18GB
2000	160w	Request: 3.5 Limit: 7	Request: 20GB Limit: 40GB
3000	240w	Request: 4 Limit: 8	Request: 33GB Limit: 66GB

When the service mesh feature is enabled¶

The following resource planning is recommended by Prometheus in the scenario of starting the service mesh:

Cluster size (pod count)	metric volume (service mesh enabled)	CPU (core)	Memory (GB)
100	15w	Request: 1 Limit: 2	Request: 3GB Limit: 6GB
200	31w	Request: 2 Limit: 3	Request: 5GB Limit: 10GB
300	46w	Request: 2 Limit: 4	Request: 6GB Limit: 12GB
400	62w	Request: 2 Limit: 4	Request: 8GB Limit: 16GB
500	78w	Request: 3 Limit: 6	Request: 10GB Limit: 20GB
800	125w	Request: 4 Limit: 8	Request: 15GB Limit: 30GB
1000	156w	Request: 5 Limit: 10	Request: 18GB Limit: 36GB
2000	312w	Request: 7 Limit: 14	Request: 40GB Limit: 80GB
3000	468w	Request: 8 Limit: 16	Request: 65GB Limit: 130GB

Note

Pod count in the table refers to the pod count that is basically running stably in the cluster. If a large number of pods are restarted, the index will increase sharply in a short period of time. At this time, resources need to be adjusted accordingly.
Prometheus stores two hours of data by default in memory, and when the Remote Write function is enabled in the cluster, a certain amount of memory will be occupied, and resources surge ratio is recommended to be set to 2.
The data in the table are recommended values, applicable to general situations. If the environment has precise resource requirements, it is recommended to check the resource usage of the proper Prometheus after the cluster has been running for a period of time for precise configuration.