I am running different versions of our application in different namespaces and I have set up a prometheus and grafana stack to monitor them. I am using below promql for getting the cpu usage of different pods (as percentage of 1 core) and the value that it returns is matching the values that I get from the kubectl top pods -n namespace:
sum (rate (container_cpu_usage_seconds_total{id!="/",namespace=~"$Namespace",pod=~"^$Deployment.*quot;}[1m])) by (pod)*100The problem is I want to get the total cpu usage of all pods in a namespace cluster-wide and I tried different queries but the values that they return is not matching the total cpu usage that I get from the above promql or kubectl top pods -n namespace.
The promql queries that I tried:
sum (rate (container_cpu_usage_seconds_total{namespace=~"$Namespace",pod=~"^$Deployment.*quot;}[1m])) by (namespace)
sum (rate (container_cpu_usage_seconds_total{namespace=~"$Namespace",pod=~"^$Deployment.*quot;}[1m]))I am using the Singlestat for this and also at visualization from Value section I tried different show methods such as Average, total, current but non returned the correct value.
My question is how I can get the total cpu usage of all the pods in a namespace cluster-wide?
I have made some research and found few answers that could suit your needs:
In order to simply monitor CPU usage at cluster level use: sum (rate (container_cpu_usage_seconds_total{id="/"}[1m])) / sum (machine_cpu_cores) * 100
If you want to see %CPU usage for a namespace you'll need to calculate namespace CPU usage first and than divide it by the available CPU in a cluster. It would look like this: sum (rate (container_cpu_usage_seconds_total{namespace="$Namespace"}[1m])) / sum(machine_cpu_cores) * 100
You can also use Prometheus' arbitrary labels in order to calculate CPU usage of a namespace. More details can be found here.
Finally you can try Prometheus exporter.
Please let me know if that helped.