Metrics
Metrics are the foundation of effective monitoring in Kubernetes. They allow continuous observation and data-driven evaluation of the health and performance of clusters, nodes, pods, and applications.
To collect, store, and visualize metrics, we use the kube-prometheus-stack, which includes Prometheus along with key exporters like Node Exporter and kube-state-metrics. The collected data is analyzed and visualized through Grafana and can also be explored directly in the Prometheus UI.
Architecture Overview
- Prometheus scrapes metrics from defined targets (e.g., nodes, pods, services).
- Node Exporter provides system-level metrics (CPU, memory, disk I/O, etc.).
- kube-state-metrics exposes metrics about the state of Kubernetes resources (e.g., Deployments, CronJobs, StatefulSets).
- ServiceMonitors and PodMonitors define which services Prometheus should scrape.
- Configuration is managed declaratively using Helm charts and GitOps (via Argo CD).
Accessing Prometheus
The Prometheus UI is accessible via Ingress at:
https://<customer-domain>/prometheus
It provides a functional interface to explore metrics, debug scraping targets, and manually execute PromQL queries.
Note: While Prometheus is excellent for direct queries and troubleshooting, Grafana is used as the primary interface for metric visualization, offering rich dashboards and user-friendly analytics. See more in the Dashboards section.
kubara Standardization
In kubara, ServiceMonitors are enabled by default for all deployed applications. This ensures that each app exposes Prometheus-compatible metrics and is automatically included in centralized monitoring.
We also apply consistent labels to every ServiceMonitor-for example, monitoring.instance-to simplify filtering and organization.
Example snippet from the Argo CD Helm chart values.yaml:
controller:
metrics:
enabled: true
serviceMonitor:
enabled: true
additionalLabels:
monitoring.instance: default
Example: ServiceMonitor
A ServiceMonitor defines which services Prometheus should monitor. Here's a basic example:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: my-app
labels:
release: kube-prometheus-stack
spec:
selector:
matchLabels:
app: my-app
namespaceSelector:
matchNames:
- my-app-namespace
endpoints:
- port: http
path: /metrics
interval: 30s
Configuration via values.yaml
Prometheus settings are defined in the values.yaml file of the Helm chart. This includes scrape intervals, retention policies, and storage settings.
prometheus:
prometheusSpec:
scrapeInterval: "30s"
evaluationInterval: "30s"
retention: "15d"
storageSpec:
volumeClaimTemplate:
spec:
storageClassName: "gp2"
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 50Gi
Prebuilt Dashboards
The kube-prometheus-stack comes with a wide set of preconfigured Grafana dashboards for:
- Kubernetes nodes and workloads
- Prometheus internals
- etcd, API server, scheduler
- kubelet performance
- resource usage and capacity planning
These dashboards are automatically imported when Grafana is deployed with the stack. You can find more about them in the Dashboards chapter.
Best Practices
- Use
ServiceMonitorsinstead of static target definitions to keep deployments flexible and declarative. - Apply labels for better metric organization and filtering (e.g., by namespace, app, or team).
- Set retention periods based on operational needs-long retention can impact performance.
- Monitor Prometheus itself: metrics like
prometheus_tsdb_head_seriesandprometheus_engine_query_duration_secondsprovide insight into system health and scaling requirements.