Skip to content

Metrics

Metrics are the foundation of effective monitoring in Kubernetes. They allow continuous observation and data-driven evaluation of the health and performance of clusters, nodes, pods, and applications.

To collect, store, and visualize metrics, we use the kube-prometheus-stack, which includes Prometheus along with key exporters like Node Exporter and kube-state-metrics. The collected data is analyzed and visualized through Grafana and can also be explored directly in the Prometheus UI.

Architecture Overview

  • Prometheus scrapes metrics from defined targets (e.g., nodes, pods, services).
  • Node Exporter provides system-level metrics (CPU, memory, disk I/O, etc.).
  • kube-state-metrics exposes metrics about the state of Kubernetes resources (e.g., Deployments, CronJobs, StatefulSets).
  • ServiceMonitors and PodMonitors define which services Prometheus should scrape.
  • Configuration is managed declaratively using Helm charts and GitOps (via Argo CD).

Accessing Prometheus

The Prometheus UI is accessible via Ingress at:

https://<customer-domain>/prometheus

It provides a functional interface to explore metrics, debug scraping targets, and manually execute PromQL queries.

Note: While Prometheus is excellent for direct queries and troubleshooting, Grafana is used as the primary interface for metric visualization, offering rich dashboards and user-friendly analytics. See more in the Dashboards section.

kubara Standardization

In kubara, ServiceMonitors are enabled by default for all deployed applications. This ensures that each app exposes Prometheus-compatible metrics and is automatically included in centralized monitoring.

We also apply consistent labels to every ServiceMonitor-for example, monitoring.instance-to simplify filtering and organization.

Example snippet from the Argo CD Helm chart values.yaml:

controller:
  metrics:
    enabled: true
    serviceMonitor:
      enabled: true
      additionalLabels:
        monitoring.instance: default

Example: ServiceMonitor

A ServiceMonitor defines which services Prometheus should monitor. Here's a basic example:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: my-app
  labels:
    release: kube-prometheus-stack
spec:
  selector:
    matchLabels:
      app: my-app
  namespaceSelector:
    matchNames:
      - my-app-namespace
  endpoints:
    - port: http
      path: /metrics
      interval: 30s

Configuration via values.yaml

Prometheus settings are defined in the values.yaml file of the Helm chart. This includes scrape intervals, retention policies, and storage settings.

prometheus:
  prometheusSpec:
    scrapeInterval: "30s"
    evaluationInterval: "30s"
    retention: "15d"
    storageSpec:
      volumeClaimTemplate:
        spec:
          storageClassName: "gp2"
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 50Gi

Prebuilt Dashboards

The kube-prometheus-stack comes with a wide set of preconfigured Grafana dashboards for:

  • Kubernetes nodes and workloads
  • Prometheus internals
  • etcd, API server, scheduler
  • kubelet performance
  • resource usage and capacity planning

These dashboards are automatically imported when Grafana is deployed with the stack. You can find more about them in the Dashboards chapter.

Best Practices

  • Use ServiceMonitors instead of static target definitions to keep deployments flexible and declarative.
  • Apply labels for better metric organization and filtering (e.g., by namespace, app, or team).
  • Set retention periods based on operational needs-long retention can impact performance.
  • Monitor Prometheus itself: metrics like prometheus_tsdb_head_series and prometheus_engine_query_duration_seconds provide insight into system health and scaling requirements.