- Change panel type from gauge to stat
- Add * 100 to query for percentage
- Show each node's CPU usage horizontally
- Set thresholds at 50% (orange), 80% (red)
- Global CPU Usage: set interval="2m" on Real Linux/Windows targets
- CPU Usage: set interval="2m" on Real Linux/Windows targets
- Previously empty interval caused $__rate_interval mismatch
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Revert Overview panels to 2m (rate() needs sufficient data points)
- Change Cluster CPU Utilization targets to 2m for consistency
- All CPU panels now update at the same rate
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Change hardcoded "2m" interval to "$resolution" variable
- Affected panels: Global CPU Usage (id 77), CPU Usage (id 37)
- Ensures consistent refresh rate across all CPU metrics
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- certmanager.json: use Thanos datasource, fix variable regex
- argocd.json: use Thanos datasource via $datasource variable
- logs.json: update to use OTel labels (k8s_namespace_name, k8s_container_name)
- collector.yaml: add loki.resource.labels hint for proper Loki label mapping
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Replace timeseries with stat panel for OOM detection
- Show total count of OOMKilled pods instead of timeline
- Gauge metric not suitable for timeseries visualization
- Replace container_oom_events_total with kube_pod_container_status_last_terminated_reason
- Fix OOM events not showing after pod restart
- cAdvisor metric resets on pod restart, kube-state-metrics persists
- to JSON and use sidecar ConfigMaps
- Export 14 dashboards to JSON files
- Use kustomize configMapGenerator for dashboard ConfigMaps
- Enable Grafana sidecar to load dashboards from ConfigMaps
- Keep Longhorn and Traefik Official from grafana.com