Commit Graph

5 Commits

Author SHA1 Message Date
518b5c31ef fix: update dashboards and OTel collector for proper metrics/logs
- certmanager.json: use Thanos datasource, fix variable regex
- argocd.json: use Thanos datasource via $datasource variable
- logs.json: update to use OTel labels (k8s_namespace_name, k8s_container_name)
- collector.yaml: add loki.resource.labels hint for proper Loki label mapping

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-10 03:36:37 +09:00
01c5742d7a FIX(grafana): change OOM panel to stat type
- Replace timeseries with stat panel for OOM detection
- Show total count of OOMKilled pods instead of timeline
- Gauge metric not suitable for timeseries visualization
2026-01-09 21:42:35 +09:00
539f4be497 FIX(grafana): use kube-state-metrics for OOM detection
- Replace container_oom_events_total with kube_pod_container_status_last_terminated_reason
- Fix OOM events not showing after pod restart
- cAdvisor metric resets on pod restart, kube-state-metrics persists
2026-01-09 21:42:35 +09:00
c472035499 FEAT(grafana): add Grafana monitoring
- Add Grafana monitoring configuration
- Enable metrics collection
2026-01-05 00:40:01 +09:00
9583be9b46 FEAT(grafana): export dashboards
- to JSON and use sidecar ConfigMaps
- Export 14 dashboards to JSON files
- Use kustomize configMapGenerator for dashboard ConfigMaps
- Enable Grafana sidecar to load dashboards from ConfigMaps
- Keep Longhorn and Traefik Official from grafana.com
2026-01-05 00:40:01 +09:00