Commit Graph

19 Commits

Author SHA1 Message Date
da89c8dbf0 FIX(grafana): restore gauge design with percentage display
- Restore original gauge panel type
- Keep * 100 query and percent unit
- Set max to 100 for proper gauge range
2026-01-10 17:58:11 +09:00
11f9457236 fix: increase CPU pressure threshold to 30% 2026-01-10 17:57:34 +09:00
7e375e20c6 FIX(grafana): show CPU Usage as percentage per node
- Change panel type from gauge to stat
- Add * 100 to query for percentage
- Show each node's CPU usage horizontally
- Set thresholds at 50% (orange), 80% (red)
2026-01-10 17:57:05 +09:00
b818a8c1fe fix: update CPU throttling panels to use PSI metrics with 10% threshold 2026-01-10 17:54:55 +09:00
2b1667e643 FIX(grafana): replace rate_interval with 5m in MinIO dashboard
- Change all $__rate_interval to 5m
- Fix No data issues in rate() queries
2026-01-10 17:50:47 +09:00
38e0c68ddb CHORE(grafana): rearrange Bucket Scans panels side by side
- Move Finished to left (x=0)
- Move Started next to Finished (x=12, same y)
2026-01-10 17:48:43 +09:00
4afdf04ef2 CHORE(grafana): remove KMS panels from MinIO dashboard
- Remove 5 KMS-related panels (KMS not configured)
- KMS Uptime, Request rates, Online/Offline status
2026-01-10 17:46:45 +09:00
20b796f9e4 FIX(grafana): fix MinIO CPU Usage panel query
- Hardcode job=minio and 5m interval
- Change unit from 's' to 'percentunit'
- Set max to 1 for proper gauge display
2026-01-10 17:33:54 +09:00
fa4c2ce8f6 FIX(grafana): set default value for MinIO dashboard variable
- Set scrape_jobs default to 'minio'
- Hide variable selector (only one option)
2026-01-10 17:32:23 +09:00
fc4f825b6d FIX(grafana): fix MinIO dashboard scrape_jobs variable
- Query only MinIO-related jobs
- Set includeAll and multi to false
2026-01-10 17:15:53 +09:00
823edfbd88 fix(grafana): restrict main dashboard datasource to Thanos only
- Set regex filter "/Thanos/" on datasource variable
- Set default value to "Thanos"

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-10 03:51:44 +09:00
dc8706fb02 fix(grafana): set explicit 2m interval on CPU query targets
- Global CPU Usage: set interval="2m" on Real Linux/Windows targets
- CPU Usage: set interval="2m" on Real Linux/Windows targets
- Previously empty interval caused $__rate_interval mismatch

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-10 03:50:44 +09:00
3516a860db fix(grafana): standardize CPU panel intervals to 2m
- Revert Overview panels to 2m (rate() needs sufficient data points)
- Change Cluster CPU Utilization targets to 2m for consistency
- All CPU panels now update at the same rate

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-10 03:48:21 +09:00
64e129128f fix(grafana): sync interval for CPU panels in main dashboard
- Change hardcoded "2m" interval to "$resolution" variable
- Affected panels: Global CPU Usage (id 77), CPU Usage (id 37)
- Ensures consistent refresh rate across all CPU metrics

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-10 03:46:15 +09:00
518b5c31ef fix: update dashboards and OTel collector for proper metrics/logs
- certmanager.json: use Thanos datasource, fix variable regex
- argocd.json: use Thanos datasource via $datasource variable
- logs.json: update to use OTel labels (k8s_namespace_name, k8s_container_name)
- collector.yaml: add loki.resource.labels hint for proper Loki label mapping

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-10 03:36:37 +09:00
01c5742d7a FIX(grafana): change OOM panel to stat type
- Replace timeseries with stat panel for OOM detection
- Show total count of OOMKilled pods instead of timeline
- Gauge metric not suitable for timeseries visualization
2026-01-09 21:42:35 +09:00
539f4be497 FIX(grafana): use kube-state-metrics for OOM detection
- Replace container_oom_events_total with kube_pod_container_status_last_terminated_reason
- Fix OOM events not showing after pod restart
- cAdvisor metric resets on pod restart, kube-state-metrics persists
2026-01-09 21:42:35 +09:00
c472035499 FEAT(grafana): add Grafana monitoring
- Add Grafana monitoring configuration
- Enable metrics collection
2026-01-05 00:40:01 +09:00
9583be9b46 FEAT(grafana): export dashboards
- to JSON and use sidecar ConfigMaps
- Export 14 dashboards to JSON files
- Use kustomize configMapGenerator for dashboard ConfigMaps
- Enable Grafana sidecar to load dashboards from ConfigMaps
- Keep Longhorn and Traefik Official from grafana.com
2026-01-05 00:40:01 +09:00