observability

Author	SHA1	Message	Date
Mayne0213	7e375e20c6	FIX(grafana): show CPU Usage as percentage per node - Change panel type from gauge to stat - Add * 100 to query for percentage - Show each node's CPU usage horizontally - Set thresholds at 50% (orange), 80% (red)	2026-01-10 17:57:05 +09:00
Mayne0213	b818a8c1fe	fix: update CPU throttling panels to use PSI metrics with 10% threshold	2026-01-10 17:54:55 +09:00
Mayne0213	2b1667e643	FIX(grafana): replace rate_interval with 5m in MinIO dashboard - Change all $__rate_interval to 5m - Fix No data issues in rate() queries	2026-01-10 17:50:47 +09:00
Mayne0213	38e0c68ddb	CHORE(grafana): rearrange Bucket Scans panels side by side - Move Finished to left (x=0) - Move Started next to Finished (x=12, same y)	2026-01-10 17:48:43 +09:00
Mayne0213	4afdf04ef2	CHORE(grafana): remove KMS panels from MinIO dashboard - Remove 5 KMS-related panels (KMS not configured) - KMS Uptime, Request rates, Online/Offline status	2026-01-10 17:46:45 +09:00
Mayne0213	20b796f9e4	FIX(grafana): fix MinIO CPU Usage panel query - Hardcode job=minio and 5m interval - Change unit from 's' to 'percentunit' - Set max to 1 for proper gauge display	2026-01-10 17:33:54 +09:00
Mayne0213	fa4c2ce8f6	FIX(grafana): set default value for MinIO dashboard variable - Set scrape_jobs default to 'minio' - Hide variable selector (only one option)	2026-01-10 17:32:23 +09:00
Mayne0213	fc4f825b6d	FIX(grafana): fix MinIO dashboard scrape_jobs variable - Query only MinIO-related jobs - Set includeAll and multi to false	2026-01-10 17:15:53 +09:00
Mayne0213	823edfbd88	fix(grafana): restrict main dashboard datasource to Thanos only - Set regex filter "/Thanos/" on datasource variable - Set default value to "Thanos" Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-10 03:51:44 +09:00
Mayne0213	dc8706fb02	fix(grafana): set explicit 2m interval on CPU query targets - Global CPU Usage: set interval="2m" on Real Linux/Windows targets - CPU Usage: set interval="2m" on Real Linux/Windows targets - Previously empty interval caused $__rate_interval mismatch Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-10 03:50:44 +09:00
Mayne0213	3516a860db	fix(grafana): standardize CPU panel intervals to 2m - Revert Overview panels to 2m (rate() needs sufficient data points) - Change Cluster CPU Utilization targets to 2m for consistency - All CPU panels now update at the same rate Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-10 03:48:21 +09:00
Mayne0213	64e129128f	fix(grafana): sync interval for CPU panels in main dashboard - Change hardcoded "2m" interval to "$resolution" variable - Affected panels: Global CPU Usage (id 77), CPU Usage (id 37) - Ensures consistent refresh rate across all CPU metrics Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-10 03:46:15 +09:00
Mayne0213	518b5c31ef	fix: update dashboards and OTel collector for proper metrics/logs - certmanager.json: use Thanos datasource, fix variable regex - argocd.json: use Thanos datasource via $datasource variable - logs.json: update to use OTel labels (k8s_namespace_name, k8s_container_name) - collector.yaml: add loki.resource.labels hint for proper Loki label mapping Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-10 03:36:37 +09:00
Mayne0213	01c5742d7a	FIX(grafana): change OOM panel to stat type - Replace timeseries with stat panel for OOM detection - Show total count of OOMKilled pods instead of timeline - Gauge metric not suitable for timeseries visualization	2026-01-09 21:42:35 +09:00
Mayne0213	539f4be497	FIX(grafana): use kube-state-metrics for OOM detection - Replace container_oom_events_total with kube_pod_container_status_last_terminated_reason - Fix OOM events not showing after pod restart - cAdvisor metric resets on pod restart, kube-state-metrics persists	2026-01-09 21:42:35 +09:00
Mayne0213	c472035499	FEAT(grafana): add Grafana monitoring - Add Grafana monitoring configuration - Enable metrics collection	2026-01-05 00:40:01 +09:00
Mayne0213	9583be9b46	FEAT(grafana): export dashboards - to JSON and use sidecar ConfigMaps - Export 14 dashboards to JSON files - Use kustomize configMapGenerator for dashboard ConfigMaps - Enable Grafana sidecar to load dashboards from ConfigMaps - Keep Longhorn and Traefik Official from grafana.com	2026-01-05 00:40:01 +09:00

17 Commits