7e61af372b
PERF(observability): remove CPU limits for stability
...
- Remove CPU limits from all observability components
- Prevents CPU throttling issues across monitoring stack
2026-01-12 02:10:54 +09:00
3b5bf20902
PERF(observability): optimize resources via VPA
...
- alertmanager: CPU 15m/15m, memory 100Mi/100Mi
- blackbox-exporter: CPU 15m/32m, memory 100Mi/100Mi
- goldilocks: controller 15m/25m, dashboard 15m/15m
- grafana: CPU 22m/24m, memory 144Mi/242Mi (upperBound)
- kube-state-metrics: CPU 15m/15m, memory 100Mi/100Mi
- loki: CPU 10m/69m, memory 225Mi/323Mi
- node-exporter: CPU 15m/15m, memory 100Mi/100Mi
- opentelemetry: CPU 34m/410m, memory 142Mi/1024Mi
- prometheus-operator: CPU 15m/15m, memory 100Mi/100Mi
- tempo: CPU 15m/15m, memory 100Mi/109Mi
- thanos: CPU 15m/15m, memory 100Mi/126Mi
- vpa: CPU 15m/15m, memory 100Mi/100Mi
2026-01-12 01:07:58 +09:00
a3003d597f
PERF(observability): adjust resources based on VPA
...
- Update blackbox-exporter cpu 15m→23m, memory 64Mi→100Mi
- Update grafana cpu 11m→23m, memory 425Mi→175Mi
- Update loki cpu 23m→63m, memory 462Mi→363Mi
- Update tempo cpu 50m→15m, memory 128Mi→100Mi
- Update thanos memory 128Mi→283Mi
- Update node-exporter memory 64Mi→100Mi
- Update kube-state-metrics memory 100Mi→105Mi
- Update opentelemetry-operator cpu 10m→11m, memory 256Mi→75Mi
- Update vpa memory 128Mi→100Mi
2026-01-10 14:33:40 +09:00
5089e8607d
CHORE(resources): set memory limits equal to memory requests
...
Align memory limits with memory requests for guaranteed QoS class.
- prometheus, thanos (query, storegateway, compactor)
- alertmanager, tempo, goldilocks (dashboard, controller)
- node-exporter, opentelemetry-collector, vpa, kube-state-metrics
2026-01-09 21:42:35 +09:00
4286296591
PERF(resources): remove CPU limits - keep memory limits only
...
- CPU throttling prevents app startup, not crashes
- Memory OOM is the real cascading failure cause
- CPU request ensures fair scheduling
2026-01-07 23:48:35 +09:00
2ee651b98d
FEAT(node-exporter): add toleration
...
- for all taints to node-exporter
- Allows node-exporter to run on master node with NoExecute taint
- Enables metrics collection from all nodes including master
2026-01-05 00:40:01 +09:00
251b16ee1f
FIX(node-exporter): node-exporter ServiceMonitor
...
- to select node-expor...
2026-01-04 23:38:05 +09:00
ac2abde8b5
FIX(prometheus): servicemonitor namespace
...
- from monitoring to prometheus
2026-01-04 23:38:05 +09:00
5c4676ca9a
REFACTOR(repo): restructure monitoring folder
...
- and add namespace resou...
- Remove argocd/, helm-values/, ingress/ subdirectories
- Move files to parent directory (argocd.yaml, helm-values.yaml,
ingress.yaml)
- Update helm valueFiles paths in ArgoCD Applications
- Add namespace.yaml to all applications with Goldilocks labels
- Update destination namespaces to match folder names
- Update kustomization.yaml files to reference new structure
2026-01-04 23:38:05 +09:00