|
|
4a4a43ed82
|
FIX(prometheus): increase memory to 768Mi
- Prometheus was OOMKilled with 512Mi limit
- Set both requests and limits to 768Mi
|
2026-01-09 21:42:35 +09:00 |
|
|
|
5089e8607d
|
CHORE(resources): set memory limits equal to memory requests
Align memory limits with memory requests for guaranteed QoS class.
- prometheus, thanos (query, storegateway, compactor)
- alertmanager, tempo, goldilocks (dashboard, controller)
- node-exporter, opentelemetry-collector, vpa, kube-state-metrics
|
2026-01-09 21:42:35 +09:00 |
|
|
|
7139f3e5a2
|
FIX(prometheus): correct ArgoCD metrics service names
- Update controller target to argocd-application-controller-metrics
- Update repo-server target to argocd-repo-server-metrics
|
2026-01-09 21:41:52 +09:00 |
|
|
|
ea4d7d4ecf
|
PERF(prometheus): reduce CPU request from 200m to 50m
- Actual usage is ~17m, 200m was over-provisioned
- Fixes "Insufficient cpu" scheduling error for replica 2
|
2026-01-09 21:41:52 +09:00 |
|
|
|
6b576d6a16
|
FEAT(thanos): add Thanos for Prometheus HA and long-term storage
- Add Thanos Query, Store Gateway, Compactor
- Enable Prometheus Sidecar with S3 (MinIO) storage
- Configure Prometheus replicas: 2 with pod anti-affinity
- Add ExternalSecrets for MinIO credentials
- Retention: raw 7d, 5m downsampled 30d, 1h downsampled 90d
|
2026-01-09 21:41:52 +09:00 |
|
|
|
30f028fae4
|
CHORE(prometheus): disable CPU/Memory overcommit alerts
- Disable KubeCPUOvercommit and KubeMemoryOvercommit alerts
- Cluster uses replica=2 with pod anti-affinity for HA
|
2026-01-09 21:41:52 +09:00 |
|
|
|
4286296591
|
PERF(resources): remove CPU limits - keep memory limits only
- CPU throttling prevents app startup, not crashes
- Memory OOM is the real cascading failure cause
- CPU request ensures fair scheduling
|
2026-01-07 23:48:35 +09:00 |
|
|
|
864c2c45d8
|
REFACTOR(alertmanager): change storageClass
- Update storageClass to local-path-retain
- Change storage backend configuration
|
2026-01-05 00:40:01 +09:00 |
|
|
|
e4b477a510
|
REFACTOR(longhorn): migrate to local-path
- alertmanager, grafana, loki, prometheus: storageClass -> local-path-retain
- Change storage backend configuration
|
2026-01-05 00:40:01 +09:00 |
|
|
|
60dfa5cf7b
|
CHORE(resources): disable apiserver/etcd metrics
- Disable kubeApiServer ServiceMonitor (~37k series)
- Disable kubeEtcd ServiceMonitor (~26k series)
- Expected memory reduction: ~30-40%
|
2026-01-05 00:40:01 +09:00 |
|
|
|
d8360c10a1
|
FEAT(repo): add cAdvisor metrics_path relabel
- Add relabeling for cAdvisor metrics
- Support recording rules
|
2026-01-05 00:40:01 +09:00 |
|
|
|
1befeb68c4
|
FEAT(prometheus): add ServerSideApply
- Enable ServerSideApply for CRD annotation handling
- Fix resource management
|
2026-01-05 00:40:01 +09:00 |
|
|
|
cd575d94a6
|
PERF(prometheus): optimize prometheus memory usage
- Increase scrapeInterval: 30s → 60s
- Increase evaluationInterval: 30s → 60s
- Reduce retention: 7d → 3d
- Add memory limit: 1Gi (prevent unlimited growth)
- Increase memory request: 256Mi → 512Mi (reflect actual usage)
|
2026-01-05 00:40:01 +09:00 |
|
|
|
2ec87ca7a5
|
PERF(prometheus): increase Prometheus CPU request from 50m to 200m
- Increase CPU request based on actual usage
- Optimize resource allocation
|
2026-01-05 00:40:01 +09:00 |
|
|
|
b3ad6338ac
|
FIX(prometheus): grafana prometheus datasource
- url with full namespace
|
2026-01-04 23:38:05 +09:00 |
|
|
|
340c6fea11
|
FIX(alertmanager): prometheus alertingendpoints
- to connect to alertma...
|
2026-01-04 23:38:05 +09:00 |
|
|
|
79b34aaca6
|
FEAT(prometheus): add ServerSideApply
- Enable ServerSideApply for CRD annotation handling
- Fix resource management
|
2026-01-04 23:38:05 +09:00 |
|
|
|
ac2abde8b5
|
FIX(prometheus): servicemonitor namespace
- from monitoring to prometheus
|
2026-01-04 23:38:05 +09:00 |
|
|
|
5c4676ca9a
|
REFACTOR(repo): restructure monitoring folder
- and add namespace resou...
- Remove argocd/, helm-values/, ingress/ subdirectories
- Move files to parent directory (argocd.yaml, helm-values.yaml,
ingress.yaml)
- Update helm valueFiles paths in ArgoCD Applications
- Add namespace.yaml to all applications with Goldilocks labels
- Update destination namespaces to match folder names
- Update kustomization.yaml files to reference new structure
|
2026-01-04 23:38:05 +09:00 |
|