7e61af372b
PERF(observability): remove CPU limits for stability
...
- Remove CPU limits from all observability components
- Prevents CPU throttling issues across monitoring stack
2026-01-12 02:10:54 +09:00
3b5bf20902
PERF(observability): optimize resources via VPA
...
- alertmanager: CPU 15m/15m, memory 100Mi/100Mi
- blackbox-exporter: CPU 15m/32m, memory 100Mi/100Mi
- goldilocks: controller 15m/25m, dashboard 15m/15m
- grafana: CPU 22m/24m, memory 144Mi/242Mi (upperBound)
- kube-state-metrics: CPU 15m/15m, memory 100Mi/100Mi
- loki: CPU 10m/69m, memory 225Mi/323Mi
- node-exporter: CPU 15m/15m, memory 100Mi/100Mi
- opentelemetry: CPU 34m/410m, memory 142Mi/1024Mi
- prometheus-operator: CPU 15m/15m, memory 100Mi/100Mi
- tempo: CPU 15m/15m, memory 100Mi/109Mi
- thanos: CPU 15m/15m, memory 100Mi/126Mi
- vpa: CPU 15m/15m, memory 100Mi/100Mi
2026-01-12 01:07:58 +09:00
15d5e58d6c
migrate: change repoURLs from GitHub to Gitea
...
Update all ArgoCD Application references to use Gitea (github0213.com)
instead of GitHub for K3S-HOME/observability repository.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2026-01-10 20:43:29 +09:00
203a8debac
REFACTOR(repo): remove control-plane scheduling
...
- Remove nodeSelector for control-plane node
- Remove tolerations for control-plane taint
- Allow pods to schedule on any available node
2026-01-10 18:35:15 +09:00
a3003d597f
PERF(observability): adjust resources based on VPA
...
- Update blackbox-exporter cpu 15m→23m, memory 64Mi→100Mi
- Update grafana cpu 11m→23m, memory 425Mi→175Mi
- Update loki cpu 23m→63m, memory 462Mi→363Mi
- Update tempo cpu 50m→15m, memory 128Mi→100Mi
- Update thanos memory 128Mi→283Mi
- Update node-exporter memory 64Mi→100Mi
- Update kube-state-metrics memory 100Mi→105Mi
- Update opentelemetry-operator cpu 10m→11m, memory 256Mi→75Mi
- Update vpa memory 128Mi→100Mi
2026-01-10 14:33:40 +09:00
5089e8607d
CHORE(resources): set memory limits equal to memory requests
...
Align memory limits with memory requests for guaranteed QoS class.
- prometheus, thanos (query, storegateway, compactor)
- alertmanager, tempo, goldilocks (dashboard, controller)
- node-exporter, opentelemetry-collector, vpa, kube-state-metrics
2026-01-09 21:42:35 +09:00
735166fc9c
REFACTOR(repo): standardize taint to control-plane
...
- Change node-role.kubernetes.io/master to control-plane
- Update vpa, goldilocks, kube-state-metrics tolerations
- Remove deprecated master taint from promtail
2026-01-09 21:41:52 +09:00
4511fd5b2e
FIX(repo): correct nodeSelector label value
...
- Change master label value from "" to "true"
- Fix pod scheduling failure due to label mismatch
2026-01-09 21:41:52 +09:00
1c6a9dc491
PERF(repo): move system pods to master node
...
- Add nodeSelector for master node placement
- Add tolerations for NoExecute taint
- kube-state-metrics: schedule on master
- goldilocks-controller: schedule on master, reduce to 1 replica
- vpa-recommender: schedule on master, remove anti-affinity
- Free worker node resources for applications
2026-01-09 21:41:52 +09:00
4515ea0b33
FEAT(observability): enable HA with replica 2 and soft anti-affinity
...
- Add replicaCount: 2 to goldilocks, vpa, alertmanager
- Add replicas: 2 to loki singleBinary
- Add soft pod anti-affinity for node distribution
- Keep kube-state-metrics at replica 1 to prevent duplicate metrics
FIX(loki): revert to replica 1 for Single Binary mode
- Single Binary mode cannot run more than 1 replica without object storage
- Remove affinity configuration for single replica
- Keep filesystem storage backend
2026-01-09 21:41:51 +09:00
4286296591
PERF(resources): remove CPU limits - keep memory limits only
...
- CPU throttling prevents app startup, not crashes
- Memory OOM is the real cascading failure cause
- CPU request ensures fair scheduling
2026-01-07 23:48:35 +09:00
28ba50d1a3
REFACTOR(repo): observability repo structure
...
- Add application.yaml for ArgoCD app-of-apps
- Add kustomization.yaml with observability components
- Add renovate.json for automated updates
- Update all component argocd.yaml repoURLs to observability repo
Components: prometheus, alertmanager, grafana, loki, promtail,
node-exporter, kube-state-metrics, goldilocks, uptime-kuma, vpa
2026-01-05 00:40:01 +09:00
200c6e97ae
REFACTOR(repo): migrate repoURL to K3S-HOME
...
- Update repository URL to K3S-HOME organization
- Change from personal to organization repo
2026-01-05 00:40:01 +09:00
ea4152a0d6
REFACTOR(gitea): migrate repoURL from Gitea
...
- to GitHub
2026-01-04 23:38:05 +09:00
5ec1a3323d
REFACTOR(goldilocks): use managedNamespaceMetad...
...
- Remove namespace.yaml files
- Add managedNamespaceMetadata with Goldilocks label
- Set CreateNamespace=true in syncOptions
- Update kustomization.yaml to remove namespace.yaml references
2026-01-04 23:38:05 +09:00
ac2abde8b5
FIX(prometheus): servicemonitor namespace
...
- from monitoring to prometheus
2026-01-04 23:38:05 +09:00
bbf6fa5001
CHORE(repo): clean kustomization files
...
- Remove unused entries from kustomization
- Clean up configuration
2026-01-04 23:38:05 +09:00
2309254fc9
FIX(repo): circular reference in app kustomizes
...
- Comment out argocd.yaml in all app kustomization.yaml files
- Prevents circular reference when apps have 'path:' source (grafana,
prometheus)
- ArgoCD Applications are managed manually, not via kustomize
2026-01-04 23:38:05 +09:00
b4ec13618a
REFACTOR(repo): to independent app management
...
- pattern
- monitoring/kustomization.yaml now only manages application.yaml (App
of Apps)
- Each app independently manages its own ArgoCD Application via
kustomization.yaml
- Apps are fully self-contained: argocd.yaml, namespace.yaml, and app-
specific resources
- Cleaner separation: no central app list to maintain
2026-01-04 23:38:05 +09:00
078850f77a
FIX(argocd): sharedresourcewarning by referencing
...
- argocd.yaml files d...
- Change monitoring/kustomization.yaml to reference argocd.yaml files
instead of folders
- Comment out argocd.yaml in each app's kustomization.yaml
- Matches applications folder pattern to avoid resource conflicts
2026-01-04 23:38:05 +09:00
6dec7e0a46
REFACTOR(argocd): monitoring apps
...
- to self-manage ArgoCD Applications
- Each app now includes its own argocd.yaml in kustomization.yaml
- Main monitoring/kustomization.yaml references app folders instead of
individual argocd.yaml files
- Better separation of concerns - each app is self-contained and
independently managed
2026-01-04 23:38:05 +09:00
5c4676ca9a
REFACTOR(repo): restructure monitoring folder
...
- and add namespace resou...
- Remove argocd/, helm-values/, ingress/ subdirectories
- Move files to parent directory (argocd.yaml, helm-values.yaml,
ingress.yaml)
- Update helm valueFiles paths in ArgoCD Applications
- Add namespace.yaml to all applications with Goldilocks labels
- Update destination namespaces to match folder names
- Update kustomization.yaml files to reference new structure
2026-01-04 23:38:05 +09:00
beab899dee
FEAT(kube-state-metrics): add pod and container
...
- label relabeling for ...
Copy exported_pod and exported_container to pod and container labels
for Grafana dashboard query compatibility. This fixes CNPG dashboard
queries that filter by container and pod names.
2026-01-04 23:38:05 +09:00
a04fff40a3
FIX(repo): fix namespace unshown problem
...
- Fix namespace display issue
- Correct ArgoCD configuration
2025-12-20 15:01:58 +09:00
a11a9ab329
CHORE(argocd): update ArgoCD applications
...
- to point to monitoring repo...
2025-12-17 15:12:56 +09:00
baee94b69d
INIT(repo): monitoring stack setup
2025-12-17 15:06:58 +09:00