Commit Graph

20 Commits

Author SHA1 Message Date
15d5e58d6c migrate: change repoURLs from GitHub to Gitea
Update all ArgoCD Application references to use Gitea (github0213.com)
instead of GitHub for K3S-HOME/observability repository.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-10 20:43:29 +09:00
203a8debac REFACTOR(repo): remove control-plane scheduling
- Remove nodeSelector for control-plane node
- Remove tolerations for control-plane taint
- Allow pods to schedule on any available node
2026-01-10 18:35:15 +09:00
a3003d597f PERF(observability): adjust resources based on VPA
- Update blackbox-exporter cpu 15m→23m, memory 64Mi→100Mi
- Update grafana cpu 11m→23m, memory 425Mi→175Mi
- Update loki cpu 23m→63m, memory 462Mi→363Mi
- Update tempo cpu 50m→15m, memory 128Mi→100Mi
- Update thanos memory 128Mi→283Mi
- Update node-exporter memory 64Mi→100Mi
- Update kube-state-metrics memory 100Mi→105Mi
- Update opentelemetry-operator cpu 10m→11m, memory 256Mi→75Mi
- Update vpa memory 128Mi→100Mi
2026-01-10 14:33:40 +09:00
5089e8607d CHORE(resources): set memory limits equal to memory requests
Align memory limits with memory requests for guaranteed QoS class.
- prometheus, thanos (query, storegateway, compactor)
- alertmanager, tempo, goldilocks (dashboard, controller)
- node-exporter, opentelemetry-collector, vpa, kube-state-metrics
2026-01-09 21:42:35 +09:00
735166fc9c REFACTOR(repo): standardize taint to control-plane
- Change node-role.kubernetes.io/master to control-plane
- Update vpa, goldilocks, kube-state-metrics tolerations
- Remove deprecated master taint from promtail
2026-01-09 21:41:52 +09:00
4511fd5b2e FIX(repo): correct nodeSelector label value
- Change master label value from "" to "true"
- Fix pod scheduling failure due to label mismatch
2026-01-09 21:41:52 +09:00
1c6a9dc491 PERF(repo): move system pods to master node
- Add nodeSelector for master node placement
- Add tolerations for NoExecute taint
- kube-state-metrics: schedule on master
- goldilocks-controller: schedule on master, reduce to 1 replica
- vpa-recommender: schedule on master, remove anti-affinity
- Free worker node resources for applications
2026-01-09 21:41:52 +09:00
6a8e1f5a47 PERF(vpa): fix config and reduce CPU request
- Merge duplicate recommender sections
- Reduce CPU: 50m → 15m
- Change replicas: 2 → 1 (single recommender sufficient)
2026-01-09 21:41:52 +09:00
4515ea0b33 FEAT(observability): enable HA with replica 2 and soft anti-affinity
- Add replicaCount: 2 to goldilocks, vpa, alertmanager
- Add replicas: 2 to loki singleBinary
- Add soft pod anti-affinity for node distribution
- Keep kube-state-metrics at replica 1 to prevent duplicate metrics

FIX(loki): revert to replica 1 for Single Binary mode

- Single Binary mode cannot run more than 1 replica without object storage
- Remove affinity configuration for single replica
- Keep filesystem storage backend
2026-01-09 21:41:51 +09:00
4286296591 PERF(resources): remove CPU limits - keep memory limits only
- CPU throttling prevents app startup, not crashes
- Memory OOM is the real cascading failure cause
- CPU request ensures fair scheduling
2026-01-07 23:48:35 +09:00
28ba50d1a3 REFACTOR(repo): observability repo structure
- Add application.yaml for ArgoCD app-of-apps
- Add kustomization.yaml with observability components
- Add renovate.json for automated updates
- Update all component argocd.yaml repoURLs to observability repo

Components: prometheus, alertmanager, grafana, loki, promtail,
node-exporter, kube-state-metrics, goldilocks, uptime-kuma, vpa
2026-01-05 00:40:01 +09:00
200c6e97ae REFACTOR(repo): migrate repoURL to K3S-HOME
- Update repository URL to K3S-HOME organization
- Change from personal to organization repo
2026-01-05 00:40:01 +09:00
0ce1f99fb4 CHORE(goldilocks): disable goldilocks
- and cancel trivy installation
- Comment out goldilocks/argocd.yaml from kustomization
- Comment out trivy/argocd.yaml from kustomization
- Disable autoSync in both applications
- Server overload mitigation
2026-01-05 00:40:01 +09:00
ea4152a0d6 REFACTOR(gitea): migrate repoURL from Gitea
- to GitHub
2026-01-04 23:38:05 +09:00
5ec1a3323d REFACTOR(goldilocks): use managedNamespaceMetad...
- Remove namespace.yaml files
- Add managedNamespaceMetadata with Goldilocks label
- Set CreateNamespace=true in syncOptions
- Update kustomization.yaml to remove namespace.yaml references
2026-01-04 23:38:05 +09:00
bbf6fa5001 CHORE(repo): clean kustomization files
- Remove unused entries from kustomization
- Clean up configuration
2026-01-04 23:38:05 +09:00
3245bbbda1 FIX(argocd): helm valueFiles paths in ArgoCD
- Applications
- Update valueFiles paths from helm-values/<app>.yaml to helm-
  values.yaml
- Fixes ComparisonError after folder restructuring

Applications fixed:
- cert-manager
- cnpg
- external-secrets
- vault
- vpa
- velero
2026-01-04 23:38:05 +09:00
d922eadb48 REFACTOR(repo): restructure infra folder structure
- Remove argocd/, helm-values/, ingress/ subdirectories
- Move files to parent directory with standardized names
- Add namespace.yaml to all apps with Goldilocks labels
- Preserve vault/ subdirectories (falco, velero)
- Update main kustomization.yaml to reference argocd.yaml files directly
- Comment out argocd.yaml in each app's kustomization.yaml to prevent
  circular reference

Applications restructured:
- cert-manager (2 ArgoCD apps)
- external-secrets
- reloader
- vault (2 ArgoCD apps)
- velero (2 ArgoCD apps)
- falco
- cnpg
- haproxy
- metallb
- vpa
- argocd
2026-01-04 23:38:05 +09:00
c34775735f FIX(vpa): vpa outofsync by ignoring cpu limits
- differences
- Add ignoreDifferences for Deployment CPU limits
- Prevents ArgoCD from detecting drift when CPU limits are null
- Add RespectIgnoreDifferences syncOption
2026-01-04 23:38:05 +09:00
27479af5aa FEAT(goldilocks): add vpa and goldilocks
- for resource optimization
- Add VPA (Vertical Pod Autoscaler) for automatic resource
  recommendation
- Add Goldilocks dashboard for visualizing VPA recommendations
- Update kustomization.yaml to include both applications
2026-01-04 23:38:05 +09:00