Commit Graph

156 Commits

Author SHA1 Message Date
60dfa5cf7b CHORE(resources): disable apiserver/etcd metrics
- Disable kubeApiServer ServiceMonitor (~37k series)
- Disable kubeEtcd ServiceMonitor (~26k series)
- Expected memory reduction: ~30-40%
2026-01-05 00:40:01 +09:00
823b2ba495 REFACTOR(repo): remove global panel from Grafana
- Remove global panel configuration
- Clean up dashboard settings
2026-01-05 00:40:01 +09:00
f6ceb50503 REFACTOR(grafana): remove dashboard 15757
- Remove Windows-specific queries dashboard
- Clean up unused dashboards
2026-01-05 00:40:01 +09:00
658a81b4c1 REFACTOR(repo): remove ServerSideApply
- Remove ServerSideApply configuration
- Add RespectIgnoreDifferences syncOption
2026-01-05 00:40:01 +09:00
2c841c2b6e FEAT(vault): add ignoreDiff for ES/SM
- Add ignoreDifferences for ExternalSecret
- Prevent ArgoCD sync drift
2026-01-05 00:40:01 +09:00
d8360c10a1 FEAT(repo): add cAdvisor metrics_path relabel
- Add relabeling for cAdvisor metrics
- Support recording rules
2026-01-05 00:40:01 +09:00
0617611d22 FIX(grafana): restore dashboard 15757
- Restore Kubernetes Global with CPU Real dashboard
- Re-enable monitoring visualization
2026-01-05 00:40:01 +09:00
1befeb68c4 FEAT(prometheus): add ServerSideApply
- Enable ServerSideApply for CRD annotation handling
- Fix resource management
2026-01-05 00:40:01 +09:00
685563b92c REFACTOR(grafana): remove duplicated Dashboard
- Remove duplicate Grafana dashboard
- Clean up configuration
2026-01-05 00:40:01 +09:00
cd575d94a6 PERF(prometheus): optimize prometheus memory usage
- Increase scrapeInterval: 30s → 60s
- Increase evaluationInterval: 30s → 60s
- Reduce retention: 7d → 3d
- Add memory limit: 1Gi (prevent unlimited growth)
- Increase memory request: 256Mi → 512Mi (reflect actual usage)
2026-01-05 00:40:01 +09:00
2ee651b98d FEAT(node-exporter): add toleration
- for all taints to node-exporter
- Allows node-exporter to run on master node with NoExecute taint
- Enables metrics collection from all nodes including master
2026-01-05 00:40:01 +09:00
ebc5af24ef FEAT(repo): add Grafana Global panel
- Add global panel to Grafana dashboard
- Enable overview visualization
2026-01-05 00:40:01 +09:00
2c8095a1db FIX(alertmanager): alertmanager smtp auth by
- loading config from secret
- Add ExternalSecret to generate alertmanager.yml with SMTP password
  from Vault
- Disable helm chart config (ConfigMap) and use extraSecretMounts
  instead
- Fixes "535 5.7.8 Error: authentication failed" SMTP error
2026-01-05 00:40:01 +09:00
2ec87ca7a5 PERF(prometheus): increase Prometheus CPU request from 50m to 200m
- Increase CPU request based on actual usage
- Optimize resource allocation
2026-01-05 00:40:01 +09:00
0ce1f99fb4 CHORE(goldilocks): disable goldilocks
- and cancel trivy installation
- Comment out goldilocks/argocd.yaml from kustomization
- Comment out trivy/argocd.yaml from kustomization
- Disable autoSync in both applications
- Server overload mitigation
2026-01-05 00:40:01 +09:00
d0fc55d403 FEAT(grafana): add uid to Grafana datasources
- for dashboard compatibi...
2026-01-05 00:40:01 +09:00
912b3aa38f REFACTOR(minio): remove minio dashboard
- using manually imported one
2026-01-04 23:38:05 +09:00
8e964afe42 FEAT(grafana): add grafana dashboards
- for cluster monitoring
2026-01-04 23:38:05 +09:00
251b16ee1f FIX(node-exporter): node-exporter ServiceMonitor
- to select node-expor...
2026-01-04 23:38:05 +09:00
b3ad6338ac FIX(prometheus): grafana prometheus datasource
- url with full namespace
2026-01-04 23:38:05 +09:00
340c6fea11 FIX(alertmanager): prometheus alertingendpoints
- to connect to alertma...
2026-01-04 23:38:05 +09:00
5002d352fb FEAT(alertmanager): add Karma to Alertmanager
- Add Karma dashboard for alert aggregation
- Enable alert visualization
2026-01-04 23:38:05 +09:00
bc1cf0d223 REFACTOR(argocd): remove serversideapply
- from argocd applications
2026-01-04 23:38:05 +09:00
a30dbf138f REFACTOR(traefik): switch ingress to Traefik
- Update ingressClassName from haproxy to traefik
- Remove haproxy.org annotations
2026-01-04 23:38:05 +09:00
005da570e6 FEAT(goldilocks): add kustomize source to Goldilocks
- Add kustomize source for ingress deployment
- Update ArgoCD application configuration
2026-01-04 23:38:05 +09:00
79b34aaca6 FEAT(prometheus): add ServerSideApply
- Enable ServerSideApply for CRD annotation handling
- Fix resource management
2026-01-04 23:38:05 +09:00
0cb7438d79 CHORE(external-secrets): update ESO API version from v1beta1 to v1
- Update ExternalSecret API version
- Migrate to stable API
2026-01-04 23:38:05 +09:00
c75798065f CHORE(postgresql): update PostgreSQL namespace reference
- Update namespace reference for PostgreSQL
- Fix service discovery
2026-01-04 23:38:05 +09:00
ea4152a0d6 REFACTOR(gitea): migrate repoURL from Gitea
- to GitHub
2026-01-04 23:38:05 +09:00
5ec1a3323d REFACTOR(goldilocks): use managedNamespaceMetad...
- Remove namespace.yaml files
- Add managedNamespaceMetadata with Goldilocks label
- Set CreateNamespace=true in syncOptions
- Update kustomization.yaml to remove namespace.yaml references
2026-01-04 23:38:05 +09:00
ac2abde8b5 FIX(prometheus): servicemonitor namespace
- from monitoring to prometheus
2026-01-04 23:38:05 +09:00
bbf6fa5001 CHORE(repo): clean kustomization files
- Remove unused entries from kustomization
- Clean up configuration
2026-01-04 23:38:05 +09:00
3245bbbda1 FIX(argocd): helm valueFiles paths in ArgoCD
- Applications
- Update valueFiles paths from helm-values/<app>.yaml to helm-
  values.yaml
- Fixes ComparisonError after folder restructuring

Applications fixed:
- cert-manager
- cnpg
- external-secrets
- vault
- vpa
- velero
2026-01-04 23:38:05 +09:00
7d4f5ff86c REFACTOR(grafana): change logging path in Grafana
- Update logging path configuration
- Fix log collection settings
2026-01-04 23:38:05 +09:00
d922eadb48 REFACTOR(repo): restructure infra folder structure
- Remove argocd/, helm-values/, ingress/ subdirectories
- Move files to parent directory with standardized names
- Add namespace.yaml to all apps with Goldilocks labels
- Preserve vault/ subdirectories (falco, velero)
- Update main kustomization.yaml to reference argocd.yaml files directly
- Comment out argocd.yaml in each app's kustomization.yaml to prevent
  circular reference

Applications restructured:
- cert-manager (2 ArgoCD apps)
- external-secrets
- reloader
- vault (2 ArgoCD apps)
- velero (2 ArgoCD apps)
- falco
- cnpg
- haproxy
- metallb
- vpa
- argocd
2026-01-04 23:38:05 +09:00
d9dc296674 FIX(promtail): promtail Loki endpoint
- to use loki namespace
- Change Loki URL from loki.logging to loki.loki namespace
- Fixes DNS lookup error and readiness probe failures
- Resolves Healthy <-> Progressing oscillation in ArgoCD
2026-01-04 23:38:05 +09:00
a2e0ef4477 REFACTOR(prometheus): remove path sources
- from grafana and prometheus...
- Remove 'path: grafana' source from grafana Application
- Remove 'path: prometheus' source from prometheus Application
- ExternalSecret and Ingress will be managed manually via kubectl apply
  -k
- Fixes circular dependency issue causing Progressing state
2026-01-04 23:38:05 +09:00
4e0f27192f FIX(prometheus): namespaces in grafana
- and prometheus resources
- Update grafana ExternalSecret namespace: monitoring → grafana
- Update grafana Ingress namespace: monitoring → grafana
- Update prometheus ExternalSecret namespace: monitoring → prometheus
- Aligns with per-app namespace strategy
2026-01-04 23:38:05 +09:00
2309254fc9 FIX(repo): circular reference in app kustomizes
- Comment out argocd.yaml in all app kustomization.yaml files
- Prevents circular reference when apps have 'path:' source (grafana,
  prometheus)
- ArgoCD Applications are managed manually, not via kustomize
2026-01-04 23:38:05 +09:00
b4ec13618a REFACTOR(repo): to independent app management
- pattern
- monitoring/kustomization.yaml now only manages application.yaml (App
  of Apps)
- Each app independently manages its own ArgoCD Application via
  kustomization.yaml
- Apps are fully self-contained: argocd.yaml, namespace.yaml, and app-
  specific resources
- Cleaner separation: no central app list to maintain
2026-01-04 23:38:05 +09:00
078850f77a FIX(argocd): sharedresourcewarning by referencing
- argocd.yaml files d...
- Change monitoring/kustomization.yaml to reference argocd.yaml files
  instead of folders
- Comment out argocd.yaml in each app's kustomization.yaml
- Matches applications folder pattern to avoid resource conflicts
2026-01-04 23:38:05 +09:00
6dec7e0a46 REFACTOR(argocd): monitoring apps
- to self-manage ArgoCD Applications
- Each app now includes its own argocd.yaml in kustomization.yaml
- Main monitoring/kustomization.yaml references app folders instead of
  individual argocd.yaml files
- Better separation of concerns - each app is self-contained and
  independently managed
2026-01-04 23:38:05 +09:00
5c4676ca9a REFACTOR(repo): restructure monitoring folder
- and add namespace resou...
- Remove argocd/, helm-values/, ingress/ subdirectories
- Move files to parent directory (argocd.yaml, helm-values.yaml,
  ingress.yaml)
- Update helm valueFiles paths in ArgoCD Applications
- Add namespace.yaml to all applications with Goldilocks labels
- Update destination namespaces to match folder names
- Update kustomization.yaml files to reference new structure
2026-01-04 23:38:05 +09:00
01c10141a9 FEAT(goldilocks): add goldilocks from infra
- Move Goldilocks to monitoring repository
- Goldilocks provides VPA recommendations dashboard
- Update repoURL to monitoring.git
- Includes HAProxy ingress for goldilocks0213.kro.kr
2026-01-04 23:38:05 +09:00
c34775735f FIX(vpa): vpa outofsync by ignoring cpu limits
- differences
- Add ignoreDifferences for Deployment CPU limits
- Prevents ArgoCD from detecting drift when CPU limits are null
- Add RespectIgnoreDifferences syncOption
2026-01-04 23:38:05 +09:00
27479af5aa FEAT(goldilocks): add vpa and goldilocks
- for resource optimization
- Add VPA (Vertical Pod Autoscaler) for automatic resource
  recommendation
- Add Goldilocks dashboard for visualizing VPA recommendations
- Update kustomization.yaml to include both applications
2026-01-04 23:38:05 +09:00
beab899dee FEAT(kube-state-metrics): add pod and container
- label relabeling for ...
Copy exported_pod and exported_container to pod and container labels
for Grafana dashboard query compatibility. This fixes CNPG dashboard
queries that filter by container and pod names.
2026-01-04 23:38:05 +09:00
5df6fc7b52 FIX(postgresql): pg-password ExternalSecret
- to use databases/...
2026-01-04 23:38:05 +09:00
1bf40d431b REVERT(grafana): grafana to local-path
- storageclass
Due to storage constraints, reverting from longhorn to local-path.
Only Loki, Alertmanager, and Gitea remain on longhorn.
2026-01-04 23:38:05 +09:00
1a2f15c468 REFACTOR(longhorn): migrate monitoring PVCs
- from local-path to Longhorn
- Grafana: 2Gi (replica=3)
- Loki: 10Gi (replica=3)
- Alertmanager: 1Gi (replica=3)
- Prometheus: 5Gi (replica=3)
- Use dedicated 50GB Longhorn storage on each node
2026-01-04 23:38:05 +09:00