Commit Graph

9 Commits

Author SHA1 Message Date
a15cb1510f PERF(grafana): optimize cpu requests based on
- actual usage from grafa...
- external-secrets: 20m → 5m (actual: 1m)
- external-secrets-webhook: 10m → 2m (actual: 1m)
- external-secrets-cert: 10m → 2m (actual: 1m)
- cnpg: 100m → 5m (actual: 2m)
- haproxy-ingress: 100m → 15m (actual: 9-10m)
2026-01-04 23:47:13 +09:00
b59c5618ea REFACTOR(resources): remove cpu limits
- to prevent throttling
Removed CPU limits from all infrastructure components while keeping
memory limits for protection:

- cnpg: removed 500m CPU limit
- external-secrets: removed 200m, 100m CPU limits (operator, webhook,
  certController)
- falco: removed 500m CPU limit (falcosidekick webui)
- vault: removed 500m CPU limit
- velero: removed 500m, 1000m CPU limits (server, node-agent)

Benefits:
-  Prevents CPU throttling
-  Better performance and lower latency
-  More efficient resource utilization
-  Simpler management (only requests to tune)

Memory limits are kept to prevent memory leaks and OOM issues.
2026-01-04 23:47:13 +09:00
ecb04fc14a FEAT(velero): configure minio
- for selective velero backup
Added pod annotation to exclude PVC data from Velero backups while
preserving MinIO resource definitions:
- backup.velero.io/backup-volumes-excludes: export

This prevents circular backup of the velero-backups bucket while
still backing up MinIO StatefulSet, Services, and configuration.

Note: MinIO bucket data (bucket, bucket-dev, velero-backups) will
NOT be backed up. Consider separate backup strategy for critical
bucket data if needed.
2026-01-04 23:47:13 +09:00
656d3fa5a3 PERF(velero): optimize velero node-agent
- resources and prevent circul...
- Reduce node-agent CPU request from 100m to 50m
  - Fixes scheduling issue on mayne-worker-2 (was at 99% CPU)
  - Enables node-agent to run on all 3 nodes for complete backup
coverage
- Exclude minio namespace from backups
  - Prevents circular backup (backing up the backup storage)
  - Minio config is in Git and can be recreated
  - Saves significant storage space
2026-01-04 23:47:13 +09:00
b0cd9274b1 FEAT(velero): configure velero
- for full k3s cluster backup
- Enable node-agent for PV file-system backups
- Add defaultVolumesToFsBackup configuration
- Optimize backup schedule (daily, 7-day retention)
- Exclude non-essential namespaces (postgresql-dev, harbor)
- Update Velero to v1.17.1
- Update velero-plugin-for-aws to v1.13.1

Full cluster disaster recovery backup now active.
2026-01-04 23:47:13 +09:00
4ef5497fd5 FEAT(velero): activate https in falco, update
- velero version
2026-01-04 23:47:13 +09:00
3767a6edea CHORE(traefik): split centralized ingress
- management to per-applicati...
- Moved ArgoCD ingress to argocd/ingress/
- Moved Velero ingress to velero/ingress/
- Removed centralized ingress/ingresses.yaml (single point of failure)
- Updated root kustomization.yaml to reference argocd and velero
  directories
- Each application now manages its own ingress independently
2026-01-04 23:47:13 +09:00
311e8a1cc1 FEAT(velero): Add Velero UI
- with HAProxy Ingress at velero0213.kro.kr
2026-01-04 23:47:13 +09:00
3366a6b5b8 FEAT(velero): Add Velero, Falco,
- and CNPG infrastructure components
Add three critical infrastructure components via GitOps:

- Velero: Backup and disaster recovery solution
  - Configured with Minio S3 backend
  - Daily full cluster backups (30-day retention)
  - Hourly backups for critical namespaces (7-day retention)
  - Credentials managed via External Secrets from Vault

- Falco: Runtime security monitoring
  - eBPF-based threat detection
  - Custom rules for container security
  - Falcosidekick for alert forwarding
  - Prometheus metrics enabled

- CNPG (CloudNativePG): PostgreSQL operator
  - Kubernetes-native PostgreSQL management
  - Automated failover and backups
  - Will replace Bitnami PostgreSQL

All components follow existing GitOps patterns:
- Helm charts deployed via ArgoCD
- Values managed in Git
- Automated sync with selfHeal enabled
2026-01-04 23:47:13 +09:00