Commit Graph

54 Commits

Author SHA1 Message Date
d38634bbb7 migrate: change repoURLs from GitHub to Gitea
Update all ArgoCD Application references to use Gitea (github0213.com)
instead of GitHub for K3S-HOME/storage repository.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-10 20:43:32 +09:00
70b3491072 FIX(velero): increase velero-ui memory limit
- Increase memory from 128Mi to 256Mi
- Fix OOMKilled (exit code 137) issue
2026-01-10 19:00:18 +09:00
39ad7757f8 REFACTOR(repo): remove control-plane scheduling
- Remove nodeSelector for control-plane node
- Remove tolerations for control-plane taint
- Allow pods to schedule on any available node
2026-01-10 18:35:15 +09:00
40e02a4ac4 PERF(velero): disable backup-sync to reduce MinIO load
- Set backupSyncPeriod to 0s
- User only uses manual backups
2026-01-10 17:02:15 +09:00
485b588a7e PERF(storage): adjust resources based on VPA
- Update minio memory 263Mi→175Mi
- Update cnpg memory 128Mi→121Mi
- Update zot memory 128Mi→121Mi
- Update velero memory 128Mi→75Mi
- Update velero nodeAgent memory 256Mi→100Mi
2026-01-10 14:34:29 +09:00
c2165b8a8d FIX(velero): set velero-ui memory limits equal to requests
- Align memory limits with requests for guaranteed QoS
2026-01-09 21:46:58 +09:00
6cc795c3ef CHORE(resources): set memory limits equal to memory requests
Align memory limits with memory requests for guaranteed QoS class.
- velero: main, nodeAgent
- postgresql: cluster
- minio: console
- zot, cnpg, pgweb
2026-01-09 21:46:58 +09:00
88c334a7c3 FIX(velero): fix UI and backup visibility
- Enable backup-sync to show backups in UI
- Disable prune to preserve dynamic resources
- Move velero-ui to master node with single replica
2026-01-09 21:46:40 +09:00
b1ddea2b26 FEAT(velero): add control-plane toleration to node-agent
- Enable node-agent on all nodes including master
- Ensure backup coverage for all pods
2026-01-09 21:46:40 +09:00
613ef5984e REFACTOR(repo): standardize taint to control-plane
- Change node-role.kubernetes.io/master to control-plane
- Update velero, zot, postgresql, minio tolerations
- Change effect from NoExecute to NoSchedule (K3s standard)
2026-01-09 21:46:40 +09:00
a30d9005d9 REFACTOR(velero): move to master node for stability
- Set replicaCount to 1 (Velero doesn't support multiple replicas)
- Add nodeSelector for master node
- Add toleration for master NoExecute taint
- Remove podAntiAffinity (not needed with single replica)
- Ensures backup availability even if worker nodes fail
2026-01-09 21:46:40 +09:00
60d81ac73b PERF(velero): reduce CPU requests based on VPA
- velero: 50m → 11m
- node-agent: 30m → 15m
- velero-ui: 30m → 15m
2026-01-08 17:50:56 +09:00
7c0db6b458 FEAT(velero): enable HA with replica 2 and soft anti-affinity
- Add replicaCount: 2 to velero deployment
- Add soft pod anti-affinity for node distribution
- Configure affinity for velero controller
2026-01-08 13:21:00 +09:00
7487b477a7 FEAT(storage): enable HA with replica 2 and soft anti-affinity
- Add replicaCount: 2 to cnpg, pgweb, velero-ui, minio-console
- Add soft pod anti-affinity for node distribution
- Configure affinity for all storage components
2026-01-08 13:16:43 +09:00
35df7aa64e PERF(resources): remove CPU limits - keep memory limits only
- CPU throttling prevents app startup, not crashes
- Memory OOM is the real cascading failure cause
- CPU request ensures fair scheduling
2026-01-07 23:48:47 +09:00
9c0fddb0ef REFACTOR(secrets): flatten Vault paths
- Change secret paths from <category>/<app> to <app>
- databases/postgresql → postgresql
- databases/minio → minio
- databases/pgweb → pgweb
- cluster-infrastructure/velero → velero
2026-01-06 16:52:54 +09:00
b5f93b3812 REFACTOR(repo): move vault/ to manifests/
- Move ExternalSecret files from vault/ to manifests/secret.yaml
- Merge multiple secrets with --- separator (postgresql)
- Update kustomization.yaml references
- Remove vault/ folders

Apps: postgresql, postgresql-dev, pgweb, minio, velero
2026-01-06 16:42:24 +09:00
6a13a52924 REFACTOR(storage): integrate ingress in values
- longhorn: move ingress to helm-values, nodes to manifests
- velero: move ingress to velero-ui inline values
2026-01-06 01:56:50 +09:00
44f773b827 REFACTOR(storage): storage repo structure
- Add application.yaml for ArgoCD app-of-apps
- Add kustomization.yaml with storage components
- Add renovate.json for automated updates
- Update all component argocd.yaml repoURLs to storage repo

Components: longhorn, minio, postgresql, postgresql-dev, pgweb, cnpg,
velero
2026-01-05 00:39:12 +09:00
c2cda8ee36 REFACTOR(repo): migrate repoURL to K3S-HOME
- Update repository URL to K3S-HOME organization
- Change from personal to organization repo
2026-01-05 00:39:12 +09:00
ad0be20dd9 CHORE(velero): disable BSL validation
- Set storeValidationFrequency to 0 (disabled)
- Prevents ArgoCD refresh every 24 seconds
- Manual backups still work normally
2026-01-05 00:39:12 +09:00
11adb91e19 CHORE(authelia): disable velero-ui basic auth
- use Authelia SSO
2026-01-05 00:39:12 +09:00
4c8b55cc9e CHORE(authelia): disable velero-ui Basic Auth
- use Authelia SSO only
2026-01-05 00:39:12 +09:00
00f8b62dd9 REFACTOR(authelia): remove kanidm
- and restore authelia
- Delete kanidm folder
- Remove oauth2-proxy from velero
- Restore velero ingress to use authelia middleware
- Update kustomization.yaml to use authelia instead of kanidm
2026-01-05 00:39:12 +09:00
8545e1984b FEAT(velero): add oauth2-proxy
- for velero with Kanidm OIDC
- Replace authelia middleware with oauth2-proxy
- Configure OIDC authentication via Kanidm
- Update ingress to route through oauth2-proxy
2026-01-05 00:39:12 +09:00
efa56d156e FEAT(authelia): add authelia sso to velero ingress 2026-01-05 00:39:12 +09:00
962767dfb2 REFACTOR(authentik): remove authentik
- migrating to kanidm
2026-01-05 00:39:12 +09:00
a466073a6f REFACTOR(velero): remove velero weekly backup
- schedule (manual daily ...
2026-01-05 00:39:12 +09:00
a978a4a10e FEAT(authentik): add authentik sso
- with traefik forwardauth
- Add Authentik helm chart and ArgoCD application
- Configure Traefik ForwardAuth middleware for SSO
- Add External Secrets for Vault integration
- Apply SSO middleware to Velero UI as test
2026-01-05 00:39:12 +09:00
4a4dbb7937 REFACTOR(argocd): remove serversideapply
- from argocd applications
- Fixes OutOfSync issues caused by operator-added default values
- ServerSideApply causes stricter field management that conflicts with
  CRD defaults
2026-01-05 00:39:12 +09:00
e47760e680 REFACTOR(traefik): switch from HAProxy
- to Traefik ingress controller
- Update all ingress files to use ingressClassName: traefik
- Update cert-manager ClusterIssuer to use traefik class
- Remove haproxy.org annotations from ingress files
- Update vault helm-values to use traefik
2026-01-05 00:39:12 +09:00
ab80e14e0a CHORE(external-secrets): update ESO API version from v1beta1 to v1
- Update ExternalSecret API version
- Migrate to stable API
2026-01-05 00:39:12 +09:00
a586febc4c REFACTOR(gitea): migrate repoURL from Gitea to GitHub
- Update repository URL to GitHub
- Change source control provider
2026-01-05 00:39:12 +09:00
9abcdfa98d REFACTOR(goldilocks): use managedNamespaceMetadata for namespace labels
- Remove namespace.yaml files
- Add managedNamespaceMetadata with Goldilocks label
- Set CreateNamespace=true in syncOptions
- Update kustomization.yaml to remove namespace.yaml references
2026-01-05 00:39:12 +09:00
66890d8f66 FEAT(velero): add kustomize source
- to velero for ingress deployment
2026-01-04 23:47:13 +09:00
6b4cd0dce8 REFACTOR(velero): simplify vault
- and velero configs
- vault: Fix CreateNamespace conflict (set to false)
- velero: Consolidate ExternalSecrets into vault/velero-secrets.yaml
- velero: Clean up kustomization.yaml
2026-01-04 23:47:13 +09:00
f7610c9a3e FEAT(cert-manager): integrate cert-manager,
- vault, velero
2026-01-04 23:47:13 +09:00
ad12f641a2 FIX(argocd): helm valueFiles paths in ArgoCD
- Applications
- Update valueFiles paths from helm-values/<app>.yaml to helm-
  values.yaml
- Fixes ComparisonError after folder restructuring

Applications fixed:
- cert-manager
- cnpg
- external-secrets
- vault
- vpa
- velero
2026-01-04 23:47:13 +09:00
55380edbd4 REFACTOR(repo): restructure infra folder structure
- Remove argocd/, helm-values/, ingress/ subdirectories
- Move files to parent directory with standardized names
- Add namespace.yaml to all apps with Goldilocks labels
- Preserve vault/ subdirectories (falco, velero)
- Update main kustomization.yaml to reference argocd.yaml files directly
- Comment out argocd.yaml in each app's kustomization.yaml to prevent
  circular reference

Applications restructured:
- cert-manager (2 ArgoCD apps)
- external-secrets
- reloader
- vault (2 ArgoCD apps)
- velero (2 ArgoCD apps)
- falco
- cnpg
- haproxy
- metallb
- vpa
- argocd
2026-01-04 23:47:13 +09:00
cfb6e9db5b CHORE(velero): clean up velero configuration
Remove unused repository maintenance job configuration.
2026-01-04 23:47:13 +09:00
a76543660b FEAT(repo): add repositoryMaintenanceJob
- auto-cleanup: keep only late...
2026-01-04 23:47:13 +09:00
3b2768c9f0 FIX(velero): velero-ui auth: use explicit env
- instead of en...
2026-01-04 23:47:13 +09:00
044cae85e3 FEAT(velero): add velero and falco UI auth
- secrets from Vault
2026-01-04 23:47:13 +09:00
cd9e2822f4 FIX(velero): velero-s3-credentials ExternalSecret
- to use databases/minio
2026-01-04 23:47:13 +09:00
628d168e96 PERF(cnpg): reduce cpu requests
- to allow cnpg join pod scheduling
- Falco: 40m → 30m
- Falcosidekick Web UI: 50m → 30m
- Velero UI: 50m → 30m

This frees up ~40m CPU on worker nodes to allow CNPG join pods
(which request 500m) to be scheduled successfully.
2026-01-04 23:47:13 +09:00
a15cb1510f PERF(grafana): optimize cpu requests based on
- actual usage from grafa...
- external-secrets: 20m → 5m (actual: 1m)
- external-secrets-webhook: 10m → 2m (actual: 1m)
- external-secrets-cert: 10m → 2m (actual: 1m)
- cnpg: 100m → 5m (actual: 2m)
- haproxy-ingress: 100m → 15m (actual: 9-10m)
2026-01-04 23:47:13 +09:00
b59c5618ea REFACTOR(resources): remove cpu limits
- to prevent throttling
Removed CPU limits from all infrastructure components while keeping
memory limits for protection:

- cnpg: removed 500m CPU limit
- external-secrets: removed 200m, 100m CPU limits (operator, webhook,
  certController)
- falco: removed 500m CPU limit (falcosidekick webui)
- vault: removed 500m CPU limit
- velero: removed 500m, 1000m CPU limits (server, node-agent)

Benefits:
-  Prevents CPU throttling
-  Better performance and lower latency
-  More efficient resource utilization
-  Simpler management (only requests to tune)

Memory limits are kept to prevent memory leaks and OOM issues.
2026-01-04 23:47:13 +09:00
ecb04fc14a FEAT(velero): configure minio
- for selective velero backup
Added pod annotation to exclude PVC data from Velero backups while
preserving MinIO resource definitions:
- backup.velero.io/backup-volumes-excludes: export

This prevents circular backup of the velero-backups bucket while
still backing up MinIO StatefulSet, Services, and configuration.

Note: MinIO bucket data (bucket, bucket-dev, velero-backups) will
NOT be backed up. Consider separate backup strategy for critical
bucket data if needed.
2026-01-04 23:47:13 +09:00
656d3fa5a3 PERF(velero): optimize velero node-agent
- resources and prevent circul...
- Reduce node-agent CPU request from 100m to 50m
  - Fixes scheduling issue on mayne-worker-2 (was at 99% CPU)
  - Enables node-agent to run on all 3 nodes for complete backup
coverage
- Exclude minio namespace from backups
  - Prevents circular backup (backing up the backup storage)
  - Minio config is in Git and can be recreated
  - Saves significant storage space
2026-01-04 23:47:13 +09:00
b0cd9274b1 FEAT(velero): configure velero
- for full k3s cluster backup
- Enable node-agent for PV file-system backups
- Add defaultVolumesToFsBackup configuration
- Optimize backup schedule (daily, 7-day retention)
- Exclude non-essential namespaces (postgresql-dev, harbor)
- Update Velero to v1.17.1
- Update velero-plugin-for-aws to v1.13.1

Full cluster disaster recovery backup now active.
2026-01-04 23:47:13 +09:00