- Set CPU limits to null for manager container
- Set CPU limits to null for kube-rbac-proxy container
- Disable chart default CPU limits to prevent throttling
- Update opentelemetry-operator manager from 64Mi to 256Mi
- Update opentelemetry-operator kube-rbac-proxy from 32Mi to 64Mi
- Update opentelemetry-collector memory request from 256Mi to 512Mi
- Revert to simpler architecture where Prometheus scrapes metrics directly via ServiceMonitors
- OTel Collector only handles logs (filelog) and traces (otlp)
- Remove Target Allocator and metrics-related config
- This reduces complexity and resource usage for home cluster
- Create otel-logs (DaemonSet) for logs and traces collection
- Create otel-metrics (Deployment+TA) for metrics collection
- Use consistent-hashing strategy for full target coverage
- Remove old unified collector.yaml
- Set outOfOrderTimeWindow to 5m for TSDB
- Allow slightly out-of-order samples from distributed collectors
- Prevents data loss from timing differences
- Enable Target Allocator with consistent-hashing strategy
- Configure prometheus receiver to use Target Allocator
- Add RBAC permissions for secrets and events
- Use prometheusCR for ServiceMonitor/PodMonitor discovery
- Add sizeLimit 2Gi to loki emptyDir
- Add sizeLimit 2Gi to tempo emptyDir
- Change prometheus from PVC to emptyDir 5Gi
- Change alertmanager from PVC to emptyDir 500Mi
- Move extraEnv from top-level to tempo section where chart expects it
- Move extraVolumeMounts under tempo section for proper WAL mounting
- Fixes Access Denied error when connecting to MinIO
- Loki: disable PVC, use emptyDir for /var/loki
- Tempo: switch backend from local to s3 (MinIO)
- Tempo: disable PVC, use emptyDir for /var/tempo
- Both services no longer use boot volume (/dev/sda1)
- WAL data is temporary, persistent data stored in MinIO
- Change storage type from filesystem to s3
- Configure MinIO endpoint and bucket settings
- Add S3 credentials from minio-s3-credentials secret
- Update schema config to use s3 object_store
- Loki: s3 backend to filesystem with local-path PVC
- Tempo: s3 backend to local backend with local-path PVC
- Remove MinIO/S3 credentials and configuration
- Disable Store Gateway and Compactor
- Remove Sidecar objectStorageConfig
- Keep Thanos Query + Sidecar for HA query
- 3-day local retention is sufficient
- Add http_auth module accepting 401/403 status codes
- Apply http_auth to grafana, code-server, pgweb, velero-ui
- These services return 401 when accessed without authentication
- Replace timeseries with stat panel for OOM detection
- Show total count of OOMKilled pods instead of timeline
- Gauge metric not suitable for timeseries visualization
- Replace container_oom_events_total with kube_pod_container_status_last_terminated_reason
- Fix OOM events not showing after pod restart
- cAdvisor metric resets on pod restart, kube-state-metrics persists
- Add Thanos Query as default Prometheus datasource
- Keep original Prometheus datasource as backup
- Thanos provides deduplicated metrics from HA Prometheus
REFACTOR(thanos): move all components to master node
- Add tolerations for control-plane:NoSchedule
- Add nodeSelector for control-plane node
- Affects: query, storegateway, compactor
- PVC will be recreated on master node (data in S3)
FIX(thanos): allow non-Bitnami images (quay.io/thanos)
FIX(thanos): correct nodeSelector value to 'true'