FEAT(grafana): add Thanos as default datasource

- Add Thanos Query as default Prometheus datasource
- Keep original Prometheus datasource as backup
- Thanos provides deduplicated metrics from HA Prometheus

REFACTOR(thanos): move all components to master node

- Add tolerations for control-plane:NoSchedule
- Add nodeSelector for control-plane node
- Affects: query, storegateway, compactor
- PVC will be recreated on master node (data in S3)

FIX(thanos): allow non-Bitnami images (quay.io/thanos)

FIX(thanos): correct nodeSelector value to 'true'
This commit is contained in:
2026-01-08 21:27:24 +09:00
parent 9b052b49cf
commit aecb15031d
2 changed files with 38 additions and 1 deletions

View File

@@ -7,6 +7,11 @@
# - Store Gateway: reads historical data from MinIO
# - Compactor: compacts and downsamples data in MinIO
# Allow non-Bitnami images (quay.io/thanos/thanos)
global:
security:
allowInsecureImages: true
# Use quay.io image to avoid Docker Hub rate limits
image:
registry: quay.io
@@ -24,6 +29,14 @@ query:
enabled: true
replicaCount: 1
# Run on master node for stability
tolerations:
- key: node-role.kubernetes.io/control-plane
operator: Exists
effect: NoSchedule
nodeSelector:
node-role.kubernetes.io/control-plane: "true"
# Deduplicate metrics from multiple Prometheus replicas
dnsDiscovery:
enabled: true
@@ -58,6 +71,14 @@ storegateway:
enabled: true
replicaCount: 1
# Run on master node for stability
tolerations:
- key: node-role.kubernetes.io/control-plane
operator: Exists
effect: NoSchedule
nodeSelector:
node-role.kubernetes.io/control-plane: "true"
resources:
requests:
cpu: 15m
@@ -76,6 +97,14 @@ storegateway:
compactor:
enabled: true
# Run on master node for stability
tolerations:
- key: node-role.kubernetes.io/control-plane
operator: Exists
effect: NoSchedule
nodeSelector:
node-role.kubernetes.io/control-plane: "true"
# Retention settings
retentionResolutionRaw: 7d # Keep raw data for 7 days
retentionResolution5m: 30d # Keep 5m downsampled for 30 days