Commit Graph

  • b145881fa2 PERF(prometheus): increase memory limit to 1Gi main Mayne0213 2026-01-12 03:16:40 +09:00
  • 7e61af372b PERF(observability): remove CPU limits for stability Mayne0213 2026-01-12 02:10:54 +09:00
  • 3b5bf20902 PERF(observability): optimize resources via VPA Mayne0213 2026-01-12 01:07:58 +09:00
  • a70403d1ae FEAT(grafana): add Tempo datasource Mayne0213 2026-01-12 00:34:50 +09:00
  • 7cbc0c810e FIX(tempo): move resources to correct helm path Mayne0213 2026-01-12 00:21:12 +09:00
  • 904cc3cab6 PERF(grafana): increase memory limits Mayne0213 2026-01-11 23:32:09 +09:00
  • c1214029a2 refactor: update Vault secret paths to new categorized structure Mayne0213 2026-01-11 22:36:22 +09:00
  • 4aa7e37f76 PERF(otel): reduce resources based on VPA recommendation Mayne0213 2026-01-11 21:33:58 +09:00
  • 4bdcaf8fcd REFACTOR(otel): rename folder to opentelemetry Mayne0213 2026-01-11 21:27:54 +09:00
  • 43cf7e9de7 REFACTOR(otel): migrate collector from Operator to Helm Mayne0213 2026-01-11 21:22:39 +09:00
  • 15d5e58d6c migrate: change repoURLs from GitHub to Gitea Mayne0213 2026-01-10 20:43:29 +09:00
  • 7d0c8aa5f3 FIX(opentelemetry-operator): remove cpu null values Mayne0213 2026-01-10 18:55:23 +09:00
  • 9c00c42946 CHORE(opentelemetry-operator): upgrade chart to 0.102.0 Mayne0213 2026-01-10 18:53:34 +09:00
  • a08d989fc3 FIX(opentelemetry-operator): remove invalid serviceMonitor Mayne0213 2026-01-10 18:42:02 +09:00
  • 203a8debac REFACTOR(repo): remove control-plane scheduling Mayne0213 2026-01-10 18:35:15 +09:00
  • c128ece672 FIX(opentelemetry-operator): disable serviceMonitor Mayne0213 2026-01-10 18:28:12 +09:00
  • bcf60b2428 fix: set CPU pressure threshold to 10% Mayne0213 2026-01-10 18:00:06 +09:00
  • da89c8dbf0 FIX(grafana): restore gauge design with percentage display Mayne0213 2026-01-10 17:58:11 +09:00
  • 11f9457236 fix: increase CPU pressure threshold to 30% Mayne0213 2026-01-10 17:57:34 +09:00
  • 7e375e20c6 FIX(grafana): show CPU Usage as percentage per node Mayne0213 2026-01-10 17:57:05 +09:00
  • b818a8c1fe fix: update CPU throttling panels to use PSI metrics with 10% threshold Mayne0213 2026-01-10 17:54:55 +09:00
  • 2b1667e643 FIX(grafana): replace rate_interval with 5m in MinIO dashboard Mayne0213 2026-01-10 17:50:47 +09:00
  • 38e0c68ddb CHORE(grafana): rearrange Bucket Scans panels side by side Mayne0213 2026-01-10 17:48:43 +09:00
  • 4afdf04ef2 CHORE(grafana): remove KMS panels from MinIO dashboard Mayne0213 2026-01-10 17:46:45 +09:00
  • 20b796f9e4 FIX(grafana): fix MinIO CPU Usage panel query Mayne0213 2026-01-10 17:33:54 +09:00
  • fa4c2ce8f6 FIX(grafana): set default value for MinIO dashboard variable Mayne0213 2026-01-10 17:32:23 +09:00
  • fc4f825b6d FIX(grafana): fix MinIO dashboard scrape_jobs variable Mayne0213 2026-01-10 17:15:53 +09:00
  • 7d5780cb97 PERF(tempo): switch from MinIO to local filesystem storage Mayne0213 2026-01-10 15:58:34 +09:00
  • eea6420544 PERF(loki): switch from MinIO to local filesystem storage Mayne0213 2026-01-10 15:57:50 +09:00
  • 001aa9253d PERF(loki): disable canary to reduce MinIO load Mayne0213 2026-01-10 15:43:19 +09:00
  • ef7c7c2593 PERF(loki,tempo): reduce replicas to 1 Mayne0213 2026-01-10 15:32:07 +09:00
  • b4b48c6e89 FIX(opentelemetry-operator): restore memory to 256Mi Mayne0213 2026-01-10 14:52:24 +09:00
  • a3003d597f PERF(observability): adjust resources based on VPA Mayne0213 2026-01-10 14:33:40 +09:00
  • c3084225b7 PERF(observability): add HA for Loki and Tempo Mayne0213 2026-01-10 13:46:02 +09:00
  • 395c79ad9e PERF(alertmanager): reduce karma replicas to 1 Mayne0213 2026-01-10 13:36:14 +09:00
  • 67db06cf8b PERF(observability): reduce replicas to 1 Mayne0213 2026-01-10 13:31:39 +09:00
  • 9e218a8adc PERF(observability): reduce replicas, add priority Mayne0213 2026-01-10 13:15:03 +09:00
  • c34f56945a feat(prometheus): enable container CPU throttling metrics collection Mayne0213 2026-01-10 03:55:36 +09:00
  • 823edfbd88 fix(grafana): restrict main dashboard datasource to Thanos only Mayne0213 2026-01-10 03:51:44 +09:00
  • dc8706fb02 fix(grafana): set explicit 2m interval on CPU query targets Mayne0213 2026-01-10 03:50:44 +09:00
  • 3516a860db fix(grafana): standardize CPU panel intervals to 2m Mayne0213 2026-01-10 03:48:21 +09:00
  • 64e129128f fix(grafana): sync interval for CPU panels in main dashboard Mayne0213 2026-01-10 03:46:15 +09:00
  • 518b5c31ef fix: update dashboards and OTel collector for proper metrics/logs Mayne0213 2026-01-10 03:36:37 +09:00
  • de81ca68c9 FIX(opentelemetry-operator): fix ServiceMonitor config path Mayne0213 2026-01-10 02:38:53 +09:00
  • dac5fc7bcf FIX(opentelemetry-operator): disable ServiceMonitor creation Mayne0213 2026-01-10 02:36:12 +09:00
  • 8a050dd303 CHORE(opentelemetry-operator): disable CPU limits Mayne0213 2026-01-10 02:32:53 +09:00
  • 466ec6210c CHORE(observability): align memory requests with limits Mayne0213 2026-01-10 02:31:19 +09:00
  • 507395aca7 CHORE(otel-operator): schedule on master node Mayne0213 2026-01-10 00:55:46 +09:00
  • 9e87e6fbcb REVERT(otel): remove metrics collection, keep logs/traces only Mayne0213 2026-01-10 00:33:10 +09:00
  • a506ca3f58 FIX(prometheus): reduce replicas to 1 due to resource constraints Mayne0213 2026-01-10 00:22:57 +09:00
  • 328d952cc1 FIX(otel): increase metrics collector memory to 1Gi Mayne0213 2026-01-10 00:18:15 +09:00
  • 5bc0caa324 FIX(prometheus): increase memory limit to 1536Mi to resolve OOMKilled Mayne0213 2026-01-10 00:13:29 +09:00
  • 8ce6f95d92 FIX(otel): use statefulset mode for metrics collector Mayne0213 2026-01-10 00:01:22 +09:00
  • 5b70f19b12 REFACTOR(otel): split collector into logs and metrics Mayne0213 2026-01-09 23:50:21 +09:00
  • 12ee5b61c0 FIX(prometheus): enable out-of-order time window Mayne0213 2026-01-09 23:43:01 +09:00
  • a3c5a8dbcf CHORE(prometheus): disable direct scraping Mayne0213 2026-01-09 23:39:30 +09:00
  • 31f15e230d FIX(otel): add scrape_configs for Target Allocator Mayne0213 2026-01-09 23:36:55 +09:00
  • 254687225c FIX(otel): use per-node strategy for DaemonSet mode Mayne0213 2026-01-09 23:32:56 +09:00
  • 1fdbb5e1dd FEAT(otel): enable Target Allocator for metrics Mayne0213 2026-01-09 23:30:41 +09:00
  • 02faf93555 FEAT(otel): add OTel Collector for logs and traces Mayne0213 2026-01-09 23:23:51 +09:00
  • ad9573e998 FIX(alertmanager): remove duplicate volume config Mayne0213 2026-01-09 19:21:19 +09:00
  • 470a08f78a CHORE(repo): switch to emptyDir with sizeLimit Mayne0213 2026-01-09 19:09:30 +09:00
  • fa4d97eede REFACTOR(tempo): remove redundant ExternalSecret, use ClusterExternalSecret Mayne0213 2026-01-09 18:53:49 +09:00
  • b378c6ec06 FIX(tempo): move extraEnv under tempo section for S3 credentials Mayne0213 2026-01-09 18:42:13 +09:00
  • 8ac76d17f3 FEAT(loki,tempo): use MinIO with emptyDir for WAL Mayne0213 2026-01-09 18:23:20 +09:00
  • 2e6b4cecbf FEAT(loki): switch storage backend to MinIO S3 Mayne0213 2026-01-09 18:01:54 +09:00
  • 24747b98cf REFACTOR(loki,tempo): switch from MinIO to local-path storage Mayne0213 2026-01-09 17:13:38 +09:00
  • 94af545120 REFACTOR(thanos): remove S3 storage integration Mayne0213 2026-01-09 16:27:17 +09:00
  • ffed27419a REFACTOR(blackbox-exporter): revert to http_2xx module Mayne0213 2026-01-09 15:50:19 +09:00
  • 37c216c433 FIX(blackbox-exporter): handle Authelia-protected endpoints Mayne0213 2026-01-09 15:48:12 +09:00
  • 884a38d8ad FEAT(blackbox-exporter): add external endpoint monitoring Mayne0213 2026-01-09 15:44:39 +09:00
  • 01c5742d7a FIX(grafana): change OOM panel to stat type Mayne0213 2026-01-09 15:24:48 +09:00
  • 7cd778313a FIX(prometheus): disable PrometheusDuplicateTimestamps alert Mayne0213 2026-01-09 15:16:01 +09:00
  • bb8b1c193e FIX(alertmanager): improve OOMKilled alert detection Mayne0213 2026-01-09 15:13:44 +09:00
  • e3c615b5c1 FEAT(alertmanager): add OOMKilled alert rule Mayne0213 2026-01-09 15:09:43 +09:00
  • 539f4be497 FIX(grafana): use kube-state-metrics for OOM detection Mayne0213 2026-01-09 15:04:24 +09:00
  • 14bd244b98 FIX(thanos): increase compactor memory to 256Mi Mayne0213 2026-01-09 15:03:29 +09:00
  • 4a4a43ed82 FIX(prometheus): increase memory to 768Mi Mayne0213 2026-01-09 14:59:37 +09:00
  • 8c2a9badf8 FIX(alertmanager): set karma memory limits equal to requests Mayne0213 2026-01-09 14:46:06 +09:00
  • 5089e8607d CHORE(resources): set memory limits equal to memory requests Mayne0213 2026-01-09 14:05:54 +09:00
  • fd6c1952ad FIX(tempo): enable env var expansion in config Mayne0213 2026-01-09 13:30:41 +09:00
  • 5f926cb6cf FEAT(tempo): configure S3 storage with MinIO Mayne0213 2026-01-09 13:22:16 +09:00
  • 7139f3e5a2 FIX(prometheus): correct ArgoCD metrics service names Mayne0213 2026-01-09 12:46:13 +09:00
  • 034a5f32a2 CHORE(repo): remove application.yaml reference Mayne0213 2026-01-09 02:26:16 +09:00
  • 87420d842d CHORE(repo): remove self-referencing application.yaml Mayne0213 2026-01-09 02:20:12 +09:00
  • 445cabb900 FIX(prometheus): add ExternalSecret default values to fix OutOfSync Mayne0213 2026-01-08 22:26:44 +09:00
  • aecb15031d FEAT(grafana): add Thanos as default datasource Mayne0213 2026-01-08 21:27:24 +09:00
  • 9b052b49cf FEAT(thanos): add Thanos for Prometheus HA Mayne0213 2026-01-08 20:47:49 +09:00
  • ea4d7d4ecf PERF(prometheus): reduce CPU request from 200m to 50m Mayne0213 2026-01-08 20:47:16 +09:00
  • 6b576d6a16 FEAT(thanos): add Thanos for Prometheus HA and long-term storage Mayne0213 2026-01-08 20:21:37 +09:00
  • 9f3b768cd9 FIX(loki): fix lokiCanary config path Mayne0213 2026-01-08 20:03:51 +09:00
  • a1c347e4ff FEAT(loki): enable loki-canary with control-plane toleration Mayne0213 2026-01-08 19:59:03 +09:00
  • 30f028fae4 CHORE(prometheus): disable CPU/Memory overcommit alerts Mayne0213 2026-01-08 19:47:55 +09:00
  • 6da4eba1dc CHORE(grafana): remove admin login secret for SSO Mayne0213 2026-01-08 19:30:06 +09:00
  • 735166fc9c REFACTOR(repo): standardize taint to control-plane Mayne0213 2026-01-08 19:17:34 +09:00
  • 7ed4d69c51 PERF(alertmanager): add HA with 2 replicas Mayne0213 2026-01-08 18:48:09 +09:00
  • 4511fd5b2e FIX(repo): correct nodeSelector label value Mayne0213 2026-01-08 18:44:33 +09:00
  • 1c6a9dc491 PERF(repo): move system pods to master node Mayne0213 2026-01-08 18:43:18 +09:00
  • bbdd908b27 CHORE(uptime-kuma): remove uptime-kuma application Mayne0213 2026-01-08 18:09:38 +09:00
  • 6a8e1f5a47 PERF(vpa): fix config and reduce CPU request Mayne0213 2026-01-08 17:50:59 +09:00