Skip to content

feat: add Grafana Mimir metrics infrastructure#41

Closed
gsanchietti wants to merge 5 commits intomainfrom
feature/mimir-metrics-integration
Closed

feat: add Grafana Mimir metrics infrastructure#41
gsanchietti wants to merge 5 commits intomainfrom
feature/mimir-metrics-integration

Conversation

@gsanchietti
Copy link
Member

@gsanchietti gsanchietti commented Feb 18, 2026

📋 Description

Integrates Grafana Mimir (metrics storage) and Grafana (dashboards) into the MY platform, enabling NethServer/NethSecurity systems to push Prometheus metrics via remote_write with per-organization isolation.

Architecture

NethServer  →  POST /mimir/api/v1/push  →  nginx  →  backend /api/mimir/*
                Basic Auth: system_key:system_secret          │
                                                              │ X-Scope-OrgID = organization_id
                                                              ▼
                                                           Mimir (pserv, 2-node cluster)

Browser  →  /grafana/  →  nginx  →  Grafana  →  backend /api/mimir  →  Mimir

Changes

Backend:

  • backend/methods/mimir.go — wildcard proxy handler (ANY /api/mimir/*): HTTP Basic Auth (system_key + system_secret), Argon2id secret verification, X-Scope-OrgID injection, streaming reverse proxy with load balancing

Metrics stack:

  • dedicated metrics VM: mimir1 + mimir2 (cluster)
  • dedicated VM for grafana
  • data, ruler and alertmanager configuration go to remote s3; only local per-tenant limits are deployed inside the mimir instance

Configuration (already in main):

  • proxy/nginx.conf + nginx.conf.local/mimir/ (→ backend, proxy_request_buffering off) and /grafana/ location blocks

Testing

Check for readiness:

curl -v -X POST -u NETH-xxx:yyy https://my-collect-qa-pr-41.onrender.com/api/services/mimir/ready

Write:

curl -v -X POST -u NETH-xxx:yyy https://my-collect-qa-pr-41.onrender.com/api/services/mimir/api/v1/push

Query:

curl -v  -u NETH-xxx:yyy 'https://my-collect-qa-pr-41.onrender.com/api/services/mimir/prometheus/api/v1/label/__name__/values?limit=40000&start=1771484100&end=1771487700

Grafana dashboard: https://my-grafana-qa.onrender.com/grafana

TODO

Possibile improvements:

  • choose a set of labels for the hosts to identify them inside the queries, proposed:
    • hostname
    • system_key
  • prepare common alert rules
  • make sure to send few data from the hosts
  • evaluate Grafana Oauth authentication with provisioned organizations (each organization can see only its own tenant datasource)
  • expose mimir UI with authentication to evaluate cluster status and expose metrics to other prometheus
  • switch to modern prometheus data format on host that sends data using remote write
  • fix mimir and grafana versions so deployed vm will be reproducible

NethServer 8 integration

Changes for ns8-metrics:

diff -u ../bin/provision-prometheus.ori ../bin/provision-prometheus
--- ../bin/provision-prometheus.ori	2026-02-17 08:03:26.111372197 +0000
+++ ../bin/provision-prometheus	2026-02-18 16:23:52.234003166 +0000
@@ -63,6 +63,19 @@
         fp.write('        - localhost:9093\n')
         fp.write('rule_files:\n')
         fp.write('  - "/prometheus/rules.d/*.yml"\n')
+        fp.write('remote_write:\n')
+        fp.write('  - url: https://my-collect-qa-pr-41.onrender.com/api/services/mimir/api/v1/push\n')
+        fp.write('    basic_auth:\n')
+        fp.write('      username: NETH-xxx\n')
+        fp.write('      password: my_xxx\n')
+        fp.write('    write_relabel_configs:\n')
+        fp.write('      - action: replace\n')
+        fp.write('        target_label: system_key\n')
+        fp.write('        replacement: aa-bb-cc\n')
+        fp.write('      - action: replace\n')
+        fp.write('        target_label: hostname\n')
+        fp.write('        replacement: demo-heron2\n')
 
 def validate_and_generate_provider_configs(redis_client):
     for nkey in redis_client.scan_iter("node/*/vpn"):

NethSecurity integration

It requires telegraf binary with http and prometheusremotewrite plugins.

Config for telegraph:

[global_tags]

[agent]
  interval = "10s"
  round_interval = true
  metric_batch_size = 1000
  metric_buffer_limit = 10000
  collection_jitter = "0s"
  flush_interval = "10s"
  flush_jitter = "0s"
  precision = "0s"

[[inputs.cpu]]
  percpu = true
  totalcpu = true
  collect_cpu_time = false
  report_active = false
  core_tags = false

[[inputs.disk]]
  ignore_fs = ["tmpfs", "devtmpfs", "devfs", "iso9660", "overlay", "aufs", "squashfs"]

[[inputs.diskio]]

[[inputs.kernel]]

[[inputs.mem]]

[[inputs.processes]]

[[inputs.system]]

[[outputs.http]]
  url = "http://mimir.gs.nethserver.net:9009/api/v1/push"
  data_format = "prometheusremotewrite"
  [outputs.http.headers]
     Content-Type = "application/x-protobuf"
     Content-Encoding = "snappy"
     X-Prometheus-Remote-Write-Version = "0.1.0"
     X-Scope-OrgID = "11"

🚀 Testing Environment

To trigger a fresh deployment of all services in the PR preview environment, comment:

update deploy

Automatic PR environments:

✅ Merge Checklist

Code Quality:

  • Backend Tests
  • Collect Tests
  • Sync Tests
  • Frontend Tests

Builds:

  • Backend Build
  • Collect Build
  • Sync Build
  • Frontend Build

@github-actions
Copy link
Contributor

🔗 Redirect URIs Added to Logto

The following redirect URIs have been automatically added to the Logto application configuration:

Redirect URIs:

  • https://my-frontend-qa-pr-41.onrender.com/login-redirect
  • https://my-proxy-qa-pr-41.onrender.com/login-redirect

Post-logout redirect URIs:

  • https://my-frontend-qa-pr-41.onrender.com/login
  • https://my-proxy-qa-pr-41.onrender.com/login

These will be automatically removed when the PR is closed or merged.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 18, 2026

🤖 My API structural change detected

Preview documentation

Structural change details

Added (4)

  • DELETE /api/services/mimir/{path}
  • GET /api/services/mimir/{path}
  • POST /api/services/mimir/{path}
  • PUT /api/services/mimir/{path}
Powered by Bump.sh

@edospadoni edospadoni force-pushed the feature/mimir-metrics-integration branch from ac6a0c7 to 8fc771b Compare February 18, 2026 13:07
@edospadoni edospadoni force-pushed the feature/mimir-metrics-integration branch from 8fc771b to 9d4c71a Compare February 18, 2026 13:16
@edospadoni edospadoni force-pushed the feature/mimir-metrics-integration branch from 9d4c71a to 377ad48 Compare February 18, 2026 13:37
@edospadoni edospadoni force-pushed the feature/mimir-metrics-integration branch from 377ad48 to d4d191b Compare February 18, 2026 13:43
@edospadoni edospadoni force-pushed the feature/mimir-metrics-integration branch from d4d191b to 7d47600 Compare February 18, 2026 13:49
@edospadoni edospadoni force-pushed the feature/mimir-metrics-integration branch from 7d47600 to 55956c8 Compare February 18, 2026 13:54
gsanchietti and others added 2 commits February 20, 2026 11:13
- Add Mimir 2-node cluster (memberlist gossip, replication_factor:2)
- Add Grafana dashboard service with sub-path routing (/grafana/)
- Add authenticated metrics proxy endpoint (ANY /api/mimir/*)
  - HTTP Basic Auth: system_key + system_secret (no JWT)
  - Adds X-Scope-OrgID = organization_id for multi-tenancy
  - Streaming proxy (no buffering) for large metric payloads
- Add MIMIR_URL config field (default: http://localhost:9009)
- Update nginx to route /mimir/ -> backend and /grafana/ -> Grafana
- Add mimir/docker-compose.yml for dedicated metrics VM (Server B)
- Remove mimir/grafana from main docker-compose (moved to mimir/)
- Update render.yaml: pserv mimir1+mimir2 (prod+qa), Grafana web service
- Add OpenAPI documentation for /api/mimir/{path} endpoint
- Add user documentation (EN+IT) and operator README

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@edospadoni edospadoni temporarily deployed to feature/mimir-metrics-integration - my-collect-qa PR #41 February 20, 2026 10:41 — with Render Destroyed
@edospadoni edospadoni had a problem deploying to feature/mimir-metrics-integration - my-mimir-qa PR #41 February 20, 2026 10:41 — with Render Failure
@edospadoni edospadoni temporarily deployed to feature/mimir-metrics-integration - my-mimir-qa PR #41 February 20, 2026 10:43 — with Render Destroyed
@gsanchietti
Copy link
Member Author

Replaced by #42

@github-actions
Copy link
Contributor

🗑️ Redirect URIs Removed from Logto

The following redirect URIs have been automatically removed from the Logto application configuration:

Redirect URIs:

  • https://my-frontend-qa-pr-41.onrender.com/login-redirect
  • https://my-proxy-qa-pr-41.onrender.com/login-redirect

Post-logout redirect URIs:

  • https://my-frontend-qa-pr-41.onrender.com/login
  • https://my-proxy-qa-pr-41.onrender.com/login

Cleanup completed for PR #41.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants