Project

General

Profile

Actions

The ungleich monitoring infrastructure 2024 » History » Revision 4

« Previous | Revision 4/8 (diff) | Next »
Nico Schottelius, 12/18/2023 11:33 AM


The ungleich monitoring infrastructure 2024 (WIP)

Intro

This is a work-in-progress update from The_ungleich_monitoring_infrastructure. The infrastructure is still based on prometheus + blackbox exporter, but now also makes use of kubernetes native objects.

Monitoring definition

External primary router/link monitoring

  • Objective: find out from an external PoV whether the lines are functioning
  • Implementation:
    • Collecting/alerting with prometheus on place12
    • blackbox on place12
    • blackbox on place11
  • Targets
    • ipv6/router1.place10/snr
    • ipv4/router1.place10/snr
    • ipv6/server12X.place10/snr
    • ipv4/server12X.place10/snr

External primary router

  • Objective: find out whether a router is reachable via any path
  • Implementation:
    • Collecting/alerting with prometheus on place12
    • blackbox on place12
    • blackbox on place11

Test external monitoring

  • Objective: find out whether the external monitoring is alive
  • Implementation:
    • Collecting/alerting with prometheus on place10
  • Targets
    • ipv6/emonitor1.place12/prometheus
    • ipv6/emonitor1.place12/blackbox
    • ipv6/emonitor1.place12/alertmanager
    • ipv6/vm1.place11/blackbox

Test per place monitoring infrastructure (blackbox exporter, prometheus)

Each place should provide a blackbox exporter suitable for monitoring onsite targets.
We need to ensure that these blackbox exporters all function and that prometheus instances are up.

  • Objective: find out whether the onsite monitoring is alive
  • Implementation:
    • Collecting/alerting with prometheus on place12
  • Targets
    • blackbox-exporter + prometheus/place5
    • blackbox-exporter + prometheus/place6
    • blackbox-exporter + prometheus/place10

Internal internal router monitoring (TBD)

Per place monitor internal routers

  • Objective: find out whether the internal monitoring is alive
  • Implementation:
    • Collecting/alerting with prometheus on ...
  • Targets
    • ipv6/apu-router1.place6 (via place10/blackbox)

Updated by Nico Schottelius 5 months ago · 4 revisions