Project

General

Profile

Actions

The ungleich monitoring infrastructure 2024 (WIP)

Intro

This is a work-in-progress update from The_ungleich_monitoring_infrastructure. The infrastructure is still based on prometheus + blackbox exporter, but now also makes use of kubernetes native objects.

Monitoring definition

External primary router/link monitoring

  • Objective: find out from an external PoV whether the lines are functioning
  • Implementation:
    • Collecting/alerting with prometheus on place12
    • blackbox on place12
    • blackbox on place11
  • Targets
    • ipv6/router1.place10/snr
    • ipv4/router1.place10/snr
    • ipv6/server12X.place10/snr
    • ipv4/server12X.place10/snr
    • ipv4/fiberstream/place5
    • ipv4/fiberstream/place6
    • ipv4/fiberstream/place7
    • ipv4/fiberstream/place10
  • Status: TBD

Main DNS servers

  • Objective: ensure all 3 DNS servers are running and returning queries
  • Implementation:
    • Collecting/alerting with prometheus on place12
    • blackbox on place12
    • blackbox on place11
  • Targets
    • dns1.ungleich.ch
    • dns2.ungleich.ch
    • dns3.ungleich.ch
  • Status: TBD

External primary router

  • Objective: find out whether a router is reachable via any path
  • Implementation:
    • Collecting/alerting with prometheus on place12
    • blackbox on place12
    • blackbox on place11
  • Targets
    • genauso/r2
    • genauso/r3
    • p5/server137
    • p5/server138
    • p10/router1
    • p10/server122
    • p10/server123
    • p15/server120
    • p15/server121
  • Status: TBD

Test external monitoring

  • Objective: find out whether the external monitoring is alive
  • Implementation:
    • Collecting/alerting with prometheus on place10
  • Targets
    • ipv6/emonitor1.place12/prometheus
    • ipv6/emonitor1.place12/blackbox
    • ipv6/emonitor1.place12/alertmanager
    • ipv6/vm1.place11/blackbox

Test per place monitoring infrastructure (blackbox exporter, prometheus)

Each place should provide a blackbox exporter suitable for monitoring onsite targets.
We need to ensure that these blackbox exporters all function and that prometheus instances are up.

  • Objective: find out whether the onsite monitoring is alive
  • Implementation:
    • Collecting/alerting with prometheus on place12
  • Targets
    • blackbox-exporter + prometheus/place5
    • blackbox-exporter + prometheus/place6
    • blackbox-exporter + prometheus/place10

Internal internal router monitoring (TBD)

Per place monitor internal routers

  • Objective: find out whether the internal monitoring is alive
  • Implementation:
    • Collecting/alerting with prometheus on ...
  • Targets
    • ipv6/apu-router1.place6 (via place10/blackbox)

Internal network device monitoring

  • Objective: find out whether all production switches are alive
  • Implementation:
    • Dedicated blackbox_exporter on a router or similar (needs to be secured)
  • Targets
    • All Arista in each place
    • All Mikrotik in each place

Updated by Nico Schottelius about 1 year ago · 8 revisions