Actions
The ungleich monitoring infrastructure 2024 (WIP)¶
- Table of contents
- The ungleich monitoring infrastructure 2024 (WIP)
Intro¶
This is a work-in-progress update from The_ungleich_monitoring_infrastructure. The infrastructure is still based on prometheus + blackbox exporter, but now also makes use of kubernetes native objects.
Monitoring definition¶
External primary router/link monitoring¶
- Objective: find out from an external PoV whether the lines are functioning
- Implementation:
- Collecting/alerting with prometheus on place12
- blackbox on place12
- blackbox on place11
- Targets
- ipv6/router1.place10/snr
- ipv4/router1.place10/snr
- ipv6/server12X.place10/snr
- ipv4/server12X.place10/snr
- ipv4/fiberstream/place5
- ipv4/fiberstream/place6
- ipv4/fiberstream/place7
- ipv4/fiberstream/place10
- Status: TBD
Main DNS servers¶
- Objective: ensure all 3 DNS servers are running and returning queries
- Implementation:
- Collecting/alerting with prometheus on place12
- blackbox on place12
- blackbox on place11
- Targets
- dns1.ungleich.ch
- dns2.ungleich.ch
- dns3.ungleich.ch
- Status: TBD
External primary router¶
- Objective: find out whether a router is reachable via any path
- Implementation:
- Collecting/alerting with prometheus on place12
- blackbox on place12
- blackbox on place11
- Targets
- genauso/r2
- genauso/r3
- p5/server137
- p5/server138
- p10/router1
- p10/server122
- p10/server123
- p15/server120
- p15/server121
- Status: TBD
Test external monitoring¶
- Objective: find out whether the external monitoring is alive
- Implementation:
- Collecting/alerting with prometheus on place10
- Targets
- ipv6/emonitor1.place12/prometheus
- ipv6/emonitor1.place12/blackbox
- ipv6/emonitor1.place12/alertmanager
- ipv6/vm1.place11/blackbox
Test per place monitoring infrastructure (blackbox exporter, prometheus)¶
Each place should provide a blackbox exporter suitable for monitoring onsite targets.
We need to ensure that these blackbox exporters all function and that prometheus instances are up.
- Objective: find out whether the onsite monitoring is alive
- Implementation:
- Collecting/alerting with prometheus on place12
- Targets
- blackbox-exporter + prometheus/place5
- blackbox-exporter + prometheus/place6
- blackbox-exporter + prometheus/place10
Internal internal router monitoring (TBD)¶
Per place monitor internal routers
- Objective: find out whether the internal monitoring is alive
- Implementation:
- Collecting/alerting with prometheus on ...
- Targets
- ipv6/apu-router1.place6 (via place10/blackbox)
Internal network device monitoring¶
- Objective: find out whether all production switches are alive
- Implementation:
- Dedicated blackbox_exporter on a router or similar (needs to be secured)
- Targets
- All Arista in each place
- All Mikrotik in each place
Updated by Nico Schottelius about 1 year ago · 8 revisions