Actions
The ungleich monitoring infrastructure 2024 » History » Revision 5
« Previous |
Revision 5/8
(diff)
| Next »
Nico Schottelius, 12/21/2023 01:17 PM
The ungleich monitoring infrastructure 2024 (WIP)¶
- Table of contents
- The ungleich monitoring infrastructure 2024 (WIP)
Intro¶
This is a work-in-progress update from The_ungleich_monitoring_infrastructure. The infrastructure is still based on prometheus + blackbox exporter, but now also makes use of kubernetes native objects.
Monitoring definition¶
External primary router/link monitoring¶
- Objective: find out from an external PoV whether the lines are functioning
- Implementation:
- Collecting/alerting with prometheus on place12
- blackbox on place12
- blackbox on place11
- Targets
- ipv6/router1.place10/snr
- ipv4/router1.place10/snr
- ipv6/server12X.place10/snr
- ipv4/server12X.place10/snr
External primary router¶
- Objective: find out whether a router is reachable via any path
- Implementation:
- Collecting/alerting with prometheus on place12
- blackbox on place12
- blackbox on place11
Test external monitoring¶
- Objective: find out whether the external monitoring is alive
- Implementation:
- Collecting/alerting with prometheus on place10
- Targets
- ipv6/emonitor1.place12/prometheus
- ipv6/emonitor1.place12/blackbox
- ipv6/emonitor1.place12/alertmanager
- ipv6/vm1.place11/blackbox
Test per place monitoring infrastructure (blackbox exporter, prometheus)¶
Each place should provide a blackbox exporter suitable for monitoring onsite targets.
We need to ensure that these blackbox exporters all function and that prometheus instances are up.
- Objective: find out whether the onsite monitoring is alive
- Implementation:
- Collecting/alerting with prometheus on place12
- Targets
- blackbox-exporter + prometheus/place5
- blackbox-exporter + prometheus/place6
- blackbox-exporter + prometheus/place10
Internal internal router monitoring (TBD)¶
Per place monitor internal routers
- Objective: find out whether the internal monitoring is alive
- Implementation:
- Collecting/alerting with prometheus on ...
- Targets
- ipv6/apu-router1.place6 (via place10/blackbox)
Internal network device monitoring¶
- Objective: find out whether all production switches are alive
- Implementation:
- Dedicated blackbox_exporter on a router or similar (needs to be secured)
- Targets
- All Arista in each place
- All Mikrotik in each place
Updated by Nico Schottelius 11 months ago · 5 revisions