Project

General

Profile

The ungleich monitoring infrastructure 2024 » History » Version 5

Nico Schottelius, 12/21/2023 01:17 PM

1 1 Nico Schottelius
h1. The ungleich monitoring infrastructure 2024 (WIP)
2
3 3 Nico Schottelius
{{toc}}
4
5 1 Nico Schottelius
h2. Intro
6
7
This is a work-in-progress update from [[The_ungleich_monitoring_infrastructure]]. The infrastructure is still based on prometheus + blackbox exporter, but now also makes use of kubernetes native objects.
8
9
h2. Monitoring definition
10
11
h3. External primary router/link monitoring
12
13
* Objective: find out from an external PoV whether the lines are functioning
14
* Implementation:
15
** Collecting/alerting with prometheus on place12
16
** blackbox on place12
17
** blackbox on place11
18 2 Nico Schottelius
* Targets
19
** ipv6/router1.place10/snr
20
** ipv4/router1.place10/snr
21
** ipv6/server12X.place10/snr
22
** ipv4/server12X.place10/snr
23
24
h3. External primary router
25
26
* Objective: find out whether a router is reachable via any path
27
* Implementation:
28
** Collecting/alerting with prometheus on place12
29
** blackbox on place12
30
** blackbox on place11
31
32
h3. Test external monitoring
33
34
* Objective: find out whether the external monitoring is alive
35
* Implementation:
36
** Collecting/alerting with prometheus on place10
37
* Targets
38
** ipv6/emonitor1.place12/prometheus
39
** ipv6/emonitor1.place12/blackbox
40
** ipv6/emonitor1.place12/alertmanager
41
** ipv6/vm1.place11/blackbox
42 4 Nico Schottelius
43
h3. Test per place monitoring infrastructure (blackbox exporter, prometheus)
44
45
Each place should provide a blackbox exporter suitable for monitoring onsite targets. 
46
We need to ensure that these blackbox exporters all function and that prometheus instances are up.
47
48
* Objective: find out whether the onsite monitoring is alive
49
* Implementation:
50
** Collecting/alerting with prometheus on place12
51
* Targets
52
** blackbox-exporter + prometheus/place5
53
** blackbox-exporter + prometheus/place6
54
** blackbox-exporter + prometheus/place10
55
56
57
h3. Internal internal router monitoring (TBD)
58
59
Per place monitor internal routers
60
61
* Objective: find out whether the internal monitoring is alive
62
* Implementation:
63
** Collecting/alerting with prometheus on ...
64
* Targets
65
** ipv6/apu-router1.place6 (via place10/blackbox)
66 5 Nico Schottelius
67
h3. Internal network device monitoring
68
69
* Objective: find out whether all production switches are alive
70
* Implementation:
71
** Dedicated blackbox_exporter on a router or similar (needs to be secured)
72
* Targets
73
** All Arista in each place
74
** All Mikrotik in each place