Project

General

Profile

The ungleich monitoring infrastructure 2024 » History » Version 6

Nico Schottelius, 12/25/2023 12:30 PM

1 1 Nico Schottelius
h1. The ungleich monitoring infrastructure 2024 (WIP)
2
3 3 Nico Schottelius
{{toc}}
4
5 1 Nico Schottelius
h2. Intro
6
7
This is a work-in-progress update from [[The_ungleich_monitoring_infrastructure]]. The infrastructure is still based on prometheus + blackbox exporter, but now also makes use of kubernetes native objects.
8
9
h2. Monitoring definition
10
11
h3. External primary router/link monitoring
12
13
* Objective: find out from an external PoV whether the lines are functioning
14
* Implementation:
15
** Collecting/alerting with prometheus on place12
16
** blackbox on place12
17
** blackbox on place11
18 2 Nico Schottelius
* Targets
19
** ipv6/router1.place10/snr
20
** ipv4/router1.place10/snr
21
** ipv6/server12X.place10/snr
22
** ipv4/server12X.place10/snr
23 6 Nico Schottelius
** ipv4/fiberstream/place5
24
** ipv4/fiberstream/place6
25
** ipv4/fiberstream/place7
26
** ipv4/fiberstream/place10
27
* Status: TBD
28 2 Nico Schottelius
29
h3. External primary router
30
31
* Objective: find out whether a router is reachable via any path
32
* Implementation:
33
** Collecting/alerting with prometheus on place12
34
** blackbox on place12
35
** blackbox on place11
36
37
h3. Test external monitoring
38
39
* Objective: find out whether the external monitoring is alive
40
* Implementation:
41
** Collecting/alerting with prometheus on place10
42
* Targets
43
** ipv6/emonitor1.place12/prometheus
44
** ipv6/emonitor1.place12/blackbox
45
** ipv6/emonitor1.place12/alertmanager
46
** ipv6/vm1.place11/blackbox
47 4 Nico Schottelius
48
h3. Test per place monitoring infrastructure (blackbox exporter, prometheus)
49
50
Each place should provide a blackbox exporter suitable for monitoring onsite targets. 
51
We need to ensure that these blackbox exporters all function and that prometheus instances are up.
52
53
* Objective: find out whether the onsite monitoring is alive
54
* Implementation:
55
** Collecting/alerting with prometheus on place12
56
* Targets
57
** blackbox-exporter + prometheus/place5
58
** blackbox-exporter + prometheus/place6
59
** blackbox-exporter + prometheus/place10
60
61
62
h3. Internal internal router monitoring (TBD)
63
64
Per place monitor internal routers
65
66
* Objective: find out whether the internal monitoring is alive
67
* Implementation:
68
** Collecting/alerting with prometheus on ...
69
* Targets
70
** ipv6/apu-router1.place6 (via place10/blackbox)
71 5 Nico Schottelius
72
h3. Internal network device monitoring
73
74
* Objective: find out whether all production switches are alive
75
* Implementation:
76
** Dedicated blackbox_exporter on a router or similar (needs to be secured)
77
* Targets
78
** All Arista in each place
79
** All Mikrotik in each place