The ungleich monitoring infrastructure 2024 » History » Version 8
Nico Schottelius, 12/25/2023 01:51 PM
1 | 1 | Nico Schottelius | h1. The ungleich monitoring infrastructure 2024 (WIP) |
---|---|---|---|
2 | |||
3 | 3 | Nico Schottelius | {{toc}} |
4 | |||
5 | 1 | Nico Schottelius | h2. Intro |
6 | |||
7 | This is a work-in-progress update from [[The_ungleich_monitoring_infrastructure]]. The infrastructure is still based on prometheus + blackbox exporter, but now also makes use of kubernetes native objects. |
||
8 | |||
9 | h2. Monitoring definition |
||
10 | |||
11 | h3. External primary router/link monitoring |
||
12 | |||
13 | * Objective: find out from an external PoV whether the lines are functioning |
||
14 | * Implementation: |
||
15 | ** Collecting/alerting with prometheus on place12 |
||
16 | ** blackbox on place12 |
||
17 | ** blackbox on place11 |
||
18 | 2 | Nico Schottelius | * Targets |
19 | ** ipv6/router1.place10/snr |
||
20 | ** ipv4/router1.place10/snr |
||
21 | ** ipv6/server12X.place10/snr |
||
22 | ** ipv4/server12X.place10/snr |
||
23 | 6 | Nico Schottelius | ** ipv4/fiberstream/place5 |
24 | ** ipv4/fiberstream/place6 |
||
25 | ** ipv4/fiberstream/place7 |
||
26 | ** ipv4/fiberstream/place10 |
||
27 | * Status: TBD |
||
28 | 2 | Nico Schottelius | |
29 | 7 | Nico Schottelius | h3. Main DNS servers |
30 | |||
31 | * Objective: ensure all 3 DNS servers are running and returning queries |
||
32 | * Implementation: |
||
33 | ** Collecting/alerting with prometheus on place12 |
||
34 | ** blackbox on place12 |
||
35 | ** blackbox on place11 |
||
36 | * Targets |
||
37 | ** dns1.ungleich.ch |
||
38 | ** dns2.ungleich.ch |
||
39 | ** dns3.ungleich.ch |
||
40 | * Status: TBD |
||
41 | |||
42 | 2 | Nico Schottelius | h3. External primary router |
43 | |||
44 | * Objective: find out whether a router is reachable via any path |
||
45 | * Implementation: |
||
46 | ** Collecting/alerting with prometheus on place12 |
||
47 | ** blackbox on place12 |
||
48 | ** blackbox on place11 |
||
49 | 8 | Nico Schottelius | * Targets |
50 | ** genauso/r2 |
||
51 | ** genauso/r3 |
||
52 | ** p5/server137 |
||
53 | ** p5/server138 |
||
54 | ** p10/router1 |
||
55 | ** p10/server122 |
||
56 | ** p10/server123 |
||
57 | ** p15/server120 |
||
58 | ** p15/server121 |
||
59 | 1 | Nico Schottelius | |
60 | 8 | Nico Schottelius | * Status: TBD |
61 | 2 | Nico Schottelius | |
62 | h3. Test external monitoring |
||
63 | |||
64 | * Objective: find out whether the external monitoring is alive |
||
65 | * Implementation: |
||
66 | ** Collecting/alerting with prometheus on place10 |
||
67 | * Targets |
||
68 | ** ipv6/emonitor1.place12/prometheus |
||
69 | ** ipv6/emonitor1.place12/blackbox |
||
70 | ** ipv6/emonitor1.place12/alertmanager |
||
71 | ** ipv6/vm1.place11/blackbox |
||
72 | 4 | Nico Schottelius | |
73 | h3. Test per place monitoring infrastructure (blackbox exporter, prometheus) |
||
74 | |||
75 | Each place should provide a blackbox exporter suitable for monitoring onsite targets. |
||
76 | We need to ensure that these blackbox exporters all function and that prometheus instances are up. |
||
77 | |||
78 | * Objective: find out whether the onsite monitoring is alive |
||
79 | * Implementation: |
||
80 | ** Collecting/alerting with prometheus on place12 |
||
81 | * Targets |
||
82 | ** blackbox-exporter + prometheus/place5 |
||
83 | ** blackbox-exporter + prometheus/place6 |
||
84 | ** blackbox-exporter + prometheus/place10 |
||
85 | |||
86 | |||
87 | h3. Internal internal router monitoring (TBD) |
||
88 | |||
89 | Per place monitor internal routers |
||
90 | |||
91 | * Objective: find out whether the internal monitoring is alive |
||
92 | * Implementation: |
||
93 | ** Collecting/alerting with prometheus on ... |
||
94 | * Targets |
||
95 | ** ipv6/apu-router1.place6 (via place10/blackbox) |
||
96 | 5 | Nico Schottelius | |
97 | h3. Internal network device monitoring |
||
98 | |||
99 | * Objective: find out whether all production switches are alive |
||
100 | * Implementation: |
||
101 | ** Dedicated blackbox_exporter on a router or similar (needs to be secured) |
||
102 | * Targets |
||
103 | ** All Arista in each place |
||
104 | ** All Mikrotik in each place |