Project

General

Profile

The ungleich kubernetes infrastructure » History » Version 207

Nico Schottelius, 11/27/2023 08:19 PM

1 22 Nico Schottelius
h1. The ungleich kubernetes infrastructure and ungleich kubernetes manual
2 1 Nico Schottelius
3 3 Nico Schottelius
{{toc}}
4
5 1 Nico Schottelius
h2. Status
6
7 28 Nico Schottelius
This document is **pre-production**.
8
This document is to become the ungleich kubernetes infrastructure overview as well as the ungleich kubernetes manual.
9 1 Nico Schottelius
10 10 Nico Schottelius
h2. k8s clusters
11
12 123 Nico Schottelius
| Cluster            | Purpose/Setup     | Maintainer | Master(s)                     | argo                                                   | v4 http proxy | last verified |
13
| c0.k8s.ooo         | Dev               | -          | UNUSED                        |                                                        |               |    2021-10-05 |
14
| c1.k8s.ooo         | retired           |            | -                             |                                                        |               |    2022-03-15 |
15
| c2.k8s.ooo         | Dev p7 HW         | Nico       | server47 server53 server54    | "argo":https://argocd-server.argocd.svc.c2.k8s.ooo     |               |    2021-10-05 |
16
| c3.k8s.ooo         | retired           | -          | -                             |                                                        |               |    2021-10-05 |
17
| c4.k8s.ooo         | Dev2 p7 HW        | Jin-Guk    | server52 server53 server54    |                                                        |               |             - |
18
| c5.k8s.ooo         | retired           |            | -                             |                                                        |               |    2022-03-15 |
19
| c6.k8s.ooo         | Dev p6 VM Jin-Guk | Jin-Guk    |                               |                                                        |               |               |
20
| [[p5.k8s.ooo]]     | production        |            | server34 server36 server38    | "argo":https://argocd-server.argocd.svc.p5.k8s.ooo     | -             |               |
21
| [[p5-cow.k8s.ooo]] | production        | Nico       | server47 server51 server55    | "argo":https://argocd-server.argocd.svc.p5-cow.k8s.ooo |               |    2022-08-27 |
22
| [[p6.k8s.ooo]]     | production        |            | server67 server69 server71    | "argo":https://argocd-server.argocd.svc.p6.k8s.ooo     | 147.78.194.13 |    2021-10-05 |
23 184 Nico Schottelius
| [[p6-cow.k8s.ooo]] | production        |            | server134 server135 server136 | "argo":https://argocd-server.argocd.svc.p6in10.k8s.ooo | ?             |    2023-05-17 |
24 177 Nico Schottelius
| [[p10.k8s.ooo]]    | production        |            | server131 server132 server133 | "argo":https://argocd-server.argocd.svc.p10.k8s.ooo    | 147.78.194.12 |    2021-10-05 |
25 123 Nico Schottelius
| [[k8s.ge.nau.so]]  | development       |            | server107 server108 server109 | "argo":https://argocd-server.argocd.svc.k8s.ge.nau.so  |               |               |
26
| [[dev.k8s.ooo]]    | development       |            | server110 server111 server112 | "argo":https://argocd-server.argocd.svc.dev.k8s.ooo    | -             |    2022-07-08 |
27 164 Nico Schottelius
| [[r1r2p15k8sooo|r1.p15.k8s.ooo]] | production | Nico | server120 | | | 2022-10-30 |
28
| [[r1r2p15k8sooo|r2.p15.k8s.ooo]] | production | Nico | server121 | | | 2022-09-06 |
29 162 Nico Schottelius
| [[r1r2p10k8sooo|r1.p10.k8s.ooo]] | production | Nico | server122 | | | 2022-10-30 |
30
| [[r1r2p10k8sooo|r2.p10.k8s.ooo]] | production | Nico | server123 | | | 2022-10-15 |
31
| [[r1r2p5k8sooo|r1.p5.k8s.ooo]] | production | Nico | server137 | | | 2022-10-30 |
32
| [[r1r2p5k8sooo|r2.p5.k8s.ooo]] | production | Nico | server138 | | | 2022-10-30 |
33
| [[r1r2p6k8sooo|r1.p6.k8s.ooo]] | production | Nico | server139 | | | 2022-10-30 |
34
| [[r1r2p6k8sooo|r2.p6.k8s.ooo]] | production | Nico | server140 | | | 2022-10-30 |
35 21 Nico Schottelius
36 1 Nico Schottelius
h2. General architecture and components overview
37
38
* All k8s clusters are IPv6 only
39
* We use BGP peering to propagate podcidr and serviceCidr networks to our infrastructure
40
* The main public testing repository is "ungleich-k8s":https://code.ungleich.ch/ungleich-public/ungleich-k8s
41 18 Nico Schottelius
** Private configurations are found in the **k8s-config** repository
42 1 Nico Schottelius
43
h3. Cluster types
44
45 28 Nico Schottelius
| **Type/Feature**            | **Development**                | **Production**         |
46
| Min No. nodes               | 3 (1 master, 3 worker)         | 5 (3 master, 3 worker) |
47
| Recommended minimum         | 4 (dedicated master, 3 worker) | 8 (3 master, 5 worker) |
48
| Separation of control plane | optional                       | recommended            |
49
| Persistent storage          | required                       | required               |
50
| Number of storage monitors  | 3                              | 5                      |
51 1 Nico Schottelius
52 43 Nico Schottelius
h2. General k8s operations
53 1 Nico Schottelius
54 46 Nico Schottelius
h3. Cheat sheet / external great references
55
56
* "kubectl cheatsheet":https://kubernetes.io/docs/reference/kubectl/cheatsheet/
57
58 117 Nico Schottelius
h3. Allowing to schedule work on the control plane / removing node taints
59 69 Nico Schottelius
60
* Mostly for single node / test / development clusters
61
* Just remove the master taint as follows
62
63
<pre>
64
kubectl taint nodes --all node-role.kubernetes.io/master-
65 118 Nico Schottelius
kubectl taint nodes --all node-role.kubernetes.io/control-plane-
66 69 Nico Schottelius
</pre>
67 1 Nico Schottelius
68 117 Nico Schottelius
You can check the node taints using @kubectl describe node ...@
69 69 Nico Schottelius
70 44 Nico Schottelius
h3. Get the cluster admin.conf
71
72
* On the masters of each cluster you can find the file @/etc/kubernetes/admin.conf@
73
* To be able to administrate the cluster you can copy the admin.conf to your local machine
74
* Multi cluster debugging can very easy if you name the config ~/cX-admin.conf (see example below)
75
76
<pre>
77
% scp root@server47.place7.ungleich.ch:/etc/kubernetes/admin.conf ~/c2-admin.conf
78
% export KUBECONFIG=~/c2-admin.conf    
79
% kubectl get nodes
80
NAME       STATUS                     ROLES                  AGE   VERSION
81
server47   Ready                      control-plane,master   82d   v1.22.0
82
server48   Ready                      control-plane,master   82d   v1.22.0
83
server49   Ready                      <none>                 82d   v1.22.0
84
server50   Ready                      <none>                 82d   v1.22.0
85
server59   Ready                      control-plane,master   82d   v1.22.0
86
server60   Ready,SchedulingDisabled   <none>                 82d   v1.22.0
87
server61   Ready                      <none>                 82d   v1.22.0
88
server62   Ready                      <none>                 82d   v1.22.0               
89
</pre>
90
91 18 Nico Schottelius
h3. Installing a new k8s cluster
92 8 Nico Schottelius
93 9 Nico Schottelius
* Decide on the cluster name (usually *cX.k8s.ooo*), X counting upwards
94 28 Nico Schottelius
** Using pXX.k8s.ooo for production clusters of placeXX
95 9 Nico Schottelius
* Use cdist to configure the nodes with requirements like crio
96
* Decide between single or multi node control plane setups (see below)
97 28 Nico Schottelius
** Single control plane suitable for development clusters
98 9 Nico Schottelius
99 28 Nico Schottelius
Typical init procedure:
100 9 Nico Schottelius
101 206 Nico Schottelius
h4. Single control plane:
102
103
<pre>
104
kubeadm init --config bootstrap/XXX/kubeadm.yaml
105
</pre>
106
107
h4. Multi control plane (HA):
108
109
<pre>
110
kubeadm init --config bootstrap/XXX/kubeadm.yaml --upload-certs
111
</pre>
112
113 10 Nico Schottelius
114 29 Nico Schottelius
h3. Deleting a pod that is hanging in terminating state
115
116
<pre>
117
kubectl delete pod <PODNAME> --grace-period=0 --force --namespace <NAMESPACE>
118
</pre>
119
120
(from https://stackoverflow.com/questions/35453792/pods-stuck-in-terminating-status)
121
122 42 Nico Schottelius
h3. Listing nodes of a cluster
123
124
<pre>
125
[15:05] bridge:~% kubectl get nodes
126
NAME       STATUS   ROLES                  AGE   VERSION
127
server22   Ready    <none>                 52d   v1.22.0
128
server23   Ready    <none>                 52d   v1.22.2
129
server24   Ready    <none>                 52d   v1.22.0
130
server25   Ready    <none>                 52d   v1.22.0
131
server26   Ready    <none>                 52d   v1.22.0
132
server27   Ready    <none>                 52d   v1.22.0
133
server63   Ready    control-plane,master   52d   v1.22.0
134
server64   Ready    <none>                 52d   v1.22.0
135
server65   Ready    control-plane,master   52d   v1.22.0
136
server66   Ready    <none>                 52d   v1.22.0
137
server83   Ready    control-plane,master   52d   v1.22.0
138
server84   Ready    <none>                 52d   v1.22.0
139
server85   Ready    <none>                 52d   v1.22.0
140
server86   Ready    <none>                 52d   v1.22.0
141
</pre>
142
143 41 Nico Schottelius
h3. Removing / draining a node
144
145
Usually @kubectl drain server@ should do the job, but sometimes we need to be more aggressive:
146
147 1 Nico Schottelius
<pre>
148 103 Nico Schottelius
kubectl drain --delete-emptydir-data --ignore-daemonsets serverXX
149 42 Nico Schottelius
</pre>
150
151
h3. Readding a node after draining
152
153
<pre>
154
kubectl uncordon serverXX
155 1 Nico Schottelius
</pre>
156 43 Nico Schottelius
157 50 Nico Schottelius
h3. (Re-)joining worker nodes after creating the cluster
158 49 Nico Schottelius
159
* We need to have an up-to-date token
160
* We use different join commands for the workers and control plane nodes
161
162
Generating the join command on an existing control plane node:
163
164
<pre>
165
kubeadm token create --print-join-command
166
</pre>
167
168 50 Nico Schottelius
h3. (Re-)joining control plane nodes after creating the cluster
169 1 Nico Schottelius
170 50 Nico Schottelius
* We generate the token again
171
* We upload the certificates
172
* We need to combine/create the join command for the control plane node
173
174
Example session:
175
176
<pre>
177
% kubeadm token create --print-join-command
178
kubeadm join p10-api.k8s.ooo:6443 --token xmff4i.ABC --discovery-token-ca-cert-hash sha256:longhash 
179
180
% kubeadm init phase upload-certs --upload-certs
181
[upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
182
[upload-certs] Using certificate key:
183
CERTKEY
184
185
# Then we use these two outputs on the joining node:
186
187
kubeadm join p10-api.k8s.ooo:6443 --token xmff4i.ABC --discovery-token-ca-cert-hash sha256:longhash --control-plane --certificate-key CERTKEY
188
</pre>
189
190
Commands to be used on a control plane node:
191
192
<pre>
193
kubeadm token create --print-join-command
194
kubeadm init phase upload-certs --upload-certs
195
</pre>
196
197
Commands to be used on the joining node:
198
199
<pre>
200
JOINCOMMAND --control-plane --certificate-key CERTKEY
201
</pre>
202 49 Nico Schottelius
203 51 Nico Schottelius
SEE ALSO
204
205
* https://stackoverflow.com/questions/63936268/how-to-generate-kubeadm-token-for-secondary-control-plane-nodes
206
* https://blog.scottlowe.org/2019/08/15/reconstructing-the-join-command-for-kubeadm/
207
208 53 Nico Schottelius
h3. How to fix etcd does not start when rejoining a kubernetes cluster as a control plane
209 52 Nico Schottelius
210
If during the above step etcd does not come up, @kubeadm join@ can hang as follows:
211
212
<pre>
213
[control-plane] Creating static Pod manifest for "kube-apiserver"                                                              
214
[control-plane] Creating static Pod manifest for "kube-controller-manager"                                                     
215
[control-plane] Creating static Pod manifest for "kube-scheduler"                                                              
216
[check-etcd] Checking that the etcd cluster is healthy                                                                         
217
error execution phase check-etcd: etcd cluster is not healthy: failed to dial endpoint https://[2a0a:e5c0:10:1:225:b3ff:fe20:37
218
8a]:2379 with maintenance client: context deadline exceeded                                                                    
219
To see the stack trace of this error execute with --v=5 or higher         
220
</pre>
221
222
Then the problem is likely that the etcd server is still a member of the cluster. We first need to remove it from the etcd cluster and then the join works.
223
224
To fix this we do:
225
226
* Find a working etcd pod
227
* Find the etcd members / member list
228
* Remove the etcd member that we want to re-join the cluster
229
230
231
<pre>
232
# Find the etcd pods
233
kubectl -n kube-system get pods -l component=etcd,tier=control-plane
234
235
# Get the list of etcd servers with the member id 
236
kubectl exec -n kube-system -ti ETCDPODNAME -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert  /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member list
237
238
# Remove the member
239
kubectl exec -n kube-system -ti ETCDPODNAME -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert  /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member remove MEMBERID
240
</pre>
241
242
Sample session:
243
244
<pre>
245
[10:48] line:~% kubectl -n kube-system get pods -l component=etcd,tier=control-plane
246
NAME            READY   STATUS    RESTARTS     AGE
247
etcd-server63   1/1     Running   0            3m11s
248
etcd-server65   1/1     Running   3            7d2h
249
etcd-server83   1/1     Running   8 (6d ago)   7d2h
250
[10:48] line:~% kubectl exec -n kube-system -ti etcd-server65 -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert  /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member list
251
356891cd676df6e4, started, server65, https://[2a0a:e5c0:10:1:225:b3ff:fe20:375c]:2380, https://[2a0a:e5c0:10:1:225:b3ff:fe20:375c]:2379, false
252
371b8a07185dee7e, started, server63, https://[2a0a:e5c0:10:1:225:b3ff:fe20:378a]:2380, https://[2a0a:e5c0:10:1:225:b3ff:fe20:378a]:2379, false
253
5942bc58307f8af9, started, server83, https://[2a0a:e5c0:10:1:3e4a:92ff:fe79:bb98]:2380, https://[2a0a:e5c0:10:1:3e4a:92ff:fe79:bb98]:2379, false
254
255
[10:48] line:~% kubectl exec -n kube-system -ti etcd-server65 -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert  /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member remove 371b8a07185dee7e
256
Member 371b8a07185dee7e removed from cluster e3c0805f592a8f77
257 1 Nico Schottelius
258
</pre>
259
260
SEE ALSO
261
262
* We found the solution using https://stackoverflow.com/questions/67921552/re-installed-node-cannot-join-kubernetes-cluster
263 56 Nico Schottelius
264 147 Nico Schottelius
h3. Node labels (adding, showing, removing)
265
266
Listing the labels:
267
268
<pre>
269
kubectl get nodes --show-labels
270
</pre>
271
272
Adding labels:
273
274
<pre>
275
kubectl label nodes LIST-OF-NODES label1=value1 
276
277
</pre>
278
279
For instance:
280
281
<pre>
282
kubectl label nodes router2 router3 hosttype=router 
283
</pre>
284
285
Selecting nodes in pods:
286
287
<pre>
288
apiVersion: v1
289
kind: Pod
290
...
291
spec:
292
  nodeSelector:
293
    hosttype: router
294
</pre>
295
296 148 Nico Schottelius
Removing labels by adding a minus at the end of the label name:
297
298
<pre>
299
kubectl label node <nodename> <labelname>-
300
</pre>
301
302
For instance:
303
304
<pre>
305
kubectl label nodes router2 router3 hosttype- 
306
</pre>
307
308 147 Nico Schottelius
SEE ALSO
309 1 Nico Schottelius
310 148 Nico Schottelius
* https://kubernetes.io/docs/tasks/configure-pod-container/assign-pods-nodes/
311
* https://stackoverflow.com/questions/34067979/how-to-delete-a-node-label-by-command-and-api
312 147 Nico Schottelius
313 199 Nico Schottelius
h3. Listing all pods on a node
314
315
<pre>
316
kubectl get pods --all-namespaces -o wide --field-selector spec.nodeName=serverXX
317
</pre>
318
319
Found on https://stackoverflow.com/questions/62000559/how-to-list-all-the-pods-running-in-a-particular-worker-node-by-executing-a-comm
320
321 101 Nico Schottelius
h3. Hardware Maintenance using ungleich-hardware
322
323
Use the following manifest and replace the HOST with the actual host:
324
325
<pre>
326
apiVersion: v1
327
kind: Pod
328
metadata:
329
  name: ungleich-hardware-HOST
330
spec:
331
  containers:
332
  - name: ungleich-hardware
333
    image: ungleich/ungleich-hardware:0.0.5
334
    args:
335
    - sleep
336
    - "1000000"
337
    volumeMounts:
338
      - mountPath: /dev
339
        name: dev
340
    securityContext:
341
      privileged: true
342
  nodeSelector:
343
    kubernetes.io/hostname: "HOST"
344
345
  volumes:
346
    - name: dev
347
      hostPath:
348
        path: /dev
349
</pre>
350
351 102 Nico Schottelius
Also see: [[The_ungleich_hardware_maintenance_guide]]
352
353 105 Nico Schottelius
h3. Triggering a cronjob / creating a job from a cronjob
354 104 Nico Schottelius
355
To test a cronjob, we can create a job from a cronjob:
356
357
<pre>
358
kubectl create job --from=cronjob/volume2-daily-backup volume2-manual
359
</pre>
360
361
This creates a job volume2-manual based on the cronjob  volume2-daily
362
363 112 Nico Schottelius
h3. su-ing into a user that has nologin shell set
364
365
Many times users are having nologin as their shell inside the container. To be able to execute maintenance commands within the
366
container, we can use @su -s /bin/sh@ like this:
367
368
<pre>
369
su -s /bin/sh -c '/path/to/your/script' testuser
370
</pre>
371
372
Found on https://serverfault.com/questions/351046/how-to-run-command-as-user-who-has-usr-sbin-nologin-as-shell
373
374 113 Nico Schottelius
h3. How to print a secret value
375
376
Assuming you want the "password" item from a secret, use:
377
378
<pre>
379
kubectl get secret SECRETNAME -o jsonpath="{.data.password}" | base64 -d; echo "" 
380
</pre>
381
382 173 Nico Schottelius
h3. How to upgrade a kubernetes cluster
383 172 Nico Schottelius
384
h4. General
385
386
* Should be done every X months to stay up-to-date
387
** X probably something like 3-6
388
* kubeadm based clusters
389
* Needs specific kubeadm versions for upgrade
390
* Follow instructions on https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/
391 190 Nico Schottelius
* Finding releases: https://github.com/kubernetes/kubernetes/tree/master/CHANGELOG
392 172 Nico Schottelius
393
h4. Getting a specific kubeadm or kubelet version
394
395
<pre>
396 190 Nico Schottelius
RELEASE=v1.22.17
397
RELEASE=v1.23.17
398 181 Nico Schottelius
RELEASE=v1.24.9
399 1 Nico Schottelius
RELEASE=v1.25.9
400
RELEASE=v1.26.6
401 190 Nico Schottelius
RELEASE=v1.27.2
402
403 187 Nico Schottelius
ARCH=amd64
404 172 Nico Schottelius
405
curl -L --remote-name-all https://dl.k8s.io/release/${RELEASE}/bin/linux/${ARCH}/{kubeadm,kubelet}
406 182 Nico Schottelius
chmod u+x kubeadm kubelet
407 172 Nico Schottelius
</pre>
408
409
h4. Steps
410
411
* kubeadm upgrade plan
412
** On one control plane node
413
* kubeadm upgrade apply vXX.YY.ZZ
414
** On one control plane node
415 189 Nico Schottelius
* kubeadm upgrade node
416
** On all other control plane nodes
417
** On all worker nodes afterwards
418
419 172 Nico Schottelius
420 173 Nico Schottelius
Repeat for all control planes nodes. The upgrade kubelet on all other nodes via package manager.
421 172 Nico Schottelius
422 193 Nico Schottelius
h4. Upgrading to 1.22.17
423 1 Nico Schottelius
424 193 Nico Schottelius
* https://v1-22.docs.kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/
425 194 Nico Schottelius
* Need to create a kubeadm config map
426 198 Nico Schottelius
** f.i. using the following
427
** @/usr/local/bin/kubeadm-v1.22.17   upgrade --config kubeadm.yaml --ignore-preflight-errors=CoreDNSUnsupportedPlugins,CoreDNSMigration apply -y v1.22.17@
428 193 Nico Schottelius
* Done for p6 on 2023-10-04
429
430
h4. Upgrading to 1.23.17
431
432
* https://v1-23.docs.kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/
433
* No special notes
434
* Done for p6 on 2023-10-04
435
436
h4. Upgrading to 1.24.17
437
438
* https://v1-24.docs.kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/
439
* No special notes
440
* Done for p6 on 2023-10-04
441
442
h4. Upgrading to 1.25.14
443
444
* https://v1-24.docs.kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/
445
* No special notes
446
* Done for p6 on 2023-10-04
447
448
h4. Upgrading to 1.26.9
449
450 1 Nico Schottelius
* https://v1-26.docs.kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/
451 193 Nico Schottelius
* No special notes
452
* Done for p6 on 2023-10-04
453 188 Nico Schottelius
454 196 Nico Schottelius
h4. Upgrading to 1.27
455 186 Nico Schottelius
456 192 Nico Schottelius
* https://v1-27.docs.kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/
457 186 Nico Schottelius
* kubelet will not start anymore
458
* reason: @"command failed" err="failed to parse kubelet flag: unknown flag: --container-runtime"@
459
* /var/lib/kubelet/kubeadm-flags.env contains that parameter
460
* remove it, start kubelet
461 192 Nico Schottelius
462 197 Nico Schottelius
h4. Upgrading to 1.28
463 192 Nico Schottelius
464
* https://v1-28.docs.kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/
465 186 Nico Schottelius
466
h4. Upgrade to crio 1.27: missing crun
467
468
Error message
469
470
<pre>
471
level=fatal msg="validating runtime config: runtime validation: \"crun\" not found in $PATH: exec: \"crun\": executable file not found in $PATH"
472
</pre>
473
474
Fix:
475
476
<pre>
477
apk add crun
478
</pre>
479
480 157 Nico Schottelius
h2. Reference CNI
481
482
* Mainly "stupid", but effective plugins
483
* Main documentation on https://www.cni.dev/plugins/current/
484 158 Nico Schottelius
* Plugins
485
** bridge
486
*** Can create the bridge on the host
487
*** But seems not to be able to add host interfaces to it as well
488
*** Has support for vlan tags
489
** vlan
490
*** creates vlan tagged sub interface on the host
491 160 Nico Schottelius
*** "It's a 1:1 mapping (i.e. no bridge in between)":https://github.com/k8snetworkplumbingwg/multus-cni/issues/569
492 158 Nico Schottelius
** host-device
493
*** moves the interface from the host into the container
494
*** very easy for physical connections to containers
495 159 Nico Schottelius
** ipvlan
496
*** "virtualisation" of a host device
497
*** routing based on IP
498
*** Same MAC for everyone
499
*** Cannot reach the master interface
500
** maclvan
501
*** With mac addresses
502
*** Supports various modes (to be checked)
503
** ptp ("point to point")
504
*** Creates a host device and connects it to the container
505
** win*
506 158 Nico Schottelius
*** Windows implementations
507 157 Nico Schottelius
508 62 Nico Schottelius
h2. Calico CNI
509
510
h3. Calico Installation
511
512
* We install "calico using helm":https://docs.projectcalico.org/getting-started/kubernetes/helm
513
* This has the following advantages:
514
** Easy to upgrade
515
** Does not require os to configure IPv6/dual stack settings as the tigera operator figures out things on its own
516
517
Usually plain calico can be installed directly using:
518
519
<pre>
520 174 Nico Schottelius
VERSION=v3.25.0
521 149 Nico Schottelius
522 1 Nico Schottelius
helm repo add projectcalico https://docs.projectcalico.org/charts
523 167 Nico Schottelius
helm repo update
524 124 Nico Schottelius
helm upgrade --install --namespace tigera calico projectcalico/tigera-operator --version $VERSION --create-namespace
525 1 Nico Schottelius
</pre>
526 92 Nico Schottelius
527
* Check the tags on https://github.com/projectcalico/calico/tags for the latest release
528 62 Nico Schottelius
529
h3. Installing calicoctl
530
531 115 Nico Schottelius
* General installation instructions, including binary download: https://projectcalico.docs.tigera.io/maintenance/clis/calicoctl/install
532
533 62 Nico Schottelius
To be able to manage and configure calico, we need to 
534
"install calicoctl (we choose the version as a pod)":https://docs.projectcalico.org/getting-started/clis/calicoctl/install#install-calicoctl-as-a-kubernetes-pod
535
536
<pre>
537
kubectl apply -f https://docs.projectcalico.org/manifests/calicoctl.yaml
538
</pre>
539
540 93 Nico Schottelius
Or version specific:
541
542
<pre>
543
kubectl apply -f https://github.com/projectcalico/calico/blob/v3.20.4/manifests/calicoctl.yaml
544 97 Nico Schottelius
545
# For 3.22
546
kubectl apply -f https://projectcalico.docs.tigera.io/archive/v3.22/manifests/calicoctl.yaml
547 93 Nico Schottelius
</pre>
548
549 70 Nico Schottelius
And making it easier accessible by alias:
550
551
<pre>
552
alias calicoctl="kubectl exec -i -n kube-system calicoctl -- /calicoctl"
553
</pre>
554
555 62 Nico Schottelius
h3. Calico configuration
556
557 63 Nico Schottelius
By default our k8s clusters "BGP peer":https://docs.projectcalico.org/networking/bgp
558
with an upstream router to propagate podcidr and servicecidr.
559 62 Nico Schottelius
560
Default settings in our infrastructure:
561
562
* We use a full-mesh using the @nodeToNodeMeshEnabled: true@ option
563
* We keep the original next hop so that *only* the server with the pod is announcing it (instead of ecmp)
564 1 Nico Schottelius
* We use private ASNs for k8s clusters
565 63 Nico Schottelius
* We do *not* use any overlay
566 62 Nico Schottelius
567
After installing calico and calicoctl the last step of the installation is usually:
568
569 1 Nico Schottelius
<pre>
570 79 Nico Schottelius
calicoctl create -f - < calico-bgp.yaml
571 62 Nico Schottelius
</pre>
572
573
574
A sample BGP configuration:
575
576
<pre>
577
---
578
apiVersion: projectcalico.org/v3
579
kind: BGPConfiguration
580
metadata:
581
  name: default
582
spec:
583
  logSeverityScreen: Info
584
  nodeToNodeMeshEnabled: true
585
  asNumber: 65534
586
  serviceClusterIPs:
587
  - cidr: 2a0a:e5c0:10:3::/108
588
  serviceExternalIPs:
589
  - cidr: 2a0a:e5c0:10:3::/108
590
---
591
apiVersion: projectcalico.org/v3
592
kind: BGPPeer
593
metadata:
594
  name: router1-place10
595
spec:
596
  peerIP: 2a0a:e5c0:10:1::50
597
  asNumber: 213081
598
  keepOriginalNextHop: true
599
</pre>
600
601 126 Nico Schottelius
h2. Cilium CNI (experimental)
602
603 137 Nico Schottelius
h3. Status
604
605 138 Nico Schottelius
*NO WORKING CILIUM CONFIGURATION FOR IPV6 only modes*
606 137 Nico Schottelius
607 146 Nico Schottelius
h3. Latest error
608
609
It seems cilium does not run on IPv6 only hosts:
610
611
<pre>
612
level=info msg="Validating configured node address ranges" subsys=daemon
613
level=fatal msg="postinit failed" error="external IPv4 node address could not be derived, please configure via --ipv4-node" subsys=daemon
614
level=info msg="Starting IP identity watcher" subsys=ipcache
615
</pre>
616
617
It crashes after that log entry
618
619 128 Nico Schottelius
h3. BGP configuration
620
621
* The cilium-operator will not start without a correct configmap being present beforehand (see error message below)
622
* Creating the bgp config beforehand as a configmap is thus required.
623
624
The error one gets without the configmap present:
625
626
Pods are hanging with:
627
628
<pre>
629
cilium-bpqm6                       0/1     Init:0/4            0             9s
630
cilium-operator-5947d94f7f-5bmh2   0/1     ContainerCreating   0             9s
631
</pre>
632
633
The error message in the cilium-*perator is:
634
635
<pre>
636
Events:
637
  Type     Reason       Age                From               Message
638
  ----     ------       ----               ----               -------
639
  Normal   Scheduled    80s                default-scheduler  Successfully assigned kube-system/cilium-operator-5947d94f7f-lqcsp to server56
640
  Warning  FailedMount  16s (x8 over 80s)  kubelet            MountVolume.SetUp failed for volume "bgp-config-path" : configmap "bgp-config" not found
641
</pre>
642
643
A correct bgp config looks like this:
644
645
<pre>
646
apiVersion: v1
647
kind: ConfigMap
648
metadata:
649
  name: bgp-config
650
  namespace: kube-system
651
data:
652
  config.yaml: |
653
    peers:
654
      - peer-address: 2a0a:e5c0::46
655
        peer-asn: 209898
656
        my-asn: 65533
657
      - peer-address: 2a0a:e5c0::47
658
        peer-asn: 209898
659
        my-asn: 65533
660
    address-pools:
661
      - name: default
662
        protocol: bgp
663
        addresses:
664
          - 2a0a:e5c0:0:14::/64
665
</pre>
666 127 Nico Schottelius
667
h3. Installation
668 130 Nico Schottelius
669 127 Nico Schottelius
Adding the repo
670 1 Nico Schottelius
<pre>
671 127 Nico Schottelius
672 129 Nico Schottelius
helm repo add cilium https://helm.cilium.io/
673 130 Nico Schottelius
helm repo update
674
</pre>
675 129 Nico Schottelius
676 135 Nico Schottelius
Installing + configuring cilium
677 129 Nico Schottelius
<pre>
678 130 Nico Schottelius
ipv6pool=2a0a:e5c0:0:14::/112
679 1 Nico Schottelius
680 146 Nico Schottelius
version=1.12.2
681 129 Nico Schottelius
682
helm upgrade --install cilium cilium/cilium --version $version \
683 1 Nico Schottelius
  --namespace kube-system \
684
  --set ipv4.enabled=false \
685
  --set ipv6.enabled=true \
686 146 Nico Schottelius
  --set enableIPv6Masquerade=false \
687
  --set bgpControlPlane.enabled=true 
688 1 Nico Schottelius
689 146 Nico Schottelius
#  --set ipam.operator.clusterPoolIPv6PodCIDRList=$ipv6pool
690
691
# Old style bgp?
692 136 Nico Schottelius
#   --set bgp.enabled=true --set bgp.announce.podCIDR=true \
693 127 Nico Schottelius
694
# Show possible configuration options
695
helm show values cilium/cilium
696
697 1 Nico Schottelius
</pre>
698 132 Nico Schottelius
699
Using a /64 for ipam.operator.clusterPoolIPv6PodCIDRList fails with:
700
701
<pre>
702
level=fatal msg="Unable to init cluster-pool allocator" error="unable to initialize IPv6 allocator New CIDR set failed; the node CIDR size is too big" subsys=cilium-operator-generic
703
</pre>
704
705 126 Nico Schottelius
706 1 Nico Schottelius
See also https://github.com/cilium/cilium/issues/20756
707 135 Nico Schottelius
708
Seems a /112 is actually working.
709
710
h3. Kernel modules
711
712
Cilium requires the following modules to be loaded on the host (not loaded by default):
713
714
<pre>
715 1 Nico Schottelius
modprobe  ip6table_raw
716
modprobe  ip6table_filter
717
</pre>
718 146 Nico Schottelius
719
h3. Interesting helm flags
720
721
* autoDirectNodeRoutes
722
* bgpControlPlane.enabled = true
723
724
h3. SEE ALSO
725
726
* https://docs.cilium.io/en/v1.12/helm-reference/
727 133 Nico Schottelius
728 179 Nico Schottelius
h2. Multus
729 168 Nico Schottelius
730
* https://github.com/k8snetworkplumbingwg/multus-cni
731
* Installing a deployment w/ CRDs
732 150 Nico Schottelius
733 169 Nico Schottelius
<pre>
734 176 Nico Schottelius
VERSION=v4.0.1
735 169 Nico Schottelius
736 170 Nico Schottelius
kubectl apply -f https://raw.githubusercontent.com/k8snetworkplumbingwg/multus-cni/${VERSION}/deployments/multus-daemonset-crio.yml
737
</pre>
738 169 Nico Schottelius
739 191 Nico Schottelius
h2. ArgoCD
740 56 Nico Schottelius
741 60 Nico Schottelius
h3. Argocd Installation
742 1 Nico Schottelius
743 116 Nico Schottelius
* See https://argo-cd.readthedocs.io/en/stable/
744
745 60 Nico Schottelius
As there is no configuration management present yet, argocd is installed using
746
747 1 Nico Schottelius
<pre>
748 60 Nico Schottelius
kubectl create namespace argocd
749 1 Nico Schottelius
750
# OR: latest stable
751
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml
752
753 191 Nico Schottelius
# OR Specific Version
754
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/v2.3.2/manifests/install.yaml
755 56 Nico Schottelius
756 191 Nico Schottelius
757
</pre>
758 1 Nico Schottelius
759 60 Nico Schottelius
h3. Get the argocd credentials
760
761
<pre>
762
kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d; echo ""
763
</pre>
764 52 Nico Schottelius
765 87 Nico Schottelius
h3. Accessing argocd
766
767
In regular IPv6 clusters:
768
769
* Navigate to https://argocd-server.argocd.CLUSTERDOMAIN
770
771
In legacy IPv4 clusters
772
773
<pre>
774
kubectl --namespace argocd port-forward svc/argocd-server 8080:80
775
</pre>
776
777 88 Nico Schottelius
* Navigate to https://localhost:8080
778
779 68 Nico Schottelius
h3. Using the argocd webhook to trigger changes
780 67 Nico Schottelius
781
* To trigger changes post json https://argocd.example.com/api/webhook
782
783 72 Nico Schottelius
h3. Deploying an application
784
785
* Applications are deployed via git towards gitea (code.ungleich.ch) and then pulled by argo
786 73 Nico Schottelius
* Always include the *redmine-url* pointing to the (customer) ticket
787
** Also add the support-url if it exists
788 72 Nico Schottelius
789
Application sample
790
791
<pre>
792
apiVersion: argoproj.io/v1alpha1
793
kind: Application
794
metadata:
795
  name: gitea-CUSTOMER
796
  namespace: argocd
797
spec:
798
  destination:
799
    namespace: default
800
    server: 'https://kubernetes.default.svc'
801
  source:
802
    path: apps/prod/gitea
803
    repoURL: 'https://code.ungleich.ch/ungleich-intern/k8s-config.git'
804
    targetRevision: HEAD
805
    helm:
806
      parameters:
807
        - name: storage.data.storageClass
808
          value: rook-ceph-block-hdd
809
        - name: storage.data.size
810
          value: 200Gi
811
        - name: storage.db.storageClass
812
          value: rook-ceph-block-ssd
813
        - name: storage.db.size
814
          value: 10Gi
815
        - name: storage.letsencrypt.storageClass
816
          value: rook-ceph-block-hdd
817
        - name: storage.letsencrypt.size
818
          value: 50Mi
819
        - name: letsencryptStaging
820
          value: 'no'
821
        - name: fqdn
822
          value: 'code.verua.online'
823
  project: default
824
  syncPolicy:
825
    automated:
826
      prune: true
827
      selfHeal: true
828
  info:
829
    - name: 'redmine-url'
830
      value: 'https://redmine.ungleich.ch/issues/ISSUEID'
831
    - name: 'support-url'
832
      value: 'https://support.ungleich.ch/Ticket/Display.html?id=TICKETID'
833
</pre>
834
835 80 Nico Schottelius
h2. Helm related operations and conventions
836 55 Nico Schottelius
837 61 Nico Schottelius
We use helm charts extensively.
838
839
* In production, they are managed via argocd
840
* In development, helm chart can de developed and deployed manually using the helm utility.
841
842 55 Nico Schottelius
h3. Installing a helm chart
843
844
One can use the usual pattern of
845
846
<pre>
847
helm install <releasename> <chartdirectory>
848
</pre>
849
850
However often you want to reinstall/update when testing helm charts. The following pattern is "better", because it allows you to reinstall, if it is already installed:
851
852
<pre>
853
helm upgrade --install <releasename> <chartdirectory>
854 1 Nico Schottelius
</pre>
855 80 Nico Schottelius
856
h3. Naming services and deployments in helm charts [Application labels]
857
858
* We always have {{ .Release.Name }} to identify the current "instance"
859
* Deployments:
860
** use @app: <what it is>@, f.i. @app: nginx@, @app: postgres@, ...
861 81 Nico Schottelius
* See more about standard labels on
862
** https://kubernetes.io/docs/concepts/overview/working-with-objects/common-labels/
863
** https://helm.sh/docs/chart_best_practices/labels/
864 55 Nico Schottelius
865 151 Nico Schottelius
h3. Show all versions of a helm chart
866
867
<pre>
868
helm search repo -l repo/chart
869
</pre>
870
871
For example:
872
873
<pre>
874
% helm search repo -l projectcalico/tigera-operator 
875
NAME                         	CHART VERSION	APP VERSION	DESCRIPTION                            
876
projectcalico/tigera-operator	v3.23.3      	v3.23.3    	Installs the Tigera operator for Calico
877
projectcalico/tigera-operator	v3.23.2      	v3.23.2    	Installs the Tigera operator for Calico
878
....
879
</pre>
880
881 152 Nico Schottelius
h3. Show possible values of a chart
882
883
<pre>
884
helm show values <repo/chart>
885
</pre>
886
887
Example:
888
889
<pre>
890
helm show values ingress-nginx/ingress-nginx
891
</pre>
892
893 207 Nico Schottelius
h3. Show all possible charts in a repo
894
895
<pre>
896
helm search repo REPO
897
</pre>
898
899 178 Nico Schottelius
h3. Download a chart
900
901
For instance for checking it out locally. Use:
902
903
<pre>
904
helm pull <repo/chart>
905
</pre>
906 152 Nico Schottelius
907 139 Nico Schottelius
h2. Rook + Ceph
908
909
h3. Installation
910
911
* Usually directly via argocd
912
913 71 Nico Schottelius
h3. Executing ceph commands
914
915
Using the ceph-tools pod as follows:
916
917
<pre>
918
kubectl exec -n rook-ceph -ti $(kubectl -n rook-ceph get pods -l app=rook-ceph-tools -o jsonpath='{.items[*].metadata.name}') -- ceph -s
919
</pre>
920
921 43 Nico Schottelius
h3. Inspecting the logs of a specific server
922
923
<pre>
924
# Get the related pods
925
kubectl -n rook-ceph get pods -l app=rook-ceph-osd-prepare 
926
...
927
928
# Inspect the logs of a specific pod
929
kubectl -n rook-ceph logs -f rook-ceph-osd-prepare-server23--1-444qx
930
931 71 Nico Schottelius
</pre>
932
933
h3. Inspecting the logs of the rook-ceph-operator
934
935
<pre>
936
kubectl -n rook-ceph logs -f -l app=rook-ceph-operator
937 43 Nico Schottelius
</pre>
938
939 200 Nico Schottelius
h3. (Temporarily) Disabling the rook-operation
940
941
* first disabling the sync in argocd
942
* then scale it down
943
944
<pre>
945
kubectl -n rook-ceph scale deployment rook-ceph-operator --replicas=0
946
</pre>
947
948
When done with the work/maintenance, re-enable sync in argocd.
949
The following command is thus strictly speaking not required, as argocd will fix it on its own:
950
951
<pre>
952
kubectl -n rook-ceph scale deployment rook-ceph-operator --replicas=1
953
</pre>
954
955 121 Nico Schottelius
h3. Restarting the rook operator
956
957
<pre>
958
kubectl -n rook-ceph delete pods  -l app=rook-ceph-operator
959
</pre>
960
961 43 Nico Schottelius
h3. Triggering server prepare / adding new osds
962
963
The rook-ceph-operator triggers/watches/creates pods to maintain hosts. To trigger a full "re scan", simply delete that pod:
964
965
<pre>
966
kubectl -n rook-ceph delete pods -l app=rook-ceph-operator
967
</pre>
968
969
This will cause all the @rook-ceph-osd-prepare-..@ jobs to be recreated and thus OSDs to be created, if new disks have been added.
970
971
h3. Removing an OSD
972
973
* See "Ceph OSD Management":https://rook.io/docs/rook/v1.7/ceph-osd-mgmt.html
974 77 Nico Schottelius
* More specifically: https://github.com/rook/rook/blob/release-1.7/cluster/examples/kubernetes/ceph/osd-purge.yaml
975 99 Nico Schottelius
* Then delete the related deployment
976 41 Nico Schottelius
977 98 Nico Schottelius
Set osd id in the osd-purge.yaml and apply it. OSD should be down before.
978
979
<pre>
980
apiVersion: batch/v1
981
kind: Job
982
metadata:
983
  name: rook-ceph-purge-osd
984
  namespace: rook-ceph # namespace:cluster
985
  labels:
986
    app: rook-ceph-purge-osd
987
spec:
988
  template:
989
    metadata:
990
      labels:
991
        app: rook-ceph-purge-osd
992
    spec:
993
      serviceAccountName: rook-ceph-purge-osd
994
      containers:
995
        - name: osd-removal
996
          image: rook/ceph:master
997
          # TODO: Insert the OSD ID in the last parameter that is to be removed
998
          # The OSD IDs are a comma-separated list. For example: "0" or "0,2".
999
          # If you want to preserve the OSD PVCs, set `--preserve-pvc true`.
1000
          #
1001
          # A --force-osd-removal option is available if the OSD should be destroyed even though the
1002
          # removal could lead to data loss.
1003
          args:
1004
            - "ceph"
1005
            - "osd"
1006
            - "remove"
1007
            - "--preserve-pvc"
1008
            - "false"
1009
            - "--force-osd-removal"
1010
            - "false"
1011
            - "--osd-ids"
1012
            - "SETTHEOSDIDHERE"
1013
          env:
1014
            - name: POD_NAMESPACE
1015
              valueFrom:
1016
                fieldRef:
1017
                  fieldPath: metadata.namespace
1018
            - name: ROOK_MON_ENDPOINTS
1019
              valueFrom:
1020
                configMapKeyRef:
1021
                  key: data
1022
                  name: rook-ceph-mon-endpoints
1023
            - name: ROOK_CEPH_USERNAME
1024
              valueFrom:
1025
                secretKeyRef:
1026
                  key: ceph-username
1027
                  name: rook-ceph-mon
1028
            - name: ROOK_CEPH_SECRET
1029
              valueFrom:
1030
                secretKeyRef:
1031
                  key: ceph-secret
1032
                  name: rook-ceph-mon
1033
            - name: ROOK_CONFIG_DIR
1034
              value: /var/lib/rook
1035
            - name: ROOK_CEPH_CONFIG_OVERRIDE
1036
              value: /etc/rook/config/override.conf
1037
            - name: ROOK_FSID
1038
              valueFrom:
1039
                secretKeyRef:
1040
                  key: fsid
1041
                  name: rook-ceph-mon
1042
            - name: ROOK_LOG_LEVEL
1043
              value: DEBUG
1044
          volumeMounts:
1045
            - mountPath: /etc/ceph
1046
              name: ceph-conf-emptydir
1047
            - mountPath: /var/lib/rook
1048
              name: rook-config
1049
      volumes:
1050
        - emptyDir: {}
1051
          name: ceph-conf-emptydir
1052
        - emptyDir: {}
1053
          name: rook-config
1054
      restartPolicy: Never
1055
1056
1057 99 Nico Schottelius
</pre>
1058
1059 1 Nico Schottelius
Deleting the deployment:
1060
1061
<pre>
1062
[18:05] bridge:~% kubectl -n rook-ceph delete deployment rook-ceph-osd-6
1063 99 Nico Schottelius
deployment.apps "rook-ceph-osd-6" deleted
1064
</pre>
1065 185 Nico Schottelius
1066
h3. Placement of mons/osds/etc.
1067
1068
See https://rook.io/docs/rook/v1.11/CRDs/Cluster/ceph-cluster-crd/#placement-configuration-settings
1069 98 Nico Schottelius
1070 145 Nico Schottelius
h2. Ingress + Cert Manager
1071
1072
* We deploy "nginx-ingress":https://docs.nginx.com/nginx-ingress-controller/ to get an ingress
1073
* we deploy "cert-manager":https://cert-manager.io/ to handle certificates
1074
* We independently deploy @ClusterIssuer@ to allow the cert-manager app to deploy and the issuer to be created once the CRDs from cert manager are in place
1075
1076
h3. IPv4 reachability 
1077
1078
The ingress is by default IPv6 only. To make it reachable from the IPv4 world, get its IPv6 address and configure a NAT64 mapping in Jool.
1079
1080
Steps:
1081
1082
h4. Get the ingress IPv6 address
1083
1084
Use @kubectl -n ingress-nginx get svc ingress-nginx-controller -o jsonpath='{.spec.clusterIP}'; echo ''@
1085
1086
Example:
1087
1088
<pre>
1089
kubectl -n ingress-nginx get svc ingress-nginx-controller -o jsonpath='{.spec.clusterIP}'; echo ''
1090
2a0a:e5c0:10:1b::ce11
1091
</pre>
1092
1093
h4. Add NAT64 mapping
1094
1095
* Update the __dcl_jool_siit cdist type
1096
* Record the two IPs (IPv6 and IPv4)
1097
* Configure all routers
1098
1099
1100
h4. Add DNS record
1101
1102
To use the ingress capable as a CNAME destination, create an "ingress" DNS record, such as:
1103
1104
<pre>
1105
; k8s ingress for dev
1106
dev-ingress                 AAAA 2a0a:e5c0:10:1b::ce11
1107
dev-ingress                 A 147.78.194.23
1108
1109
</pre> 
1110
1111
h4. Add supporting wildcard DNS
1112
1113
If you plan to add various sites under a specific domain, we can add a wildcard DNS entry, such as *.k8s-dev.django-hosting.ch:
1114
1115
<pre>
1116
*.k8s-dev         CNAME dev-ingress.ungleich.ch.
1117
</pre>
1118
1119 76 Nico Schottelius
h2. Harbor
1120
1121 175 Nico Schottelius
* We user "Harbor":https://goharbor.io/ as an image registry for our own images. Internal app reference: apps/prod/harbor.
1122
* The admin password is in the password store, it is Harbor12345 by default
1123 76 Nico Schottelius
* At the moment harbor only authenticates against the internal ldap tree
1124
1125
h3. LDAP configuration
1126
1127
* The url needs to be ldaps://...
1128
* uid = uid
1129
* rest standard
1130 75 Nico Schottelius
1131 89 Nico Schottelius
h2. Monitoring / Prometheus
1132
1133 90 Nico Schottelius
* Via "kube-prometheus":https://github.com/prometheus-operator/kube-prometheus/
1134 89 Nico Schottelius
1135 91 Nico Schottelius
Access via ...
1136
1137
* http://prometheus-k8s.monitoring.svc:9090
1138
* http://grafana.monitoring.svc:3000
1139
* http://alertmanager.monitoring.svc:9093
1140
1141
1142 100 Nico Schottelius
h3. Prometheus Options
1143
1144
* "helm/kube-prometheus-stack":https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack
1145
** Includes dashboards and co.
1146
* "manifest based kube-prometheus":https://github.com/prometheus-operator/kube-prometheus
1147
** Includes dashboards and co.
1148
* "Prometheus Operator (mainly CRD manifest":https://github.com/prometheus-operator/prometheus-operator
1149
1150 171 Nico Schottelius
h3. Grafana default password
1151
1152
* If not changed: @prom-operator@
1153
1154 82 Nico Schottelius
h2. Nextcloud
1155
1156 85 Nico Schottelius
h3. How to get the nextcloud credentials 
1157 84 Nico Schottelius
1158
* The initial username is set to "nextcloud"
1159
* The password is autogenerated and saved in a kubernetes secret
1160
1161
<pre>
1162 85 Nico Schottelius
kubectl get secret RELEASENAME-nextcloud -o jsonpath="{.data.PASSWORD}" | base64 -d; echo "" 
1163 84 Nico Schottelius
</pre>
1164
1165 83 Nico Schottelius
h3. How to fix "Access through untrusted domain"
1166
1167 82 Nico Schottelius
* Nextcloud stores the initial domain configuration
1168 1 Nico Schottelius
* If the FQDN is changed, it will show the error message "Access through untrusted domain"
1169 82 Nico Schottelius
* To fix, edit /var/www/html/config/config.php and correct the domain
1170 1 Nico Schottelius
* Then delete the pods
1171 165 Nico Schottelius
1172
h3. Running occ commands inside the nextcloud container
1173
1174
* Find the pod in the right namespace
1175
1176
Exec:
1177
1178
<pre>
1179
su www-data -s /bin/sh -c ./occ
1180
</pre>
1181
1182
* -s /bin/sh is needed as the default shell is set to /bin/false
1183
1184 166 Nico Schottelius
h4. Rescanning files
1185 165 Nico Schottelius
1186 166 Nico Schottelius
* If files have been added without nextcloud's knowledge
1187
1188
<pre>
1189
su www-data -s /bin/sh -c "./occ files:scan --all"
1190
</pre>
1191 82 Nico Schottelius
1192 201 Nico Schottelius
h2. Sealed Secrets
1193
1194 202 Jin-Guk Kwon
* install kubeseal
1195 1 Nico Schottelius
1196 202 Jin-Guk Kwon
<pre>
1197
KUBESEAL_VERSION='0.23.0'
1198
wget "https://github.com/bitnami-labs/sealed-secrets/releases/download/v${KUBESEAL_VERSION:?}/kubeseal-${KUBESEAL_VERSION:?}-linux-amd64.tar.gz" 
1199
tar -xvzf kubeseal-${KUBESEAL_VERSION:?}-linux-amd64.tar.gz kubeseal
1200
sudo install -m 755 kubeseal /usr/local/bin/kubeseal
1201
</pre>
1202
1203
* create key for sealed-secret
1204
1205
<pre>
1206
kubeseal --fetch-cert > /tmp/public-key-cert.pem
1207
</pre>
1208
1209
* create the secret
1210
1211
<pre>
1212 203 Jin-Guk Kwon
ex)
1213 202 Jin-Guk Kwon
apiVersion: v1
1214
kind: Secret
1215
metadata:
1216
  name: Release.Name-postgres-config
1217
  annotations:
1218
    secret-generator.v1.mittwald.de/autogenerate: POSTGRES_PASSWORD
1219
    hosting: Release.Name
1220
  labels:
1221
    app.kubernetes.io/instance: Release.Name
1222
    app.kubernetes.io/component: postgres
1223
stringData:
1224
  POSTGRES_USER: postgresUser
1225
  POSTGRES_DB: postgresDBName
1226
  POSTGRES_INITDB_ARGS: "--no-locale --encoding=UTF8"
1227
</pre>
1228
1229
* convert secret.yaml to sealed-secret.yaml
1230
1231
<pre>
1232
kubeseal -n <namespace> --cert=/tmp/public-key-cert.pem --format=yaml < ./secret.yaml  > ./sealed-secret.yaml
1233
</pre>
1234
1235
* use sealed-secret.yaml on helm-chart directory
1236 201 Nico Schottelius
1237 205 Jin-Guk Kwon
* refer ticket : #11989 , #12120
1238 204 Jin-Guk Kwon
1239 1 Nico Schottelius
h2. Infrastructure versions
1240 35 Nico Schottelius
1241 57 Nico Schottelius
h3. ungleich kubernetes infrastructure v5 (2021-10)
1242 1 Nico Schottelius
1243 57 Nico Schottelius
Clusters are configured / setup in this order:
1244
1245
* Bootstrap via kubeadm
1246 59 Nico Schottelius
* "Networking via calico + BGP (non ECMP) using helm":https://docs.projectcalico.org/getting-started/kubernetes/helm
1247
* "ArgoCD for CD":https://argo-cd.readthedocs.io/en/stable/
1248
** "rook for storage via argocd":https://rook.io/
1249 58 Nico Schottelius
** haproxy for in IPv6-cluster-IPv4-to-IPv6 proxy via argocd
1250
** "kubernetes-secret-generator for in cluster secrets":https://github.com/mittwald/kubernetes-secret-generator
1251
** "ungleich-certbot managing certs and nginx":https://hub.docker.com/repository/docker/ungleich/ungleich-certbot
1252
1253 57 Nico Schottelius
1254
h3. ungleich kubernetes infrastructure v4 (2021-09)
1255
1256 54 Nico Schottelius
* rook is configured via manifests instead of using the rook-ceph-cluster helm chart
1257 1 Nico Schottelius
* The rook operator is still being installed via helm
1258 35 Nico Schottelius
1259 57 Nico Schottelius
h3. ungleich kubernetes infrastructure v3 (2021-07)
1260 1 Nico Schottelius
1261 10 Nico Schottelius
* rook is now installed via helm via argocd instead of directly via manifests
1262 28 Nico Schottelius
1263 57 Nico Schottelius
h3. ungleich kubernetes infrastructure v2 (2021-05)
1264 28 Nico Schottelius
1265
* Replaced fluxv2 from ungleich k8s v1 with argocd
1266 1 Nico Schottelius
** argocd can apply helm templates directly without needing to go through Chart releases
1267 28 Nico Schottelius
* We are also using argoflow for build flows
1268
* Planned to add "kaniko":https://github.com/GoogleContainerTools/kaniko for image building
1269
1270 57 Nico Schottelius
h3. ungleich kubernetes infrastructure v1 (2021-01)
1271 28 Nico Schottelius
1272
We are using the following components:
1273
1274
* "Calico as a CNI":https://www.projectcalico.org/ with BGP, IPv6 only, no encapsulation
1275
** Needed for basic networking
1276
* "kubernetes-secret-generator":https://github.com/mittwald/kubernetes-secret-generator for creating secrets
1277
** Needed so that secrets are not stored in the git repository, but only in the cluster
1278
* "ungleich-certbot":https://hub.docker.com/repository/docker/ungleich/ungleich-certbot
1279
** Needed to get letsencrypt certificates for services
1280
* "rook with ceph rbd + cephfs":https://rook.io/ for storage
1281
** rbd for almost everything, *ReadWriteOnce*
1282
** cephfs for smaller things, multi access *ReadWriteMany*
1283
** Needed for providing persistent storage
1284
* "flux v2":https://fluxcd.io/
1285
** Needed to manage resources automatically