Project

General

Profile

The ungleich kubernetes infrastructure » History » Version 198

Nico Schottelius, 10/08/2023 09:30 PM

1 22 Nico Schottelius
h1. The ungleich kubernetes infrastructure and ungleich kubernetes manual
2 1 Nico Schottelius
3 3 Nico Schottelius
{{toc}}
4
5 1 Nico Schottelius
h2. Status
6
7 28 Nico Schottelius
This document is **pre-production**.
8
This document is to become the ungleich kubernetes infrastructure overview as well as the ungleich kubernetes manual.
9 1 Nico Schottelius
10 10 Nico Schottelius
h2. k8s clusters
11
12 123 Nico Schottelius
| Cluster            | Purpose/Setup     | Maintainer | Master(s)                     | argo                                                   | v4 http proxy | last verified |
13
| c0.k8s.ooo         | Dev               | -          | UNUSED                        |                                                        |               |    2021-10-05 |
14
| c1.k8s.ooo         | retired           |            | -                             |                                                        |               |    2022-03-15 |
15
| c2.k8s.ooo         | Dev p7 HW         | Nico       | server47 server53 server54    | "argo":https://argocd-server.argocd.svc.c2.k8s.ooo     |               |    2021-10-05 |
16
| c3.k8s.ooo         | retired           | -          | -                             |                                                        |               |    2021-10-05 |
17
| c4.k8s.ooo         | Dev2 p7 HW        | Jin-Guk    | server52 server53 server54    |                                                        |               |             - |
18
| c5.k8s.ooo         | retired           |            | -                             |                                                        |               |    2022-03-15 |
19
| c6.k8s.ooo         | Dev p6 VM Jin-Guk | Jin-Guk    |                               |                                                        |               |               |
20
| [[p5.k8s.ooo]]     | production        |            | server34 server36 server38    | "argo":https://argocd-server.argocd.svc.p5.k8s.ooo     | -             |               |
21
| [[p5-cow.k8s.ooo]] | production        | Nico       | server47 server51 server55    | "argo":https://argocd-server.argocd.svc.p5-cow.k8s.ooo |               |    2022-08-27 |
22
| [[p6.k8s.ooo]]     | production        |            | server67 server69 server71    | "argo":https://argocd-server.argocd.svc.p6.k8s.ooo     | 147.78.194.13 |    2021-10-05 |
23 184 Nico Schottelius
| [[p6-cow.k8s.ooo]] | production        |            | server134 server135 server136 | "argo":https://argocd-server.argocd.svc.p6in10.k8s.ooo | ?             |    2023-05-17 |
24 177 Nico Schottelius
| [[p10.k8s.ooo]]    | production        |            | server131 server132 server133 | "argo":https://argocd-server.argocd.svc.p10.k8s.ooo    | 147.78.194.12 |    2021-10-05 |
25 123 Nico Schottelius
| [[k8s.ge.nau.so]]  | development       |            | server107 server108 server109 | "argo":https://argocd-server.argocd.svc.k8s.ge.nau.so  |               |               |
26
| [[dev.k8s.ooo]]    | development       |            | server110 server111 server112 | "argo":https://argocd-server.argocd.svc.dev.k8s.ooo    | -             |    2022-07-08 |
27 164 Nico Schottelius
| [[r1r2p15k8sooo|r1.p15.k8s.ooo]] | production | Nico | server120 | | | 2022-10-30 |
28
| [[r1r2p15k8sooo|r2.p15.k8s.ooo]] | production | Nico | server121 | | | 2022-09-06 |
29 162 Nico Schottelius
| [[r1r2p10k8sooo|r1.p10.k8s.ooo]] | production | Nico | server122 | | | 2022-10-30 |
30
| [[r1r2p10k8sooo|r2.p10.k8s.ooo]] | production | Nico | server123 | | | 2022-10-15 |
31
| [[r1r2p5k8sooo|r1.p5.k8s.ooo]] | production | Nico | server137 | | | 2022-10-30 |
32
| [[r1r2p5k8sooo|r2.p5.k8s.ooo]] | production | Nico | server138 | | | 2022-10-30 |
33
| [[r1r2p6k8sooo|r1.p6.k8s.ooo]] | production | Nico | server139 | | | 2022-10-30 |
34
| [[r1r2p6k8sooo|r2.p6.k8s.ooo]] | production | Nico | server140 | | | 2022-10-30 |
35 21 Nico Schottelius
36 1 Nico Schottelius
h2. General architecture and components overview
37
38
* All k8s clusters are IPv6 only
39
* We use BGP peering to propagate podcidr and serviceCidr networks to our infrastructure
40
* The main public testing repository is "ungleich-k8s":https://code.ungleich.ch/ungleich-public/ungleich-k8s
41 18 Nico Schottelius
** Private configurations are found in the **k8s-config** repository
42 1 Nico Schottelius
43
h3. Cluster types
44
45 28 Nico Schottelius
| **Type/Feature**            | **Development**                | **Production**         |
46
| Min No. nodes               | 3 (1 master, 3 worker)         | 5 (3 master, 3 worker) |
47
| Recommended minimum         | 4 (dedicated master, 3 worker) | 8 (3 master, 5 worker) |
48
| Separation of control plane | optional                       | recommended            |
49
| Persistent storage          | required                       | required               |
50
| Number of storage monitors  | 3                              | 5                      |
51 1 Nico Schottelius
52 43 Nico Schottelius
h2. General k8s operations
53 1 Nico Schottelius
54 46 Nico Schottelius
h3. Cheat sheet / external great references
55
56
* "kubectl cheatsheet":https://kubernetes.io/docs/reference/kubectl/cheatsheet/
57
58 117 Nico Schottelius
h3. Allowing to schedule work on the control plane / removing node taints
59 69 Nico Schottelius
60
* Mostly for single node / test / development clusters
61
* Just remove the master taint as follows
62
63
<pre>
64
kubectl taint nodes --all node-role.kubernetes.io/master-
65 118 Nico Schottelius
kubectl taint nodes --all node-role.kubernetes.io/control-plane-
66 69 Nico Schottelius
</pre>
67 1 Nico Schottelius
68 117 Nico Schottelius
You can check the node taints using @kubectl describe node ...@
69 69 Nico Schottelius
70 44 Nico Schottelius
h3. Get the cluster admin.conf
71
72
* On the masters of each cluster you can find the file @/etc/kubernetes/admin.conf@
73
* To be able to administrate the cluster you can copy the admin.conf to your local machine
74
* Multi cluster debugging can very easy if you name the config ~/cX-admin.conf (see example below)
75
76
<pre>
77
% scp root@server47.place7.ungleich.ch:/etc/kubernetes/admin.conf ~/c2-admin.conf
78
% export KUBECONFIG=~/c2-admin.conf    
79
% kubectl get nodes
80
NAME       STATUS                     ROLES                  AGE   VERSION
81
server47   Ready                      control-plane,master   82d   v1.22.0
82
server48   Ready                      control-plane,master   82d   v1.22.0
83
server49   Ready                      <none>                 82d   v1.22.0
84
server50   Ready                      <none>                 82d   v1.22.0
85
server59   Ready                      control-plane,master   82d   v1.22.0
86
server60   Ready,SchedulingDisabled   <none>                 82d   v1.22.0
87
server61   Ready                      <none>                 82d   v1.22.0
88
server62   Ready                      <none>                 82d   v1.22.0               
89
</pre>
90
91 18 Nico Schottelius
h3. Installing a new k8s cluster
92 8 Nico Schottelius
93 9 Nico Schottelius
* Decide on the cluster name (usually *cX.k8s.ooo*), X counting upwards
94 28 Nico Schottelius
** Using pXX.k8s.ooo for production clusters of placeXX
95 9 Nico Schottelius
* Use cdist to configure the nodes with requirements like crio
96
* Decide between single or multi node control plane setups (see below)
97 28 Nico Schottelius
** Single control plane suitable for development clusters
98 9 Nico Schottelius
99 28 Nico Schottelius
Typical init procedure:
100 9 Nico Schottelius
101 28 Nico Schottelius
* Single control plane: @kubeadm init --config bootstrap/XXX/kubeadm.yaml@
102
* Multi control plane (HA): @kubeadm init --config bootstrap/XXX/kubeadm.yaml --upload-certs@
103 10 Nico Schottelius
104 29 Nico Schottelius
h3. Deleting a pod that is hanging in terminating state
105
106
<pre>
107
kubectl delete pod <PODNAME> --grace-period=0 --force --namespace <NAMESPACE>
108
</pre>
109
110
(from https://stackoverflow.com/questions/35453792/pods-stuck-in-terminating-status)
111
112 42 Nico Schottelius
h3. Listing nodes of a cluster
113
114
<pre>
115
[15:05] bridge:~% kubectl get nodes
116
NAME       STATUS   ROLES                  AGE   VERSION
117
server22   Ready    <none>                 52d   v1.22.0
118
server23   Ready    <none>                 52d   v1.22.2
119
server24   Ready    <none>                 52d   v1.22.0
120
server25   Ready    <none>                 52d   v1.22.0
121
server26   Ready    <none>                 52d   v1.22.0
122
server27   Ready    <none>                 52d   v1.22.0
123
server63   Ready    control-plane,master   52d   v1.22.0
124
server64   Ready    <none>                 52d   v1.22.0
125
server65   Ready    control-plane,master   52d   v1.22.0
126
server66   Ready    <none>                 52d   v1.22.0
127
server83   Ready    control-plane,master   52d   v1.22.0
128
server84   Ready    <none>                 52d   v1.22.0
129
server85   Ready    <none>                 52d   v1.22.0
130
server86   Ready    <none>                 52d   v1.22.0
131
</pre>
132
133 41 Nico Schottelius
h3. Removing / draining a node
134
135
Usually @kubectl drain server@ should do the job, but sometimes we need to be more aggressive:
136
137 1 Nico Schottelius
<pre>
138 103 Nico Schottelius
kubectl drain --delete-emptydir-data --ignore-daemonsets serverXX
139 42 Nico Schottelius
</pre>
140
141
h3. Readding a node after draining
142
143
<pre>
144
kubectl uncordon serverXX
145 1 Nico Schottelius
</pre>
146 43 Nico Schottelius
147 50 Nico Schottelius
h3. (Re-)joining worker nodes after creating the cluster
148 49 Nico Schottelius
149
* We need to have an up-to-date token
150
* We use different join commands for the workers and control plane nodes
151
152
Generating the join command on an existing control plane node:
153
154
<pre>
155
kubeadm token create --print-join-command
156
</pre>
157
158 50 Nico Schottelius
h3. (Re-)joining control plane nodes after creating the cluster
159 1 Nico Schottelius
160 50 Nico Schottelius
* We generate the token again
161
* We upload the certificates
162
* We need to combine/create the join command for the control plane node
163
164
Example session:
165
166
<pre>
167
% kubeadm token create --print-join-command
168
kubeadm join p10-api.k8s.ooo:6443 --token xmff4i.ABC --discovery-token-ca-cert-hash sha256:longhash 
169
170
% kubeadm init phase upload-certs --upload-certs
171
[upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
172
[upload-certs] Using certificate key:
173
CERTKEY
174
175
# Then we use these two outputs on the joining node:
176
177
kubeadm join p10-api.k8s.ooo:6443 --token xmff4i.ABC --discovery-token-ca-cert-hash sha256:longhash --control-plane --certificate-key CERTKEY
178
</pre>
179
180
Commands to be used on a control plane node:
181
182
<pre>
183
kubeadm token create --print-join-command
184
kubeadm init phase upload-certs --upload-certs
185
</pre>
186
187
Commands to be used on the joining node:
188
189
<pre>
190
JOINCOMMAND --control-plane --certificate-key CERTKEY
191
</pre>
192 49 Nico Schottelius
193 51 Nico Schottelius
SEE ALSO
194
195
* https://stackoverflow.com/questions/63936268/how-to-generate-kubeadm-token-for-secondary-control-plane-nodes
196
* https://blog.scottlowe.org/2019/08/15/reconstructing-the-join-command-for-kubeadm/
197
198 53 Nico Schottelius
h3. How to fix etcd does not start when rejoining a kubernetes cluster as a control plane
199 52 Nico Schottelius
200
If during the above step etcd does not come up, @kubeadm join@ can hang as follows:
201
202
<pre>
203
[control-plane] Creating static Pod manifest for "kube-apiserver"                                                              
204
[control-plane] Creating static Pod manifest for "kube-controller-manager"                                                     
205
[control-plane] Creating static Pod manifest for "kube-scheduler"                                                              
206
[check-etcd] Checking that the etcd cluster is healthy                                                                         
207
error execution phase check-etcd: etcd cluster is not healthy: failed to dial endpoint https://[2a0a:e5c0:10:1:225:b3ff:fe20:37
208
8a]:2379 with maintenance client: context deadline exceeded                                                                    
209
To see the stack trace of this error execute with --v=5 or higher         
210
</pre>
211
212
Then the problem is likely that the etcd server is still a member of the cluster. We first need to remove it from the etcd cluster and then the join works.
213
214
To fix this we do:
215
216
* Find a working etcd pod
217
* Find the etcd members / member list
218
* Remove the etcd member that we want to re-join the cluster
219
220
221
<pre>
222
# Find the etcd pods
223
kubectl -n kube-system get pods -l component=etcd,tier=control-plane
224
225
# Get the list of etcd servers with the member id 
226
kubectl exec -n kube-system -ti ETCDPODNAME -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert  /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member list
227
228
# Remove the member
229
kubectl exec -n kube-system -ti ETCDPODNAME -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert  /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member remove MEMBERID
230
</pre>
231
232
Sample session:
233
234
<pre>
235
[10:48] line:~% kubectl -n kube-system get pods -l component=etcd,tier=control-plane
236
NAME            READY   STATUS    RESTARTS     AGE
237
etcd-server63   1/1     Running   0            3m11s
238
etcd-server65   1/1     Running   3            7d2h
239
etcd-server83   1/1     Running   8 (6d ago)   7d2h
240
[10:48] line:~% kubectl exec -n kube-system -ti etcd-server65 -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert  /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member list
241
356891cd676df6e4, started, server65, https://[2a0a:e5c0:10:1:225:b3ff:fe20:375c]:2380, https://[2a0a:e5c0:10:1:225:b3ff:fe20:375c]:2379, false
242
371b8a07185dee7e, started, server63, https://[2a0a:e5c0:10:1:225:b3ff:fe20:378a]:2380, https://[2a0a:e5c0:10:1:225:b3ff:fe20:378a]:2379, false
243
5942bc58307f8af9, started, server83, https://[2a0a:e5c0:10:1:3e4a:92ff:fe79:bb98]:2380, https://[2a0a:e5c0:10:1:3e4a:92ff:fe79:bb98]:2379, false
244
245
[10:48] line:~% kubectl exec -n kube-system -ti etcd-server65 -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert  /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member remove 371b8a07185dee7e
246
Member 371b8a07185dee7e removed from cluster e3c0805f592a8f77
247 1 Nico Schottelius
248
</pre>
249
250
SEE ALSO
251
252
* We found the solution using https://stackoverflow.com/questions/67921552/re-installed-node-cannot-join-kubernetes-cluster
253 56 Nico Schottelius
254 147 Nico Schottelius
h3. Node labels (adding, showing, removing)
255
256
Listing the labels:
257
258
<pre>
259
kubectl get nodes --show-labels
260
</pre>
261
262
Adding labels:
263
264
<pre>
265
kubectl label nodes LIST-OF-NODES label1=value1 
266
267
</pre>
268
269
For instance:
270
271
<pre>
272
kubectl label nodes router2 router3 hosttype=router 
273
</pre>
274
275
Selecting nodes in pods:
276
277
<pre>
278
apiVersion: v1
279
kind: Pod
280
...
281
spec:
282
  nodeSelector:
283
    hosttype: router
284
</pre>
285
286 148 Nico Schottelius
Removing labels by adding a minus at the end of the label name:
287
288
<pre>
289
kubectl label node <nodename> <labelname>-
290
</pre>
291
292
For instance:
293
294
<pre>
295
kubectl label nodes router2 router3 hosttype- 
296
</pre>
297
298 147 Nico Schottelius
SEE ALSO
299 1 Nico Schottelius
300 148 Nico Schottelius
* https://kubernetes.io/docs/tasks/configure-pod-container/assign-pods-nodes/
301
* https://stackoverflow.com/questions/34067979/how-to-delete-a-node-label-by-command-and-api
302 147 Nico Schottelius
303 101 Nico Schottelius
h3. Hardware Maintenance using ungleich-hardware
304
305
Use the following manifest and replace the HOST with the actual host:
306
307
<pre>
308
apiVersion: v1
309
kind: Pod
310
metadata:
311
  name: ungleich-hardware-HOST
312
spec:
313
  containers:
314
  - name: ungleich-hardware
315
    image: ungleich/ungleich-hardware:0.0.5
316
    args:
317
    - sleep
318
    - "1000000"
319
    volumeMounts:
320
      - mountPath: /dev
321
        name: dev
322
    securityContext:
323
      privileged: true
324
  nodeSelector:
325
    kubernetes.io/hostname: "HOST"
326
327
  volumes:
328
    - name: dev
329
      hostPath:
330
        path: /dev
331
</pre>
332
333 102 Nico Schottelius
Also see: [[The_ungleich_hardware_maintenance_guide]]
334
335 105 Nico Schottelius
h3. Triggering a cronjob / creating a job from a cronjob
336 104 Nico Schottelius
337
To test a cronjob, we can create a job from a cronjob:
338
339
<pre>
340
kubectl create job --from=cronjob/volume2-daily-backup volume2-manual
341
</pre>
342
343
This creates a job volume2-manual based on the cronjob  volume2-daily
344
345 112 Nico Schottelius
h3. su-ing into a user that has nologin shell set
346
347
Many times users are having nologin as their shell inside the container. To be able to execute maintenance commands within the
348
container, we can use @su -s /bin/sh@ like this:
349
350
<pre>
351
su -s /bin/sh -c '/path/to/your/script' testuser
352
</pre>
353
354
Found on https://serverfault.com/questions/351046/how-to-run-command-as-user-who-has-usr-sbin-nologin-as-shell
355
356 113 Nico Schottelius
h3. How to print a secret value
357
358
Assuming you want the "password" item from a secret, use:
359
360
<pre>
361
kubectl get secret SECRETNAME -o jsonpath="{.data.password}" | base64 -d; echo "" 
362
</pre>
363
364 173 Nico Schottelius
h3. How to upgrade a kubernetes cluster
365 172 Nico Schottelius
366
h4. General
367
368
* Should be done every X months to stay up-to-date
369
** X probably something like 3-6
370
* kubeadm based clusters
371
* Needs specific kubeadm versions for upgrade
372
* Follow instructions on https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/
373 190 Nico Schottelius
* Finding releases: https://github.com/kubernetes/kubernetes/tree/master/CHANGELOG
374 172 Nico Schottelius
375
h4. Getting a specific kubeadm or kubelet version
376
377
<pre>
378 190 Nico Schottelius
RELEASE=v1.22.17
379
RELEASE=v1.23.17
380 181 Nico Schottelius
RELEASE=v1.24.9
381 1 Nico Schottelius
RELEASE=v1.25.9
382
RELEASE=v1.26.6
383 190 Nico Schottelius
RELEASE=v1.27.2
384
385 187 Nico Schottelius
ARCH=amd64
386 172 Nico Schottelius
387
curl -L --remote-name-all https://dl.k8s.io/release/${RELEASE}/bin/linux/${ARCH}/{kubeadm,kubelet}
388 182 Nico Schottelius
chmod u+x kubeadm kubelet
389 172 Nico Schottelius
</pre>
390
391
h4. Steps
392
393
* kubeadm upgrade plan
394
** On one control plane node
395
* kubeadm upgrade apply vXX.YY.ZZ
396
** On one control plane node
397 189 Nico Schottelius
* kubeadm upgrade node
398
** On all other control plane nodes
399
** On all worker nodes afterwards
400
401 172 Nico Schottelius
402 173 Nico Schottelius
Repeat for all control planes nodes. The upgrade kubelet on all other nodes via package manager.
403 172 Nico Schottelius
404 193 Nico Schottelius
h4. Upgrading to 1.22.17
405 1 Nico Schottelius
406 193 Nico Schottelius
* https://v1-22.docs.kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/
407 194 Nico Schottelius
* Need to create a kubeadm config map
408 198 Nico Schottelius
** f.i. using the following
409
** @/usr/local/bin/kubeadm-v1.22.17   upgrade --config kubeadm.yaml --ignore-preflight-errors=CoreDNSUnsupportedPlugins,CoreDNSMigration apply -y v1.22.17@
410 193 Nico Schottelius
* Done for p6 on 2023-10-04
411
412
h4. Upgrading to 1.23.17
413
414
* https://v1-23.docs.kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/
415
* No special notes
416
* Done for p6 on 2023-10-04
417
418
h4. Upgrading to 1.24.17
419
420
* https://v1-24.docs.kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/
421
* No special notes
422
* Done for p6 on 2023-10-04
423
424
h4. Upgrading to 1.25.14
425
426
* https://v1-24.docs.kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/
427
* No special notes
428
* Done for p6 on 2023-10-04
429
430
h4. Upgrading to 1.26.9
431
432 1 Nico Schottelius
* https://v1-26.docs.kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/
433 193 Nico Schottelius
* No special notes
434
* Done for p6 on 2023-10-04
435 188 Nico Schottelius
436 196 Nico Schottelius
h4. Upgrading to 1.27
437 186 Nico Schottelius
438 192 Nico Schottelius
* https://v1-27.docs.kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/
439 186 Nico Schottelius
* kubelet will not start anymore
440
* reason: @"command failed" err="failed to parse kubelet flag: unknown flag: --container-runtime"@
441
* /var/lib/kubelet/kubeadm-flags.env contains that parameter
442
* remove it, start kubelet
443 192 Nico Schottelius
444 197 Nico Schottelius
h4. Upgrading to 1.28
445 192 Nico Schottelius
446
* https://v1-28.docs.kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/
447 186 Nico Schottelius
448
h4. Upgrade to crio 1.27: missing crun
449
450
Error message
451
452
<pre>
453
level=fatal msg="validating runtime config: runtime validation: \"crun\" not found in $PATH: exec: \"crun\": executable file not found in $PATH"
454
</pre>
455
456
Fix:
457
458
<pre>
459
apk add crun
460
</pre>
461
462 157 Nico Schottelius
h2. Reference CNI
463
464
* Mainly "stupid", but effective plugins
465
* Main documentation on https://www.cni.dev/plugins/current/
466 158 Nico Schottelius
* Plugins
467
** bridge
468
*** Can create the bridge on the host
469
*** But seems not to be able to add host interfaces to it as well
470
*** Has support for vlan tags
471
** vlan
472
*** creates vlan tagged sub interface on the host
473 160 Nico Schottelius
*** "It's a 1:1 mapping (i.e. no bridge in between)":https://github.com/k8snetworkplumbingwg/multus-cni/issues/569
474 158 Nico Schottelius
** host-device
475
*** moves the interface from the host into the container
476
*** very easy for physical connections to containers
477 159 Nico Schottelius
** ipvlan
478
*** "virtualisation" of a host device
479
*** routing based on IP
480
*** Same MAC for everyone
481
*** Cannot reach the master interface
482
** maclvan
483
*** With mac addresses
484
*** Supports various modes (to be checked)
485
** ptp ("point to point")
486
*** Creates a host device and connects it to the container
487
** win*
488 158 Nico Schottelius
*** Windows implementations
489 157 Nico Schottelius
490 62 Nico Schottelius
h2. Calico CNI
491
492
h3. Calico Installation
493
494
* We install "calico using helm":https://docs.projectcalico.org/getting-started/kubernetes/helm
495
* This has the following advantages:
496
** Easy to upgrade
497
** Does not require os to configure IPv6/dual stack settings as the tigera operator figures out things on its own
498
499
Usually plain calico can be installed directly using:
500
501
<pre>
502 174 Nico Schottelius
VERSION=v3.25.0
503 149 Nico Schottelius
504 1 Nico Schottelius
helm repo add projectcalico https://docs.projectcalico.org/charts
505 167 Nico Schottelius
helm repo update
506 124 Nico Schottelius
helm upgrade --install --namespace tigera calico projectcalico/tigera-operator --version $VERSION --create-namespace
507 1 Nico Schottelius
</pre>
508 92 Nico Schottelius
509
* Check the tags on https://github.com/projectcalico/calico/tags for the latest release
510 62 Nico Schottelius
511
h3. Installing calicoctl
512
513 115 Nico Schottelius
* General installation instructions, including binary download: https://projectcalico.docs.tigera.io/maintenance/clis/calicoctl/install
514
515 62 Nico Schottelius
To be able to manage and configure calico, we need to 
516
"install calicoctl (we choose the version as a pod)":https://docs.projectcalico.org/getting-started/clis/calicoctl/install#install-calicoctl-as-a-kubernetes-pod
517
518
<pre>
519
kubectl apply -f https://docs.projectcalico.org/manifests/calicoctl.yaml
520
</pre>
521
522 93 Nico Schottelius
Or version specific:
523
524
<pre>
525
kubectl apply -f https://github.com/projectcalico/calico/blob/v3.20.4/manifests/calicoctl.yaml
526 97 Nico Schottelius
527
# For 3.22
528
kubectl apply -f https://projectcalico.docs.tigera.io/archive/v3.22/manifests/calicoctl.yaml
529 93 Nico Schottelius
</pre>
530
531 70 Nico Schottelius
And making it easier accessible by alias:
532
533
<pre>
534
alias calicoctl="kubectl exec -i -n kube-system calicoctl -- /calicoctl"
535
</pre>
536
537 62 Nico Schottelius
h3. Calico configuration
538
539 63 Nico Schottelius
By default our k8s clusters "BGP peer":https://docs.projectcalico.org/networking/bgp
540
with an upstream router to propagate podcidr and servicecidr.
541 62 Nico Schottelius
542
Default settings in our infrastructure:
543
544
* We use a full-mesh using the @nodeToNodeMeshEnabled: true@ option
545
* We keep the original next hop so that *only* the server with the pod is announcing it (instead of ecmp)
546 1 Nico Schottelius
* We use private ASNs for k8s clusters
547 63 Nico Schottelius
* We do *not* use any overlay
548 62 Nico Schottelius
549
After installing calico and calicoctl the last step of the installation is usually:
550
551 1 Nico Schottelius
<pre>
552 79 Nico Schottelius
calicoctl create -f - < calico-bgp.yaml
553 62 Nico Schottelius
</pre>
554
555
556
A sample BGP configuration:
557
558
<pre>
559
---
560
apiVersion: projectcalico.org/v3
561
kind: BGPConfiguration
562
metadata:
563
  name: default
564
spec:
565
  logSeverityScreen: Info
566
  nodeToNodeMeshEnabled: true
567
  asNumber: 65534
568
  serviceClusterIPs:
569
  - cidr: 2a0a:e5c0:10:3::/108
570
  serviceExternalIPs:
571
  - cidr: 2a0a:e5c0:10:3::/108
572
---
573
apiVersion: projectcalico.org/v3
574
kind: BGPPeer
575
metadata:
576
  name: router1-place10
577
spec:
578
  peerIP: 2a0a:e5c0:10:1::50
579
  asNumber: 213081
580
  keepOriginalNextHop: true
581
</pre>
582
583 126 Nico Schottelius
h2. Cilium CNI (experimental)
584
585 137 Nico Schottelius
h3. Status
586
587 138 Nico Schottelius
*NO WORKING CILIUM CONFIGURATION FOR IPV6 only modes*
588 137 Nico Schottelius
589 146 Nico Schottelius
h3. Latest error
590
591
It seems cilium does not run on IPv6 only hosts:
592
593
<pre>
594
level=info msg="Validating configured node address ranges" subsys=daemon
595
level=fatal msg="postinit failed" error="external IPv4 node address could not be derived, please configure via --ipv4-node" subsys=daemon
596
level=info msg="Starting IP identity watcher" subsys=ipcache
597
</pre>
598
599
It crashes after that log entry
600
601 128 Nico Schottelius
h3. BGP configuration
602
603
* The cilium-operator will not start without a correct configmap being present beforehand (see error message below)
604
* Creating the bgp config beforehand as a configmap is thus required.
605
606
The error one gets without the configmap present:
607
608
Pods are hanging with:
609
610
<pre>
611
cilium-bpqm6                       0/1     Init:0/4            0             9s
612
cilium-operator-5947d94f7f-5bmh2   0/1     ContainerCreating   0             9s
613
</pre>
614
615
The error message in the cilium-*perator is:
616
617
<pre>
618
Events:
619
  Type     Reason       Age                From               Message
620
  ----     ------       ----               ----               -------
621
  Normal   Scheduled    80s                default-scheduler  Successfully assigned kube-system/cilium-operator-5947d94f7f-lqcsp to server56
622
  Warning  FailedMount  16s (x8 over 80s)  kubelet            MountVolume.SetUp failed for volume "bgp-config-path" : configmap "bgp-config" not found
623
</pre>
624
625
A correct bgp config looks like this:
626
627
<pre>
628
apiVersion: v1
629
kind: ConfigMap
630
metadata:
631
  name: bgp-config
632
  namespace: kube-system
633
data:
634
  config.yaml: |
635
    peers:
636
      - peer-address: 2a0a:e5c0::46
637
        peer-asn: 209898
638
        my-asn: 65533
639
      - peer-address: 2a0a:e5c0::47
640
        peer-asn: 209898
641
        my-asn: 65533
642
    address-pools:
643
      - name: default
644
        protocol: bgp
645
        addresses:
646
          - 2a0a:e5c0:0:14::/64
647
</pre>
648 127 Nico Schottelius
649
h3. Installation
650 130 Nico Schottelius
651 127 Nico Schottelius
Adding the repo
652 1 Nico Schottelius
<pre>
653 127 Nico Schottelius
654 129 Nico Schottelius
helm repo add cilium https://helm.cilium.io/
655 130 Nico Schottelius
helm repo update
656
</pre>
657 129 Nico Schottelius
658 135 Nico Schottelius
Installing + configuring cilium
659 129 Nico Schottelius
<pre>
660 130 Nico Schottelius
ipv6pool=2a0a:e5c0:0:14::/112
661 1 Nico Schottelius
662 146 Nico Schottelius
version=1.12.2
663 129 Nico Schottelius
664
helm upgrade --install cilium cilium/cilium --version $version \
665 1 Nico Schottelius
  --namespace kube-system \
666
  --set ipv4.enabled=false \
667
  --set ipv6.enabled=true \
668 146 Nico Schottelius
  --set enableIPv6Masquerade=false \
669
  --set bgpControlPlane.enabled=true 
670 1 Nico Schottelius
671 146 Nico Schottelius
#  --set ipam.operator.clusterPoolIPv6PodCIDRList=$ipv6pool
672
673
# Old style bgp?
674 136 Nico Schottelius
#   --set bgp.enabled=true --set bgp.announce.podCIDR=true \
675 127 Nico Schottelius
676
# Show possible configuration options
677
helm show values cilium/cilium
678
679 1 Nico Schottelius
</pre>
680 132 Nico Schottelius
681
Using a /64 for ipam.operator.clusterPoolIPv6PodCIDRList fails with:
682
683
<pre>
684
level=fatal msg="Unable to init cluster-pool allocator" error="unable to initialize IPv6 allocator New CIDR set failed; the node CIDR size is too big" subsys=cilium-operator-generic
685
</pre>
686
687 126 Nico Schottelius
688 1 Nico Schottelius
See also https://github.com/cilium/cilium/issues/20756
689 135 Nico Schottelius
690
Seems a /112 is actually working.
691
692
h3. Kernel modules
693
694
Cilium requires the following modules to be loaded on the host (not loaded by default):
695
696
<pre>
697 1 Nico Schottelius
modprobe  ip6table_raw
698
modprobe  ip6table_filter
699
</pre>
700 146 Nico Schottelius
701
h3. Interesting helm flags
702
703
* autoDirectNodeRoutes
704
* bgpControlPlane.enabled = true
705
706
h3. SEE ALSO
707
708
* https://docs.cilium.io/en/v1.12/helm-reference/
709 133 Nico Schottelius
710 179 Nico Schottelius
h2. Multus
711 168 Nico Schottelius
712
* https://github.com/k8snetworkplumbingwg/multus-cni
713
* Installing a deployment w/ CRDs
714 150 Nico Schottelius
715 169 Nico Schottelius
<pre>
716 176 Nico Schottelius
VERSION=v4.0.1
717 169 Nico Schottelius
718 170 Nico Schottelius
kubectl apply -f https://raw.githubusercontent.com/k8snetworkplumbingwg/multus-cni/${VERSION}/deployments/multus-daemonset-crio.yml
719
</pre>
720 169 Nico Schottelius
721 191 Nico Schottelius
h2. ArgoCD
722 56 Nico Schottelius
723 60 Nico Schottelius
h3. Argocd Installation
724 1 Nico Schottelius
725 116 Nico Schottelius
* See https://argo-cd.readthedocs.io/en/stable/
726
727 60 Nico Schottelius
As there is no configuration management present yet, argocd is installed using
728
729 1 Nico Schottelius
<pre>
730 60 Nico Schottelius
kubectl create namespace argocd
731 1 Nico Schottelius
732
# OR: latest stable
733
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml
734
735 191 Nico Schottelius
# OR Specific Version
736
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/v2.3.2/manifests/install.yaml
737 56 Nico Schottelius
738 191 Nico Schottelius
739
</pre>
740 1 Nico Schottelius
741 60 Nico Schottelius
h3. Get the argocd credentials
742
743
<pre>
744
kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d; echo ""
745
</pre>
746 52 Nico Schottelius
747 87 Nico Schottelius
h3. Accessing argocd
748
749
In regular IPv6 clusters:
750
751
* Navigate to https://argocd-server.argocd.CLUSTERDOMAIN
752
753
In legacy IPv4 clusters
754
755
<pre>
756
kubectl --namespace argocd port-forward svc/argocd-server 8080:80
757
</pre>
758
759 88 Nico Schottelius
* Navigate to https://localhost:8080
760
761 68 Nico Schottelius
h3. Using the argocd webhook to trigger changes
762 67 Nico Schottelius
763
* To trigger changes post json https://argocd.example.com/api/webhook
764
765 72 Nico Schottelius
h3. Deploying an application
766
767
* Applications are deployed via git towards gitea (code.ungleich.ch) and then pulled by argo
768 73 Nico Schottelius
* Always include the *redmine-url* pointing to the (customer) ticket
769
** Also add the support-url if it exists
770 72 Nico Schottelius
771
Application sample
772
773
<pre>
774
apiVersion: argoproj.io/v1alpha1
775
kind: Application
776
metadata:
777
  name: gitea-CUSTOMER
778
  namespace: argocd
779
spec:
780
  destination:
781
    namespace: default
782
    server: 'https://kubernetes.default.svc'
783
  source:
784
    path: apps/prod/gitea
785
    repoURL: 'https://code.ungleich.ch/ungleich-intern/k8s-config.git'
786
    targetRevision: HEAD
787
    helm:
788
      parameters:
789
        - name: storage.data.storageClass
790
          value: rook-ceph-block-hdd
791
        - name: storage.data.size
792
          value: 200Gi
793
        - name: storage.db.storageClass
794
          value: rook-ceph-block-ssd
795
        - name: storage.db.size
796
          value: 10Gi
797
        - name: storage.letsencrypt.storageClass
798
          value: rook-ceph-block-hdd
799
        - name: storage.letsencrypt.size
800
          value: 50Mi
801
        - name: letsencryptStaging
802
          value: 'no'
803
        - name: fqdn
804
          value: 'code.verua.online'
805
  project: default
806
  syncPolicy:
807
    automated:
808
      prune: true
809
      selfHeal: true
810
  info:
811
    - name: 'redmine-url'
812
      value: 'https://redmine.ungleich.ch/issues/ISSUEID'
813
    - name: 'support-url'
814
      value: 'https://support.ungleich.ch/Ticket/Display.html?id=TICKETID'
815
</pre>
816
817 80 Nico Schottelius
h2. Helm related operations and conventions
818 55 Nico Schottelius
819 61 Nico Schottelius
We use helm charts extensively.
820
821
* In production, they are managed via argocd
822
* In development, helm chart can de developed and deployed manually using the helm utility.
823
824 55 Nico Schottelius
h3. Installing a helm chart
825
826
One can use the usual pattern of
827
828
<pre>
829
helm install <releasename> <chartdirectory>
830
</pre>
831
832
However often you want to reinstall/update when testing helm charts. The following pattern is "better", because it allows you to reinstall, if it is already installed:
833
834
<pre>
835
helm upgrade --install <releasename> <chartdirectory>
836 1 Nico Schottelius
</pre>
837 80 Nico Schottelius
838
h3. Naming services and deployments in helm charts [Application labels]
839
840
* We always have {{ .Release.Name }} to identify the current "instance"
841
* Deployments:
842
** use @app: <what it is>@, f.i. @app: nginx@, @app: postgres@, ...
843 81 Nico Schottelius
* See more about standard labels on
844
** https://kubernetes.io/docs/concepts/overview/working-with-objects/common-labels/
845
** https://helm.sh/docs/chart_best_practices/labels/
846 55 Nico Schottelius
847 151 Nico Schottelius
h3. Show all versions of a helm chart
848
849
<pre>
850
helm search repo -l repo/chart
851
</pre>
852
853
For example:
854
855
<pre>
856
% helm search repo -l projectcalico/tigera-operator 
857
NAME                         	CHART VERSION	APP VERSION	DESCRIPTION                            
858
projectcalico/tigera-operator	v3.23.3      	v3.23.3    	Installs the Tigera operator for Calico
859
projectcalico/tigera-operator	v3.23.2      	v3.23.2    	Installs the Tigera operator for Calico
860
....
861
</pre>
862
863 152 Nico Schottelius
h3. Show possible values of a chart
864
865
<pre>
866
helm show values <repo/chart>
867
</pre>
868
869
Example:
870
871
<pre>
872
helm show values ingress-nginx/ingress-nginx
873
</pre>
874
875 178 Nico Schottelius
h3. Download a chart
876
877
For instance for checking it out locally. Use:
878
879
<pre>
880
helm pull <repo/chart>
881
</pre>
882 152 Nico Schottelius
883 139 Nico Schottelius
h2. Rook + Ceph
884
885
h3. Installation
886
887
* Usually directly via argocd
888
889 71 Nico Schottelius
h3. Executing ceph commands
890
891
Using the ceph-tools pod as follows:
892
893
<pre>
894
kubectl exec -n rook-ceph -ti $(kubectl -n rook-ceph get pods -l app=rook-ceph-tools -o jsonpath='{.items[*].metadata.name}') -- ceph -s
895
</pre>
896
897 43 Nico Schottelius
h3. Inspecting the logs of a specific server
898
899
<pre>
900
# Get the related pods
901
kubectl -n rook-ceph get pods -l app=rook-ceph-osd-prepare 
902
...
903
904
# Inspect the logs of a specific pod
905
kubectl -n rook-ceph logs -f rook-ceph-osd-prepare-server23--1-444qx
906
907 71 Nico Schottelius
</pre>
908
909
h3. Inspecting the logs of the rook-ceph-operator
910
911
<pre>
912
kubectl -n rook-ceph logs -f -l app=rook-ceph-operator
913 43 Nico Schottelius
</pre>
914
915 121 Nico Schottelius
h3. Restarting the rook operator
916
917
<pre>
918
kubectl -n rook-ceph delete pods  -l app=rook-ceph-operator
919
</pre>
920
921 43 Nico Schottelius
h3. Triggering server prepare / adding new osds
922
923
The rook-ceph-operator triggers/watches/creates pods to maintain hosts. To trigger a full "re scan", simply delete that pod:
924
925
<pre>
926
kubectl -n rook-ceph delete pods -l app=rook-ceph-operator
927
</pre>
928
929
This will cause all the @rook-ceph-osd-prepare-..@ jobs to be recreated and thus OSDs to be created, if new disks have been added.
930
931
h3. Removing an OSD
932
933
* See "Ceph OSD Management":https://rook.io/docs/rook/v1.7/ceph-osd-mgmt.html
934 77 Nico Schottelius
* More specifically: https://github.com/rook/rook/blob/release-1.7/cluster/examples/kubernetes/ceph/osd-purge.yaml
935 99 Nico Schottelius
* Then delete the related deployment
936 41 Nico Schottelius
937 98 Nico Schottelius
Set osd id in the osd-purge.yaml and apply it. OSD should be down before.
938
939
<pre>
940
apiVersion: batch/v1
941
kind: Job
942
metadata:
943
  name: rook-ceph-purge-osd
944
  namespace: rook-ceph # namespace:cluster
945
  labels:
946
    app: rook-ceph-purge-osd
947
spec:
948
  template:
949
    metadata:
950
      labels:
951
        app: rook-ceph-purge-osd
952
    spec:
953
      serviceAccountName: rook-ceph-purge-osd
954
      containers:
955
        - name: osd-removal
956
          image: rook/ceph:master
957
          # TODO: Insert the OSD ID in the last parameter that is to be removed
958
          # The OSD IDs are a comma-separated list. For example: "0" or "0,2".
959
          # If you want to preserve the OSD PVCs, set `--preserve-pvc true`.
960
          #
961
          # A --force-osd-removal option is available if the OSD should be destroyed even though the
962
          # removal could lead to data loss.
963
          args:
964
            - "ceph"
965
            - "osd"
966
            - "remove"
967
            - "--preserve-pvc"
968
            - "false"
969
            - "--force-osd-removal"
970
            - "false"
971
            - "--osd-ids"
972
            - "SETTHEOSDIDHERE"
973
          env:
974
            - name: POD_NAMESPACE
975
              valueFrom:
976
                fieldRef:
977
                  fieldPath: metadata.namespace
978
            - name: ROOK_MON_ENDPOINTS
979
              valueFrom:
980
                configMapKeyRef:
981
                  key: data
982
                  name: rook-ceph-mon-endpoints
983
            - name: ROOK_CEPH_USERNAME
984
              valueFrom:
985
                secretKeyRef:
986
                  key: ceph-username
987
                  name: rook-ceph-mon
988
            - name: ROOK_CEPH_SECRET
989
              valueFrom:
990
                secretKeyRef:
991
                  key: ceph-secret
992
                  name: rook-ceph-mon
993
            - name: ROOK_CONFIG_DIR
994
              value: /var/lib/rook
995
            - name: ROOK_CEPH_CONFIG_OVERRIDE
996
              value: /etc/rook/config/override.conf
997
            - name: ROOK_FSID
998
              valueFrom:
999
                secretKeyRef:
1000
                  key: fsid
1001
                  name: rook-ceph-mon
1002
            - name: ROOK_LOG_LEVEL
1003
              value: DEBUG
1004
          volumeMounts:
1005
            - mountPath: /etc/ceph
1006
              name: ceph-conf-emptydir
1007
            - mountPath: /var/lib/rook
1008
              name: rook-config
1009
      volumes:
1010
        - emptyDir: {}
1011
          name: ceph-conf-emptydir
1012
        - emptyDir: {}
1013
          name: rook-config
1014
      restartPolicy: Never
1015
1016
1017 99 Nico Schottelius
</pre>
1018
1019 1 Nico Schottelius
Deleting the deployment:
1020
1021
<pre>
1022
[18:05] bridge:~% kubectl -n rook-ceph delete deployment rook-ceph-osd-6
1023 99 Nico Schottelius
deployment.apps "rook-ceph-osd-6" deleted
1024
</pre>
1025 185 Nico Schottelius
1026
h3. Placement of mons/osds/etc.
1027
1028
See https://rook.io/docs/rook/v1.11/CRDs/Cluster/ceph-cluster-crd/#placement-configuration-settings
1029 98 Nico Schottelius
1030 145 Nico Schottelius
h2. Ingress + Cert Manager
1031
1032
* We deploy "nginx-ingress":https://docs.nginx.com/nginx-ingress-controller/ to get an ingress
1033
* we deploy "cert-manager":https://cert-manager.io/ to handle certificates
1034
* We independently deploy @ClusterIssuer@ to allow the cert-manager app to deploy and the issuer to be created once the CRDs from cert manager are in place
1035
1036
h3. IPv4 reachability 
1037
1038
The ingress is by default IPv6 only. To make it reachable from the IPv4 world, get its IPv6 address and configure a NAT64 mapping in Jool.
1039
1040
Steps:
1041
1042
h4. Get the ingress IPv6 address
1043
1044
Use @kubectl -n ingress-nginx get svc ingress-nginx-controller -o jsonpath='{.spec.clusterIP}'; echo ''@
1045
1046
Example:
1047
1048
<pre>
1049
kubectl -n ingress-nginx get svc ingress-nginx-controller -o jsonpath='{.spec.clusterIP}'; echo ''
1050
2a0a:e5c0:10:1b::ce11
1051
</pre>
1052
1053
h4. Add NAT64 mapping
1054
1055
* Update the __dcl_jool_siit cdist type
1056
* Record the two IPs (IPv6 and IPv4)
1057
* Configure all routers
1058
1059
1060
h4. Add DNS record
1061
1062
To use the ingress capable as a CNAME destination, create an "ingress" DNS record, such as:
1063
1064
<pre>
1065
; k8s ingress for dev
1066
dev-ingress                 AAAA 2a0a:e5c0:10:1b::ce11
1067
dev-ingress                 A 147.78.194.23
1068
1069
</pre> 
1070
1071
h4. Add supporting wildcard DNS
1072
1073
If you plan to add various sites under a specific domain, we can add a wildcard DNS entry, such as *.k8s-dev.django-hosting.ch:
1074
1075
<pre>
1076
*.k8s-dev         CNAME dev-ingress.ungleich.ch.
1077
</pre>
1078
1079 76 Nico Schottelius
h2. Harbor
1080
1081 175 Nico Schottelius
* We user "Harbor":https://goharbor.io/ as an image registry for our own images. Internal app reference: apps/prod/harbor.
1082
* The admin password is in the password store, it is Harbor12345 by default
1083 76 Nico Schottelius
* At the moment harbor only authenticates against the internal ldap tree
1084
1085
h3. LDAP configuration
1086
1087
* The url needs to be ldaps://...
1088
* uid = uid
1089
* rest standard
1090 75 Nico Schottelius
1091 89 Nico Schottelius
h2. Monitoring / Prometheus
1092
1093 90 Nico Schottelius
* Via "kube-prometheus":https://github.com/prometheus-operator/kube-prometheus/
1094 89 Nico Schottelius
1095 91 Nico Schottelius
Access via ...
1096
1097
* http://prometheus-k8s.monitoring.svc:9090
1098
* http://grafana.monitoring.svc:3000
1099
* http://alertmanager.monitoring.svc:9093
1100
1101
1102 100 Nico Schottelius
h3. Prometheus Options
1103
1104
* "helm/kube-prometheus-stack":https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack
1105
** Includes dashboards and co.
1106
* "manifest based kube-prometheus":https://github.com/prometheus-operator/kube-prometheus
1107
** Includes dashboards and co.
1108
* "Prometheus Operator (mainly CRD manifest":https://github.com/prometheus-operator/prometheus-operator
1109
1110 171 Nico Schottelius
h3. Grafana default password
1111
1112
* If not changed: @prom-operator@
1113
1114 82 Nico Schottelius
h2. Nextcloud
1115
1116 85 Nico Schottelius
h3. How to get the nextcloud credentials 
1117 84 Nico Schottelius
1118
* The initial username is set to "nextcloud"
1119
* The password is autogenerated and saved in a kubernetes secret
1120
1121
<pre>
1122 85 Nico Schottelius
kubectl get secret RELEASENAME-nextcloud -o jsonpath="{.data.PASSWORD}" | base64 -d; echo "" 
1123 84 Nico Schottelius
</pre>
1124
1125 83 Nico Schottelius
h3. How to fix "Access through untrusted domain"
1126
1127 82 Nico Schottelius
* Nextcloud stores the initial domain configuration
1128 1 Nico Schottelius
* If the FQDN is changed, it will show the error message "Access through untrusted domain"
1129 82 Nico Schottelius
* To fix, edit /var/www/html/config/config.php and correct the domain
1130 1 Nico Schottelius
* Then delete the pods
1131 165 Nico Schottelius
1132
h3. Running occ commands inside the nextcloud container
1133
1134
* Find the pod in the right namespace
1135
1136
Exec:
1137
1138
<pre>
1139
su www-data -s /bin/sh -c ./occ
1140
</pre>
1141
1142
* -s /bin/sh is needed as the default shell is set to /bin/false
1143
1144 166 Nico Schottelius
h4. Rescanning files
1145 165 Nico Schottelius
1146 166 Nico Schottelius
* If files have been added without nextcloud's knowledge
1147
1148
<pre>
1149
su www-data -s /bin/sh -c "./occ files:scan --all"
1150
</pre>
1151 82 Nico Schottelius
1152 1 Nico Schottelius
h2. Infrastructure versions
1153 35 Nico Schottelius
1154 57 Nico Schottelius
h3. ungleich kubernetes infrastructure v5 (2021-10)
1155 1 Nico Schottelius
1156 57 Nico Schottelius
Clusters are configured / setup in this order:
1157
1158
* Bootstrap via kubeadm
1159 59 Nico Schottelius
* "Networking via calico + BGP (non ECMP) using helm":https://docs.projectcalico.org/getting-started/kubernetes/helm
1160
* "ArgoCD for CD":https://argo-cd.readthedocs.io/en/stable/
1161
** "rook for storage via argocd":https://rook.io/
1162 58 Nico Schottelius
** haproxy for in IPv6-cluster-IPv4-to-IPv6 proxy via argocd
1163
** "kubernetes-secret-generator for in cluster secrets":https://github.com/mittwald/kubernetes-secret-generator
1164
** "ungleich-certbot managing certs and nginx":https://hub.docker.com/repository/docker/ungleich/ungleich-certbot
1165
1166 57 Nico Schottelius
1167
h3. ungleich kubernetes infrastructure v4 (2021-09)
1168
1169 54 Nico Schottelius
* rook is configured via manifests instead of using the rook-ceph-cluster helm chart
1170 1 Nico Schottelius
* The rook operator is still being installed via helm
1171 35 Nico Schottelius
1172 57 Nico Schottelius
h3. ungleich kubernetes infrastructure v3 (2021-07)
1173 1 Nico Schottelius
1174 10 Nico Schottelius
* rook is now installed via helm via argocd instead of directly via manifests
1175 28 Nico Schottelius
1176 57 Nico Schottelius
h3. ungleich kubernetes infrastructure v2 (2021-05)
1177 28 Nico Schottelius
1178
* Replaced fluxv2 from ungleich k8s v1 with argocd
1179 1 Nico Schottelius
** argocd can apply helm templates directly without needing to go through Chart releases
1180 28 Nico Schottelius
* We are also using argoflow for build flows
1181
* Planned to add "kaniko":https://github.com/GoogleContainerTools/kaniko for image building
1182
1183 57 Nico Schottelius
h3. ungleich kubernetes infrastructure v1 (2021-01)
1184 28 Nico Schottelius
1185
We are using the following components:
1186
1187
* "Calico as a CNI":https://www.projectcalico.org/ with BGP, IPv6 only, no encapsulation
1188
** Needed for basic networking
1189
* "kubernetes-secret-generator":https://github.com/mittwald/kubernetes-secret-generator for creating secrets
1190
** Needed so that secrets are not stored in the git repository, but only in the cluster
1191
* "ungleich-certbot":https://hub.docker.com/repository/docker/ungleich/ungleich-certbot
1192
** Needed to get letsencrypt certificates for services
1193
* "rook with ceph rbd + cephfs":https://rook.io/ for storage
1194
** rbd for almost everything, *ReadWriteOnce*
1195
** cephfs for smaller things, multi access *ReadWriteMany*
1196
** Needed for providing persistent storage
1197
* "flux v2":https://fluxcd.io/
1198
** Needed to manage resources automatically