Project

General

Profile

The ungleich kubernetes infrastructure » History » Version 192

Nico Schottelius, 10/03/2023 08:17 PM

1 22 Nico Schottelius
h1. The ungleich kubernetes infrastructure and ungleich kubernetes manual
2 1 Nico Schottelius
3 3 Nico Schottelius
{{toc}}
4
5 1 Nico Schottelius
h2. Status
6
7 28 Nico Schottelius
This document is **pre-production**.
8
This document is to become the ungleich kubernetes infrastructure overview as well as the ungleich kubernetes manual.
9 1 Nico Schottelius
10 10 Nico Schottelius
h2. k8s clusters
11
12 123 Nico Schottelius
| Cluster            | Purpose/Setup     | Maintainer | Master(s)                     | argo                                                   | v4 http proxy | last verified |
13
| c0.k8s.ooo         | Dev               | -          | UNUSED                        |                                                        |               |    2021-10-05 |
14
| c1.k8s.ooo         | retired           |            | -                             |                                                        |               |    2022-03-15 |
15
| c2.k8s.ooo         | Dev p7 HW         | Nico       | server47 server53 server54    | "argo":https://argocd-server.argocd.svc.c2.k8s.ooo     |               |    2021-10-05 |
16
| c3.k8s.ooo         | retired           | -          | -                             |                                                        |               |    2021-10-05 |
17
| c4.k8s.ooo         | Dev2 p7 HW        | Jin-Guk    | server52 server53 server54    |                                                        |               |             - |
18
| c5.k8s.ooo         | retired           |            | -                             |                                                        |               |    2022-03-15 |
19
| c6.k8s.ooo         | Dev p6 VM Jin-Guk | Jin-Guk    |                               |                                                        |               |               |
20
| [[p5.k8s.ooo]]     | production        |            | server34 server36 server38    | "argo":https://argocd-server.argocd.svc.p5.k8s.ooo     | -             |               |
21
| [[p5-cow.k8s.ooo]] | production        | Nico       | server47 server51 server55    | "argo":https://argocd-server.argocd.svc.p5-cow.k8s.ooo |               |    2022-08-27 |
22
| [[p6.k8s.ooo]]     | production        |            | server67 server69 server71    | "argo":https://argocd-server.argocd.svc.p6.k8s.ooo     | 147.78.194.13 |    2021-10-05 |
23 184 Nico Schottelius
| [[p6-cow.k8s.ooo]] | production        |            | server134 server135 server136 | "argo":https://argocd-server.argocd.svc.p6in10.k8s.ooo | ?             |    2023-05-17 |
24 177 Nico Schottelius
| [[p10.k8s.ooo]]    | production        |            | server131 server132 server133 | "argo":https://argocd-server.argocd.svc.p10.k8s.ooo    | 147.78.194.12 |    2021-10-05 |
25 123 Nico Schottelius
| [[k8s.ge.nau.so]]  | development       |            | server107 server108 server109 | "argo":https://argocd-server.argocd.svc.k8s.ge.nau.so  |               |               |
26
| [[dev.k8s.ooo]]    | development       |            | server110 server111 server112 | "argo":https://argocd-server.argocd.svc.dev.k8s.ooo    | -             |    2022-07-08 |
27 164 Nico Schottelius
| [[r1r2p15k8sooo|r1.p15.k8s.ooo]] | production | Nico | server120 | | | 2022-10-30 |
28
| [[r1r2p15k8sooo|r2.p15.k8s.ooo]] | production | Nico | server121 | | | 2022-09-06 |
29 162 Nico Schottelius
| [[r1r2p10k8sooo|r1.p10.k8s.ooo]] | production | Nico | server122 | | | 2022-10-30 |
30
| [[r1r2p10k8sooo|r2.p10.k8s.ooo]] | production | Nico | server123 | | | 2022-10-15 |
31
| [[r1r2p5k8sooo|r1.p5.k8s.ooo]] | production | Nico | server137 | | | 2022-10-30 |
32
| [[r1r2p5k8sooo|r2.p5.k8s.ooo]] | production | Nico | server138 | | | 2022-10-30 |
33
| [[r1r2p6k8sooo|r1.p6.k8s.ooo]] | production | Nico | server139 | | | 2022-10-30 |
34
| [[r1r2p6k8sooo|r2.p6.k8s.ooo]] | production | Nico | server140 | | | 2022-10-30 |
35 21 Nico Schottelius
36 1 Nico Schottelius
h2. General architecture and components overview
37
38
* All k8s clusters are IPv6 only
39
* We use BGP peering to propagate podcidr and serviceCidr networks to our infrastructure
40
* The main public testing repository is "ungleich-k8s":https://code.ungleich.ch/ungleich-public/ungleich-k8s
41 18 Nico Schottelius
** Private configurations are found in the **k8s-config** repository
42 1 Nico Schottelius
43
h3. Cluster types
44
45 28 Nico Schottelius
| **Type/Feature**            | **Development**                | **Production**         |
46
| Min No. nodes               | 3 (1 master, 3 worker)         | 5 (3 master, 3 worker) |
47
| Recommended minimum         | 4 (dedicated master, 3 worker) | 8 (3 master, 5 worker) |
48
| Separation of control plane | optional                       | recommended            |
49
| Persistent storage          | required                       | required               |
50
| Number of storage monitors  | 3                              | 5                      |
51 1 Nico Schottelius
52 43 Nico Schottelius
h2. General k8s operations
53 1 Nico Schottelius
54 46 Nico Schottelius
h3. Cheat sheet / external great references
55
56
* "kubectl cheatsheet":https://kubernetes.io/docs/reference/kubectl/cheatsheet/
57
58 117 Nico Schottelius
h3. Allowing to schedule work on the control plane / removing node taints
59 69 Nico Schottelius
60
* Mostly for single node / test / development clusters
61
* Just remove the master taint as follows
62
63
<pre>
64
kubectl taint nodes --all node-role.kubernetes.io/master-
65 118 Nico Schottelius
kubectl taint nodes --all node-role.kubernetes.io/control-plane-
66 69 Nico Schottelius
</pre>
67 1 Nico Schottelius
68 117 Nico Schottelius
You can check the node taints using @kubectl describe node ...@
69 69 Nico Schottelius
70 44 Nico Schottelius
h3. Get the cluster admin.conf
71
72
* On the masters of each cluster you can find the file @/etc/kubernetes/admin.conf@
73
* To be able to administrate the cluster you can copy the admin.conf to your local machine
74
* Multi cluster debugging can very easy if you name the config ~/cX-admin.conf (see example below)
75
76
<pre>
77
% scp root@server47.place7.ungleich.ch:/etc/kubernetes/admin.conf ~/c2-admin.conf
78
% export KUBECONFIG=~/c2-admin.conf    
79
% kubectl get nodes
80
NAME       STATUS                     ROLES                  AGE   VERSION
81
server47   Ready                      control-plane,master   82d   v1.22.0
82
server48   Ready                      control-plane,master   82d   v1.22.0
83
server49   Ready                      <none>                 82d   v1.22.0
84
server50   Ready                      <none>                 82d   v1.22.0
85
server59   Ready                      control-plane,master   82d   v1.22.0
86
server60   Ready,SchedulingDisabled   <none>                 82d   v1.22.0
87
server61   Ready                      <none>                 82d   v1.22.0
88
server62   Ready                      <none>                 82d   v1.22.0               
89
</pre>
90
91 18 Nico Schottelius
h3. Installing a new k8s cluster
92 8 Nico Schottelius
93 9 Nico Schottelius
* Decide on the cluster name (usually *cX.k8s.ooo*), X counting upwards
94 28 Nico Schottelius
** Using pXX.k8s.ooo for production clusters of placeXX
95 9 Nico Schottelius
* Use cdist to configure the nodes with requirements like crio
96
* Decide between single or multi node control plane setups (see below)
97 28 Nico Schottelius
** Single control plane suitable for development clusters
98 9 Nico Schottelius
99 28 Nico Schottelius
Typical init procedure:
100 9 Nico Schottelius
101 28 Nico Schottelius
* Single control plane: @kubeadm init --config bootstrap/XXX/kubeadm.yaml@
102
* Multi control plane (HA): @kubeadm init --config bootstrap/XXX/kubeadm.yaml --upload-certs@
103 10 Nico Schottelius
104 29 Nico Schottelius
h3. Deleting a pod that is hanging in terminating state
105
106
<pre>
107
kubectl delete pod <PODNAME> --grace-period=0 --force --namespace <NAMESPACE>
108
</pre>
109
110
(from https://stackoverflow.com/questions/35453792/pods-stuck-in-terminating-status)
111
112 42 Nico Schottelius
h3. Listing nodes of a cluster
113
114
<pre>
115
[15:05] bridge:~% kubectl get nodes
116
NAME       STATUS   ROLES                  AGE   VERSION
117
server22   Ready    <none>                 52d   v1.22.0
118
server23   Ready    <none>                 52d   v1.22.2
119
server24   Ready    <none>                 52d   v1.22.0
120
server25   Ready    <none>                 52d   v1.22.0
121
server26   Ready    <none>                 52d   v1.22.0
122
server27   Ready    <none>                 52d   v1.22.0
123
server63   Ready    control-plane,master   52d   v1.22.0
124
server64   Ready    <none>                 52d   v1.22.0
125
server65   Ready    control-plane,master   52d   v1.22.0
126
server66   Ready    <none>                 52d   v1.22.0
127
server83   Ready    control-plane,master   52d   v1.22.0
128
server84   Ready    <none>                 52d   v1.22.0
129
server85   Ready    <none>                 52d   v1.22.0
130
server86   Ready    <none>                 52d   v1.22.0
131
</pre>
132
133 41 Nico Schottelius
h3. Removing / draining a node
134
135
Usually @kubectl drain server@ should do the job, but sometimes we need to be more aggressive:
136
137 1 Nico Schottelius
<pre>
138 103 Nico Schottelius
kubectl drain --delete-emptydir-data --ignore-daemonsets serverXX
139 42 Nico Schottelius
</pre>
140
141
h3. Readding a node after draining
142
143
<pre>
144
kubectl uncordon serverXX
145 1 Nico Schottelius
</pre>
146 43 Nico Schottelius
147 50 Nico Schottelius
h3. (Re-)joining worker nodes after creating the cluster
148 49 Nico Schottelius
149
* We need to have an up-to-date token
150
* We use different join commands for the workers and control plane nodes
151
152
Generating the join command on an existing control plane node:
153
154
<pre>
155
kubeadm token create --print-join-command
156
</pre>
157
158 50 Nico Schottelius
h3. (Re-)joining control plane nodes after creating the cluster
159 1 Nico Schottelius
160 50 Nico Schottelius
* We generate the token again
161
* We upload the certificates
162
* We need to combine/create the join command for the control plane node
163
164
Example session:
165
166
<pre>
167
% kubeadm token create --print-join-command
168
kubeadm join p10-api.k8s.ooo:6443 --token xmff4i.ABC --discovery-token-ca-cert-hash sha256:longhash 
169
170
% kubeadm init phase upload-certs --upload-certs
171
[upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
172
[upload-certs] Using certificate key:
173
CERTKEY
174
175
# Then we use these two outputs on the joining node:
176
177
kubeadm join p10-api.k8s.ooo:6443 --token xmff4i.ABC --discovery-token-ca-cert-hash sha256:longhash --control-plane --certificate-key CERTKEY
178
</pre>
179
180
Commands to be used on a control plane node:
181
182
<pre>
183
kubeadm token create --print-join-command
184
kubeadm init phase upload-certs --upload-certs
185
</pre>
186
187
Commands to be used on the joining node:
188
189
<pre>
190
JOINCOMMAND --control-plane --certificate-key CERTKEY
191
</pre>
192 49 Nico Schottelius
193 51 Nico Schottelius
SEE ALSO
194
195
* https://stackoverflow.com/questions/63936268/how-to-generate-kubeadm-token-for-secondary-control-plane-nodes
196
* https://blog.scottlowe.org/2019/08/15/reconstructing-the-join-command-for-kubeadm/
197
198 53 Nico Schottelius
h3. How to fix etcd does not start when rejoining a kubernetes cluster as a control plane
199 52 Nico Schottelius
200
If during the above step etcd does not come up, @kubeadm join@ can hang as follows:
201
202
<pre>
203
[control-plane] Creating static Pod manifest for "kube-apiserver"                                                              
204
[control-plane] Creating static Pod manifest for "kube-controller-manager"                                                     
205
[control-plane] Creating static Pod manifest for "kube-scheduler"                                                              
206
[check-etcd] Checking that the etcd cluster is healthy                                                                         
207
error execution phase check-etcd: etcd cluster is not healthy: failed to dial endpoint https://[2a0a:e5c0:10:1:225:b3ff:fe20:37
208
8a]:2379 with maintenance client: context deadline exceeded                                                                    
209
To see the stack trace of this error execute with --v=5 or higher         
210
</pre>
211
212
Then the problem is likely that the etcd server is still a member of the cluster. We first need to remove it from the etcd cluster and then the join works.
213
214
To fix this we do:
215
216
* Find a working etcd pod
217
* Find the etcd members / member list
218
* Remove the etcd member that we want to re-join the cluster
219
220
221
<pre>
222
# Find the etcd pods
223
kubectl -n kube-system get pods -l component=etcd,tier=control-plane
224
225
# Get the list of etcd servers with the member id 
226
kubectl exec -n kube-system -ti ETCDPODNAME -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert  /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member list
227
228
# Remove the member
229
kubectl exec -n kube-system -ti ETCDPODNAME -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert  /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member remove MEMBERID
230
</pre>
231
232
Sample session:
233
234
<pre>
235
[10:48] line:~% kubectl -n kube-system get pods -l component=etcd,tier=control-plane
236
NAME            READY   STATUS    RESTARTS     AGE
237
etcd-server63   1/1     Running   0            3m11s
238
etcd-server65   1/1     Running   3            7d2h
239
etcd-server83   1/1     Running   8 (6d ago)   7d2h
240
[10:48] line:~% kubectl exec -n kube-system -ti etcd-server65 -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert  /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member list
241
356891cd676df6e4, started, server65, https://[2a0a:e5c0:10:1:225:b3ff:fe20:375c]:2380, https://[2a0a:e5c0:10:1:225:b3ff:fe20:375c]:2379, false
242
371b8a07185dee7e, started, server63, https://[2a0a:e5c0:10:1:225:b3ff:fe20:378a]:2380, https://[2a0a:e5c0:10:1:225:b3ff:fe20:378a]:2379, false
243
5942bc58307f8af9, started, server83, https://[2a0a:e5c0:10:1:3e4a:92ff:fe79:bb98]:2380, https://[2a0a:e5c0:10:1:3e4a:92ff:fe79:bb98]:2379, false
244
245
[10:48] line:~% kubectl exec -n kube-system -ti etcd-server65 -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert  /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member remove 371b8a07185dee7e
246
Member 371b8a07185dee7e removed from cluster e3c0805f592a8f77
247 1 Nico Schottelius
248
</pre>
249
250
SEE ALSO
251
252
* We found the solution using https://stackoverflow.com/questions/67921552/re-installed-node-cannot-join-kubernetes-cluster
253 56 Nico Schottelius
254 147 Nico Schottelius
h3. Node labels (adding, showing, removing)
255
256
Listing the labels:
257
258
<pre>
259
kubectl get nodes --show-labels
260
</pre>
261
262
Adding labels:
263
264
<pre>
265
kubectl label nodes LIST-OF-NODES label1=value1 
266
267
</pre>
268
269
For instance:
270
271
<pre>
272
kubectl label nodes router2 router3 hosttype=router 
273
</pre>
274
275
Selecting nodes in pods:
276
277
<pre>
278
apiVersion: v1
279
kind: Pod
280
...
281
spec:
282
  nodeSelector:
283
    hosttype: router
284
</pre>
285
286 148 Nico Schottelius
Removing labels by adding a minus at the end of the label name:
287
288
<pre>
289
kubectl label node <nodename> <labelname>-
290
</pre>
291
292
For instance:
293
294
<pre>
295
kubectl label nodes router2 router3 hosttype- 
296
</pre>
297
298 147 Nico Schottelius
SEE ALSO
299 1 Nico Schottelius
300 148 Nico Schottelius
* https://kubernetes.io/docs/tasks/configure-pod-container/assign-pods-nodes/
301
* https://stackoverflow.com/questions/34067979/how-to-delete-a-node-label-by-command-and-api
302 147 Nico Schottelius
303 101 Nico Schottelius
h3. Hardware Maintenance using ungleich-hardware
304
305
Use the following manifest and replace the HOST with the actual host:
306
307
<pre>
308
apiVersion: v1
309
kind: Pod
310
metadata:
311
  name: ungleich-hardware-HOST
312
spec:
313
  containers:
314
  - name: ungleich-hardware
315
    image: ungleich/ungleich-hardware:0.0.5
316
    args:
317
    - sleep
318
    - "1000000"
319
    volumeMounts:
320
      - mountPath: /dev
321
        name: dev
322
    securityContext:
323
      privileged: true
324
  nodeSelector:
325
    kubernetes.io/hostname: "HOST"
326
327
  volumes:
328
    - name: dev
329
      hostPath:
330
        path: /dev
331
</pre>
332
333 102 Nico Schottelius
Also see: [[The_ungleich_hardware_maintenance_guide]]
334
335 105 Nico Schottelius
h3. Triggering a cronjob / creating a job from a cronjob
336 104 Nico Schottelius
337
To test a cronjob, we can create a job from a cronjob:
338
339
<pre>
340
kubectl create job --from=cronjob/volume2-daily-backup volume2-manual
341
</pre>
342
343
This creates a job volume2-manual based on the cronjob  volume2-daily
344
345 112 Nico Schottelius
h3. su-ing into a user that has nologin shell set
346
347
Many times users are having nologin as their shell inside the container. To be able to execute maintenance commands within the
348
container, we can use @su -s /bin/sh@ like this:
349
350
<pre>
351
su -s /bin/sh -c '/path/to/your/script' testuser
352
</pre>
353
354
Found on https://serverfault.com/questions/351046/how-to-run-command-as-user-who-has-usr-sbin-nologin-as-shell
355
356 113 Nico Schottelius
h3. How to print a secret value
357
358
Assuming you want the "password" item from a secret, use:
359
360
<pre>
361
kubectl get secret SECRETNAME -o jsonpath="{.data.password}" | base64 -d; echo "" 
362
</pre>
363
364 173 Nico Schottelius
h3. How to upgrade a kubernetes cluster
365 172 Nico Schottelius
366
h4. General
367
368
* Should be done every X months to stay up-to-date
369
** X probably something like 3-6
370
* kubeadm based clusters
371
* Needs specific kubeadm versions for upgrade
372
* Follow instructions on https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/
373 190 Nico Schottelius
* Finding releases: https://github.com/kubernetes/kubernetes/tree/master/CHANGELOG
374 172 Nico Schottelius
375
h4. Getting a specific kubeadm or kubelet version
376
377
<pre>
378 190 Nico Schottelius
RELEASE=v1.22.17
379
RELEASE=v1.23.17
380 181 Nico Schottelius
RELEASE=v1.24.9
381 1 Nico Schottelius
RELEASE=v1.25.9
382
RELEASE=v1.26.6
383 190 Nico Schottelius
RELEASE=v1.27.2
384
385 187 Nico Schottelius
ARCH=amd64
386 172 Nico Schottelius
387
curl -L --remote-name-all https://dl.k8s.io/release/${RELEASE}/bin/linux/${ARCH}/{kubeadm,kubelet}
388 182 Nico Schottelius
chmod u+x kubeadm kubelet
389 172 Nico Schottelius
</pre>
390
391
h4. Steps
392
393
* kubeadm upgrade plan
394
** On one control plane node
395
* kubeadm upgrade apply vXX.YY.ZZ
396
** On one control plane node
397 189 Nico Schottelius
* kubeadm upgrade node
398
** On all other control plane nodes
399
** On all worker nodes afterwards
400
401 172 Nico Schottelius
402 173 Nico Schottelius
Repeat for all control planes nodes. The upgrade kubelet on all other nodes via package manager.
403 172 Nico Schottelius
404 188 Nico Schottelius
h4. Upgrading to 1.26.6
405
406
* https://v1-26.docs.kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/
407
408
409 186 Nico Schottelius
h4. Upgrade to kubernetes 1.27
410
411 192 Nico Schottelius
* https://v1-27.docs.kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/
412 186 Nico Schottelius
* kubelet will not start anymore
413
* reason: @"command failed" err="failed to parse kubelet flag: unknown flag: --container-runtime"@
414
* /var/lib/kubelet/kubeadm-flags.env contains that parameter
415
* remove it, start kubelet
416 192 Nico Schottelius
417
h4. Upgrade to kubernetes 1.28
418
419
* https://v1-28.docs.kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/
420 186 Nico Schottelius
421
h4. Upgrade to crio 1.27: missing crun
422
423
Error message
424
425
<pre>
426
level=fatal msg="validating runtime config: runtime validation: \"crun\" not found in $PATH: exec: \"crun\": executable file not found in $PATH"
427
</pre>
428
429
Fix:
430
431
<pre>
432
apk add crun
433
</pre>
434
435 157 Nico Schottelius
h2. Reference CNI
436
437
* Mainly "stupid", but effective plugins
438
* Main documentation on https://www.cni.dev/plugins/current/
439 158 Nico Schottelius
* Plugins
440
** bridge
441
*** Can create the bridge on the host
442
*** But seems not to be able to add host interfaces to it as well
443
*** Has support for vlan tags
444
** vlan
445
*** creates vlan tagged sub interface on the host
446 160 Nico Schottelius
*** "It's a 1:1 mapping (i.e. no bridge in between)":https://github.com/k8snetworkplumbingwg/multus-cni/issues/569
447 158 Nico Schottelius
** host-device
448
*** moves the interface from the host into the container
449
*** very easy for physical connections to containers
450 159 Nico Schottelius
** ipvlan
451
*** "virtualisation" of a host device
452
*** routing based on IP
453
*** Same MAC for everyone
454
*** Cannot reach the master interface
455
** maclvan
456
*** With mac addresses
457
*** Supports various modes (to be checked)
458
** ptp ("point to point")
459
*** Creates a host device and connects it to the container
460
** win*
461 158 Nico Schottelius
*** Windows implementations
462 157 Nico Schottelius
463 62 Nico Schottelius
h2. Calico CNI
464
465
h3. Calico Installation
466
467
* We install "calico using helm":https://docs.projectcalico.org/getting-started/kubernetes/helm
468
* This has the following advantages:
469
** Easy to upgrade
470
** Does not require os to configure IPv6/dual stack settings as the tigera operator figures out things on its own
471
472
Usually plain calico can be installed directly using:
473
474
<pre>
475 174 Nico Schottelius
VERSION=v3.25.0
476 149 Nico Schottelius
477 1 Nico Schottelius
helm repo add projectcalico https://docs.projectcalico.org/charts
478 167 Nico Schottelius
helm repo update
479 124 Nico Schottelius
helm upgrade --install --namespace tigera calico projectcalico/tigera-operator --version $VERSION --create-namespace
480 1 Nico Schottelius
</pre>
481 92 Nico Schottelius
482
* Check the tags on https://github.com/projectcalico/calico/tags for the latest release
483 62 Nico Schottelius
484
h3. Installing calicoctl
485
486 115 Nico Schottelius
* General installation instructions, including binary download: https://projectcalico.docs.tigera.io/maintenance/clis/calicoctl/install
487
488 62 Nico Schottelius
To be able to manage and configure calico, we need to 
489
"install calicoctl (we choose the version as a pod)":https://docs.projectcalico.org/getting-started/clis/calicoctl/install#install-calicoctl-as-a-kubernetes-pod
490
491
<pre>
492
kubectl apply -f https://docs.projectcalico.org/manifests/calicoctl.yaml
493
</pre>
494
495 93 Nico Schottelius
Or version specific:
496
497
<pre>
498
kubectl apply -f https://github.com/projectcalico/calico/blob/v3.20.4/manifests/calicoctl.yaml
499 97 Nico Schottelius
500
# For 3.22
501
kubectl apply -f https://projectcalico.docs.tigera.io/archive/v3.22/manifests/calicoctl.yaml
502 93 Nico Schottelius
</pre>
503
504 70 Nico Schottelius
And making it easier accessible by alias:
505
506
<pre>
507
alias calicoctl="kubectl exec -i -n kube-system calicoctl -- /calicoctl"
508
</pre>
509
510 62 Nico Schottelius
h3. Calico configuration
511
512 63 Nico Schottelius
By default our k8s clusters "BGP peer":https://docs.projectcalico.org/networking/bgp
513
with an upstream router to propagate podcidr and servicecidr.
514 62 Nico Schottelius
515
Default settings in our infrastructure:
516
517
* We use a full-mesh using the @nodeToNodeMeshEnabled: true@ option
518
* We keep the original next hop so that *only* the server with the pod is announcing it (instead of ecmp)
519 1 Nico Schottelius
* We use private ASNs for k8s clusters
520 63 Nico Schottelius
* We do *not* use any overlay
521 62 Nico Schottelius
522
After installing calico and calicoctl the last step of the installation is usually:
523
524 1 Nico Schottelius
<pre>
525 79 Nico Schottelius
calicoctl create -f - < calico-bgp.yaml
526 62 Nico Schottelius
</pre>
527
528
529
A sample BGP configuration:
530
531
<pre>
532
---
533
apiVersion: projectcalico.org/v3
534
kind: BGPConfiguration
535
metadata:
536
  name: default
537
spec:
538
  logSeverityScreen: Info
539
  nodeToNodeMeshEnabled: true
540
  asNumber: 65534
541
  serviceClusterIPs:
542
  - cidr: 2a0a:e5c0:10:3::/108
543
  serviceExternalIPs:
544
  - cidr: 2a0a:e5c0:10:3::/108
545
---
546
apiVersion: projectcalico.org/v3
547
kind: BGPPeer
548
metadata:
549
  name: router1-place10
550
spec:
551
  peerIP: 2a0a:e5c0:10:1::50
552
  asNumber: 213081
553
  keepOriginalNextHop: true
554
</pre>
555
556 126 Nico Schottelius
h2. Cilium CNI (experimental)
557
558 137 Nico Schottelius
h3. Status
559
560 138 Nico Schottelius
*NO WORKING CILIUM CONFIGURATION FOR IPV6 only modes*
561 137 Nico Schottelius
562 146 Nico Schottelius
h3. Latest error
563
564
It seems cilium does not run on IPv6 only hosts:
565
566
<pre>
567
level=info msg="Validating configured node address ranges" subsys=daemon
568
level=fatal msg="postinit failed" error="external IPv4 node address could not be derived, please configure via --ipv4-node" subsys=daemon
569
level=info msg="Starting IP identity watcher" subsys=ipcache
570
</pre>
571
572
It crashes after that log entry
573
574 128 Nico Schottelius
h3. BGP configuration
575
576
* The cilium-operator will not start without a correct configmap being present beforehand (see error message below)
577
* Creating the bgp config beforehand as a configmap is thus required.
578
579
The error one gets without the configmap present:
580
581
Pods are hanging with:
582
583
<pre>
584
cilium-bpqm6                       0/1     Init:0/4            0             9s
585
cilium-operator-5947d94f7f-5bmh2   0/1     ContainerCreating   0             9s
586
</pre>
587
588
The error message in the cilium-*perator is:
589
590
<pre>
591
Events:
592
  Type     Reason       Age                From               Message
593
  ----     ------       ----               ----               -------
594
  Normal   Scheduled    80s                default-scheduler  Successfully assigned kube-system/cilium-operator-5947d94f7f-lqcsp to server56
595
  Warning  FailedMount  16s (x8 over 80s)  kubelet            MountVolume.SetUp failed for volume "bgp-config-path" : configmap "bgp-config" not found
596
</pre>
597
598
A correct bgp config looks like this:
599
600
<pre>
601
apiVersion: v1
602
kind: ConfigMap
603
metadata:
604
  name: bgp-config
605
  namespace: kube-system
606
data:
607
  config.yaml: |
608
    peers:
609
      - peer-address: 2a0a:e5c0::46
610
        peer-asn: 209898
611
        my-asn: 65533
612
      - peer-address: 2a0a:e5c0::47
613
        peer-asn: 209898
614
        my-asn: 65533
615
    address-pools:
616
      - name: default
617
        protocol: bgp
618
        addresses:
619
          - 2a0a:e5c0:0:14::/64
620
</pre>
621 127 Nico Schottelius
622
h3. Installation
623 130 Nico Schottelius
624 127 Nico Schottelius
Adding the repo
625 1 Nico Schottelius
<pre>
626 127 Nico Schottelius
627 129 Nico Schottelius
helm repo add cilium https://helm.cilium.io/
628 130 Nico Schottelius
helm repo update
629
</pre>
630 129 Nico Schottelius
631 135 Nico Schottelius
Installing + configuring cilium
632 129 Nico Schottelius
<pre>
633 130 Nico Schottelius
ipv6pool=2a0a:e5c0:0:14::/112
634 1 Nico Schottelius
635 146 Nico Schottelius
version=1.12.2
636 129 Nico Schottelius
637
helm upgrade --install cilium cilium/cilium --version $version \
638 1 Nico Schottelius
  --namespace kube-system \
639
  --set ipv4.enabled=false \
640
  --set ipv6.enabled=true \
641 146 Nico Schottelius
  --set enableIPv6Masquerade=false \
642
  --set bgpControlPlane.enabled=true 
643 1 Nico Schottelius
644 146 Nico Schottelius
#  --set ipam.operator.clusterPoolIPv6PodCIDRList=$ipv6pool
645
646
# Old style bgp?
647 136 Nico Schottelius
#   --set bgp.enabled=true --set bgp.announce.podCIDR=true \
648 127 Nico Schottelius
649
# Show possible configuration options
650
helm show values cilium/cilium
651
652 1 Nico Schottelius
</pre>
653 132 Nico Schottelius
654
Using a /64 for ipam.operator.clusterPoolIPv6PodCIDRList fails with:
655
656
<pre>
657
level=fatal msg="Unable to init cluster-pool allocator" error="unable to initialize IPv6 allocator New CIDR set failed; the node CIDR size is too big" subsys=cilium-operator-generic
658
</pre>
659
660 126 Nico Schottelius
661 1 Nico Schottelius
See also https://github.com/cilium/cilium/issues/20756
662 135 Nico Schottelius
663
Seems a /112 is actually working.
664
665
h3. Kernel modules
666
667
Cilium requires the following modules to be loaded on the host (not loaded by default):
668
669
<pre>
670 1 Nico Schottelius
modprobe  ip6table_raw
671
modprobe  ip6table_filter
672
</pre>
673 146 Nico Schottelius
674
h3. Interesting helm flags
675
676
* autoDirectNodeRoutes
677
* bgpControlPlane.enabled = true
678
679
h3. SEE ALSO
680
681
* https://docs.cilium.io/en/v1.12/helm-reference/
682 133 Nico Schottelius
683 179 Nico Schottelius
h2. Multus
684 168 Nico Schottelius
685
* https://github.com/k8snetworkplumbingwg/multus-cni
686
* Installing a deployment w/ CRDs
687 150 Nico Schottelius
688 169 Nico Schottelius
<pre>
689 176 Nico Schottelius
VERSION=v4.0.1
690 169 Nico Schottelius
691 170 Nico Schottelius
kubectl apply -f https://raw.githubusercontent.com/k8snetworkplumbingwg/multus-cni/${VERSION}/deployments/multus-daemonset-crio.yml
692
</pre>
693 169 Nico Schottelius
694 191 Nico Schottelius
h2. ArgoCD
695 56 Nico Schottelius
696 60 Nico Schottelius
h3. Argocd Installation
697 1 Nico Schottelius
698 116 Nico Schottelius
* See https://argo-cd.readthedocs.io/en/stable/
699
700 60 Nico Schottelius
As there is no configuration management present yet, argocd is installed using
701
702 1 Nico Schottelius
<pre>
703 60 Nico Schottelius
kubectl create namespace argocd
704 1 Nico Schottelius
705
# OR: latest stable
706
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml
707
708 191 Nico Schottelius
# OR Specific Version
709
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/v2.3.2/manifests/install.yaml
710 56 Nico Schottelius
711 191 Nico Schottelius
712
</pre>
713 1 Nico Schottelius
714 60 Nico Schottelius
h3. Get the argocd credentials
715
716
<pre>
717
kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d; echo ""
718
</pre>
719 52 Nico Schottelius
720 87 Nico Schottelius
h3. Accessing argocd
721
722
In regular IPv6 clusters:
723
724
* Navigate to https://argocd-server.argocd.CLUSTERDOMAIN
725
726
In legacy IPv4 clusters
727
728
<pre>
729
kubectl --namespace argocd port-forward svc/argocd-server 8080:80
730
</pre>
731
732 88 Nico Schottelius
* Navigate to https://localhost:8080
733
734 68 Nico Schottelius
h3. Using the argocd webhook to trigger changes
735 67 Nico Schottelius
736
* To trigger changes post json https://argocd.example.com/api/webhook
737
738 72 Nico Schottelius
h3. Deploying an application
739
740
* Applications are deployed via git towards gitea (code.ungleich.ch) and then pulled by argo
741 73 Nico Schottelius
* Always include the *redmine-url* pointing to the (customer) ticket
742
** Also add the support-url if it exists
743 72 Nico Schottelius
744
Application sample
745
746
<pre>
747
apiVersion: argoproj.io/v1alpha1
748
kind: Application
749
metadata:
750
  name: gitea-CUSTOMER
751
  namespace: argocd
752
spec:
753
  destination:
754
    namespace: default
755
    server: 'https://kubernetes.default.svc'
756
  source:
757
    path: apps/prod/gitea
758
    repoURL: 'https://code.ungleich.ch/ungleich-intern/k8s-config.git'
759
    targetRevision: HEAD
760
    helm:
761
      parameters:
762
        - name: storage.data.storageClass
763
          value: rook-ceph-block-hdd
764
        - name: storage.data.size
765
          value: 200Gi
766
        - name: storage.db.storageClass
767
          value: rook-ceph-block-ssd
768
        - name: storage.db.size
769
          value: 10Gi
770
        - name: storage.letsencrypt.storageClass
771
          value: rook-ceph-block-hdd
772
        - name: storage.letsencrypt.size
773
          value: 50Mi
774
        - name: letsencryptStaging
775
          value: 'no'
776
        - name: fqdn
777
          value: 'code.verua.online'
778
  project: default
779
  syncPolicy:
780
    automated:
781
      prune: true
782
      selfHeal: true
783
  info:
784
    - name: 'redmine-url'
785
      value: 'https://redmine.ungleich.ch/issues/ISSUEID'
786
    - name: 'support-url'
787
      value: 'https://support.ungleich.ch/Ticket/Display.html?id=TICKETID'
788
</pre>
789
790 80 Nico Schottelius
h2. Helm related operations and conventions
791 55 Nico Schottelius
792 61 Nico Schottelius
We use helm charts extensively.
793
794
* In production, they are managed via argocd
795
* In development, helm chart can de developed and deployed manually using the helm utility.
796
797 55 Nico Schottelius
h3. Installing a helm chart
798
799
One can use the usual pattern of
800
801
<pre>
802
helm install <releasename> <chartdirectory>
803
</pre>
804
805
However often you want to reinstall/update when testing helm charts. The following pattern is "better", because it allows you to reinstall, if it is already installed:
806
807
<pre>
808
helm upgrade --install <releasename> <chartdirectory>
809 1 Nico Schottelius
</pre>
810 80 Nico Schottelius
811
h3. Naming services and deployments in helm charts [Application labels]
812
813
* We always have {{ .Release.Name }} to identify the current "instance"
814
* Deployments:
815
** use @app: <what it is>@, f.i. @app: nginx@, @app: postgres@, ...
816 81 Nico Schottelius
* See more about standard labels on
817
** https://kubernetes.io/docs/concepts/overview/working-with-objects/common-labels/
818
** https://helm.sh/docs/chart_best_practices/labels/
819 55 Nico Schottelius
820 151 Nico Schottelius
h3. Show all versions of a helm chart
821
822
<pre>
823
helm search repo -l repo/chart
824
</pre>
825
826
For example:
827
828
<pre>
829
% helm search repo -l projectcalico/tigera-operator 
830
NAME                         	CHART VERSION	APP VERSION	DESCRIPTION                            
831
projectcalico/tigera-operator	v3.23.3      	v3.23.3    	Installs the Tigera operator for Calico
832
projectcalico/tigera-operator	v3.23.2      	v3.23.2    	Installs the Tigera operator for Calico
833
....
834
</pre>
835
836 152 Nico Schottelius
h3. Show possible values of a chart
837
838
<pre>
839
helm show values <repo/chart>
840
</pre>
841
842
Example:
843
844
<pre>
845
helm show values ingress-nginx/ingress-nginx
846
</pre>
847
848 178 Nico Schottelius
h3. Download a chart
849
850
For instance for checking it out locally. Use:
851
852
<pre>
853
helm pull <repo/chart>
854
</pre>
855 152 Nico Schottelius
856 139 Nico Schottelius
h2. Rook + Ceph
857
858
h3. Installation
859
860
* Usually directly via argocd
861
862 71 Nico Schottelius
h3. Executing ceph commands
863
864
Using the ceph-tools pod as follows:
865
866
<pre>
867
kubectl exec -n rook-ceph -ti $(kubectl -n rook-ceph get pods -l app=rook-ceph-tools -o jsonpath='{.items[*].metadata.name}') -- ceph -s
868
</pre>
869
870 43 Nico Schottelius
h3. Inspecting the logs of a specific server
871
872
<pre>
873
# Get the related pods
874
kubectl -n rook-ceph get pods -l app=rook-ceph-osd-prepare 
875
...
876
877
# Inspect the logs of a specific pod
878
kubectl -n rook-ceph logs -f rook-ceph-osd-prepare-server23--1-444qx
879
880 71 Nico Schottelius
</pre>
881
882
h3. Inspecting the logs of the rook-ceph-operator
883
884
<pre>
885
kubectl -n rook-ceph logs -f -l app=rook-ceph-operator
886 43 Nico Schottelius
</pre>
887
888 121 Nico Schottelius
h3. Restarting the rook operator
889
890
<pre>
891
kubectl -n rook-ceph delete pods  -l app=rook-ceph-operator
892
</pre>
893
894 43 Nico Schottelius
h3. Triggering server prepare / adding new osds
895
896
The rook-ceph-operator triggers/watches/creates pods to maintain hosts. To trigger a full "re scan", simply delete that pod:
897
898
<pre>
899
kubectl -n rook-ceph delete pods -l app=rook-ceph-operator
900
</pre>
901
902
This will cause all the @rook-ceph-osd-prepare-..@ jobs to be recreated and thus OSDs to be created, if new disks have been added.
903
904
h3. Removing an OSD
905
906
* See "Ceph OSD Management":https://rook.io/docs/rook/v1.7/ceph-osd-mgmt.html
907 77 Nico Schottelius
* More specifically: https://github.com/rook/rook/blob/release-1.7/cluster/examples/kubernetes/ceph/osd-purge.yaml
908 99 Nico Schottelius
* Then delete the related deployment
909 41 Nico Schottelius
910 98 Nico Schottelius
Set osd id in the osd-purge.yaml and apply it. OSD should be down before.
911
912
<pre>
913
apiVersion: batch/v1
914
kind: Job
915
metadata:
916
  name: rook-ceph-purge-osd
917
  namespace: rook-ceph # namespace:cluster
918
  labels:
919
    app: rook-ceph-purge-osd
920
spec:
921
  template:
922
    metadata:
923
      labels:
924
        app: rook-ceph-purge-osd
925
    spec:
926
      serviceAccountName: rook-ceph-purge-osd
927
      containers:
928
        - name: osd-removal
929
          image: rook/ceph:master
930
          # TODO: Insert the OSD ID in the last parameter that is to be removed
931
          # The OSD IDs are a comma-separated list. For example: "0" or "0,2".
932
          # If you want to preserve the OSD PVCs, set `--preserve-pvc true`.
933
          #
934
          # A --force-osd-removal option is available if the OSD should be destroyed even though the
935
          # removal could lead to data loss.
936
          args:
937
            - "ceph"
938
            - "osd"
939
            - "remove"
940
            - "--preserve-pvc"
941
            - "false"
942
            - "--force-osd-removal"
943
            - "false"
944
            - "--osd-ids"
945
            - "SETTHEOSDIDHERE"
946
          env:
947
            - name: POD_NAMESPACE
948
              valueFrom:
949
                fieldRef:
950
                  fieldPath: metadata.namespace
951
            - name: ROOK_MON_ENDPOINTS
952
              valueFrom:
953
                configMapKeyRef:
954
                  key: data
955
                  name: rook-ceph-mon-endpoints
956
            - name: ROOK_CEPH_USERNAME
957
              valueFrom:
958
                secretKeyRef:
959
                  key: ceph-username
960
                  name: rook-ceph-mon
961
            - name: ROOK_CEPH_SECRET
962
              valueFrom:
963
                secretKeyRef:
964
                  key: ceph-secret
965
                  name: rook-ceph-mon
966
            - name: ROOK_CONFIG_DIR
967
              value: /var/lib/rook
968
            - name: ROOK_CEPH_CONFIG_OVERRIDE
969
              value: /etc/rook/config/override.conf
970
            - name: ROOK_FSID
971
              valueFrom:
972
                secretKeyRef:
973
                  key: fsid
974
                  name: rook-ceph-mon
975
            - name: ROOK_LOG_LEVEL
976
              value: DEBUG
977
          volumeMounts:
978
            - mountPath: /etc/ceph
979
              name: ceph-conf-emptydir
980
            - mountPath: /var/lib/rook
981
              name: rook-config
982
      volumes:
983
        - emptyDir: {}
984
          name: ceph-conf-emptydir
985
        - emptyDir: {}
986
          name: rook-config
987
      restartPolicy: Never
988
989
990 99 Nico Schottelius
</pre>
991
992 1 Nico Schottelius
Deleting the deployment:
993
994
<pre>
995
[18:05] bridge:~% kubectl -n rook-ceph delete deployment rook-ceph-osd-6
996 99 Nico Schottelius
deployment.apps "rook-ceph-osd-6" deleted
997
</pre>
998 185 Nico Schottelius
999
h3. Placement of mons/osds/etc.
1000
1001
See https://rook.io/docs/rook/v1.11/CRDs/Cluster/ceph-cluster-crd/#placement-configuration-settings
1002 98 Nico Schottelius
1003 145 Nico Schottelius
h2. Ingress + Cert Manager
1004
1005
* We deploy "nginx-ingress":https://docs.nginx.com/nginx-ingress-controller/ to get an ingress
1006
* we deploy "cert-manager":https://cert-manager.io/ to handle certificates
1007
* We independently deploy @ClusterIssuer@ to allow the cert-manager app to deploy and the issuer to be created once the CRDs from cert manager are in place
1008
1009
h3. IPv4 reachability 
1010
1011
The ingress is by default IPv6 only. To make it reachable from the IPv4 world, get its IPv6 address and configure a NAT64 mapping in Jool.
1012
1013
Steps:
1014
1015
h4. Get the ingress IPv6 address
1016
1017
Use @kubectl -n ingress-nginx get svc ingress-nginx-controller -o jsonpath='{.spec.clusterIP}'; echo ''@
1018
1019
Example:
1020
1021
<pre>
1022
kubectl -n ingress-nginx get svc ingress-nginx-controller -o jsonpath='{.spec.clusterIP}'; echo ''
1023
2a0a:e5c0:10:1b::ce11
1024
</pre>
1025
1026
h4. Add NAT64 mapping
1027
1028
* Update the __dcl_jool_siit cdist type
1029
* Record the two IPs (IPv6 and IPv4)
1030
* Configure all routers
1031
1032
1033
h4. Add DNS record
1034
1035
To use the ingress capable as a CNAME destination, create an "ingress" DNS record, such as:
1036
1037
<pre>
1038
; k8s ingress for dev
1039
dev-ingress                 AAAA 2a0a:e5c0:10:1b::ce11
1040
dev-ingress                 A 147.78.194.23
1041
1042
</pre> 
1043
1044
h4. Add supporting wildcard DNS
1045
1046
If you plan to add various sites under a specific domain, we can add a wildcard DNS entry, such as *.k8s-dev.django-hosting.ch:
1047
1048
<pre>
1049
*.k8s-dev         CNAME dev-ingress.ungleich.ch.
1050
</pre>
1051
1052 76 Nico Schottelius
h2. Harbor
1053
1054 175 Nico Schottelius
* We user "Harbor":https://goharbor.io/ as an image registry for our own images. Internal app reference: apps/prod/harbor.
1055
* The admin password is in the password store, it is Harbor12345 by default
1056 76 Nico Schottelius
* At the moment harbor only authenticates against the internal ldap tree
1057
1058
h3. LDAP configuration
1059
1060
* The url needs to be ldaps://...
1061
* uid = uid
1062
* rest standard
1063 75 Nico Schottelius
1064 89 Nico Schottelius
h2. Monitoring / Prometheus
1065
1066 90 Nico Schottelius
* Via "kube-prometheus":https://github.com/prometheus-operator/kube-prometheus/
1067 89 Nico Schottelius
1068 91 Nico Schottelius
Access via ...
1069
1070
* http://prometheus-k8s.monitoring.svc:9090
1071
* http://grafana.monitoring.svc:3000
1072
* http://alertmanager.monitoring.svc:9093
1073
1074
1075 100 Nico Schottelius
h3. Prometheus Options
1076
1077
* "helm/kube-prometheus-stack":https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack
1078
** Includes dashboards and co.
1079
* "manifest based kube-prometheus":https://github.com/prometheus-operator/kube-prometheus
1080
** Includes dashboards and co.
1081
* "Prometheus Operator (mainly CRD manifest":https://github.com/prometheus-operator/prometheus-operator
1082
1083 171 Nico Schottelius
h3. Grafana default password
1084
1085
* If not changed: @prom-operator@
1086
1087 82 Nico Schottelius
h2. Nextcloud
1088
1089 85 Nico Schottelius
h3. How to get the nextcloud credentials 
1090 84 Nico Schottelius
1091
* The initial username is set to "nextcloud"
1092
* The password is autogenerated and saved in a kubernetes secret
1093
1094
<pre>
1095 85 Nico Schottelius
kubectl get secret RELEASENAME-nextcloud -o jsonpath="{.data.PASSWORD}" | base64 -d; echo "" 
1096 84 Nico Schottelius
</pre>
1097
1098 83 Nico Schottelius
h3. How to fix "Access through untrusted domain"
1099
1100 82 Nico Schottelius
* Nextcloud stores the initial domain configuration
1101 1 Nico Schottelius
* If the FQDN is changed, it will show the error message "Access through untrusted domain"
1102 82 Nico Schottelius
* To fix, edit /var/www/html/config/config.php and correct the domain
1103 1 Nico Schottelius
* Then delete the pods
1104 165 Nico Schottelius
1105
h3. Running occ commands inside the nextcloud container
1106
1107
* Find the pod in the right namespace
1108
1109
Exec:
1110
1111
<pre>
1112
su www-data -s /bin/sh -c ./occ
1113
</pre>
1114
1115
* -s /bin/sh is needed as the default shell is set to /bin/false
1116
1117 166 Nico Schottelius
h4. Rescanning files
1118 165 Nico Schottelius
1119 166 Nico Schottelius
* If files have been added without nextcloud's knowledge
1120
1121
<pre>
1122
su www-data -s /bin/sh -c "./occ files:scan --all"
1123
</pre>
1124 82 Nico Schottelius
1125 1 Nico Schottelius
h2. Infrastructure versions
1126 35 Nico Schottelius
1127 57 Nico Schottelius
h3. ungleich kubernetes infrastructure v5 (2021-10)
1128 1 Nico Schottelius
1129 57 Nico Schottelius
Clusters are configured / setup in this order:
1130
1131
* Bootstrap via kubeadm
1132 59 Nico Schottelius
* "Networking via calico + BGP (non ECMP) using helm":https://docs.projectcalico.org/getting-started/kubernetes/helm
1133
* "ArgoCD for CD":https://argo-cd.readthedocs.io/en/stable/
1134
** "rook for storage via argocd":https://rook.io/
1135 58 Nico Schottelius
** haproxy for in IPv6-cluster-IPv4-to-IPv6 proxy via argocd
1136
** "kubernetes-secret-generator for in cluster secrets":https://github.com/mittwald/kubernetes-secret-generator
1137
** "ungleich-certbot managing certs and nginx":https://hub.docker.com/repository/docker/ungleich/ungleich-certbot
1138
1139 57 Nico Schottelius
1140
h3. ungleich kubernetes infrastructure v4 (2021-09)
1141
1142 54 Nico Schottelius
* rook is configured via manifests instead of using the rook-ceph-cluster helm chart
1143 1 Nico Schottelius
* The rook operator is still being installed via helm
1144 35 Nico Schottelius
1145 57 Nico Schottelius
h3. ungleich kubernetes infrastructure v3 (2021-07)
1146 1 Nico Schottelius
1147 10 Nico Schottelius
* rook is now installed via helm via argocd instead of directly via manifests
1148 28 Nico Schottelius
1149 57 Nico Schottelius
h3. ungleich kubernetes infrastructure v2 (2021-05)
1150 28 Nico Schottelius
1151
* Replaced fluxv2 from ungleich k8s v1 with argocd
1152 1 Nico Schottelius
** argocd can apply helm templates directly without needing to go through Chart releases
1153 28 Nico Schottelius
* We are also using argoflow for build flows
1154
* Planned to add "kaniko":https://github.com/GoogleContainerTools/kaniko for image building
1155
1156 57 Nico Schottelius
h3. ungleich kubernetes infrastructure v1 (2021-01)
1157 28 Nico Schottelius
1158
We are using the following components:
1159
1160
* "Calico as a CNI":https://www.projectcalico.org/ with BGP, IPv6 only, no encapsulation
1161
** Needed for basic networking
1162
* "kubernetes-secret-generator":https://github.com/mittwald/kubernetes-secret-generator for creating secrets
1163
** Needed so that secrets are not stored in the git repository, but only in the cluster
1164
* "ungleich-certbot":https://hub.docker.com/repository/docker/ungleich/ungleich-certbot
1165
** Needed to get letsencrypt certificates for services
1166
* "rook with ceph rbd + cephfs":https://rook.io/ for storage
1167
** rbd for almost everything, *ReadWriteOnce*
1168
** cephfs for smaller things, multi access *ReadWriteMany*
1169
** Needed for providing persistent storage
1170
* "flux v2":https://fluxcd.io/
1171
** Needed to manage resources automatically