Project

General

Profile

The ungleich kubernetes infrastructure » History » Version 187

Nico Schottelius, 06/25/2023 10:47 AM

1 22 Nico Schottelius
h1. The ungleich kubernetes infrastructure and ungleich kubernetes manual
2 1 Nico Schottelius
3 3 Nico Schottelius
{{toc}}
4
5 1 Nico Schottelius
h2. Status
6
7 28 Nico Schottelius
This document is **pre-production**.
8
This document is to become the ungleich kubernetes infrastructure overview as well as the ungleich kubernetes manual.
9 1 Nico Schottelius
10 10 Nico Schottelius
h2. k8s clusters
11
12 123 Nico Schottelius
| Cluster            | Purpose/Setup     | Maintainer | Master(s)                     | argo                                                   | v4 http proxy | last verified |
13
| c0.k8s.ooo         | Dev               | -          | UNUSED                        |                                                        |               |    2021-10-05 |
14
| c1.k8s.ooo         | retired           |            | -                             |                                                        |               |    2022-03-15 |
15
| c2.k8s.ooo         | Dev p7 HW         | Nico       | server47 server53 server54    | "argo":https://argocd-server.argocd.svc.c2.k8s.ooo     |               |    2021-10-05 |
16
| c3.k8s.ooo         | retired           | -          | -                             |                                                        |               |    2021-10-05 |
17
| c4.k8s.ooo         | Dev2 p7 HW        | Jin-Guk    | server52 server53 server54    |                                                        |               |             - |
18
| c5.k8s.ooo         | retired           |            | -                             |                                                        |               |    2022-03-15 |
19
| c6.k8s.ooo         | Dev p6 VM Jin-Guk | Jin-Guk    |                               |                                                        |               |               |
20
| [[p5.k8s.ooo]]     | production        |            | server34 server36 server38    | "argo":https://argocd-server.argocd.svc.p5.k8s.ooo     | -             |               |
21
| [[p5-cow.k8s.ooo]] | production        | Nico       | server47 server51 server55    | "argo":https://argocd-server.argocd.svc.p5-cow.k8s.ooo |               |    2022-08-27 |
22
| [[p6.k8s.ooo]]     | production        |            | server67 server69 server71    | "argo":https://argocd-server.argocd.svc.p6.k8s.ooo     | 147.78.194.13 |    2021-10-05 |
23 184 Nico Schottelius
| [[p6-cow.k8s.ooo]] | production        |            | server134 server135 server136 | "argo":https://argocd-server.argocd.svc.p6in10.k8s.ooo | ?             |    2023-05-17 |
24 177 Nico Schottelius
| [[p10.k8s.ooo]]    | production        |            | server131 server132 server133 | "argo":https://argocd-server.argocd.svc.p10.k8s.ooo    | 147.78.194.12 |    2021-10-05 |
25 123 Nico Schottelius
| [[k8s.ge.nau.so]]  | development       |            | server107 server108 server109 | "argo":https://argocd-server.argocd.svc.k8s.ge.nau.so  |               |               |
26
| [[dev.k8s.ooo]]    | development       |            | server110 server111 server112 | "argo":https://argocd-server.argocd.svc.dev.k8s.ooo    | -             |    2022-07-08 |
27 164 Nico Schottelius
| [[r1r2p15k8sooo|r1.p15.k8s.ooo]] | production | Nico | server120 | | | 2022-10-30 |
28
| [[r1r2p15k8sooo|r2.p15.k8s.ooo]] | production | Nico | server121 | | | 2022-09-06 |
29 162 Nico Schottelius
| [[r1r2p10k8sooo|r1.p10.k8s.ooo]] | production | Nico | server122 | | | 2022-10-30 |
30
| [[r1r2p10k8sooo|r2.p10.k8s.ooo]] | production | Nico | server123 | | | 2022-10-15 |
31
| [[r1r2p5k8sooo|r1.p5.k8s.ooo]] | production | Nico | server137 | | | 2022-10-30 |
32
| [[r1r2p5k8sooo|r2.p5.k8s.ooo]] | production | Nico | server138 | | | 2022-10-30 |
33
| [[r1r2p6k8sooo|r1.p6.k8s.ooo]] | production | Nico | server139 | | | 2022-10-30 |
34
| [[r1r2p6k8sooo|r2.p6.k8s.ooo]] | production | Nico | server140 | | | 2022-10-30 |
35 21 Nico Schottelius
36 1 Nico Schottelius
h2. General architecture and components overview
37
38
* All k8s clusters are IPv6 only
39
* We use BGP peering to propagate podcidr and serviceCidr networks to our infrastructure
40
* The main public testing repository is "ungleich-k8s":https://code.ungleich.ch/ungleich-public/ungleich-k8s
41 18 Nico Schottelius
** Private configurations are found in the **k8s-config** repository
42 1 Nico Schottelius
43
h3. Cluster types
44
45 28 Nico Schottelius
| **Type/Feature**            | **Development**                | **Production**         |
46
| Min No. nodes               | 3 (1 master, 3 worker)         | 5 (3 master, 3 worker) |
47
| Recommended minimum         | 4 (dedicated master, 3 worker) | 8 (3 master, 5 worker) |
48
| Separation of control plane | optional                       | recommended            |
49
| Persistent storage          | required                       | required               |
50
| Number of storage monitors  | 3                              | 5                      |
51 1 Nico Schottelius
52 43 Nico Schottelius
h2. General k8s operations
53 1 Nico Schottelius
54 46 Nico Schottelius
h3. Cheat sheet / external great references
55
56
* "kubectl cheatsheet":https://kubernetes.io/docs/reference/kubectl/cheatsheet/
57
58 117 Nico Schottelius
h3. Allowing to schedule work on the control plane / removing node taints
59 69 Nico Schottelius
60
* Mostly for single node / test / development clusters
61
* Just remove the master taint as follows
62
63
<pre>
64
kubectl taint nodes --all node-role.kubernetes.io/master-
65 118 Nico Schottelius
kubectl taint nodes --all node-role.kubernetes.io/control-plane-
66 69 Nico Schottelius
</pre>
67 1 Nico Schottelius
68 117 Nico Schottelius
You can check the node taints using @kubectl describe node ...@
69 69 Nico Schottelius
70 44 Nico Schottelius
h3. Get the cluster admin.conf
71
72
* On the masters of each cluster you can find the file @/etc/kubernetes/admin.conf@
73
* To be able to administrate the cluster you can copy the admin.conf to your local machine
74
* Multi cluster debugging can very easy if you name the config ~/cX-admin.conf (see example below)
75
76
<pre>
77
% scp root@server47.place7.ungleich.ch:/etc/kubernetes/admin.conf ~/c2-admin.conf
78
% export KUBECONFIG=~/c2-admin.conf    
79
% kubectl get nodes
80
NAME       STATUS                     ROLES                  AGE   VERSION
81
server47   Ready                      control-plane,master   82d   v1.22.0
82
server48   Ready                      control-plane,master   82d   v1.22.0
83
server49   Ready                      <none>                 82d   v1.22.0
84
server50   Ready                      <none>                 82d   v1.22.0
85
server59   Ready                      control-plane,master   82d   v1.22.0
86
server60   Ready,SchedulingDisabled   <none>                 82d   v1.22.0
87
server61   Ready                      <none>                 82d   v1.22.0
88
server62   Ready                      <none>                 82d   v1.22.0               
89
</pre>
90
91 18 Nico Schottelius
h3. Installing a new k8s cluster
92 8 Nico Schottelius
93 9 Nico Schottelius
* Decide on the cluster name (usually *cX.k8s.ooo*), X counting upwards
94 28 Nico Schottelius
** Using pXX.k8s.ooo for production clusters of placeXX
95 9 Nico Schottelius
* Use cdist to configure the nodes with requirements like crio
96
* Decide between single or multi node control plane setups (see below)
97 28 Nico Schottelius
** Single control plane suitable for development clusters
98 9 Nico Schottelius
99 28 Nico Schottelius
Typical init procedure:
100 9 Nico Schottelius
101 28 Nico Schottelius
* Single control plane: @kubeadm init --config bootstrap/XXX/kubeadm.yaml@
102
* Multi control plane (HA): @kubeadm init --config bootstrap/XXX/kubeadm.yaml --upload-certs@
103 10 Nico Schottelius
104 29 Nico Schottelius
h3. Deleting a pod that is hanging in terminating state
105
106
<pre>
107
kubectl delete pod <PODNAME> --grace-period=0 --force --namespace <NAMESPACE>
108
</pre>
109
110
(from https://stackoverflow.com/questions/35453792/pods-stuck-in-terminating-status)
111
112 42 Nico Schottelius
h3. Listing nodes of a cluster
113
114
<pre>
115
[15:05] bridge:~% kubectl get nodes
116
NAME       STATUS   ROLES                  AGE   VERSION
117
server22   Ready    <none>                 52d   v1.22.0
118
server23   Ready    <none>                 52d   v1.22.2
119
server24   Ready    <none>                 52d   v1.22.0
120
server25   Ready    <none>                 52d   v1.22.0
121
server26   Ready    <none>                 52d   v1.22.0
122
server27   Ready    <none>                 52d   v1.22.0
123
server63   Ready    control-plane,master   52d   v1.22.0
124
server64   Ready    <none>                 52d   v1.22.0
125
server65   Ready    control-plane,master   52d   v1.22.0
126
server66   Ready    <none>                 52d   v1.22.0
127
server83   Ready    control-plane,master   52d   v1.22.0
128
server84   Ready    <none>                 52d   v1.22.0
129
server85   Ready    <none>                 52d   v1.22.0
130
server86   Ready    <none>                 52d   v1.22.0
131
</pre>
132
133 41 Nico Schottelius
h3. Removing / draining a node
134
135
Usually @kubectl drain server@ should do the job, but sometimes we need to be more aggressive:
136
137 1 Nico Schottelius
<pre>
138 103 Nico Schottelius
kubectl drain --delete-emptydir-data --ignore-daemonsets serverXX
139 42 Nico Schottelius
</pre>
140
141
h3. Readding a node after draining
142
143
<pre>
144
kubectl uncordon serverXX
145 1 Nico Schottelius
</pre>
146 43 Nico Schottelius
147 50 Nico Schottelius
h3. (Re-)joining worker nodes after creating the cluster
148 49 Nico Schottelius
149
* We need to have an up-to-date token
150
* We use different join commands for the workers and control plane nodes
151
152
Generating the join command on an existing control plane node:
153
154
<pre>
155
kubeadm token create --print-join-command
156
</pre>
157
158 50 Nico Schottelius
h3. (Re-)joining control plane nodes after creating the cluster
159 1 Nico Schottelius
160 50 Nico Schottelius
* We generate the token again
161
* We upload the certificates
162
* We need to combine/create the join command for the control plane node
163
164
Example session:
165
166
<pre>
167
% kubeadm token create --print-join-command
168
kubeadm join p10-api.k8s.ooo:6443 --token xmff4i.ABC --discovery-token-ca-cert-hash sha256:longhash 
169
170
% kubeadm init phase upload-certs --upload-certs
171
[upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
172
[upload-certs] Using certificate key:
173
CERTKEY
174
175
# Then we use these two outputs on the joining node:
176
177
kubeadm join p10-api.k8s.ooo:6443 --token xmff4i.ABC --discovery-token-ca-cert-hash sha256:longhash --control-plane --certificate-key CERTKEY
178
</pre>
179
180
Commands to be used on a control plane node:
181
182
<pre>
183
kubeadm token create --print-join-command
184
kubeadm init phase upload-certs --upload-certs
185
</pre>
186
187
Commands to be used on the joining node:
188
189
<pre>
190
JOINCOMMAND --control-plane --certificate-key CERTKEY
191
</pre>
192 49 Nico Schottelius
193 51 Nico Schottelius
SEE ALSO
194
195
* https://stackoverflow.com/questions/63936268/how-to-generate-kubeadm-token-for-secondary-control-plane-nodes
196
* https://blog.scottlowe.org/2019/08/15/reconstructing-the-join-command-for-kubeadm/
197
198 53 Nico Schottelius
h3. How to fix etcd does not start when rejoining a kubernetes cluster as a control plane
199 52 Nico Schottelius
200
If during the above step etcd does not come up, @kubeadm join@ can hang as follows:
201
202
<pre>
203
[control-plane] Creating static Pod manifest for "kube-apiserver"                                                              
204
[control-plane] Creating static Pod manifest for "kube-controller-manager"                                                     
205
[control-plane] Creating static Pod manifest for "kube-scheduler"                                                              
206
[check-etcd] Checking that the etcd cluster is healthy                                                                         
207
error execution phase check-etcd: etcd cluster is not healthy: failed to dial endpoint https://[2a0a:e5c0:10:1:225:b3ff:fe20:37
208
8a]:2379 with maintenance client: context deadline exceeded                                                                    
209
To see the stack trace of this error execute with --v=5 or higher         
210
</pre>
211
212
Then the problem is likely that the etcd server is still a member of the cluster. We first need to remove it from the etcd cluster and then the join works.
213
214
To fix this we do:
215
216
* Find a working etcd pod
217
* Find the etcd members / member list
218
* Remove the etcd member that we want to re-join the cluster
219
220
221
<pre>
222
# Find the etcd pods
223
kubectl -n kube-system get pods -l component=etcd,tier=control-plane
224
225
# Get the list of etcd servers with the member id 
226
kubectl exec -n kube-system -ti ETCDPODNAME -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert  /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member list
227
228
# Remove the member
229
kubectl exec -n kube-system -ti ETCDPODNAME -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert  /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member remove MEMBERID
230
</pre>
231
232
Sample session:
233
234
<pre>
235
[10:48] line:~% kubectl -n kube-system get pods -l component=etcd,tier=control-plane
236
NAME            READY   STATUS    RESTARTS     AGE
237
etcd-server63   1/1     Running   0            3m11s
238
etcd-server65   1/1     Running   3            7d2h
239
etcd-server83   1/1     Running   8 (6d ago)   7d2h
240
[10:48] line:~% kubectl exec -n kube-system -ti etcd-server65 -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert  /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member list
241
356891cd676df6e4, started, server65, https://[2a0a:e5c0:10:1:225:b3ff:fe20:375c]:2380, https://[2a0a:e5c0:10:1:225:b3ff:fe20:375c]:2379, false
242
371b8a07185dee7e, started, server63, https://[2a0a:e5c0:10:1:225:b3ff:fe20:378a]:2380, https://[2a0a:e5c0:10:1:225:b3ff:fe20:378a]:2379, false
243
5942bc58307f8af9, started, server83, https://[2a0a:e5c0:10:1:3e4a:92ff:fe79:bb98]:2380, https://[2a0a:e5c0:10:1:3e4a:92ff:fe79:bb98]:2379, false
244
245
[10:48] line:~% kubectl exec -n kube-system -ti etcd-server65 -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert  /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member remove 371b8a07185dee7e
246
Member 371b8a07185dee7e removed from cluster e3c0805f592a8f77
247 1 Nico Schottelius
248
</pre>
249
250
SEE ALSO
251
252
* We found the solution using https://stackoverflow.com/questions/67921552/re-installed-node-cannot-join-kubernetes-cluster
253 56 Nico Schottelius
254 147 Nico Schottelius
h3. Node labels (adding, showing, removing)
255
256
Listing the labels:
257
258
<pre>
259
kubectl get nodes --show-labels
260
</pre>
261
262
Adding labels:
263
264
<pre>
265
kubectl label nodes LIST-OF-NODES label1=value1 
266
267
</pre>
268
269
For instance:
270
271
<pre>
272
kubectl label nodes router2 router3 hosttype=router 
273
</pre>
274
275
Selecting nodes in pods:
276
277
<pre>
278
apiVersion: v1
279
kind: Pod
280
...
281
spec:
282
  nodeSelector:
283
    hosttype: router
284
</pre>
285
286 148 Nico Schottelius
Removing labels by adding a minus at the end of the label name:
287
288
<pre>
289
kubectl label node <nodename> <labelname>-
290
</pre>
291
292
For instance:
293
294
<pre>
295
kubectl label nodes router2 router3 hosttype- 
296
</pre>
297
298 147 Nico Schottelius
SEE ALSO
299 1 Nico Schottelius
300 148 Nico Schottelius
* https://kubernetes.io/docs/tasks/configure-pod-container/assign-pods-nodes/
301
* https://stackoverflow.com/questions/34067979/how-to-delete-a-node-label-by-command-and-api
302 147 Nico Schottelius
303 101 Nico Schottelius
h3. Hardware Maintenance using ungleich-hardware
304
305
Use the following manifest and replace the HOST with the actual host:
306
307
<pre>
308
apiVersion: v1
309
kind: Pod
310
metadata:
311
  name: ungleich-hardware-HOST
312
spec:
313
  containers:
314
  - name: ungleich-hardware
315
    image: ungleich/ungleich-hardware:0.0.5
316
    args:
317
    - sleep
318
    - "1000000"
319
    volumeMounts:
320
      - mountPath: /dev
321
        name: dev
322
    securityContext:
323
      privileged: true
324
  nodeSelector:
325
    kubernetes.io/hostname: "HOST"
326
327
  volumes:
328
    - name: dev
329
      hostPath:
330
        path: /dev
331
</pre>
332
333 102 Nico Schottelius
Also see: [[The_ungleich_hardware_maintenance_guide]]
334
335 105 Nico Schottelius
h3. Triggering a cronjob / creating a job from a cronjob
336 104 Nico Schottelius
337
To test a cronjob, we can create a job from a cronjob:
338
339
<pre>
340
kubectl create job --from=cronjob/volume2-daily-backup volume2-manual
341
</pre>
342
343
This creates a job volume2-manual based on the cronjob  volume2-daily
344
345 112 Nico Schottelius
h3. su-ing into a user that has nologin shell set
346
347
Many times users are having nologin as their shell inside the container. To be able to execute maintenance commands within the
348
container, we can use @su -s /bin/sh@ like this:
349
350
<pre>
351
su -s /bin/sh -c '/path/to/your/script' testuser
352
</pre>
353
354
Found on https://serverfault.com/questions/351046/how-to-run-command-as-user-who-has-usr-sbin-nologin-as-shell
355
356 113 Nico Schottelius
h3. How to print a secret value
357
358
Assuming you want the "password" item from a secret, use:
359
360
<pre>
361
kubectl get secret SECRETNAME -o jsonpath="{.data.password}" | base64 -d; echo "" 
362
</pre>
363
364 173 Nico Schottelius
h3. How to upgrade a kubernetes cluster
365 172 Nico Schottelius
366
h4. General
367
368
* Should be done every X months to stay up-to-date
369
** X probably something like 3-6
370
* kubeadm based clusters
371
* Needs specific kubeadm versions for upgrade
372
* Follow instructions on https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/
373
374
h4. Getting a specific kubeadm or kubelet version
375
376
<pre>
377
RELEASE=v1.24.9
378 181 Nico Schottelius
RELEASE=v1.25.9
379 1 Nico Schottelius
RELEASE=v1.23.17
380 187 Nico Schottelius
RELEASE=v1.26.6
381
ARCH=amd64
382 172 Nico Schottelius
383
curl -L --remote-name-all https://dl.k8s.io/release/${RELEASE}/bin/linux/${ARCH}/{kubeadm,kubelet}
384 182 Nico Schottelius
chmod u+x kubeadm kubelet
385 172 Nico Schottelius
</pre>
386
387
h4. Steps
388
389
* kubeadm upgrade plan
390
** On one control plane node
391
* kubeadm upgrade apply vXX.YY.ZZ
392
** On one control plane node
393
394 173 Nico Schottelius
Repeat for all control planes nodes. The upgrade kubelet on all other nodes via package manager.
395 172 Nico Schottelius
396 186 Nico Schottelius
h4. Upgrade to kubernetes 1.27
397
398
* kubelet will not start anymore
399
* reason: @"command failed" err="failed to parse kubelet flag: unknown flag: --container-runtime"@
400
* /var/lib/kubelet/kubeadm-flags.env contains that parameter
401
* remove it, start kubelet
402
403
h4. Upgrade to crio 1.27: missing crun
404
405
Error message
406
407
<pre>
408
level=fatal msg="validating runtime config: runtime validation: \"crun\" not found in $PATH: exec: \"crun\": executable file not found in $PATH"
409
</pre>
410
411
Fix:
412
413
<pre>
414
apk add crun
415
</pre>
416
417
418 157 Nico Schottelius
h2. Reference CNI
419
420
* Mainly "stupid", but effective plugins
421
* Main documentation on https://www.cni.dev/plugins/current/
422 158 Nico Schottelius
* Plugins
423
** bridge
424
*** Can create the bridge on the host
425
*** But seems not to be able to add host interfaces to it as well
426
*** Has support for vlan tags
427
** vlan
428
*** creates vlan tagged sub interface on the host
429 160 Nico Schottelius
*** "It's a 1:1 mapping (i.e. no bridge in between)":https://github.com/k8snetworkplumbingwg/multus-cni/issues/569
430 158 Nico Schottelius
** host-device
431
*** moves the interface from the host into the container
432
*** very easy for physical connections to containers
433 159 Nico Schottelius
** ipvlan
434
*** "virtualisation" of a host device
435
*** routing based on IP
436
*** Same MAC for everyone
437
*** Cannot reach the master interface
438
** maclvan
439
*** With mac addresses
440
*** Supports various modes (to be checked)
441
** ptp ("point to point")
442
*** Creates a host device and connects it to the container
443
** win*
444 158 Nico Schottelius
*** Windows implementations
445 157 Nico Schottelius
446 62 Nico Schottelius
h2. Calico CNI
447
448
h3. Calico Installation
449
450
* We install "calico using helm":https://docs.projectcalico.org/getting-started/kubernetes/helm
451
* This has the following advantages:
452
** Easy to upgrade
453
** Does not require os to configure IPv6/dual stack settings as the tigera operator figures out things on its own
454
455
Usually plain calico can be installed directly using:
456
457
<pre>
458 174 Nico Schottelius
VERSION=v3.25.0
459 149 Nico Schottelius
460 1 Nico Schottelius
helm repo add projectcalico https://docs.projectcalico.org/charts
461 167 Nico Schottelius
helm repo update
462 124 Nico Schottelius
helm upgrade --install --namespace tigera calico projectcalico/tigera-operator --version $VERSION --create-namespace
463 1 Nico Schottelius
</pre>
464 92 Nico Schottelius
465
* Check the tags on https://github.com/projectcalico/calico/tags for the latest release
466 62 Nico Schottelius
467
h3. Installing calicoctl
468
469 115 Nico Schottelius
* General installation instructions, including binary download: https://projectcalico.docs.tigera.io/maintenance/clis/calicoctl/install
470
471 62 Nico Schottelius
To be able to manage and configure calico, we need to 
472
"install calicoctl (we choose the version as a pod)":https://docs.projectcalico.org/getting-started/clis/calicoctl/install#install-calicoctl-as-a-kubernetes-pod
473
474
<pre>
475
kubectl apply -f https://docs.projectcalico.org/manifests/calicoctl.yaml
476
</pre>
477
478 93 Nico Schottelius
Or version specific:
479
480
<pre>
481
kubectl apply -f https://github.com/projectcalico/calico/blob/v3.20.4/manifests/calicoctl.yaml
482 97 Nico Schottelius
483
# For 3.22
484
kubectl apply -f https://projectcalico.docs.tigera.io/archive/v3.22/manifests/calicoctl.yaml
485 93 Nico Schottelius
</pre>
486
487 70 Nico Schottelius
And making it easier accessible by alias:
488
489
<pre>
490
alias calicoctl="kubectl exec -i -n kube-system calicoctl -- /calicoctl"
491
</pre>
492
493 62 Nico Schottelius
h3. Calico configuration
494
495 63 Nico Schottelius
By default our k8s clusters "BGP peer":https://docs.projectcalico.org/networking/bgp
496
with an upstream router to propagate podcidr and servicecidr.
497 62 Nico Schottelius
498
Default settings in our infrastructure:
499
500
* We use a full-mesh using the @nodeToNodeMeshEnabled: true@ option
501
* We keep the original next hop so that *only* the server with the pod is announcing it (instead of ecmp)
502 1 Nico Schottelius
* We use private ASNs for k8s clusters
503 63 Nico Schottelius
* We do *not* use any overlay
504 62 Nico Schottelius
505
After installing calico and calicoctl the last step of the installation is usually:
506
507 1 Nico Schottelius
<pre>
508 79 Nico Schottelius
calicoctl create -f - < calico-bgp.yaml
509 62 Nico Schottelius
</pre>
510
511
512
A sample BGP configuration:
513
514
<pre>
515
---
516
apiVersion: projectcalico.org/v3
517
kind: BGPConfiguration
518
metadata:
519
  name: default
520
spec:
521
  logSeverityScreen: Info
522
  nodeToNodeMeshEnabled: true
523
  asNumber: 65534
524
  serviceClusterIPs:
525
  - cidr: 2a0a:e5c0:10:3::/108
526
  serviceExternalIPs:
527
  - cidr: 2a0a:e5c0:10:3::/108
528
---
529
apiVersion: projectcalico.org/v3
530
kind: BGPPeer
531
metadata:
532
  name: router1-place10
533
spec:
534
  peerIP: 2a0a:e5c0:10:1::50
535
  asNumber: 213081
536
  keepOriginalNextHop: true
537
</pre>
538
539 126 Nico Schottelius
h2. Cilium CNI (experimental)
540
541 137 Nico Schottelius
h3. Status
542
543 138 Nico Schottelius
*NO WORKING CILIUM CONFIGURATION FOR IPV6 only modes*
544 137 Nico Schottelius
545 146 Nico Schottelius
h3. Latest error
546
547
It seems cilium does not run on IPv6 only hosts:
548
549
<pre>
550
level=info msg="Validating configured node address ranges" subsys=daemon
551
level=fatal msg="postinit failed" error="external IPv4 node address could not be derived, please configure via --ipv4-node" subsys=daemon
552
level=info msg="Starting IP identity watcher" subsys=ipcache
553
</pre>
554
555
It crashes after that log entry
556
557 128 Nico Schottelius
h3. BGP configuration
558
559
* The cilium-operator will not start without a correct configmap being present beforehand (see error message below)
560
* Creating the bgp config beforehand as a configmap is thus required.
561
562
The error one gets without the configmap present:
563
564
Pods are hanging with:
565
566
<pre>
567
cilium-bpqm6                       0/1     Init:0/4            0             9s
568
cilium-operator-5947d94f7f-5bmh2   0/1     ContainerCreating   0             9s
569
</pre>
570
571
The error message in the cilium-*perator is:
572
573
<pre>
574
Events:
575
  Type     Reason       Age                From               Message
576
  ----     ------       ----               ----               -------
577
  Normal   Scheduled    80s                default-scheduler  Successfully assigned kube-system/cilium-operator-5947d94f7f-lqcsp to server56
578
  Warning  FailedMount  16s (x8 over 80s)  kubelet            MountVolume.SetUp failed for volume "bgp-config-path" : configmap "bgp-config" not found
579
</pre>
580
581
A correct bgp config looks like this:
582
583
<pre>
584
apiVersion: v1
585
kind: ConfigMap
586
metadata:
587
  name: bgp-config
588
  namespace: kube-system
589
data:
590
  config.yaml: |
591
    peers:
592
      - peer-address: 2a0a:e5c0::46
593
        peer-asn: 209898
594
        my-asn: 65533
595
      - peer-address: 2a0a:e5c0::47
596
        peer-asn: 209898
597
        my-asn: 65533
598
    address-pools:
599
      - name: default
600
        protocol: bgp
601
        addresses:
602
          - 2a0a:e5c0:0:14::/64
603
</pre>
604 127 Nico Schottelius
605
h3. Installation
606 130 Nico Schottelius
607 127 Nico Schottelius
Adding the repo
608 1 Nico Schottelius
<pre>
609 127 Nico Schottelius
610 129 Nico Schottelius
helm repo add cilium https://helm.cilium.io/
611 130 Nico Schottelius
helm repo update
612
</pre>
613 129 Nico Schottelius
614 135 Nico Schottelius
Installing + configuring cilium
615 129 Nico Schottelius
<pre>
616 130 Nico Schottelius
ipv6pool=2a0a:e5c0:0:14::/112
617 1 Nico Schottelius
618 146 Nico Schottelius
version=1.12.2
619 129 Nico Schottelius
620
helm upgrade --install cilium cilium/cilium --version $version \
621 1 Nico Schottelius
  --namespace kube-system \
622
  --set ipv4.enabled=false \
623
  --set ipv6.enabled=true \
624 146 Nico Schottelius
  --set enableIPv6Masquerade=false \
625
  --set bgpControlPlane.enabled=true 
626 1 Nico Schottelius
627 146 Nico Schottelius
#  --set ipam.operator.clusterPoolIPv6PodCIDRList=$ipv6pool
628
629
# Old style bgp?
630 136 Nico Schottelius
#   --set bgp.enabled=true --set bgp.announce.podCIDR=true \
631 127 Nico Schottelius
632
# Show possible configuration options
633
helm show values cilium/cilium
634
635 1 Nico Schottelius
</pre>
636 132 Nico Schottelius
637
Using a /64 for ipam.operator.clusterPoolIPv6PodCIDRList fails with:
638
639
<pre>
640
level=fatal msg="Unable to init cluster-pool allocator" error="unable to initialize IPv6 allocator New CIDR set failed; the node CIDR size is too big" subsys=cilium-operator-generic
641
</pre>
642
643 126 Nico Schottelius
644 1 Nico Schottelius
See also https://github.com/cilium/cilium/issues/20756
645 135 Nico Schottelius
646
Seems a /112 is actually working.
647
648
h3. Kernel modules
649
650
Cilium requires the following modules to be loaded on the host (not loaded by default):
651
652
<pre>
653 1 Nico Schottelius
modprobe  ip6table_raw
654
modprobe  ip6table_filter
655
</pre>
656 146 Nico Schottelius
657
h3. Interesting helm flags
658
659
* autoDirectNodeRoutes
660
* bgpControlPlane.enabled = true
661
662
h3. SEE ALSO
663
664
* https://docs.cilium.io/en/v1.12/helm-reference/
665 133 Nico Schottelius
666 179 Nico Schottelius
h2. Multus
667 168 Nico Schottelius
668
* https://github.com/k8snetworkplumbingwg/multus-cni
669
* Installing a deployment w/ CRDs
670 150 Nico Schottelius
671 169 Nico Schottelius
<pre>
672 176 Nico Schottelius
VERSION=v4.0.1
673 169 Nico Schottelius
674 170 Nico Schottelius
kubectl apply -f https://raw.githubusercontent.com/k8snetworkplumbingwg/multus-cni/${VERSION}/deployments/multus-daemonset-crio.yml
675
</pre>
676 169 Nico Schottelius
677 122 Nico Schottelius
h2. ArgoCD 
678 56 Nico Schottelius
679 60 Nico Schottelius
h3. Argocd Installation
680 1 Nico Schottelius
681 116 Nico Schottelius
* See https://argo-cd.readthedocs.io/en/stable/
682
683 60 Nico Schottelius
As there is no configuration management present yet, argocd is installed using
684
685 1 Nico Schottelius
<pre>
686 60 Nico Schottelius
kubectl create namespace argocd
687 86 Nico Schottelius
688 96 Nico Schottelius
# Specific Version
689
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/v2.3.2/manifests/install.yaml
690 86 Nico Schottelius
691
# OR: latest stable
692 60 Nico Schottelius
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml
693 56 Nico Schottelius
</pre>
694 1 Nico Schottelius
695 116 Nico Schottelius
696 1 Nico Schottelius
697 60 Nico Schottelius
h3. Get the argocd credentials
698
699
<pre>
700
kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d; echo ""
701
</pre>
702 52 Nico Schottelius
703 87 Nico Schottelius
h3. Accessing argocd
704
705
In regular IPv6 clusters:
706
707
* Navigate to https://argocd-server.argocd.CLUSTERDOMAIN
708
709
In legacy IPv4 clusters
710
711
<pre>
712
kubectl --namespace argocd port-forward svc/argocd-server 8080:80
713
</pre>
714
715 88 Nico Schottelius
* Navigate to https://localhost:8080
716
717 68 Nico Schottelius
h3. Using the argocd webhook to trigger changes
718 67 Nico Schottelius
719
* To trigger changes post json https://argocd.example.com/api/webhook
720
721 72 Nico Schottelius
h3. Deploying an application
722
723
* Applications are deployed via git towards gitea (code.ungleich.ch) and then pulled by argo
724 73 Nico Schottelius
* Always include the *redmine-url* pointing to the (customer) ticket
725
** Also add the support-url if it exists
726 72 Nico Schottelius
727
Application sample
728
729
<pre>
730
apiVersion: argoproj.io/v1alpha1
731
kind: Application
732
metadata:
733
  name: gitea-CUSTOMER
734
  namespace: argocd
735
spec:
736
  destination:
737
    namespace: default
738
    server: 'https://kubernetes.default.svc'
739
  source:
740
    path: apps/prod/gitea
741
    repoURL: 'https://code.ungleich.ch/ungleich-intern/k8s-config.git'
742
    targetRevision: HEAD
743
    helm:
744
      parameters:
745
        - name: storage.data.storageClass
746
          value: rook-ceph-block-hdd
747
        - name: storage.data.size
748
          value: 200Gi
749
        - name: storage.db.storageClass
750
          value: rook-ceph-block-ssd
751
        - name: storage.db.size
752
          value: 10Gi
753
        - name: storage.letsencrypt.storageClass
754
          value: rook-ceph-block-hdd
755
        - name: storage.letsencrypt.size
756
          value: 50Mi
757
        - name: letsencryptStaging
758
          value: 'no'
759
        - name: fqdn
760
          value: 'code.verua.online'
761
  project: default
762
  syncPolicy:
763
    automated:
764
      prune: true
765
      selfHeal: true
766
  info:
767
    - name: 'redmine-url'
768
      value: 'https://redmine.ungleich.ch/issues/ISSUEID'
769
    - name: 'support-url'
770
      value: 'https://support.ungleich.ch/Ticket/Display.html?id=TICKETID'
771
</pre>
772
773 80 Nico Schottelius
h2. Helm related operations and conventions
774 55 Nico Schottelius
775 61 Nico Schottelius
We use helm charts extensively.
776
777
* In production, they are managed via argocd
778
* In development, helm chart can de developed and deployed manually using the helm utility.
779
780 55 Nico Schottelius
h3. Installing a helm chart
781
782
One can use the usual pattern of
783
784
<pre>
785
helm install <releasename> <chartdirectory>
786
</pre>
787
788
However often you want to reinstall/update when testing helm charts. The following pattern is "better", because it allows you to reinstall, if it is already installed:
789
790
<pre>
791
helm upgrade --install <releasename> <chartdirectory>
792 1 Nico Schottelius
</pre>
793 80 Nico Schottelius
794
h3. Naming services and deployments in helm charts [Application labels]
795
796
* We always have {{ .Release.Name }} to identify the current "instance"
797
* Deployments:
798
** use @app: <what it is>@, f.i. @app: nginx@, @app: postgres@, ...
799 81 Nico Schottelius
* See more about standard labels on
800
** https://kubernetes.io/docs/concepts/overview/working-with-objects/common-labels/
801
** https://helm.sh/docs/chart_best_practices/labels/
802 55 Nico Schottelius
803 151 Nico Schottelius
h3. Show all versions of a helm chart
804
805
<pre>
806
helm search repo -l repo/chart
807
</pre>
808
809
For example:
810
811
<pre>
812
% helm search repo -l projectcalico/tigera-operator 
813
NAME                         	CHART VERSION	APP VERSION	DESCRIPTION                            
814
projectcalico/tigera-operator	v3.23.3      	v3.23.3    	Installs the Tigera operator for Calico
815
projectcalico/tigera-operator	v3.23.2      	v3.23.2    	Installs the Tigera operator for Calico
816
....
817
</pre>
818
819 152 Nico Schottelius
h3. Show possible values of a chart
820
821
<pre>
822
helm show values <repo/chart>
823
</pre>
824
825
Example:
826
827
<pre>
828
helm show values ingress-nginx/ingress-nginx
829
</pre>
830
831 178 Nico Schottelius
h3. Download a chart
832
833
For instance for checking it out locally. Use:
834
835
<pre>
836
helm pull <repo/chart>
837
</pre>
838 152 Nico Schottelius
839 139 Nico Schottelius
h2. Rook + Ceph
840
841
h3. Installation
842
843
* Usually directly via argocd
844
845 71 Nico Schottelius
h3. Executing ceph commands
846
847
Using the ceph-tools pod as follows:
848
849
<pre>
850
kubectl exec -n rook-ceph -ti $(kubectl -n rook-ceph get pods -l app=rook-ceph-tools -o jsonpath='{.items[*].metadata.name}') -- ceph -s
851
</pre>
852
853 43 Nico Schottelius
h3. Inspecting the logs of a specific server
854
855
<pre>
856
# Get the related pods
857
kubectl -n rook-ceph get pods -l app=rook-ceph-osd-prepare 
858
...
859
860
# Inspect the logs of a specific pod
861
kubectl -n rook-ceph logs -f rook-ceph-osd-prepare-server23--1-444qx
862
863 71 Nico Schottelius
</pre>
864
865
h3. Inspecting the logs of the rook-ceph-operator
866
867
<pre>
868
kubectl -n rook-ceph logs -f -l app=rook-ceph-operator
869 43 Nico Schottelius
</pre>
870
871 121 Nico Schottelius
h3. Restarting the rook operator
872
873
<pre>
874
kubectl -n rook-ceph delete pods  -l app=rook-ceph-operator
875
</pre>
876
877 43 Nico Schottelius
h3. Triggering server prepare / adding new osds
878
879
The rook-ceph-operator triggers/watches/creates pods to maintain hosts. To trigger a full "re scan", simply delete that pod:
880
881
<pre>
882
kubectl -n rook-ceph delete pods -l app=rook-ceph-operator
883
</pre>
884
885
This will cause all the @rook-ceph-osd-prepare-..@ jobs to be recreated and thus OSDs to be created, if new disks have been added.
886
887
h3. Removing an OSD
888
889
* See "Ceph OSD Management":https://rook.io/docs/rook/v1.7/ceph-osd-mgmt.html
890 77 Nico Schottelius
* More specifically: https://github.com/rook/rook/blob/release-1.7/cluster/examples/kubernetes/ceph/osd-purge.yaml
891 99 Nico Schottelius
* Then delete the related deployment
892 41 Nico Schottelius
893 98 Nico Schottelius
Set osd id in the osd-purge.yaml and apply it. OSD should be down before.
894
895
<pre>
896
apiVersion: batch/v1
897
kind: Job
898
metadata:
899
  name: rook-ceph-purge-osd
900
  namespace: rook-ceph # namespace:cluster
901
  labels:
902
    app: rook-ceph-purge-osd
903
spec:
904
  template:
905
    metadata:
906
      labels:
907
        app: rook-ceph-purge-osd
908
    spec:
909
      serviceAccountName: rook-ceph-purge-osd
910
      containers:
911
        - name: osd-removal
912
          image: rook/ceph:master
913
          # TODO: Insert the OSD ID in the last parameter that is to be removed
914
          # The OSD IDs are a comma-separated list. For example: "0" or "0,2".
915
          # If you want to preserve the OSD PVCs, set `--preserve-pvc true`.
916
          #
917
          # A --force-osd-removal option is available if the OSD should be destroyed even though the
918
          # removal could lead to data loss.
919
          args:
920
            - "ceph"
921
            - "osd"
922
            - "remove"
923
            - "--preserve-pvc"
924
            - "false"
925
            - "--force-osd-removal"
926
            - "false"
927
            - "--osd-ids"
928
            - "SETTHEOSDIDHERE"
929
          env:
930
            - name: POD_NAMESPACE
931
              valueFrom:
932
                fieldRef:
933
                  fieldPath: metadata.namespace
934
            - name: ROOK_MON_ENDPOINTS
935
              valueFrom:
936
                configMapKeyRef:
937
                  key: data
938
                  name: rook-ceph-mon-endpoints
939
            - name: ROOK_CEPH_USERNAME
940
              valueFrom:
941
                secretKeyRef:
942
                  key: ceph-username
943
                  name: rook-ceph-mon
944
            - name: ROOK_CEPH_SECRET
945
              valueFrom:
946
                secretKeyRef:
947
                  key: ceph-secret
948
                  name: rook-ceph-mon
949
            - name: ROOK_CONFIG_DIR
950
              value: /var/lib/rook
951
            - name: ROOK_CEPH_CONFIG_OVERRIDE
952
              value: /etc/rook/config/override.conf
953
            - name: ROOK_FSID
954
              valueFrom:
955
                secretKeyRef:
956
                  key: fsid
957
                  name: rook-ceph-mon
958
            - name: ROOK_LOG_LEVEL
959
              value: DEBUG
960
          volumeMounts:
961
            - mountPath: /etc/ceph
962
              name: ceph-conf-emptydir
963
            - mountPath: /var/lib/rook
964
              name: rook-config
965
      volumes:
966
        - emptyDir: {}
967
          name: ceph-conf-emptydir
968
        - emptyDir: {}
969
          name: rook-config
970
      restartPolicy: Never
971
972
973 99 Nico Schottelius
</pre>
974
975 1 Nico Schottelius
Deleting the deployment:
976
977
<pre>
978
[18:05] bridge:~% kubectl -n rook-ceph delete deployment rook-ceph-osd-6
979 99 Nico Schottelius
deployment.apps "rook-ceph-osd-6" deleted
980
</pre>
981 185 Nico Schottelius
982
h3. Placement of mons/osds/etc.
983
984
See https://rook.io/docs/rook/v1.11/CRDs/Cluster/ceph-cluster-crd/#placement-configuration-settings
985 98 Nico Schottelius
986 145 Nico Schottelius
h2. Ingress + Cert Manager
987
988
* We deploy "nginx-ingress":https://docs.nginx.com/nginx-ingress-controller/ to get an ingress
989
* we deploy "cert-manager":https://cert-manager.io/ to handle certificates
990
* We independently deploy @ClusterIssuer@ to allow the cert-manager app to deploy and the issuer to be created once the CRDs from cert manager are in place
991
992
h3. IPv4 reachability 
993
994
The ingress is by default IPv6 only. To make it reachable from the IPv4 world, get its IPv6 address and configure a NAT64 mapping in Jool.
995
996
Steps:
997
998
h4. Get the ingress IPv6 address
999
1000
Use @kubectl -n ingress-nginx get svc ingress-nginx-controller -o jsonpath='{.spec.clusterIP}'; echo ''@
1001
1002
Example:
1003
1004
<pre>
1005
kubectl -n ingress-nginx get svc ingress-nginx-controller -o jsonpath='{.spec.clusterIP}'; echo ''
1006
2a0a:e5c0:10:1b::ce11
1007
</pre>
1008
1009
h4. Add NAT64 mapping
1010
1011
* Update the __dcl_jool_siit cdist type
1012
* Record the two IPs (IPv6 and IPv4)
1013
* Configure all routers
1014
1015
1016
h4. Add DNS record
1017
1018
To use the ingress capable as a CNAME destination, create an "ingress" DNS record, such as:
1019
1020
<pre>
1021
; k8s ingress for dev
1022
dev-ingress                 AAAA 2a0a:e5c0:10:1b::ce11
1023
dev-ingress                 A 147.78.194.23
1024
1025
</pre> 
1026
1027
h4. Add supporting wildcard DNS
1028
1029
If you plan to add various sites under a specific domain, we can add a wildcard DNS entry, such as *.k8s-dev.django-hosting.ch:
1030
1031
<pre>
1032
*.k8s-dev         CNAME dev-ingress.ungleich.ch.
1033
</pre>
1034
1035 76 Nico Schottelius
h2. Harbor
1036
1037 175 Nico Schottelius
* We user "Harbor":https://goharbor.io/ as an image registry for our own images. Internal app reference: apps/prod/harbor.
1038
* The admin password is in the password store, it is Harbor12345 by default
1039 76 Nico Schottelius
* At the moment harbor only authenticates against the internal ldap tree
1040
1041
h3. LDAP configuration
1042
1043
* The url needs to be ldaps://...
1044
* uid = uid
1045
* rest standard
1046 75 Nico Schottelius
1047 89 Nico Schottelius
h2. Monitoring / Prometheus
1048
1049 90 Nico Schottelius
* Via "kube-prometheus":https://github.com/prometheus-operator/kube-prometheus/
1050 89 Nico Schottelius
1051 91 Nico Schottelius
Access via ...
1052
1053
* http://prometheus-k8s.monitoring.svc:9090
1054
* http://grafana.monitoring.svc:3000
1055
* http://alertmanager.monitoring.svc:9093
1056
1057
1058 100 Nico Schottelius
h3. Prometheus Options
1059
1060
* "helm/kube-prometheus-stack":https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack
1061
** Includes dashboards and co.
1062
* "manifest based kube-prometheus":https://github.com/prometheus-operator/kube-prometheus
1063
** Includes dashboards and co.
1064
* "Prometheus Operator (mainly CRD manifest":https://github.com/prometheus-operator/prometheus-operator
1065
1066 171 Nico Schottelius
h3. Grafana default password
1067
1068
* If not changed: @prom-operator@
1069
1070 82 Nico Schottelius
h2. Nextcloud
1071
1072 85 Nico Schottelius
h3. How to get the nextcloud credentials 
1073 84 Nico Schottelius
1074
* The initial username is set to "nextcloud"
1075
* The password is autogenerated and saved in a kubernetes secret
1076
1077
<pre>
1078 85 Nico Schottelius
kubectl get secret RELEASENAME-nextcloud -o jsonpath="{.data.PASSWORD}" | base64 -d; echo "" 
1079 84 Nico Schottelius
</pre>
1080
1081 83 Nico Schottelius
h3. How to fix "Access through untrusted domain"
1082
1083 82 Nico Schottelius
* Nextcloud stores the initial domain configuration
1084 1 Nico Schottelius
* If the FQDN is changed, it will show the error message "Access through untrusted domain"
1085 82 Nico Schottelius
* To fix, edit /var/www/html/config/config.php and correct the domain
1086 1 Nico Schottelius
* Then delete the pods
1087 165 Nico Schottelius
1088
h3. Running occ commands inside the nextcloud container
1089
1090
* Find the pod in the right namespace
1091
1092
Exec:
1093
1094
<pre>
1095
su www-data -s /bin/sh -c ./occ
1096
</pre>
1097
1098
* -s /bin/sh is needed as the default shell is set to /bin/false
1099
1100 166 Nico Schottelius
h4. Rescanning files
1101 165 Nico Schottelius
1102 166 Nico Schottelius
* If files have been added without nextcloud's knowledge
1103
1104
<pre>
1105
su www-data -s /bin/sh -c "./occ files:scan --all"
1106
</pre>
1107 82 Nico Schottelius
1108 1 Nico Schottelius
h2. Infrastructure versions
1109 35 Nico Schottelius
1110 57 Nico Schottelius
h3. ungleich kubernetes infrastructure v5 (2021-10)
1111 1 Nico Schottelius
1112 57 Nico Schottelius
Clusters are configured / setup in this order:
1113
1114
* Bootstrap via kubeadm
1115 59 Nico Schottelius
* "Networking via calico + BGP (non ECMP) using helm":https://docs.projectcalico.org/getting-started/kubernetes/helm
1116
* "ArgoCD for CD":https://argo-cd.readthedocs.io/en/stable/
1117
** "rook for storage via argocd":https://rook.io/
1118 58 Nico Schottelius
** haproxy for in IPv6-cluster-IPv4-to-IPv6 proxy via argocd
1119
** "kubernetes-secret-generator for in cluster secrets":https://github.com/mittwald/kubernetes-secret-generator
1120
** "ungleich-certbot managing certs and nginx":https://hub.docker.com/repository/docker/ungleich/ungleich-certbot
1121
1122 57 Nico Schottelius
1123
h3. ungleich kubernetes infrastructure v4 (2021-09)
1124
1125 54 Nico Schottelius
* rook is configured via manifests instead of using the rook-ceph-cluster helm chart
1126 1 Nico Schottelius
* The rook operator is still being installed via helm
1127 35 Nico Schottelius
1128 57 Nico Schottelius
h3. ungleich kubernetes infrastructure v3 (2021-07)
1129 1 Nico Schottelius
1130 10 Nico Schottelius
* rook is now installed via helm via argocd instead of directly via manifests
1131 28 Nico Schottelius
1132 57 Nico Schottelius
h3. ungleich kubernetes infrastructure v2 (2021-05)
1133 28 Nico Schottelius
1134
* Replaced fluxv2 from ungleich k8s v1 with argocd
1135 1 Nico Schottelius
** argocd can apply helm templates directly without needing to go through Chart releases
1136 28 Nico Schottelius
* We are also using argoflow for build flows
1137
* Planned to add "kaniko":https://github.com/GoogleContainerTools/kaniko for image building
1138
1139 57 Nico Schottelius
h3. ungleich kubernetes infrastructure v1 (2021-01)
1140 28 Nico Schottelius
1141
We are using the following components:
1142
1143
* "Calico as a CNI":https://www.projectcalico.org/ with BGP, IPv6 only, no encapsulation
1144
** Needed for basic networking
1145
* "kubernetes-secret-generator":https://github.com/mittwald/kubernetes-secret-generator for creating secrets
1146
** Needed so that secrets are not stored in the git repository, but only in the cluster
1147
* "ungleich-certbot":https://hub.docker.com/repository/docker/ungleich/ungleich-certbot
1148
** Needed to get letsencrypt certificates for services
1149
* "rook with ceph rbd + cephfs":https://rook.io/ for storage
1150
** rbd for almost everything, *ReadWriteOnce*
1151
** cephfs for smaller things, multi access *ReadWriteMany*
1152
** Needed for providing persistent storage
1153
* "flux v2":https://fluxcd.io/
1154
** Needed to manage resources automatically