Project

General

Profile

The ungleich kubernetes infrastructure » History » Version 177

Nico Schottelius, 05/04/2023 06:52 AM

1 22 Nico Schottelius
h1. The ungleich kubernetes infrastructure and ungleich kubernetes manual
2 1 Nico Schottelius
3 3 Nico Schottelius
{{toc}}
4
5 1 Nico Schottelius
h2. Status
6
7 28 Nico Schottelius
This document is **pre-production**.
8
This document is to become the ungleich kubernetes infrastructure overview as well as the ungleich kubernetes manual.
9 1 Nico Schottelius
10 10 Nico Schottelius
h2. k8s clusters
11
12 123 Nico Schottelius
| Cluster            | Purpose/Setup     | Maintainer | Master(s)                     | argo                                                   | v4 http proxy | last verified |
13
| c0.k8s.ooo         | Dev               | -          | UNUSED                        |                                                        |               |    2021-10-05 |
14
| c1.k8s.ooo         | retired           |            | -                             |                                                        |               |    2022-03-15 |
15
| c2.k8s.ooo         | Dev p7 HW         | Nico       | server47 server53 server54    | "argo":https://argocd-server.argocd.svc.c2.k8s.ooo     |               |    2021-10-05 |
16
| c3.k8s.ooo         | retired           | -          | -                             |                                                        |               |    2021-10-05 |
17
| c4.k8s.ooo         | Dev2 p7 HW        | Jin-Guk    | server52 server53 server54    |                                                        |               |             - |
18
| c5.k8s.ooo         | retired           |            | -                             |                                                        |               |    2022-03-15 |
19
| c6.k8s.ooo         | Dev p6 VM Jin-Guk | Jin-Guk    |                               |                                                        |               |               |
20
| [[p5.k8s.ooo]]     | production        |            | server34 server36 server38    | "argo":https://argocd-server.argocd.svc.p5.k8s.ooo     | -             |               |
21
| [[p5-cow.k8s.ooo]] | production        | Nico       | server47 server51 server55    | "argo":https://argocd-server.argocd.svc.p5-cow.k8s.ooo |               |    2022-08-27 |
22
| [[p6.k8s.ooo]]     | production        |            | server67 server69 server71    | "argo":https://argocd-server.argocd.svc.p6.k8s.ooo     | 147.78.194.13 |    2021-10-05 |
23 177 Nico Schottelius
| [[p10.k8s.ooo]]    | production        |            | server131 server132 server133 | "argo":https://argocd-server.argocd.svc.p10.k8s.ooo    | 147.78.194.12 |    2021-10-05 |
24 123 Nico Schottelius
| [[k8s.ge.nau.so]]  | development       |            | server107 server108 server109 | "argo":https://argocd-server.argocd.svc.k8s.ge.nau.so  |               |               |
25
| [[dev.k8s.ooo]]    | development       |            | server110 server111 server112 | "argo":https://argocd-server.argocd.svc.dev.k8s.ooo    | -             |    2022-07-08 |
26 164 Nico Schottelius
| [[r1r2p15k8sooo|r1.p15.k8s.ooo]] | production | Nico | server120 | | | 2022-10-30 |
27
| [[r1r2p15k8sooo|r2.p15.k8s.ooo]] | production | Nico | server121 | | | 2022-09-06 |
28 162 Nico Schottelius
| [[r1r2p10k8sooo|r1.p10.k8s.ooo]] | production | Nico | server122 | | | 2022-10-30 |
29
| [[r1r2p10k8sooo|r2.p10.k8s.ooo]] | production | Nico | server123 | | | 2022-10-15 |
30
| [[r1r2p5k8sooo|r1.p5.k8s.ooo]] | production | Nico | server137 | | | 2022-10-30 |
31
| [[r1r2p5k8sooo|r2.p5.k8s.ooo]] | production | Nico | server138 | | | 2022-10-30 |
32
| [[r1r2p6k8sooo|r1.p6.k8s.ooo]] | production | Nico | server139 | | | 2022-10-30 |
33
| [[r1r2p6k8sooo|r2.p6.k8s.ooo]] | production | Nico | server140 | | | 2022-10-30 |
34 21 Nico Schottelius
35 1 Nico Schottelius
h2. General architecture and components overview
36
37
* All k8s clusters are IPv6 only
38
* We use BGP peering to propagate podcidr and serviceCidr networks to our infrastructure
39
* The main public testing repository is "ungleich-k8s":https://code.ungleich.ch/ungleich-public/ungleich-k8s
40 18 Nico Schottelius
** Private configurations are found in the **k8s-config** repository
41 1 Nico Schottelius
42
h3. Cluster types
43
44 28 Nico Schottelius
| **Type/Feature**            | **Development**                | **Production**         |
45
| Min No. nodes               | 3 (1 master, 3 worker)         | 5 (3 master, 3 worker) |
46
| Recommended minimum         | 4 (dedicated master, 3 worker) | 8 (3 master, 5 worker) |
47
| Separation of control plane | optional                       | recommended            |
48
| Persistent storage          | required                       | required               |
49
| Number of storage monitors  | 3                              | 5                      |
50 1 Nico Schottelius
51 43 Nico Schottelius
h2. General k8s operations
52 1 Nico Schottelius
53 46 Nico Schottelius
h3. Cheat sheet / external great references
54
55
* "kubectl cheatsheet":https://kubernetes.io/docs/reference/kubectl/cheatsheet/
56
57 117 Nico Schottelius
h3. Allowing to schedule work on the control plane / removing node taints
58 69 Nico Schottelius
59
* Mostly for single node / test / development clusters
60
* Just remove the master taint as follows
61
62
<pre>
63
kubectl taint nodes --all node-role.kubernetes.io/master-
64 118 Nico Schottelius
kubectl taint nodes --all node-role.kubernetes.io/control-plane-
65 69 Nico Schottelius
</pre>
66 1 Nico Schottelius
67 117 Nico Schottelius
You can check the node taints using @kubectl describe node ...@
68 69 Nico Schottelius
69 44 Nico Schottelius
h3. Get the cluster admin.conf
70
71
* On the masters of each cluster you can find the file @/etc/kubernetes/admin.conf@
72
* To be able to administrate the cluster you can copy the admin.conf to your local machine
73
* Multi cluster debugging can very easy if you name the config ~/cX-admin.conf (see example below)
74
75
<pre>
76
% scp root@server47.place7.ungleich.ch:/etc/kubernetes/admin.conf ~/c2-admin.conf
77
% export KUBECONFIG=~/c2-admin.conf    
78
% kubectl get nodes
79
NAME       STATUS                     ROLES                  AGE   VERSION
80
server47   Ready                      control-plane,master   82d   v1.22.0
81
server48   Ready                      control-plane,master   82d   v1.22.0
82
server49   Ready                      <none>                 82d   v1.22.0
83
server50   Ready                      <none>                 82d   v1.22.0
84
server59   Ready                      control-plane,master   82d   v1.22.0
85
server60   Ready,SchedulingDisabled   <none>                 82d   v1.22.0
86
server61   Ready                      <none>                 82d   v1.22.0
87
server62   Ready                      <none>                 82d   v1.22.0               
88
</pre>
89
90 18 Nico Schottelius
h3. Installing a new k8s cluster
91 8 Nico Schottelius
92 9 Nico Schottelius
* Decide on the cluster name (usually *cX.k8s.ooo*), X counting upwards
93 28 Nico Schottelius
** Using pXX.k8s.ooo for production clusters of placeXX
94 9 Nico Schottelius
* Use cdist to configure the nodes with requirements like crio
95
* Decide between single or multi node control plane setups (see below)
96 28 Nico Schottelius
** Single control plane suitable for development clusters
97 9 Nico Schottelius
98 28 Nico Schottelius
Typical init procedure:
99 9 Nico Schottelius
100 28 Nico Schottelius
* Single control plane: @kubeadm init --config bootstrap/XXX/kubeadm.yaml@
101
* Multi control plane (HA): @kubeadm init --config bootstrap/XXX/kubeadm.yaml --upload-certs@
102 10 Nico Schottelius
103 29 Nico Schottelius
h3. Deleting a pod that is hanging in terminating state
104
105
<pre>
106
kubectl delete pod <PODNAME> --grace-period=0 --force --namespace <NAMESPACE>
107
</pre>
108
109
(from https://stackoverflow.com/questions/35453792/pods-stuck-in-terminating-status)
110
111 42 Nico Schottelius
h3. Listing nodes of a cluster
112
113
<pre>
114
[15:05] bridge:~% kubectl get nodes
115
NAME       STATUS   ROLES                  AGE   VERSION
116
server22   Ready    <none>                 52d   v1.22.0
117
server23   Ready    <none>                 52d   v1.22.2
118
server24   Ready    <none>                 52d   v1.22.0
119
server25   Ready    <none>                 52d   v1.22.0
120
server26   Ready    <none>                 52d   v1.22.0
121
server27   Ready    <none>                 52d   v1.22.0
122
server63   Ready    control-plane,master   52d   v1.22.0
123
server64   Ready    <none>                 52d   v1.22.0
124
server65   Ready    control-plane,master   52d   v1.22.0
125
server66   Ready    <none>                 52d   v1.22.0
126
server83   Ready    control-plane,master   52d   v1.22.0
127
server84   Ready    <none>                 52d   v1.22.0
128
server85   Ready    <none>                 52d   v1.22.0
129
server86   Ready    <none>                 52d   v1.22.0
130
</pre>
131
132 41 Nico Schottelius
h3. Removing / draining a node
133
134
Usually @kubectl drain server@ should do the job, but sometimes we need to be more aggressive:
135
136 1 Nico Schottelius
<pre>
137 103 Nico Schottelius
kubectl drain --delete-emptydir-data --ignore-daemonsets serverXX
138 42 Nico Schottelius
</pre>
139
140
h3. Readding a node after draining
141
142
<pre>
143
kubectl uncordon serverXX
144 1 Nico Schottelius
</pre>
145 43 Nico Schottelius
146 50 Nico Schottelius
h3. (Re-)joining worker nodes after creating the cluster
147 49 Nico Schottelius
148
* We need to have an up-to-date token
149
* We use different join commands for the workers and control plane nodes
150
151
Generating the join command on an existing control plane node:
152
153
<pre>
154
kubeadm token create --print-join-command
155
</pre>
156
157 50 Nico Schottelius
h3. (Re-)joining control plane nodes after creating the cluster
158 1 Nico Schottelius
159 50 Nico Schottelius
* We generate the token again
160
* We upload the certificates
161
* We need to combine/create the join command for the control plane node
162
163
Example session:
164
165
<pre>
166
% kubeadm token create --print-join-command
167
kubeadm join p10-api.k8s.ooo:6443 --token xmff4i.ABC --discovery-token-ca-cert-hash sha256:longhash 
168
169
% kubeadm init phase upload-certs --upload-certs
170
[upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
171
[upload-certs] Using certificate key:
172
CERTKEY
173
174
# Then we use these two outputs on the joining node:
175
176
kubeadm join p10-api.k8s.ooo:6443 --token xmff4i.ABC --discovery-token-ca-cert-hash sha256:longhash --control-plane --certificate-key CERTKEY
177
</pre>
178
179
Commands to be used on a control plane node:
180
181
<pre>
182
kubeadm token create --print-join-command
183
kubeadm init phase upload-certs --upload-certs
184
</pre>
185
186
Commands to be used on the joining node:
187
188
<pre>
189
JOINCOMMAND --control-plane --certificate-key CERTKEY
190
</pre>
191 49 Nico Schottelius
192 51 Nico Schottelius
SEE ALSO
193
194
* https://stackoverflow.com/questions/63936268/how-to-generate-kubeadm-token-for-secondary-control-plane-nodes
195
* https://blog.scottlowe.org/2019/08/15/reconstructing-the-join-command-for-kubeadm/
196
197 53 Nico Schottelius
h3. How to fix etcd does not start when rejoining a kubernetes cluster as a control plane
198 52 Nico Schottelius
199
If during the above step etcd does not come up, @kubeadm join@ can hang as follows:
200
201
<pre>
202
[control-plane] Creating static Pod manifest for "kube-apiserver"                                                              
203
[control-plane] Creating static Pod manifest for "kube-controller-manager"                                                     
204
[control-plane] Creating static Pod manifest for "kube-scheduler"                                                              
205
[check-etcd] Checking that the etcd cluster is healthy                                                                         
206
error execution phase check-etcd: etcd cluster is not healthy: failed to dial endpoint https://[2a0a:e5c0:10:1:225:b3ff:fe20:37
207
8a]:2379 with maintenance client: context deadline exceeded                                                                    
208
To see the stack trace of this error execute with --v=5 or higher         
209
</pre>
210
211
Then the problem is likely that the etcd server is still a member of the cluster. We first need to remove it from the etcd cluster and then the join works.
212
213
To fix this we do:
214
215
* Find a working etcd pod
216
* Find the etcd members / member list
217
* Remove the etcd member that we want to re-join the cluster
218
219
220
<pre>
221
# Find the etcd pods
222
kubectl -n kube-system get pods -l component=etcd,tier=control-plane
223
224
# Get the list of etcd servers with the member id 
225
kubectl exec -n kube-system -ti ETCDPODNAME -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert  /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member list
226
227
# Remove the member
228
kubectl exec -n kube-system -ti ETCDPODNAME -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert  /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member remove MEMBERID
229
</pre>
230
231
Sample session:
232
233
<pre>
234
[10:48] line:~% kubectl -n kube-system get pods -l component=etcd,tier=control-plane
235
NAME            READY   STATUS    RESTARTS     AGE
236
etcd-server63   1/1     Running   0            3m11s
237
etcd-server65   1/1     Running   3            7d2h
238
etcd-server83   1/1     Running   8 (6d ago)   7d2h
239
[10:48] line:~% kubectl exec -n kube-system -ti etcd-server65 -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert  /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member list
240
356891cd676df6e4, started, server65, https://[2a0a:e5c0:10:1:225:b3ff:fe20:375c]:2380, https://[2a0a:e5c0:10:1:225:b3ff:fe20:375c]:2379, false
241
371b8a07185dee7e, started, server63, https://[2a0a:e5c0:10:1:225:b3ff:fe20:378a]:2380, https://[2a0a:e5c0:10:1:225:b3ff:fe20:378a]:2379, false
242
5942bc58307f8af9, started, server83, https://[2a0a:e5c0:10:1:3e4a:92ff:fe79:bb98]:2380, https://[2a0a:e5c0:10:1:3e4a:92ff:fe79:bb98]:2379, false
243
244
[10:48] line:~% kubectl exec -n kube-system -ti etcd-server65 -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert  /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member remove 371b8a07185dee7e
245
Member 371b8a07185dee7e removed from cluster e3c0805f592a8f77
246 1 Nico Schottelius
247
</pre>
248
249
SEE ALSO
250
251
* We found the solution using https://stackoverflow.com/questions/67921552/re-installed-node-cannot-join-kubernetes-cluster
252 56 Nico Schottelius
253 147 Nico Schottelius
h3. Node labels (adding, showing, removing)
254
255
Listing the labels:
256
257
<pre>
258
kubectl get nodes --show-labels
259
</pre>
260
261
Adding labels:
262
263
<pre>
264
kubectl label nodes LIST-OF-NODES label1=value1 
265
266
</pre>
267
268
For instance:
269
270
<pre>
271
kubectl label nodes router2 router3 hosttype=router 
272
</pre>
273
274
Selecting nodes in pods:
275
276
<pre>
277
apiVersion: v1
278
kind: Pod
279
...
280
spec:
281
  nodeSelector:
282
    hosttype: router
283
</pre>
284
285 148 Nico Schottelius
Removing labels by adding a minus at the end of the label name:
286
287
<pre>
288
kubectl label node <nodename> <labelname>-
289
</pre>
290
291
For instance:
292
293
<pre>
294
kubectl label nodes router2 router3 hosttype- 
295
</pre>
296
297 147 Nico Schottelius
SEE ALSO
298 1 Nico Schottelius
299 148 Nico Schottelius
* https://kubernetes.io/docs/tasks/configure-pod-container/assign-pods-nodes/
300
* https://stackoverflow.com/questions/34067979/how-to-delete-a-node-label-by-command-and-api
301 147 Nico Schottelius
302 101 Nico Schottelius
h3. Hardware Maintenance using ungleich-hardware
303
304
Use the following manifest and replace the HOST with the actual host:
305
306
<pre>
307
apiVersion: v1
308
kind: Pod
309
metadata:
310
  name: ungleich-hardware-HOST
311
spec:
312
  containers:
313
  - name: ungleich-hardware
314
    image: ungleich/ungleich-hardware:0.0.5
315
    args:
316
    - sleep
317
    - "1000000"
318
    volumeMounts:
319
      - mountPath: /dev
320
        name: dev
321
    securityContext:
322
      privileged: true
323
  nodeSelector:
324
    kubernetes.io/hostname: "HOST"
325
326
  volumes:
327
    - name: dev
328
      hostPath:
329
        path: /dev
330
</pre>
331
332 102 Nico Schottelius
Also see: [[The_ungleich_hardware_maintenance_guide]]
333
334 105 Nico Schottelius
h3. Triggering a cronjob / creating a job from a cronjob
335 104 Nico Schottelius
336
To test a cronjob, we can create a job from a cronjob:
337
338
<pre>
339
kubectl create job --from=cronjob/volume2-daily-backup volume2-manual
340
</pre>
341
342
This creates a job volume2-manual based on the cronjob  volume2-daily
343
344 112 Nico Schottelius
h3. su-ing into a user that has nologin shell set
345
346
Many times users are having nologin as their shell inside the container. To be able to execute maintenance commands within the
347
container, we can use @su -s /bin/sh@ like this:
348
349
<pre>
350
su -s /bin/sh -c '/path/to/your/script' testuser
351
</pre>
352
353
Found on https://serverfault.com/questions/351046/how-to-run-command-as-user-who-has-usr-sbin-nologin-as-shell
354
355 113 Nico Schottelius
h3. How to print a secret value
356
357
Assuming you want the "password" item from a secret, use:
358
359
<pre>
360
kubectl get secret SECRETNAME -o jsonpath="{.data.password}" | base64 -d; echo "" 
361
</pre>
362
363 173 Nico Schottelius
h3. How to upgrade a kubernetes cluster
364 172 Nico Schottelius
365
h4. General
366
367
* Should be done every X months to stay up-to-date
368
** X probably something like 3-6
369
* kubeadm based clusters
370
* Needs specific kubeadm versions for upgrade
371
* Follow instructions on https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/
372
373
h4. Getting a specific kubeadm or kubelet version
374
375
<pre>
376
ARCH=amd64
377
RELEASE=v1.24.9
378
RELEASE=v1.25.5
379
380
curl -L --remote-name-all https://dl.k8s.io/release/${RELEASE}/bin/linux/${ARCH}/{kubeadm,kubelet}
381
</pre>
382
383
h4. Steps
384
385
* kubeadm upgrade plan
386
** On one control plane node
387
* kubeadm upgrade apply vXX.YY.ZZ
388
** On one control plane node
389
390 173 Nico Schottelius
Repeat for all control planes nodes. The upgrade kubelet on all other nodes via package manager.
391 172 Nico Schottelius
392 157 Nico Schottelius
h2. Reference CNI
393
394
* Mainly "stupid", but effective plugins
395
* Main documentation on https://www.cni.dev/plugins/current/
396 158 Nico Schottelius
* Plugins
397
** bridge
398
*** Can create the bridge on the host
399
*** But seems not to be able to add host interfaces to it as well
400
*** Has support for vlan tags
401
** vlan
402
*** creates vlan tagged sub interface on the host
403 160 Nico Schottelius
*** "It's a 1:1 mapping (i.e. no bridge in between)":https://github.com/k8snetworkplumbingwg/multus-cni/issues/569
404 158 Nico Schottelius
** host-device
405
*** moves the interface from the host into the container
406
*** very easy for physical connections to containers
407 159 Nico Schottelius
** ipvlan
408
*** "virtualisation" of a host device
409
*** routing based on IP
410
*** Same MAC for everyone
411
*** Cannot reach the master interface
412
** maclvan
413
*** With mac addresses
414
*** Supports various modes (to be checked)
415
** ptp ("point to point")
416
*** Creates a host device and connects it to the container
417
** win*
418 158 Nico Schottelius
*** Windows implementations
419 157 Nico Schottelius
420 62 Nico Schottelius
h2. Calico CNI
421
422
h3. Calico Installation
423
424
* We install "calico using helm":https://docs.projectcalico.org/getting-started/kubernetes/helm
425
* This has the following advantages:
426
** Easy to upgrade
427
** Does not require os to configure IPv6/dual stack settings as the tigera operator figures out things on its own
428
429
Usually plain calico can be installed directly using:
430
431
<pre>
432 174 Nico Schottelius
VERSION=v3.25.0
433 149 Nico Schottelius
434 1 Nico Schottelius
helm repo add projectcalico https://docs.projectcalico.org/charts
435 167 Nico Schottelius
helm repo update
436 124 Nico Schottelius
helm upgrade --install --namespace tigera calico projectcalico/tigera-operator --version $VERSION --create-namespace
437 1 Nico Schottelius
</pre>
438 92 Nico Schottelius
439
* Check the tags on https://github.com/projectcalico/calico/tags for the latest release
440 62 Nico Schottelius
441
h3. Installing calicoctl
442
443 115 Nico Schottelius
* General installation instructions, including binary download: https://projectcalico.docs.tigera.io/maintenance/clis/calicoctl/install
444
445 62 Nico Schottelius
To be able to manage and configure calico, we need to 
446
"install calicoctl (we choose the version as a pod)":https://docs.projectcalico.org/getting-started/clis/calicoctl/install#install-calicoctl-as-a-kubernetes-pod
447
448
<pre>
449
kubectl apply -f https://docs.projectcalico.org/manifests/calicoctl.yaml
450
</pre>
451
452 93 Nico Schottelius
Or version specific:
453
454
<pre>
455
kubectl apply -f https://github.com/projectcalico/calico/blob/v3.20.4/manifests/calicoctl.yaml
456 97 Nico Schottelius
457
# For 3.22
458
kubectl apply -f https://projectcalico.docs.tigera.io/archive/v3.22/manifests/calicoctl.yaml
459 93 Nico Schottelius
</pre>
460
461 70 Nico Schottelius
And making it easier accessible by alias:
462
463
<pre>
464
alias calicoctl="kubectl exec -i -n kube-system calicoctl -- /calicoctl"
465
</pre>
466
467 62 Nico Schottelius
h3. Calico configuration
468
469 63 Nico Schottelius
By default our k8s clusters "BGP peer":https://docs.projectcalico.org/networking/bgp
470
with an upstream router to propagate podcidr and servicecidr.
471 62 Nico Schottelius
472
Default settings in our infrastructure:
473
474
* We use a full-mesh using the @nodeToNodeMeshEnabled: true@ option
475
* We keep the original next hop so that *only* the server with the pod is announcing it (instead of ecmp)
476 1 Nico Schottelius
* We use private ASNs for k8s clusters
477 63 Nico Schottelius
* We do *not* use any overlay
478 62 Nico Schottelius
479
After installing calico and calicoctl the last step of the installation is usually:
480
481 1 Nico Schottelius
<pre>
482 79 Nico Schottelius
calicoctl create -f - < calico-bgp.yaml
483 62 Nico Schottelius
</pre>
484
485
486
A sample BGP configuration:
487
488
<pre>
489
---
490
apiVersion: projectcalico.org/v3
491
kind: BGPConfiguration
492
metadata:
493
  name: default
494
spec:
495
  logSeverityScreen: Info
496
  nodeToNodeMeshEnabled: true
497
  asNumber: 65534
498
  serviceClusterIPs:
499
  - cidr: 2a0a:e5c0:10:3::/108
500
  serviceExternalIPs:
501
  - cidr: 2a0a:e5c0:10:3::/108
502
---
503
apiVersion: projectcalico.org/v3
504
kind: BGPPeer
505
metadata:
506
  name: router1-place10
507
spec:
508
  peerIP: 2a0a:e5c0:10:1::50
509
  asNumber: 213081
510
  keepOriginalNextHop: true
511
</pre>
512
513 126 Nico Schottelius
h2. Cilium CNI (experimental)
514
515 137 Nico Schottelius
h3. Status
516
517 138 Nico Schottelius
*NO WORKING CILIUM CONFIGURATION FOR IPV6 only modes*
518 137 Nico Schottelius
519 146 Nico Schottelius
h3. Latest error
520
521
It seems cilium does not run on IPv6 only hosts:
522
523
<pre>
524
level=info msg="Validating configured node address ranges" subsys=daemon
525
level=fatal msg="postinit failed" error="external IPv4 node address could not be derived, please configure via --ipv4-node" subsys=daemon
526
level=info msg="Starting IP identity watcher" subsys=ipcache
527
</pre>
528
529
It crashes after that log entry
530
531 128 Nico Schottelius
h3. BGP configuration
532
533
* The cilium-operator will not start without a correct configmap being present beforehand (see error message below)
534
* Creating the bgp config beforehand as a configmap is thus required.
535
536
The error one gets without the configmap present:
537
538
Pods are hanging with:
539
540
<pre>
541
cilium-bpqm6                       0/1     Init:0/4            0             9s
542
cilium-operator-5947d94f7f-5bmh2   0/1     ContainerCreating   0             9s
543
</pre>
544
545
The error message in the cilium-*perator is:
546
547
<pre>
548
Events:
549
  Type     Reason       Age                From               Message
550
  ----     ------       ----               ----               -------
551
  Normal   Scheduled    80s                default-scheduler  Successfully assigned kube-system/cilium-operator-5947d94f7f-lqcsp to server56
552
  Warning  FailedMount  16s (x8 over 80s)  kubelet            MountVolume.SetUp failed for volume "bgp-config-path" : configmap "bgp-config" not found
553
</pre>
554
555
A correct bgp config looks like this:
556
557
<pre>
558
apiVersion: v1
559
kind: ConfigMap
560
metadata:
561
  name: bgp-config
562
  namespace: kube-system
563
data:
564
  config.yaml: |
565
    peers:
566
      - peer-address: 2a0a:e5c0::46
567
        peer-asn: 209898
568
        my-asn: 65533
569
      - peer-address: 2a0a:e5c0::47
570
        peer-asn: 209898
571
        my-asn: 65533
572
    address-pools:
573
      - name: default
574
        protocol: bgp
575
        addresses:
576
          - 2a0a:e5c0:0:14::/64
577
</pre>
578 127 Nico Schottelius
579
h3. Installation
580 130 Nico Schottelius
581 127 Nico Schottelius
Adding the repo
582 1 Nico Schottelius
<pre>
583 127 Nico Schottelius
584 129 Nico Schottelius
helm repo add cilium https://helm.cilium.io/
585 130 Nico Schottelius
helm repo update
586
</pre>
587 129 Nico Schottelius
588 135 Nico Schottelius
Installing + configuring cilium
589 129 Nico Schottelius
<pre>
590 130 Nico Schottelius
ipv6pool=2a0a:e5c0:0:14::/112
591 1 Nico Schottelius
592 146 Nico Schottelius
version=1.12.2
593 129 Nico Schottelius
594
helm upgrade --install cilium cilium/cilium --version $version \
595 1 Nico Schottelius
  --namespace kube-system \
596
  --set ipv4.enabled=false \
597
  --set ipv6.enabled=true \
598 146 Nico Schottelius
  --set enableIPv6Masquerade=false \
599
  --set bgpControlPlane.enabled=true 
600 1 Nico Schottelius
601 146 Nico Schottelius
#  --set ipam.operator.clusterPoolIPv6PodCIDRList=$ipv6pool
602
603
# Old style bgp?
604 136 Nico Schottelius
#   --set bgp.enabled=true --set bgp.announce.podCIDR=true \
605 127 Nico Schottelius
606
# Show possible configuration options
607
helm show values cilium/cilium
608
609 1 Nico Schottelius
</pre>
610 132 Nico Schottelius
611
Using a /64 for ipam.operator.clusterPoolIPv6PodCIDRList fails with:
612
613
<pre>
614
level=fatal msg="Unable to init cluster-pool allocator" error="unable to initialize IPv6 allocator New CIDR set failed; the node CIDR size is too big" subsys=cilium-operator-generic
615
</pre>
616
617 126 Nico Schottelius
618 1 Nico Schottelius
See also https://github.com/cilium/cilium/issues/20756
619 135 Nico Schottelius
620
Seems a /112 is actually working.
621
622
h3. Kernel modules
623
624
Cilium requires the following modules to be loaded on the host (not loaded by default):
625
626
<pre>
627 1 Nico Schottelius
modprobe  ip6table_raw
628
modprobe  ip6table_filter
629
</pre>
630 146 Nico Schottelius
631
h3. Interesting helm flags
632
633
* autoDirectNodeRoutes
634
* bgpControlPlane.enabled = true
635
636
h3. SEE ALSO
637
638
* https://docs.cilium.io/en/v1.12/helm-reference/
639 133 Nico Schottelius
640 168 Nico Schottelius
h2. Multus (incomplete/experimental/WIP)
641 1 Nico Schottelius
642 168 Nico Schottelius
643
* https://github.com/k8snetworkplumbingwg/multus-cni
644
* Installing a deployment w/ CRDs
645 150 Nico Schottelius
646 169 Nico Schottelius
<pre>
647 176 Nico Schottelius
VERSION=v4.0.1
648 169 Nico Schottelius
649 170 Nico Schottelius
kubectl apply -f https://raw.githubusercontent.com/k8snetworkplumbingwg/multus-cni/${VERSION}/deployments/multus-daemonset-crio.yml
650 1 Nico Schottelius
</pre>
651 170 Nico Schottelius
652
* crio based fails on alpine linux due to:
653
654
655
<pre>
656
[22:07] nb3:~% kubectl logs -n kube-system kube-multus-ds-2g9d5         
657
2022-12-26T21:05:21+00:00 Generating Multus configuration file using files in /host/etc/cni/net.d...
658
2022-12-26T21:05:21+00:00 Using MASTER_PLUGIN: 10-calico.conflist
659
2022-12-26T21:05:25+00:00 Nested capabilities string: "capabilities": {"bandwidth": true, "portMappings": true},
660
2022-12-26T21:05:25+00:00 Using /host/etc/cni/net.d/10-calico.conflist as a source to generate the Multus configuration
661
2022-12-26T21:05:26+00:00 Config file created @ /host/etc/cni/net.d/00-multus.conf
662
{ "cniVersion": "0.3.1", "name": "multus-cni-network", "type": "multus", "capabilities": {"bandwidth": true, "portMappings": true}, "kubeconfig": "/etc/cni/net.d/multus.d/multus.kubeconfig", "delegates": [ { "name": "k8s-pod-network", "cniVersion": "0.3.1", "plugins": [ { "type": "calico", "datastore_type": "kubernetes", "mtu": 0, "nodename_file_optional": false, "log_level": "Info", "log_file_path": "/var/log/calico/cni/cni.log", "ipam": { "type": "calico-ipam", "assign_ipv4" : "false", "assign_ipv6" : "true"}, "container_settings": { "allow_ip_forwarding": false }, "policy": { "type": "k8s" }, "kubernetes": { "k8s_api_root":"https://[2a0a:e5c0:43:bb::1]:443", "kubeconfig": "/etc/cni/net.d/calico-kubeconfig" } }, { "type": "bandwidth", "capabilities": {"bandwidth": true} }, {"type": "portmap", "snat": true, "capabilities": {"portMappings": true}} ] } ] }
663
2022-12-26T21:05:26+00:00 Restarting crio
664
/entrypoint.sh: line 434: systemctl: command not found
665
</pre>
666 169 Nico Schottelius
667 122 Nico Schottelius
h2. ArgoCD 
668 56 Nico Schottelius
669 60 Nico Schottelius
h3. Argocd Installation
670 1 Nico Schottelius
671 116 Nico Schottelius
* See https://argo-cd.readthedocs.io/en/stable/
672
673 60 Nico Schottelius
As there is no configuration management present yet, argocd is installed using
674
675 1 Nico Schottelius
<pre>
676 60 Nico Schottelius
kubectl create namespace argocd
677 86 Nico Schottelius
678 96 Nico Schottelius
# Specific Version
679
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/v2.3.2/manifests/install.yaml
680 86 Nico Schottelius
681
# OR: latest stable
682 60 Nico Schottelius
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml
683 56 Nico Schottelius
</pre>
684 1 Nico Schottelius
685 116 Nico Schottelius
686 1 Nico Schottelius
687 60 Nico Schottelius
h3. Get the argocd credentials
688
689
<pre>
690
kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d; echo ""
691
</pre>
692 52 Nico Schottelius
693 87 Nico Schottelius
h3. Accessing argocd
694
695
In regular IPv6 clusters:
696
697
* Navigate to https://argocd-server.argocd.CLUSTERDOMAIN
698
699
In legacy IPv4 clusters
700
701
<pre>
702
kubectl --namespace argocd port-forward svc/argocd-server 8080:80
703
</pre>
704
705 88 Nico Schottelius
* Navigate to https://localhost:8080
706
707 68 Nico Schottelius
h3. Using the argocd webhook to trigger changes
708 67 Nico Schottelius
709
* To trigger changes post json https://argocd.example.com/api/webhook
710
711 72 Nico Schottelius
h3. Deploying an application
712
713
* Applications are deployed via git towards gitea (code.ungleich.ch) and then pulled by argo
714 73 Nico Schottelius
* Always include the *redmine-url* pointing to the (customer) ticket
715
** Also add the support-url if it exists
716 72 Nico Schottelius
717
Application sample
718
719
<pre>
720
apiVersion: argoproj.io/v1alpha1
721
kind: Application
722
metadata:
723
  name: gitea-CUSTOMER
724
  namespace: argocd
725
spec:
726
  destination:
727
    namespace: default
728
    server: 'https://kubernetes.default.svc'
729
  source:
730
    path: apps/prod/gitea
731
    repoURL: 'https://code.ungleich.ch/ungleich-intern/k8s-config.git'
732
    targetRevision: HEAD
733
    helm:
734
      parameters:
735
        - name: storage.data.storageClass
736
          value: rook-ceph-block-hdd
737
        - name: storage.data.size
738
          value: 200Gi
739
        - name: storage.db.storageClass
740
          value: rook-ceph-block-ssd
741
        - name: storage.db.size
742
          value: 10Gi
743
        - name: storage.letsencrypt.storageClass
744
          value: rook-ceph-block-hdd
745
        - name: storage.letsencrypt.size
746
          value: 50Mi
747
        - name: letsencryptStaging
748
          value: 'no'
749
        - name: fqdn
750
          value: 'code.verua.online'
751
  project: default
752
  syncPolicy:
753
    automated:
754
      prune: true
755
      selfHeal: true
756
  info:
757
    - name: 'redmine-url'
758
      value: 'https://redmine.ungleich.ch/issues/ISSUEID'
759
    - name: 'support-url'
760
      value: 'https://support.ungleich.ch/Ticket/Display.html?id=TICKETID'
761
</pre>
762
763 80 Nico Schottelius
h2. Helm related operations and conventions
764 55 Nico Schottelius
765 61 Nico Schottelius
We use helm charts extensively.
766
767
* In production, they are managed via argocd
768
* In development, helm chart can de developed and deployed manually using the helm utility.
769
770 55 Nico Schottelius
h3. Installing a helm chart
771
772
One can use the usual pattern of
773
774
<pre>
775
helm install <releasename> <chartdirectory>
776
</pre>
777
778
However often you want to reinstall/update when testing helm charts. The following pattern is "better", because it allows you to reinstall, if it is already installed:
779
780
<pre>
781
helm upgrade --install <releasename> <chartdirectory>
782 1 Nico Schottelius
</pre>
783 80 Nico Schottelius
784
h3. Naming services and deployments in helm charts [Application labels]
785
786
* We always have {{ .Release.Name }} to identify the current "instance"
787
* Deployments:
788
** use @app: <what it is>@, f.i. @app: nginx@, @app: postgres@, ...
789 81 Nico Schottelius
* See more about standard labels on
790
** https://kubernetes.io/docs/concepts/overview/working-with-objects/common-labels/
791
** https://helm.sh/docs/chart_best_practices/labels/
792 55 Nico Schottelius
793 151 Nico Schottelius
h3. Show all versions of a helm chart
794
795
<pre>
796
helm search repo -l repo/chart
797
</pre>
798
799
For example:
800
801
<pre>
802
% helm search repo -l projectcalico/tigera-operator 
803
NAME                         	CHART VERSION	APP VERSION	DESCRIPTION                            
804
projectcalico/tigera-operator	v3.23.3      	v3.23.3    	Installs the Tigera operator for Calico
805
projectcalico/tigera-operator	v3.23.2      	v3.23.2    	Installs the Tigera operator for Calico
806
....
807
</pre>
808
809 152 Nico Schottelius
h3. Show possible values of a chart
810
811
<pre>
812
helm show values <repo/chart>
813
</pre>
814
815
Example:
816
817
<pre>
818
helm show values ingress-nginx/ingress-nginx
819
</pre>
820
821
822 139 Nico Schottelius
h2. Rook + Ceph
823
824
h3. Installation
825
826
* Usually directly via argocd
827
828
Manual steps:
829
830
<pre>
831
832
</pre>
833 43 Nico Schottelius
834 71 Nico Schottelius
h3. Executing ceph commands
835
836
Using the ceph-tools pod as follows:
837
838
<pre>
839
kubectl exec -n rook-ceph -ti $(kubectl -n rook-ceph get pods -l app=rook-ceph-tools -o jsonpath='{.items[*].metadata.name}') -- ceph -s
840
</pre>
841
842 43 Nico Schottelius
h3. Inspecting the logs of a specific server
843
844
<pre>
845
# Get the related pods
846
kubectl -n rook-ceph get pods -l app=rook-ceph-osd-prepare 
847
...
848
849
# Inspect the logs of a specific pod
850
kubectl -n rook-ceph logs -f rook-ceph-osd-prepare-server23--1-444qx
851
852 71 Nico Schottelius
</pre>
853
854
h3. Inspecting the logs of the rook-ceph-operator
855
856
<pre>
857
kubectl -n rook-ceph logs -f -l app=rook-ceph-operator
858 43 Nico Schottelius
</pre>
859
860 121 Nico Schottelius
h3. Restarting the rook operator
861
862
<pre>
863
kubectl -n rook-ceph delete pods  -l app=rook-ceph-operator
864
</pre>
865
866 43 Nico Schottelius
h3. Triggering server prepare / adding new osds
867
868
The rook-ceph-operator triggers/watches/creates pods to maintain hosts. To trigger a full "re scan", simply delete that pod:
869
870
<pre>
871
kubectl -n rook-ceph delete pods -l app=rook-ceph-operator
872
</pre>
873
874
This will cause all the @rook-ceph-osd-prepare-..@ jobs to be recreated and thus OSDs to be created, if new disks have been added.
875
876
h3. Removing an OSD
877
878
* See "Ceph OSD Management":https://rook.io/docs/rook/v1.7/ceph-osd-mgmt.html
879 77 Nico Schottelius
* More specifically: https://github.com/rook/rook/blob/release-1.7/cluster/examples/kubernetes/ceph/osd-purge.yaml
880 99 Nico Schottelius
* Then delete the related deployment
881 41 Nico Schottelius
882 98 Nico Schottelius
Set osd id in the osd-purge.yaml and apply it. OSD should be down before.
883
884
<pre>
885
apiVersion: batch/v1
886
kind: Job
887
metadata:
888
  name: rook-ceph-purge-osd
889
  namespace: rook-ceph # namespace:cluster
890
  labels:
891
    app: rook-ceph-purge-osd
892
spec:
893
  template:
894
    metadata:
895
      labels:
896
        app: rook-ceph-purge-osd
897
    spec:
898
      serviceAccountName: rook-ceph-purge-osd
899
      containers:
900
        - name: osd-removal
901
          image: rook/ceph:master
902
          # TODO: Insert the OSD ID in the last parameter that is to be removed
903
          # The OSD IDs are a comma-separated list. For example: "0" or "0,2".
904
          # If you want to preserve the OSD PVCs, set `--preserve-pvc true`.
905
          #
906
          # A --force-osd-removal option is available if the OSD should be destroyed even though the
907
          # removal could lead to data loss.
908
          args:
909
            - "ceph"
910
            - "osd"
911
            - "remove"
912
            - "--preserve-pvc"
913
            - "false"
914
            - "--force-osd-removal"
915
            - "false"
916
            - "--osd-ids"
917
            - "SETTHEOSDIDHERE"
918
          env:
919
            - name: POD_NAMESPACE
920
              valueFrom:
921
                fieldRef:
922
                  fieldPath: metadata.namespace
923
            - name: ROOK_MON_ENDPOINTS
924
              valueFrom:
925
                configMapKeyRef:
926
                  key: data
927
                  name: rook-ceph-mon-endpoints
928
            - name: ROOK_CEPH_USERNAME
929
              valueFrom:
930
                secretKeyRef:
931
                  key: ceph-username
932
                  name: rook-ceph-mon
933
            - name: ROOK_CEPH_SECRET
934
              valueFrom:
935
                secretKeyRef:
936
                  key: ceph-secret
937
                  name: rook-ceph-mon
938
            - name: ROOK_CONFIG_DIR
939
              value: /var/lib/rook
940
            - name: ROOK_CEPH_CONFIG_OVERRIDE
941
              value: /etc/rook/config/override.conf
942
            - name: ROOK_FSID
943
              valueFrom:
944
                secretKeyRef:
945
                  key: fsid
946
                  name: rook-ceph-mon
947
            - name: ROOK_LOG_LEVEL
948
              value: DEBUG
949
          volumeMounts:
950
            - mountPath: /etc/ceph
951
              name: ceph-conf-emptydir
952
            - mountPath: /var/lib/rook
953
              name: rook-config
954
      volumes:
955
        - emptyDir: {}
956
          name: ceph-conf-emptydir
957
        - emptyDir: {}
958
          name: rook-config
959
      restartPolicy: Never
960
961
962 99 Nico Schottelius
</pre>
963
964
Deleting the deployment:
965
966
<pre>
967
[18:05] bridge:~% kubectl -n rook-ceph delete deployment rook-ceph-osd-6
968
deployment.apps "rook-ceph-osd-6" deleted
969 98 Nico Schottelius
</pre>
970
971 145 Nico Schottelius
h2. Ingress + Cert Manager
972
973
* We deploy "nginx-ingress":https://docs.nginx.com/nginx-ingress-controller/ to get an ingress
974
* we deploy "cert-manager":https://cert-manager.io/ to handle certificates
975
* We independently deploy @ClusterIssuer@ to allow the cert-manager app to deploy and the issuer to be created once the CRDs from cert manager are in place
976
977
h3. IPv4 reachability 
978
979
The ingress is by default IPv6 only. To make it reachable from the IPv4 world, get its IPv6 address and configure a NAT64 mapping in Jool.
980
981
Steps:
982
983
h4. Get the ingress IPv6 address
984
985
Use @kubectl -n ingress-nginx get svc ingress-nginx-controller -o jsonpath='{.spec.clusterIP}'; echo ''@
986
987
Example:
988
989
<pre>
990
kubectl -n ingress-nginx get svc ingress-nginx-controller -o jsonpath='{.spec.clusterIP}'; echo ''
991
2a0a:e5c0:10:1b::ce11
992
</pre>
993
994
h4. Add NAT64 mapping
995
996
* Update the __dcl_jool_siit cdist type
997
* Record the two IPs (IPv6 and IPv4)
998
* Configure all routers
999
1000
1001
h4. Add DNS record
1002
1003
To use the ingress capable as a CNAME destination, create an "ingress" DNS record, such as:
1004
1005
<pre>
1006
; k8s ingress for dev
1007
dev-ingress                 AAAA 2a0a:e5c0:10:1b::ce11
1008
dev-ingress                 A 147.78.194.23
1009
1010
</pre> 
1011
1012
h4. Add supporting wildcard DNS
1013
1014
If you plan to add various sites under a specific domain, we can add a wildcard DNS entry, such as *.k8s-dev.django-hosting.ch:
1015
1016
<pre>
1017
*.k8s-dev         CNAME dev-ingress.ungleich.ch.
1018
</pre>
1019
1020 76 Nico Schottelius
h2. Harbor
1021
1022 175 Nico Schottelius
* We user "Harbor":https://goharbor.io/ as an image registry for our own images. Internal app reference: apps/prod/harbor.
1023
* The admin password is in the password store, it is Harbor12345 by default
1024 76 Nico Schottelius
* At the moment harbor only authenticates against the internal ldap tree
1025
1026
h3. LDAP configuration
1027
1028
* The url needs to be ldaps://...
1029
* uid = uid
1030
* rest standard
1031 75 Nico Schottelius
1032 89 Nico Schottelius
h2. Monitoring / Prometheus
1033
1034 90 Nico Schottelius
* Via "kube-prometheus":https://github.com/prometheus-operator/kube-prometheus/
1035 89 Nico Schottelius
1036 91 Nico Schottelius
Access via ...
1037
1038
* http://prometheus-k8s.monitoring.svc:9090
1039
* http://grafana.monitoring.svc:3000
1040
* http://alertmanager.monitoring.svc:9093
1041
1042
1043 100 Nico Schottelius
h3. Prometheus Options
1044
1045
* "helm/kube-prometheus-stack":https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack
1046
** Includes dashboards and co.
1047
* "manifest based kube-prometheus":https://github.com/prometheus-operator/kube-prometheus
1048
** Includes dashboards and co.
1049
* "Prometheus Operator (mainly CRD manifest":https://github.com/prometheus-operator/prometheus-operator
1050
1051 171 Nico Schottelius
h3. Grafana default password
1052
1053
* If not changed: @prom-operator@
1054
1055 82 Nico Schottelius
h2. Nextcloud
1056
1057 85 Nico Schottelius
h3. How to get the nextcloud credentials 
1058 84 Nico Schottelius
1059
* The initial username is set to "nextcloud"
1060
* The password is autogenerated and saved in a kubernetes secret
1061
1062
<pre>
1063 85 Nico Schottelius
kubectl get secret RELEASENAME-nextcloud -o jsonpath="{.data.PASSWORD}" | base64 -d; echo "" 
1064 84 Nico Schottelius
</pre>
1065
1066 83 Nico Schottelius
h3. How to fix "Access through untrusted domain"
1067
1068 82 Nico Schottelius
* Nextcloud stores the initial domain configuration
1069 1 Nico Schottelius
* If the FQDN is changed, it will show the error message "Access through untrusted domain"
1070 82 Nico Schottelius
* To fix, edit /var/www/html/config/config.php and correct the domain
1071 1 Nico Schottelius
* Then delete the pods
1072 165 Nico Schottelius
1073
h3. Running occ commands inside the nextcloud container
1074
1075
* Find the pod in the right namespace
1076
1077
Exec:
1078
1079
<pre>
1080
su www-data -s /bin/sh -c ./occ
1081
</pre>
1082
1083
* -s /bin/sh is needed as the default shell is set to /bin/false
1084
1085 166 Nico Schottelius
h4. Rescanning files
1086 165 Nico Schottelius
1087 166 Nico Schottelius
* If files have been added without nextcloud's knowledge
1088
1089
<pre>
1090
su www-data -s /bin/sh -c "./occ files:scan --all"
1091
</pre>
1092 82 Nico Schottelius
1093 1 Nico Schottelius
h2. Infrastructure versions
1094 35 Nico Schottelius
1095 57 Nico Schottelius
h3. ungleich kubernetes infrastructure v5 (2021-10)
1096 1 Nico Schottelius
1097 57 Nico Schottelius
Clusters are configured / setup in this order:
1098
1099
* Bootstrap via kubeadm
1100 59 Nico Schottelius
* "Networking via calico + BGP (non ECMP) using helm":https://docs.projectcalico.org/getting-started/kubernetes/helm
1101
* "ArgoCD for CD":https://argo-cd.readthedocs.io/en/stable/
1102
** "rook for storage via argocd":https://rook.io/
1103 58 Nico Schottelius
** haproxy for in IPv6-cluster-IPv4-to-IPv6 proxy via argocd
1104
** "kubernetes-secret-generator for in cluster secrets":https://github.com/mittwald/kubernetes-secret-generator
1105
** "ungleich-certbot managing certs and nginx":https://hub.docker.com/repository/docker/ungleich/ungleich-certbot
1106
1107 57 Nico Schottelius
1108
h3. ungleich kubernetes infrastructure v4 (2021-09)
1109
1110 54 Nico Schottelius
* rook is configured via manifests instead of using the rook-ceph-cluster helm chart
1111 1 Nico Schottelius
* The rook operator is still being installed via helm
1112 35 Nico Schottelius
1113 57 Nico Schottelius
h3. ungleich kubernetes infrastructure v3 (2021-07)
1114 1 Nico Schottelius
1115 10 Nico Schottelius
* rook is now installed via helm via argocd instead of directly via manifests
1116 28 Nico Schottelius
1117 57 Nico Schottelius
h3. ungleich kubernetes infrastructure v2 (2021-05)
1118 28 Nico Schottelius
1119
* Replaced fluxv2 from ungleich k8s v1 with argocd
1120 1 Nico Schottelius
** argocd can apply helm templates directly without needing to go through Chart releases
1121 28 Nico Schottelius
* We are also using argoflow for build flows
1122
* Planned to add "kaniko":https://github.com/GoogleContainerTools/kaniko for image building
1123
1124 57 Nico Schottelius
h3. ungleich kubernetes infrastructure v1 (2021-01)
1125 28 Nico Schottelius
1126
We are using the following components:
1127
1128
* "Calico as a CNI":https://www.projectcalico.org/ with BGP, IPv6 only, no encapsulation
1129
** Needed for basic networking
1130
* "kubernetes-secret-generator":https://github.com/mittwald/kubernetes-secret-generator for creating secrets
1131
** Needed so that secrets are not stored in the git repository, but only in the cluster
1132
* "ungleich-certbot":https://hub.docker.com/repository/docker/ungleich/ungleich-certbot
1133
** Needed to get letsencrypt certificates for services
1134
* "rook with ceph rbd + cephfs":https://rook.io/ for storage
1135
** rbd for almost everything, *ReadWriteOnce*
1136
** cephfs for smaller things, multi access *ReadWriteMany*
1137
** Needed for providing persistent storage
1138
* "flux v2":https://fluxcd.io/
1139
** Needed to manage resources automatically