Project

General

Profile

The ungleich kubernetes infrastructure » History » Version 161

Nico Schottelius, 10/30/2022 08:43 AM

1 22 Nico Schottelius
h1. The ungleich kubernetes infrastructure and ungleich kubernetes manual
2 1 Nico Schottelius
3 3 Nico Schottelius
{{toc}}
4
5 1 Nico Schottelius
h2. Status
6
7 28 Nico Schottelius
This document is **pre-production**.
8
This document is to become the ungleich kubernetes infrastructure overview as well as the ungleich kubernetes manual.
9 1 Nico Schottelius
10 10 Nico Schottelius
h2. k8s clusters
11
12 123 Nico Schottelius
| Cluster            | Purpose/Setup     | Maintainer | Master(s)                     | argo                                                   | v4 http proxy | last verified |
13
| c0.k8s.ooo         | Dev               | -          | UNUSED                        |                                                        |               |    2021-10-05 |
14
| c1.k8s.ooo         | retired           |            | -                             |                                                        |               |    2022-03-15 |
15
| c2.k8s.ooo         | Dev p7 HW         | Nico       | server47 server53 server54    | "argo":https://argocd-server.argocd.svc.c2.k8s.ooo     |               |    2021-10-05 |
16
| c3.k8s.ooo         | retired           | -          | -                             |                                                        |               |    2021-10-05 |
17
| c4.k8s.ooo         | Dev2 p7 HW        | Jin-Guk    | server52 server53 server54    |                                                        |               |             - |
18
| c5.k8s.ooo         | retired           |            | -                             |                                                        |               |    2022-03-15 |
19
| c6.k8s.ooo         | Dev p6 VM Jin-Guk | Jin-Guk    |                               |                                                        |               |               |
20
| [[p5.k8s.ooo]]     | production        |            | server34 server36 server38    | "argo":https://argocd-server.argocd.svc.p5.k8s.ooo     | -             |               |
21
| [[p5-cow.k8s.ooo]] | production        | Nico       | server47 server51 server55    | "argo":https://argocd-server.argocd.svc.p5-cow.k8s.ooo |               |    2022-08-27 |
22
| [[p6.k8s.ooo]]     | production        |            | server67 server69 server71    | "argo":https://argocd-server.argocd.svc.p6.k8s.ooo     | 147.78.194.13 |    2021-10-05 |
23
| [[p10.k8s.ooo]]    | production        |            | server63 server65 server83    | "argo":https://argocd-server.argocd.svc.p10.k8s.ooo    | 147.78.194.12 |    2021-10-05 |
24
| [[k8s.ge.nau.so]]  | development       |            | server107 server108 server109 | "argo":https://argocd-server.argocd.svc.k8s.ge.nau.so  |               |               |
25
| [[dev.k8s.ooo]]    | development       |            | server110 server111 server112 | "argo":https://argocd-server.argocd.svc.dev.k8s.ooo    | -             |    2022-07-08 |
26 142 Nico Schottelius
| [[server121.k8s.ooo]] | production | Nico | server121 | | | 2022-09-06 |
27 155 Nico Schottelius
| [[server122-123.k8s.ooo|server122.k8s.ooo]] | production | Nico | server122 | | | 2022-10-30 |
28 156 Nico Schottelius
| [[server122-123.k8s.ooo|server123.k8s.ooo]] | production | Nico | server123 | | | 2022-10-15 |
29 161 Nico Schottelius
| [[p5-r1-r2.k8s.ooo|r1.p5.k8s.ooo]] | production | Nico | server137 | | | 2022-10-30 |
30
| [[p5-r1-r2.k8s.ooo|r2.p5.k8s.ooo]] | production | Nico | server138 | | | 2022-10-30 |
31
32 21 Nico Schottelius
33 1 Nico Schottelius
h2. General architecture and components overview
34
35
* All k8s clusters are IPv6 only
36
* We use BGP peering to propagate podcidr and serviceCidr networks to our infrastructure
37
* The main public testing repository is "ungleich-k8s":https://code.ungleich.ch/ungleich-public/ungleich-k8s
38 18 Nico Schottelius
** Private configurations are found in the **k8s-config** repository
39 1 Nico Schottelius
40
h3. Cluster types
41
42 28 Nico Schottelius
| **Type/Feature**            | **Development**                | **Production**         |
43
| Min No. nodes               | 3 (1 master, 3 worker)         | 5 (3 master, 3 worker) |
44
| Recommended minimum         | 4 (dedicated master, 3 worker) | 8 (3 master, 5 worker) |
45
| Separation of control plane | optional                       | recommended            |
46
| Persistent storage          | required                       | required               |
47
| Number of storage monitors  | 3                              | 5                      |
48 1 Nico Schottelius
49 43 Nico Schottelius
h2. General k8s operations
50 1 Nico Schottelius
51 46 Nico Schottelius
h3. Cheat sheet / external great references
52
53
* "kubectl cheatsheet":https://kubernetes.io/docs/reference/kubectl/cheatsheet/
54
55 117 Nico Schottelius
h3. Allowing to schedule work on the control plane / removing node taints
56 69 Nico Schottelius
57
* Mostly for single node / test / development clusters
58
* Just remove the master taint as follows
59
60
<pre>
61
kubectl taint nodes --all node-role.kubernetes.io/master-
62 118 Nico Schottelius
kubectl taint nodes --all node-role.kubernetes.io/control-plane-
63 69 Nico Schottelius
</pre>
64 1 Nico Schottelius
65 117 Nico Schottelius
You can check the node taints using @kubectl describe node ...@
66 69 Nico Schottelius
67 44 Nico Schottelius
h3. Get the cluster admin.conf
68
69
* On the masters of each cluster you can find the file @/etc/kubernetes/admin.conf@
70
* To be able to administrate the cluster you can copy the admin.conf to your local machine
71
* Multi cluster debugging can very easy if you name the config ~/cX-admin.conf (see example below)
72
73
<pre>
74
% scp root@server47.place7.ungleich.ch:/etc/kubernetes/admin.conf ~/c2-admin.conf
75
% export KUBECONFIG=~/c2-admin.conf    
76
% kubectl get nodes
77
NAME       STATUS                     ROLES                  AGE   VERSION
78
server47   Ready                      control-plane,master   82d   v1.22.0
79
server48   Ready                      control-plane,master   82d   v1.22.0
80
server49   Ready                      <none>                 82d   v1.22.0
81
server50   Ready                      <none>                 82d   v1.22.0
82
server59   Ready                      control-plane,master   82d   v1.22.0
83
server60   Ready,SchedulingDisabled   <none>                 82d   v1.22.0
84
server61   Ready                      <none>                 82d   v1.22.0
85
server62   Ready                      <none>                 82d   v1.22.0               
86
</pre>
87
88 18 Nico Schottelius
h3. Installing a new k8s cluster
89 8 Nico Schottelius
90 9 Nico Schottelius
* Decide on the cluster name (usually *cX.k8s.ooo*), X counting upwards
91 28 Nico Schottelius
** Using pXX.k8s.ooo for production clusters of placeXX
92 9 Nico Schottelius
* Use cdist to configure the nodes with requirements like crio
93
* Decide between single or multi node control plane setups (see below)
94 28 Nico Schottelius
** Single control plane suitable for development clusters
95 9 Nico Schottelius
96 28 Nico Schottelius
Typical init procedure:
97 9 Nico Schottelius
98 28 Nico Schottelius
* Single control plane: @kubeadm init --config bootstrap/XXX/kubeadm.yaml@
99
* Multi control plane (HA): @kubeadm init --config bootstrap/XXX/kubeadm.yaml --upload-certs@
100 10 Nico Schottelius
101 29 Nico Schottelius
h3. Deleting a pod that is hanging in terminating state
102
103
<pre>
104
kubectl delete pod <PODNAME> --grace-period=0 --force --namespace <NAMESPACE>
105
</pre>
106
107
(from https://stackoverflow.com/questions/35453792/pods-stuck-in-terminating-status)
108
109 42 Nico Schottelius
h3. Listing nodes of a cluster
110
111
<pre>
112
[15:05] bridge:~% kubectl get nodes
113
NAME       STATUS   ROLES                  AGE   VERSION
114
server22   Ready    <none>                 52d   v1.22.0
115
server23   Ready    <none>                 52d   v1.22.2
116
server24   Ready    <none>                 52d   v1.22.0
117
server25   Ready    <none>                 52d   v1.22.0
118
server26   Ready    <none>                 52d   v1.22.0
119
server27   Ready    <none>                 52d   v1.22.0
120
server63   Ready    control-plane,master   52d   v1.22.0
121
server64   Ready    <none>                 52d   v1.22.0
122
server65   Ready    control-plane,master   52d   v1.22.0
123
server66   Ready    <none>                 52d   v1.22.0
124
server83   Ready    control-plane,master   52d   v1.22.0
125
server84   Ready    <none>                 52d   v1.22.0
126
server85   Ready    <none>                 52d   v1.22.0
127
server86   Ready    <none>                 52d   v1.22.0
128
</pre>
129
130 41 Nico Schottelius
h3. Removing / draining a node
131
132
Usually @kubectl drain server@ should do the job, but sometimes we need to be more aggressive:
133
134 1 Nico Schottelius
<pre>
135 103 Nico Schottelius
kubectl drain --delete-emptydir-data --ignore-daemonsets serverXX
136 42 Nico Schottelius
</pre>
137
138
h3. Readding a node after draining
139
140
<pre>
141
kubectl uncordon serverXX
142 1 Nico Schottelius
</pre>
143 43 Nico Schottelius
144 50 Nico Schottelius
h3. (Re-)joining worker nodes after creating the cluster
145 49 Nico Schottelius
146
* We need to have an up-to-date token
147
* We use different join commands for the workers and control plane nodes
148
149
Generating the join command on an existing control plane node:
150
151
<pre>
152
kubeadm token create --print-join-command
153
</pre>
154
155 50 Nico Schottelius
h3. (Re-)joining control plane nodes after creating the cluster
156 1 Nico Schottelius
157 50 Nico Schottelius
* We generate the token again
158
* We upload the certificates
159
* We need to combine/create the join command for the control plane node
160
161
Example session:
162
163
<pre>
164
% kubeadm token create --print-join-command
165
kubeadm join p10-api.k8s.ooo:6443 --token xmff4i.ABC --discovery-token-ca-cert-hash sha256:longhash 
166
167
% kubeadm init phase upload-certs --upload-certs
168
[upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
169
[upload-certs] Using certificate key:
170
CERTKEY
171
172
# Then we use these two outputs on the joining node:
173
174
kubeadm join p10-api.k8s.ooo:6443 --token xmff4i.ABC --discovery-token-ca-cert-hash sha256:longhash --control-plane --certificate-key CERTKEY
175
</pre>
176
177
Commands to be used on a control plane node:
178
179
<pre>
180
kubeadm token create --print-join-command
181
kubeadm init phase upload-certs --upload-certs
182
</pre>
183
184
Commands to be used on the joining node:
185
186
<pre>
187
JOINCOMMAND --control-plane --certificate-key CERTKEY
188
</pre>
189 49 Nico Schottelius
190 51 Nico Schottelius
SEE ALSO
191
192
* https://stackoverflow.com/questions/63936268/how-to-generate-kubeadm-token-for-secondary-control-plane-nodes
193
* https://blog.scottlowe.org/2019/08/15/reconstructing-the-join-command-for-kubeadm/
194
195 53 Nico Schottelius
h3. How to fix etcd does not start when rejoining a kubernetes cluster as a control plane
196 52 Nico Schottelius
197
If during the above step etcd does not come up, @kubeadm join@ can hang as follows:
198
199
<pre>
200
[control-plane] Creating static Pod manifest for "kube-apiserver"                                                              
201
[control-plane] Creating static Pod manifest for "kube-controller-manager"                                                     
202
[control-plane] Creating static Pod manifest for "kube-scheduler"                                                              
203
[check-etcd] Checking that the etcd cluster is healthy                                                                         
204
error execution phase check-etcd: etcd cluster is not healthy: failed to dial endpoint https://[2a0a:e5c0:10:1:225:b3ff:fe20:37
205
8a]:2379 with maintenance client: context deadline exceeded                                                                    
206
To see the stack trace of this error execute with --v=5 or higher         
207
</pre>
208
209
Then the problem is likely that the etcd server is still a member of the cluster. We first need to remove it from the etcd cluster and then the join works.
210
211
To fix this we do:
212
213
* Find a working etcd pod
214
* Find the etcd members / member list
215
* Remove the etcd member that we want to re-join the cluster
216
217
218
<pre>
219
# Find the etcd pods
220
kubectl -n kube-system get pods -l component=etcd,tier=control-plane
221
222
# Get the list of etcd servers with the member id 
223
kubectl exec -n kube-system -ti ETCDPODNAME -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert  /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member list
224
225
# Remove the member
226
kubectl exec -n kube-system -ti ETCDPODNAME -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert  /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member remove MEMBERID
227
</pre>
228
229
Sample session:
230
231
<pre>
232
[10:48] line:~% kubectl -n kube-system get pods -l component=etcd,tier=control-plane
233
NAME            READY   STATUS    RESTARTS     AGE
234
etcd-server63   1/1     Running   0            3m11s
235
etcd-server65   1/1     Running   3            7d2h
236
etcd-server83   1/1     Running   8 (6d ago)   7d2h
237
[10:48] line:~% kubectl exec -n kube-system -ti etcd-server65 -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert  /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member list
238
356891cd676df6e4, started, server65, https://[2a0a:e5c0:10:1:225:b3ff:fe20:375c]:2380, https://[2a0a:e5c0:10:1:225:b3ff:fe20:375c]:2379, false
239
371b8a07185dee7e, started, server63, https://[2a0a:e5c0:10:1:225:b3ff:fe20:378a]:2380, https://[2a0a:e5c0:10:1:225:b3ff:fe20:378a]:2379, false
240
5942bc58307f8af9, started, server83, https://[2a0a:e5c0:10:1:3e4a:92ff:fe79:bb98]:2380, https://[2a0a:e5c0:10:1:3e4a:92ff:fe79:bb98]:2379, false
241
242
[10:48] line:~% kubectl exec -n kube-system -ti etcd-server65 -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert  /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member remove 371b8a07185dee7e
243
Member 371b8a07185dee7e removed from cluster e3c0805f592a8f77
244 1 Nico Schottelius
245
</pre>
246
247
SEE ALSO
248
249
* We found the solution using https://stackoverflow.com/questions/67921552/re-installed-node-cannot-join-kubernetes-cluster
250 56 Nico Schottelius
251 147 Nico Schottelius
h3. Node labels (adding, showing, removing)
252
253
Listing the labels:
254
255
<pre>
256
kubectl get nodes --show-labels
257
</pre>
258
259
Adding labels:
260
261
<pre>
262
kubectl label nodes LIST-OF-NODES label1=value1 
263
264
</pre>
265
266
For instance:
267
268
<pre>
269
kubectl label nodes router2 router3 hosttype=router 
270
</pre>
271
272
Selecting nodes in pods:
273
274
<pre>
275
apiVersion: v1
276
kind: Pod
277
...
278
spec:
279
  nodeSelector:
280
    hosttype: router
281
</pre>
282
283 148 Nico Schottelius
Removing labels by adding a minus at the end of the label name:
284
285
<pre>
286
kubectl label node <nodename> <labelname>-
287
</pre>
288
289
For instance:
290
291
<pre>
292
kubectl label nodes router2 router3 hosttype- 
293
</pre>
294
295 147 Nico Schottelius
SEE ALSO
296 1 Nico Schottelius
297 148 Nico Schottelius
* https://kubernetes.io/docs/tasks/configure-pod-container/assign-pods-nodes/
298
* https://stackoverflow.com/questions/34067979/how-to-delete-a-node-label-by-command-and-api
299 147 Nico Schottelius
300 101 Nico Schottelius
h3. Hardware Maintenance using ungleich-hardware
301
302
Use the following manifest and replace the HOST with the actual host:
303
304
<pre>
305
apiVersion: v1
306
kind: Pod
307
metadata:
308
  name: ungleich-hardware-HOST
309
spec:
310
  containers:
311
  - name: ungleich-hardware
312
    image: ungleich/ungleich-hardware:0.0.5
313
    args:
314
    - sleep
315
    - "1000000"
316
    volumeMounts:
317
      - mountPath: /dev
318
        name: dev
319
    securityContext:
320
      privileged: true
321
  nodeSelector:
322
    kubernetes.io/hostname: "HOST"
323
324
  volumes:
325
    - name: dev
326
      hostPath:
327
        path: /dev
328
</pre>
329
330 102 Nico Schottelius
Also see: [[The_ungleich_hardware_maintenance_guide]]
331
332 105 Nico Schottelius
h3. Triggering a cronjob / creating a job from a cronjob
333 104 Nico Schottelius
334
To test a cronjob, we can create a job from a cronjob:
335
336
<pre>
337
kubectl create job --from=cronjob/volume2-daily-backup volume2-manual
338
</pre>
339
340
This creates a job volume2-manual based on the cronjob  volume2-daily
341
342 112 Nico Schottelius
h3. su-ing into a user that has nologin shell set
343
344
Many times users are having nologin as their shell inside the container. To be able to execute maintenance commands within the
345
container, we can use @su -s /bin/sh@ like this:
346
347
<pre>
348
su -s /bin/sh -c '/path/to/your/script' testuser
349
</pre>
350
351
Found on https://serverfault.com/questions/351046/how-to-run-command-as-user-who-has-usr-sbin-nologin-as-shell
352
353 113 Nico Schottelius
h3. How to print a secret value
354
355
Assuming you want the "password" item from a secret, use:
356
357
<pre>
358
kubectl get secret SECRETNAME -o jsonpath="{.data.password}" | base64 -d; echo "" 
359
</pre>
360
361 157 Nico Schottelius
h2. Reference CNI
362
363
* Mainly "stupid", but effective plugins
364
* Main documentation on https://www.cni.dev/plugins/current/
365 158 Nico Schottelius
* Plugins
366
** bridge
367
*** Can create the bridge on the host
368
*** But seems not to be able to add host interfaces to it as well
369
*** Has support for vlan tags
370
** vlan
371
*** creates vlan tagged sub interface on the host
372 160 Nico Schottelius
*** "It's a 1:1 mapping (i.e. no bridge in between)":https://github.com/k8snetworkplumbingwg/multus-cni/issues/569
373 158 Nico Schottelius
** host-device
374
*** moves the interface from the host into the container
375
*** very easy for physical connections to containers
376 159 Nico Schottelius
** ipvlan
377
*** "virtualisation" of a host device
378
*** routing based on IP
379
*** Same MAC for everyone
380
*** Cannot reach the master interface
381
** maclvan
382
*** With mac addresses
383
*** Supports various modes (to be checked)
384
** ptp ("point to point")
385
*** Creates a host device and connects it to the container
386
** win*
387 158 Nico Schottelius
*** Windows implementations
388 157 Nico Schottelius
389 62 Nico Schottelius
h2. Calico CNI
390
391
h3. Calico Installation
392
393
* We install "calico using helm":https://docs.projectcalico.org/getting-started/kubernetes/helm
394
* This has the following advantages:
395
** Easy to upgrade
396
** Does not require os to configure IPv6/dual stack settings as the tigera operator figures out things on its own
397
398
Usually plain calico can be installed directly using:
399
400
<pre>
401 149 Nico Schottelius
VERSION=v3.24.1
402
403 120 Nico Schottelius
helm repo add projectcalico https://docs.projectcalico.org/charts
404 124 Nico Schottelius
helm upgrade --install --namespace tigera calico projectcalico/tigera-operator --version $VERSION --create-namespace
405 1 Nico Schottelius
</pre>
406 92 Nico Schottelius
407
* Check the tags on https://github.com/projectcalico/calico/tags for the latest release
408 62 Nico Schottelius
409
h3. Installing calicoctl
410
411 115 Nico Schottelius
* General installation instructions, including binary download: https://projectcalico.docs.tigera.io/maintenance/clis/calicoctl/install
412
413 62 Nico Schottelius
To be able to manage and configure calico, we need to 
414
"install calicoctl (we choose the version as a pod)":https://docs.projectcalico.org/getting-started/clis/calicoctl/install#install-calicoctl-as-a-kubernetes-pod
415
416
<pre>
417
kubectl apply -f https://docs.projectcalico.org/manifests/calicoctl.yaml
418
</pre>
419
420 93 Nico Schottelius
Or version specific:
421
422
<pre>
423
kubectl apply -f https://github.com/projectcalico/calico/blob/v3.20.4/manifests/calicoctl.yaml
424 97 Nico Schottelius
425
# For 3.22
426
kubectl apply -f https://projectcalico.docs.tigera.io/archive/v3.22/manifests/calicoctl.yaml
427 93 Nico Schottelius
</pre>
428
429 70 Nico Schottelius
And making it easier accessible by alias:
430
431
<pre>
432
alias calicoctl="kubectl exec -i -n kube-system calicoctl -- /calicoctl"
433
</pre>
434
435 62 Nico Schottelius
h3. Calico configuration
436
437 63 Nico Schottelius
By default our k8s clusters "BGP peer":https://docs.projectcalico.org/networking/bgp
438
with an upstream router to propagate podcidr and servicecidr.
439 62 Nico Schottelius
440
Default settings in our infrastructure:
441
442
* We use a full-mesh using the @nodeToNodeMeshEnabled: true@ option
443
* We keep the original next hop so that *only* the server with the pod is announcing it (instead of ecmp)
444 1 Nico Schottelius
* We use private ASNs for k8s clusters
445 63 Nico Schottelius
* We do *not* use any overlay
446 62 Nico Schottelius
447
After installing calico and calicoctl the last step of the installation is usually:
448
449 1 Nico Schottelius
<pre>
450 79 Nico Schottelius
calicoctl create -f - < calico-bgp.yaml
451 62 Nico Schottelius
</pre>
452
453
454
A sample BGP configuration:
455
456
<pre>
457
---
458
apiVersion: projectcalico.org/v3
459
kind: BGPConfiguration
460
metadata:
461
  name: default
462
spec:
463
  logSeverityScreen: Info
464
  nodeToNodeMeshEnabled: true
465
  asNumber: 65534
466
  serviceClusterIPs:
467
  - cidr: 2a0a:e5c0:10:3::/108
468
  serviceExternalIPs:
469
  - cidr: 2a0a:e5c0:10:3::/108
470
---
471
apiVersion: projectcalico.org/v3
472
kind: BGPPeer
473
metadata:
474
  name: router1-place10
475
spec:
476
  peerIP: 2a0a:e5c0:10:1::50
477
  asNumber: 213081
478
  keepOriginalNextHop: true
479
</pre>
480
481 126 Nico Schottelius
h2. Cilium CNI (experimental)
482
483 137 Nico Schottelius
h3. Status
484
485 138 Nico Schottelius
*NO WORKING CILIUM CONFIGURATION FOR IPV6 only modes*
486 137 Nico Schottelius
487 146 Nico Schottelius
h3. Latest error
488
489
It seems cilium does not run on IPv6 only hosts:
490
491
<pre>
492
level=info msg="Validating configured node address ranges" subsys=daemon
493
level=fatal msg="postinit failed" error="external IPv4 node address could not be derived, please configure via --ipv4-node" subsys=daemon
494
level=info msg="Starting IP identity watcher" subsys=ipcache
495
</pre>
496
497
It crashes after that log entry
498
499 128 Nico Schottelius
h3. BGP configuration
500
501
* The cilium-operator will not start without a correct configmap being present beforehand (see error message below)
502
* Creating the bgp config beforehand as a configmap is thus required.
503
504
The error one gets without the configmap present:
505
506
Pods are hanging with:
507
508
<pre>
509
cilium-bpqm6                       0/1     Init:0/4            0             9s
510
cilium-operator-5947d94f7f-5bmh2   0/1     ContainerCreating   0             9s
511
</pre>
512
513
The error message in the cilium-*perator is:
514
515
<pre>
516
Events:
517
  Type     Reason       Age                From               Message
518
  ----     ------       ----               ----               -------
519
  Normal   Scheduled    80s                default-scheduler  Successfully assigned kube-system/cilium-operator-5947d94f7f-lqcsp to server56
520
  Warning  FailedMount  16s (x8 over 80s)  kubelet            MountVolume.SetUp failed for volume "bgp-config-path" : configmap "bgp-config" not found
521
</pre>
522
523
A correct bgp config looks like this:
524
525
<pre>
526
apiVersion: v1
527
kind: ConfigMap
528
metadata:
529
  name: bgp-config
530
  namespace: kube-system
531
data:
532
  config.yaml: |
533
    peers:
534
      - peer-address: 2a0a:e5c0::46
535
        peer-asn: 209898
536
        my-asn: 65533
537
      - peer-address: 2a0a:e5c0::47
538
        peer-asn: 209898
539
        my-asn: 65533
540
    address-pools:
541
      - name: default
542
        protocol: bgp
543
        addresses:
544
          - 2a0a:e5c0:0:14::/64
545
</pre>
546 127 Nico Schottelius
547
h3. Installation
548 130 Nico Schottelius
549 127 Nico Schottelius
Adding the repo
550 1 Nico Schottelius
<pre>
551 127 Nico Schottelius
552 129 Nico Schottelius
helm repo add cilium https://helm.cilium.io/
553 130 Nico Schottelius
helm repo update
554
</pre>
555 129 Nico Schottelius
556 135 Nico Schottelius
Installing + configuring cilium
557 129 Nico Schottelius
<pre>
558 130 Nico Schottelius
ipv6pool=2a0a:e5c0:0:14::/112
559 1 Nico Schottelius
560 146 Nico Schottelius
version=1.12.2
561 129 Nico Schottelius
562
helm upgrade --install cilium cilium/cilium --version $version \
563 1 Nico Schottelius
  --namespace kube-system \
564
  --set ipv4.enabled=false \
565
  --set ipv6.enabled=true \
566 146 Nico Schottelius
  --set enableIPv6Masquerade=false \
567
  --set bgpControlPlane.enabled=true 
568 1 Nico Schottelius
569 146 Nico Schottelius
#  --set ipam.operator.clusterPoolIPv6PodCIDRList=$ipv6pool
570
571
# Old style bgp?
572 136 Nico Schottelius
#   --set bgp.enabled=true --set bgp.announce.podCIDR=true \
573 127 Nico Schottelius
574
# Show possible configuration options
575
helm show values cilium/cilium
576
577 1 Nico Schottelius
</pre>
578 132 Nico Schottelius
579
Using a /64 for ipam.operator.clusterPoolIPv6PodCIDRList fails with:
580
581
<pre>
582
level=fatal msg="Unable to init cluster-pool allocator" error="unable to initialize IPv6 allocator New CIDR set failed; the node CIDR size is too big" subsys=cilium-operator-generic
583
</pre>
584
585 126 Nico Schottelius
586 1 Nico Schottelius
See also https://github.com/cilium/cilium/issues/20756
587 135 Nico Schottelius
588
Seems a /112 is actually working.
589
590
h3. Kernel modules
591
592
Cilium requires the following modules to be loaded on the host (not loaded by default):
593
594
<pre>
595 1 Nico Schottelius
modprobe  ip6table_raw
596
modprobe  ip6table_filter
597
</pre>
598 146 Nico Schottelius
599
h3. Interesting helm flags
600
601
* autoDirectNodeRoutes
602
* bgpControlPlane.enabled = true
603
604
h3. SEE ALSO
605
606
* https://docs.cilium.io/en/v1.12/helm-reference/
607 133 Nico Schottelius
608 150 Nico Schottelius
h2. Multus (incomplete/experimental)
609
610
(TBD)
611
612 122 Nico Schottelius
h2. ArgoCD 
613 56 Nico Schottelius
614 60 Nico Schottelius
h3. Argocd Installation
615 1 Nico Schottelius
616 116 Nico Schottelius
* See https://argo-cd.readthedocs.io/en/stable/
617
618 60 Nico Schottelius
As there is no configuration management present yet, argocd is installed using
619
620 1 Nico Schottelius
<pre>
621 60 Nico Schottelius
kubectl create namespace argocd
622 86 Nico Schottelius
623 96 Nico Schottelius
# Specific Version
624
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/v2.3.2/manifests/install.yaml
625 86 Nico Schottelius
626
# OR: latest stable
627 60 Nico Schottelius
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml
628 56 Nico Schottelius
</pre>
629 1 Nico Schottelius
630 116 Nico Schottelius
631 1 Nico Schottelius
632 60 Nico Schottelius
h3. Get the argocd credentials
633
634
<pre>
635
kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d; echo ""
636
</pre>
637 52 Nico Schottelius
638 87 Nico Schottelius
h3. Accessing argocd
639
640
In regular IPv6 clusters:
641
642
* Navigate to https://argocd-server.argocd.CLUSTERDOMAIN
643
644
In legacy IPv4 clusters
645
646
<pre>
647
kubectl --namespace argocd port-forward svc/argocd-server 8080:80
648
</pre>
649
650 88 Nico Schottelius
* Navigate to https://localhost:8080
651
652 68 Nico Schottelius
h3. Using the argocd webhook to trigger changes
653 67 Nico Schottelius
654
* To trigger changes post json https://argocd.example.com/api/webhook
655
656 72 Nico Schottelius
h3. Deploying an application
657
658
* Applications are deployed via git towards gitea (code.ungleich.ch) and then pulled by argo
659 73 Nico Schottelius
* Always include the *redmine-url* pointing to the (customer) ticket
660
** Also add the support-url if it exists
661 72 Nico Schottelius
662
Application sample
663
664
<pre>
665
apiVersion: argoproj.io/v1alpha1
666
kind: Application
667
metadata:
668
  name: gitea-CUSTOMER
669
  namespace: argocd
670
spec:
671
  destination:
672
    namespace: default
673
    server: 'https://kubernetes.default.svc'
674
  source:
675
    path: apps/prod/gitea
676
    repoURL: 'https://code.ungleich.ch/ungleich-intern/k8s-config.git'
677
    targetRevision: HEAD
678
    helm:
679
      parameters:
680
        - name: storage.data.storageClass
681
          value: rook-ceph-block-hdd
682
        - name: storage.data.size
683
          value: 200Gi
684
        - name: storage.db.storageClass
685
          value: rook-ceph-block-ssd
686
        - name: storage.db.size
687
          value: 10Gi
688
        - name: storage.letsencrypt.storageClass
689
          value: rook-ceph-block-hdd
690
        - name: storage.letsencrypt.size
691
          value: 50Mi
692
        - name: letsencryptStaging
693
          value: 'no'
694
        - name: fqdn
695
          value: 'code.verua.online'
696
  project: default
697
  syncPolicy:
698
    automated:
699
      prune: true
700
      selfHeal: true
701
  info:
702
    - name: 'redmine-url'
703
      value: 'https://redmine.ungleich.ch/issues/ISSUEID'
704
    - name: 'support-url'
705
      value: 'https://support.ungleich.ch/Ticket/Display.html?id=TICKETID'
706
</pre>
707
708 80 Nico Schottelius
h2. Helm related operations and conventions
709 55 Nico Schottelius
710 61 Nico Schottelius
We use helm charts extensively.
711
712
* In production, they are managed via argocd
713
* In development, helm chart can de developed and deployed manually using the helm utility.
714
715 55 Nico Schottelius
h3. Installing a helm chart
716
717
One can use the usual pattern of
718
719
<pre>
720
helm install <releasename> <chartdirectory>
721
</pre>
722
723
However often you want to reinstall/update when testing helm charts. The following pattern is "better", because it allows you to reinstall, if it is already installed:
724
725
<pre>
726
helm upgrade --install <releasename> <chartdirectory>
727 1 Nico Schottelius
</pre>
728 80 Nico Schottelius
729
h3. Naming services and deployments in helm charts [Application labels]
730
731
* We always have {{ .Release.Name }} to identify the current "instance"
732
* Deployments:
733
** use @app: <what it is>@, f.i. @app: nginx@, @app: postgres@, ...
734 81 Nico Schottelius
* See more about standard labels on
735
** https://kubernetes.io/docs/concepts/overview/working-with-objects/common-labels/
736
** https://helm.sh/docs/chart_best_practices/labels/
737 55 Nico Schottelius
738 151 Nico Schottelius
h3. Show all versions of a helm chart
739
740
<pre>
741
helm search repo -l repo/chart
742
</pre>
743
744
For example:
745
746
<pre>
747
% helm search repo -l projectcalico/tigera-operator 
748
NAME                         	CHART VERSION	APP VERSION	DESCRIPTION                            
749
projectcalico/tigera-operator	v3.23.3      	v3.23.3    	Installs the Tigera operator for Calico
750
projectcalico/tigera-operator	v3.23.2      	v3.23.2    	Installs the Tigera operator for Calico
751
....
752
</pre>
753
754 152 Nico Schottelius
h3. Show possible values of a chart
755
756
<pre>
757
helm show values <repo/chart>
758
</pre>
759
760
Example:
761
762
<pre>
763
helm show values ingress-nginx/ingress-nginx
764
</pre>
765
766
767 139 Nico Schottelius
h2. Rook + Ceph
768
769
h3. Installation
770
771
* Usually directly via argocd
772
773
Manual steps:
774
775
<pre>
776
777
</pre>
778 43 Nico Schottelius
779 71 Nico Schottelius
h3. Executing ceph commands
780
781
Using the ceph-tools pod as follows:
782
783
<pre>
784
kubectl exec -n rook-ceph -ti $(kubectl -n rook-ceph get pods -l app=rook-ceph-tools -o jsonpath='{.items[*].metadata.name}') -- ceph -s
785
</pre>
786
787 43 Nico Schottelius
h3. Inspecting the logs of a specific server
788
789
<pre>
790
# Get the related pods
791
kubectl -n rook-ceph get pods -l app=rook-ceph-osd-prepare 
792
...
793
794
# Inspect the logs of a specific pod
795
kubectl -n rook-ceph logs -f rook-ceph-osd-prepare-server23--1-444qx
796
797 71 Nico Schottelius
</pre>
798
799
h3. Inspecting the logs of the rook-ceph-operator
800
801
<pre>
802
kubectl -n rook-ceph logs -f -l app=rook-ceph-operator
803 43 Nico Schottelius
</pre>
804
805 121 Nico Schottelius
h3. Restarting the rook operator
806
807
<pre>
808
kubectl -n rook-ceph delete pods  -l app=rook-ceph-operator
809
</pre>
810
811 43 Nico Schottelius
h3. Triggering server prepare / adding new osds
812
813
The rook-ceph-operator triggers/watches/creates pods to maintain hosts. To trigger a full "re scan", simply delete that pod:
814
815
<pre>
816
kubectl -n rook-ceph delete pods -l app=rook-ceph-operator
817
</pre>
818
819
This will cause all the @rook-ceph-osd-prepare-..@ jobs to be recreated and thus OSDs to be created, if new disks have been added.
820
821
h3. Removing an OSD
822
823
* See "Ceph OSD Management":https://rook.io/docs/rook/v1.7/ceph-osd-mgmt.html
824 77 Nico Schottelius
* More specifically: https://github.com/rook/rook/blob/release-1.7/cluster/examples/kubernetes/ceph/osd-purge.yaml
825 99 Nico Schottelius
* Then delete the related deployment
826 41 Nico Schottelius
827 98 Nico Schottelius
Set osd id in the osd-purge.yaml and apply it. OSD should be down before.
828
829
<pre>
830
apiVersion: batch/v1
831
kind: Job
832
metadata:
833
  name: rook-ceph-purge-osd
834
  namespace: rook-ceph # namespace:cluster
835
  labels:
836
    app: rook-ceph-purge-osd
837
spec:
838
  template:
839
    metadata:
840
      labels:
841
        app: rook-ceph-purge-osd
842
    spec:
843
      serviceAccountName: rook-ceph-purge-osd
844
      containers:
845
        - name: osd-removal
846
          image: rook/ceph:master
847
          # TODO: Insert the OSD ID in the last parameter that is to be removed
848
          # The OSD IDs are a comma-separated list. For example: "0" or "0,2".
849
          # If you want to preserve the OSD PVCs, set `--preserve-pvc true`.
850
          #
851
          # A --force-osd-removal option is available if the OSD should be destroyed even though the
852
          # removal could lead to data loss.
853
          args:
854
            - "ceph"
855
            - "osd"
856
            - "remove"
857
            - "--preserve-pvc"
858
            - "false"
859
            - "--force-osd-removal"
860
            - "false"
861
            - "--osd-ids"
862
            - "SETTHEOSDIDHERE"
863
          env:
864
            - name: POD_NAMESPACE
865
              valueFrom:
866
                fieldRef:
867
                  fieldPath: metadata.namespace
868
            - name: ROOK_MON_ENDPOINTS
869
              valueFrom:
870
                configMapKeyRef:
871
                  key: data
872
                  name: rook-ceph-mon-endpoints
873
            - name: ROOK_CEPH_USERNAME
874
              valueFrom:
875
                secretKeyRef:
876
                  key: ceph-username
877
                  name: rook-ceph-mon
878
            - name: ROOK_CEPH_SECRET
879
              valueFrom:
880
                secretKeyRef:
881
                  key: ceph-secret
882
                  name: rook-ceph-mon
883
            - name: ROOK_CONFIG_DIR
884
              value: /var/lib/rook
885
            - name: ROOK_CEPH_CONFIG_OVERRIDE
886
              value: /etc/rook/config/override.conf
887
            - name: ROOK_FSID
888
              valueFrom:
889
                secretKeyRef:
890
                  key: fsid
891
                  name: rook-ceph-mon
892
            - name: ROOK_LOG_LEVEL
893
              value: DEBUG
894
          volumeMounts:
895
            - mountPath: /etc/ceph
896
              name: ceph-conf-emptydir
897
            - mountPath: /var/lib/rook
898
              name: rook-config
899
      volumes:
900
        - emptyDir: {}
901
          name: ceph-conf-emptydir
902
        - emptyDir: {}
903
          name: rook-config
904
      restartPolicy: Never
905
906
907 99 Nico Schottelius
</pre>
908
909
Deleting the deployment:
910
911
<pre>
912
[18:05] bridge:~% kubectl -n rook-ceph delete deployment rook-ceph-osd-6
913
deployment.apps "rook-ceph-osd-6" deleted
914 98 Nico Schottelius
</pre>
915
916 145 Nico Schottelius
h2. Ingress + Cert Manager
917
918
* We deploy "nginx-ingress":https://docs.nginx.com/nginx-ingress-controller/ to get an ingress
919
* we deploy "cert-manager":https://cert-manager.io/ to handle certificates
920
* We independently deploy @ClusterIssuer@ to allow the cert-manager app to deploy and the issuer to be created once the CRDs from cert manager are in place
921
922
h3. IPv4 reachability 
923
924
The ingress is by default IPv6 only. To make it reachable from the IPv4 world, get its IPv6 address and configure a NAT64 mapping in Jool.
925
926
Steps:
927
928
h4. Get the ingress IPv6 address
929
930
Use @kubectl -n ingress-nginx get svc ingress-nginx-controller -o jsonpath='{.spec.clusterIP}'; echo ''@
931
932
Example:
933
934
<pre>
935
kubectl -n ingress-nginx get svc ingress-nginx-controller -o jsonpath='{.spec.clusterIP}'; echo ''
936
2a0a:e5c0:10:1b::ce11
937
</pre>
938
939
h4. Add NAT64 mapping
940
941
* Update the __dcl_jool_siit cdist type
942
* Record the two IPs (IPv6 and IPv4)
943
* Configure all routers
944
945
946
h4. Add DNS record
947
948
To use the ingress capable as a CNAME destination, create an "ingress" DNS record, such as:
949
950
<pre>
951
; k8s ingress for dev
952
dev-ingress                 AAAA 2a0a:e5c0:10:1b::ce11
953
dev-ingress                 A 147.78.194.23
954
955
</pre> 
956
957
h4. Add supporting wildcard DNS
958
959
If you plan to add various sites under a specific domain, we can add a wildcard DNS entry, such as *.k8s-dev.django-hosting.ch:
960
961
<pre>
962
*.k8s-dev         CNAME dev-ingress.ungleich.ch.
963
</pre>
964
965 76 Nico Schottelius
h2. Harbor
966
967
* We user "Harbor":https://goharbor.io/ for caching and as an image registry. Internal app reference: apps/prod/harbor.
968
* The admin password is in the password store, auto generated per cluster
969
* At the moment harbor only authenticates against the internal ldap tree
970
971
h3. LDAP configuration
972
973
* The url needs to be ldaps://...
974
* uid = uid
975
* rest standard
976 75 Nico Schottelius
977 89 Nico Schottelius
h2. Monitoring / Prometheus
978
979 90 Nico Schottelius
* Via "kube-prometheus":https://github.com/prometheus-operator/kube-prometheus/
980 89 Nico Schottelius
981 91 Nico Schottelius
Access via ...
982
983
* http://prometheus-k8s.monitoring.svc:9090
984
* http://grafana.monitoring.svc:3000
985
* http://alertmanager.monitoring.svc:9093
986
987
988 100 Nico Schottelius
h3. Prometheus Options
989
990
* "helm/kube-prometheus-stack":https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack
991
** Includes dashboards and co.
992
* "manifest based kube-prometheus":https://github.com/prometheus-operator/kube-prometheus
993
** Includes dashboards and co.
994
* "Prometheus Operator (mainly CRD manifest":https://github.com/prometheus-operator/prometheus-operator
995
996 91 Nico Schottelius
997 82 Nico Schottelius
h2. Nextcloud
998
999 85 Nico Schottelius
h3. How to get the nextcloud credentials 
1000 84 Nico Schottelius
1001
* The initial username is set to "nextcloud"
1002
* The password is autogenerated and saved in a kubernetes secret
1003
1004
<pre>
1005 85 Nico Schottelius
kubectl get secret RELEASENAME-nextcloud -o jsonpath="{.data.PASSWORD}" | base64 -d; echo "" 
1006 84 Nico Schottelius
</pre>
1007
1008 83 Nico Schottelius
h3. How to fix "Access through untrusted domain"
1009
1010 82 Nico Schottelius
* Nextcloud stores the initial domain configuration
1011 1 Nico Schottelius
* If the FQDN is changed, it will show the error message "Access through untrusted domain"
1012 82 Nico Schottelius
* To fix, edit /var/www/html/config/config.php and correct the domain
1013 83 Nico Schottelius
* Then delete the pods
1014 82 Nico Schottelius
1015 1 Nico Schottelius
h2. Infrastructure versions
1016 35 Nico Schottelius
1017 57 Nico Schottelius
h3. ungleich kubernetes infrastructure v5 (2021-10)
1018 1 Nico Schottelius
1019 57 Nico Schottelius
Clusters are configured / setup in this order:
1020
1021
* Bootstrap via kubeadm
1022 59 Nico Schottelius
* "Networking via calico + BGP (non ECMP) using helm":https://docs.projectcalico.org/getting-started/kubernetes/helm
1023
* "ArgoCD for CD":https://argo-cd.readthedocs.io/en/stable/
1024
** "rook for storage via argocd":https://rook.io/
1025 58 Nico Schottelius
** haproxy for in IPv6-cluster-IPv4-to-IPv6 proxy via argocd
1026
** "kubernetes-secret-generator for in cluster secrets":https://github.com/mittwald/kubernetes-secret-generator
1027
** "ungleich-certbot managing certs and nginx":https://hub.docker.com/repository/docker/ungleich/ungleich-certbot
1028
1029 57 Nico Schottelius
1030
h3. ungleich kubernetes infrastructure v4 (2021-09)
1031
1032 54 Nico Schottelius
* rook is configured via manifests instead of using the rook-ceph-cluster helm chart
1033 1 Nico Schottelius
* The rook operator is still being installed via helm
1034 35 Nico Schottelius
1035 57 Nico Schottelius
h3. ungleich kubernetes infrastructure v3 (2021-07)
1036 1 Nico Schottelius
1037 10 Nico Schottelius
* rook is now installed via helm via argocd instead of directly via manifests
1038 28 Nico Schottelius
1039 57 Nico Schottelius
h3. ungleich kubernetes infrastructure v2 (2021-05)
1040 28 Nico Schottelius
1041
* Replaced fluxv2 from ungleich k8s v1 with argocd
1042 1 Nico Schottelius
** argocd can apply helm templates directly without needing to go through Chart releases
1043 28 Nico Schottelius
* We are also using argoflow for build flows
1044
* Planned to add "kaniko":https://github.com/GoogleContainerTools/kaniko for image building
1045
1046 57 Nico Schottelius
h3. ungleich kubernetes infrastructure v1 (2021-01)
1047 28 Nico Schottelius
1048
We are using the following components:
1049
1050
* "Calico as a CNI":https://www.projectcalico.org/ with BGP, IPv6 only, no encapsulation
1051
** Needed for basic networking
1052
* "kubernetes-secret-generator":https://github.com/mittwald/kubernetes-secret-generator for creating secrets
1053
** Needed so that secrets are not stored in the git repository, but only in the cluster
1054
* "ungleich-certbot":https://hub.docker.com/repository/docker/ungleich/ungleich-certbot
1055
** Needed to get letsencrypt certificates for services
1056
* "rook with ceph rbd + cephfs":https://rook.io/ for storage
1057
** rbd for almost everything, *ReadWriteOnce*
1058
** cephfs for smaller things, multi access *ReadWriteMany*
1059
** Needed for providing persistent storage
1060
* "flux v2":https://fluxcd.io/
1061
** Needed to manage resources automatically