Project

General

Profile

The ungleich kubernetes infrastructure » History » Version 158

Nico Schottelius, 10/30/2022 07:48 AM

1 22 Nico Schottelius
h1. The ungleich kubernetes infrastructure and ungleich kubernetes manual
2 1 Nico Schottelius
3 3 Nico Schottelius
{{toc}}
4
5 1 Nico Schottelius
h2. Status
6
7 28 Nico Schottelius
This document is **pre-production**.
8
This document is to become the ungleich kubernetes infrastructure overview as well as the ungleich kubernetes manual.
9 1 Nico Schottelius
10 10 Nico Schottelius
h2. k8s clusters
11
12 123 Nico Schottelius
| Cluster            | Purpose/Setup     | Maintainer | Master(s)                     | argo                                                   | v4 http proxy | last verified |
13
| c0.k8s.ooo         | Dev               | -          | UNUSED                        |                                                        |               |    2021-10-05 |
14
| c1.k8s.ooo         | retired           |            | -                             |                                                        |               |    2022-03-15 |
15
| c2.k8s.ooo         | Dev p7 HW         | Nico       | server47 server53 server54    | "argo":https://argocd-server.argocd.svc.c2.k8s.ooo     |               |    2021-10-05 |
16
| c3.k8s.ooo         | retired           | -          | -                             |                                                        |               |    2021-10-05 |
17
| c4.k8s.ooo         | Dev2 p7 HW        | Jin-Guk    | server52 server53 server54    |                                                        |               |             - |
18
| c5.k8s.ooo         | retired           |            | -                             |                                                        |               |    2022-03-15 |
19
| c6.k8s.ooo         | Dev p6 VM Jin-Guk | Jin-Guk    |                               |                                                        |               |               |
20
| [[p5.k8s.ooo]]     | production        |            | server34 server36 server38    | "argo":https://argocd-server.argocd.svc.p5.k8s.ooo     | -             |               |
21
| [[p5-cow.k8s.ooo]] | production        | Nico       | server47 server51 server55    | "argo":https://argocd-server.argocd.svc.p5-cow.k8s.ooo |               |    2022-08-27 |
22
| [[p6.k8s.ooo]]     | production        |            | server67 server69 server71    | "argo":https://argocd-server.argocd.svc.p6.k8s.ooo     | 147.78.194.13 |    2021-10-05 |
23
| [[p10.k8s.ooo]]    | production        |            | server63 server65 server83    | "argo":https://argocd-server.argocd.svc.p10.k8s.ooo    | 147.78.194.12 |    2021-10-05 |
24
| [[k8s.ge.nau.so]]  | development       |            | server107 server108 server109 | "argo":https://argocd-server.argocd.svc.k8s.ge.nau.so  |               |               |
25
| [[dev.k8s.ooo]]    | development       |            | server110 server111 server112 | "argo":https://argocd-server.argocd.svc.dev.k8s.ooo    | -             |    2022-07-08 |
26 142 Nico Schottelius
| [[server121.k8s.ooo]] | production | Nico | server121 | | | 2022-09-06 |
27 155 Nico Schottelius
| [[server122-123.k8s.ooo|server122.k8s.ooo]] | production | Nico | server122 | | | 2022-10-30 |
28 156 Nico Schottelius
| [[server122-123.k8s.ooo|server123.k8s.ooo]] | production | Nico | server123 | | | 2022-10-15 |
29 21 Nico Schottelius
30 1 Nico Schottelius
h2. General architecture and components overview
31
32
* All k8s clusters are IPv6 only
33
* We use BGP peering to propagate podcidr and serviceCidr networks to our infrastructure
34
* The main public testing repository is "ungleich-k8s":https://code.ungleich.ch/ungleich-public/ungleich-k8s
35 18 Nico Schottelius
** Private configurations are found in the **k8s-config** repository
36 1 Nico Schottelius
37
h3. Cluster types
38
39 28 Nico Schottelius
| **Type/Feature**            | **Development**                | **Production**         |
40
| Min No. nodes               | 3 (1 master, 3 worker)         | 5 (3 master, 3 worker) |
41
| Recommended minimum         | 4 (dedicated master, 3 worker) | 8 (3 master, 5 worker) |
42
| Separation of control plane | optional                       | recommended            |
43
| Persistent storage          | required                       | required               |
44
| Number of storage monitors  | 3                              | 5                      |
45 1 Nico Schottelius
46 43 Nico Schottelius
h2. General k8s operations
47 1 Nico Schottelius
48 46 Nico Schottelius
h3. Cheat sheet / external great references
49
50
* "kubectl cheatsheet":https://kubernetes.io/docs/reference/kubectl/cheatsheet/
51
52 117 Nico Schottelius
h3. Allowing to schedule work on the control plane / removing node taints
53 69 Nico Schottelius
54
* Mostly for single node / test / development clusters
55
* Just remove the master taint as follows
56
57
<pre>
58
kubectl taint nodes --all node-role.kubernetes.io/master-
59 118 Nico Schottelius
kubectl taint nodes --all node-role.kubernetes.io/control-plane-
60 69 Nico Schottelius
</pre>
61 1 Nico Schottelius
62 117 Nico Schottelius
You can check the node taints using @kubectl describe node ...@
63 69 Nico Schottelius
64 44 Nico Schottelius
h3. Get the cluster admin.conf
65
66
* On the masters of each cluster you can find the file @/etc/kubernetes/admin.conf@
67
* To be able to administrate the cluster you can copy the admin.conf to your local machine
68
* Multi cluster debugging can very easy if you name the config ~/cX-admin.conf (see example below)
69
70
<pre>
71
% scp root@server47.place7.ungleich.ch:/etc/kubernetes/admin.conf ~/c2-admin.conf
72
% export KUBECONFIG=~/c2-admin.conf    
73
% kubectl get nodes
74
NAME       STATUS                     ROLES                  AGE   VERSION
75
server47   Ready                      control-plane,master   82d   v1.22.0
76
server48   Ready                      control-plane,master   82d   v1.22.0
77
server49   Ready                      <none>                 82d   v1.22.0
78
server50   Ready                      <none>                 82d   v1.22.0
79
server59   Ready                      control-plane,master   82d   v1.22.0
80
server60   Ready,SchedulingDisabled   <none>                 82d   v1.22.0
81
server61   Ready                      <none>                 82d   v1.22.0
82
server62   Ready                      <none>                 82d   v1.22.0               
83
</pre>
84
85 18 Nico Schottelius
h3. Installing a new k8s cluster
86 8 Nico Schottelius
87 9 Nico Schottelius
* Decide on the cluster name (usually *cX.k8s.ooo*), X counting upwards
88 28 Nico Schottelius
** Using pXX.k8s.ooo for production clusters of placeXX
89 9 Nico Schottelius
* Use cdist to configure the nodes with requirements like crio
90
* Decide between single or multi node control plane setups (see below)
91 28 Nico Schottelius
** Single control plane suitable for development clusters
92 9 Nico Schottelius
93 28 Nico Schottelius
Typical init procedure:
94 9 Nico Schottelius
95 28 Nico Schottelius
* Single control plane: @kubeadm init --config bootstrap/XXX/kubeadm.yaml@
96
* Multi control plane (HA): @kubeadm init --config bootstrap/XXX/kubeadm.yaml --upload-certs@
97 10 Nico Schottelius
98 29 Nico Schottelius
h3. Deleting a pod that is hanging in terminating state
99
100
<pre>
101
kubectl delete pod <PODNAME> --grace-period=0 --force --namespace <NAMESPACE>
102
</pre>
103
104
(from https://stackoverflow.com/questions/35453792/pods-stuck-in-terminating-status)
105
106 42 Nico Schottelius
h3. Listing nodes of a cluster
107
108
<pre>
109
[15:05] bridge:~% kubectl get nodes
110
NAME       STATUS   ROLES                  AGE   VERSION
111
server22   Ready    <none>                 52d   v1.22.0
112
server23   Ready    <none>                 52d   v1.22.2
113
server24   Ready    <none>                 52d   v1.22.0
114
server25   Ready    <none>                 52d   v1.22.0
115
server26   Ready    <none>                 52d   v1.22.0
116
server27   Ready    <none>                 52d   v1.22.0
117
server63   Ready    control-plane,master   52d   v1.22.0
118
server64   Ready    <none>                 52d   v1.22.0
119
server65   Ready    control-plane,master   52d   v1.22.0
120
server66   Ready    <none>                 52d   v1.22.0
121
server83   Ready    control-plane,master   52d   v1.22.0
122
server84   Ready    <none>                 52d   v1.22.0
123
server85   Ready    <none>                 52d   v1.22.0
124
server86   Ready    <none>                 52d   v1.22.0
125
</pre>
126
127 41 Nico Schottelius
h3. Removing / draining a node
128
129
Usually @kubectl drain server@ should do the job, but sometimes we need to be more aggressive:
130
131 1 Nico Schottelius
<pre>
132 103 Nico Schottelius
kubectl drain --delete-emptydir-data --ignore-daemonsets serverXX
133 42 Nico Schottelius
</pre>
134
135
h3. Readding a node after draining
136
137
<pre>
138
kubectl uncordon serverXX
139 1 Nico Schottelius
</pre>
140 43 Nico Schottelius
141 50 Nico Schottelius
h3. (Re-)joining worker nodes after creating the cluster
142 49 Nico Schottelius
143
* We need to have an up-to-date token
144
* We use different join commands for the workers and control plane nodes
145
146
Generating the join command on an existing control plane node:
147
148
<pre>
149
kubeadm token create --print-join-command
150
</pre>
151
152 50 Nico Schottelius
h3. (Re-)joining control plane nodes after creating the cluster
153 1 Nico Schottelius
154 50 Nico Schottelius
* We generate the token again
155
* We upload the certificates
156
* We need to combine/create the join command for the control plane node
157
158
Example session:
159
160
<pre>
161
% kubeadm token create --print-join-command
162
kubeadm join p10-api.k8s.ooo:6443 --token xmff4i.ABC --discovery-token-ca-cert-hash sha256:longhash 
163
164
% kubeadm init phase upload-certs --upload-certs
165
[upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
166
[upload-certs] Using certificate key:
167
CERTKEY
168
169
# Then we use these two outputs on the joining node:
170
171
kubeadm join p10-api.k8s.ooo:6443 --token xmff4i.ABC --discovery-token-ca-cert-hash sha256:longhash --control-plane --certificate-key CERTKEY
172
</pre>
173
174
Commands to be used on a control plane node:
175
176
<pre>
177
kubeadm token create --print-join-command
178
kubeadm init phase upload-certs --upload-certs
179
</pre>
180
181
Commands to be used on the joining node:
182
183
<pre>
184
JOINCOMMAND --control-plane --certificate-key CERTKEY
185
</pre>
186 49 Nico Schottelius
187 51 Nico Schottelius
SEE ALSO
188
189
* https://stackoverflow.com/questions/63936268/how-to-generate-kubeadm-token-for-secondary-control-plane-nodes
190
* https://blog.scottlowe.org/2019/08/15/reconstructing-the-join-command-for-kubeadm/
191
192 53 Nico Schottelius
h3. How to fix etcd does not start when rejoining a kubernetes cluster as a control plane
193 52 Nico Schottelius
194
If during the above step etcd does not come up, @kubeadm join@ can hang as follows:
195
196
<pre>
197
[control-plane] Creating static Pod manifest for "kube-apiserver"                                                              
198
[control-plane] Creating static Pod manifest for "kube-controller-manager"                                                     
199
[control-plane] Creating static Pod manifest for "kube-scheduler"                                                              
200
[check-etcd] Checking that the etcd cluster is healthy                                                                         
201
error execution phase check-etcd: etcd cluster is not healthy: failed to dial endpoint https://[2a0a:e5c0:10:1:225:b3ff:fe20:37
202
8a]:2379 with maintenance client: context deadline exceeded                                                                    
203
To see the stack trace of this error execute with --v=5 or higher         
204
</pre>
205
206
Then the problem is likely that the etcd server is still a member of the cluster. We first need to remove it from the etcd cluster and then the join works.
207
208
To fix this we do:
209
210
* Find a working etcd pod
211
* Find the etcd members / member list
212
* Remove the etcd member that we want to re-join the cluster
213
214
215
<pre>
216
# Find the etcd pods
217
kubectl -n kube-system get pods -l component=etcd,tier=control-plane
218
219
# Get the list of etcd servers with the member id 
220
kubectl exec -n kube-system -ti ETCDPODNAME -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert  /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member list
221
222
# Remove the member
223
kubectl exec -n kube-system -ti ETCDPODNAME -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert  /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member remove MEMBERID
224
</pre>
225
226
Sample session:
227
228
<pre>
229
[10:48] line:~% kubectl -n kube-system get pods -l component=etcd,tier=control-plane
230
NAME            READY   STATUS    RESTARTS     AGE
231
etcd-server63   1/1     Running   0            3m11s
232
etcd-server65   1/1     Running   3            7d2h
233
etcd-server83   1/1     Running   8 (6d ago)   7d2h
234
[10:48] line:~% kubectl exec -n kube-system -ti etcd-server65 -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert  /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member list
235
356891cd676df6e4, started, server65, https://[2a0a:e5c0:10:1:225:b3ff:fe20:375c]:2380, https://[2a0a:e5c0:10:1:225:b3ff:fe20:375c]:2379, false
236
371b8a07185dee7e, started, server63, https://[2a0a:e5c0:10:1:225:b3ff:fe20:378a]:2380, https://[2a0a:e5c0:10:1:225:b3ff:fe20:378a]:2379, false
237
5942bc58307f8af9, started, server83, https://[2a0a:e5c0:10:1:3e4a:92ff:fe79:bb98]:2380, https://[2a0a:e5c0:10:1:3e4a:92ff:fe79:bb98]:2379, false
238
239
[10:48] line:~% kubectl exec -n kube-system -ti etcd-server65 -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert  /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member remove 371b8a07185dee7e
240
Member 371b8a07185dee7e removed from cluster e3c0805f592a8f77
241 1 Nico Schottelius
242
</pre>
243
244
SEE ALSO
245
246
* We found the solution using https://stackoverflow.com/questions/67921552/re-installed-node-cannot-join-kubernetes-cluster
247 56 Nico Schottelius
248 147 Nico Schottelius
h3. Node labels (adding, showing, removing)
249
250
Listing the labels:
251
252
<pre>
253
kubectl get nodes --show-labels
254
</pre>
255
256
Adding labels:
257
258
<pre>
259
kubectl label nodes LIST-OF-NODES label1=value1 
260
261
</pre>
262
263
For instance:
264
265
<pre>
266
kubectl label nodes router2 router3 hosttype=router 
267
</pre>
268
269
Selecting nodes in pods:
270
271
<pre>
272
apiVersion: v1
273
kind: Pod
274
...
275
spec:
276
  nodeSelector:
277
    hosttype: router
278
</pre>
279
280 148 Nico Schottelius
Removing labels by adding a minus at the end of the label name:
281
282
<pre>
283
kubectl label node <nodename> <labelname>-
284
</pre>
285
286
For instance:
287
288
<pre>
289
kubectl label nodes router2 router3 hosttype- 
290
</pre>
291
292 147 Nico Schottelius
SEE ALSO
293 1 Nico Schottelius
294 148 Nico Schottelius
* https://kubernetes.io/docs/tasks/configure-pod-container/assign-pods-nodes/
295
* https://stackoverflow.com/questions/34067979/how-to-delete-a-node-label-by-command-and-api
296 147 Nico Schottelius
297 101 Nico Schottelius
h3. Hardware Maintenance using ungleich-hardware
298
299
Use the following manifest and replace the HOST with the actual host:
300
301
<pre>
302
apiVersion: v1
303
kind: Pod
304
metadata:
305
  name: ungleich-hardware-HOST
306
spec:
307
  containers:
308
  - name: ungleich-hardware
309
    image: ungleich/ungleich-hardware:0.0.5
310
    args:
311
    - sleep
312
    - "1000000"
313
    volumeMounts:
314
      - mountPath: /dev
315
        name: dev
316
    securityContext:
317
      privileged: true
318
  nodeSelector:
319
    kubernetes.io/hostname: "HOST"
320
321
  volumes:
322
    - name: dev
323
      hostPath:
324
        path: /dev
325
</pre>
326
327 102 Nico Schottelius
Also see: [[The_ungleich_hardware_maintenance_guide]]
328
329 105 Nico Schottelius
h3. Triggering a cronjob / creating a job from a cronjob
330 104 Nico Schottelius
331
To test a cronjob, we can create a job from a cronjob:
332
333
<pre>
334
kubectl create job --from=cronjob/volume2-daily-backup volume2-manual
335
</pre>
336
337
This creates a job volume2-manual based on the cronjob  volume2-daily
338
339 112 Nico Schottelius
h3. su-ing into a user that has nologin shell set
340
341
Many times users are having nologin as their shell inside the container. To be able to execute maintenance commands within the
342
container, we can use @su -s /bin/sh@ like this:
343
344
<pre>
345
su -s /bin/sh -c '/path/to/your/script' testuser
346
</pre>
347
348
Found on https://serverfault.com/questions/351046/how-to-run-command-as-user-who-has-usr-sbin-nologin-as-shell
349
350 113 Nico Schottelius
h3. How to print a secret value
351
352
Assuming you want the "password" item from a secret, use:
353
354
<pre>
355
kubectl get secret SECRETNAME -o jsonpath="{.data.password}" | base64 -d; echo "" 
356
</pre>
357
358 157 Nico Schottelius
h2. Reference CNI
359
360
* Mainly "stupid", but effective plugins
361
* Main documentation on https://www.cni.dev/plugins/current/
362 158 Nico Schottelius
* Plugins
363
** bridge
364
*** Can create the bridge on the host
365
*** But seems not to be able to add host interfaces to it as well
366
*** Has support for vlan tags
367
** vlan
368
*** creates vlan tagged sub interface on the host
369
*** Reads like a 1:1 mapping (i.e. no bridge in between)
370
** host-device
371
*** moves the interface from the host into the container
372
*** very easy for physical connections to containers
373
374 157 Nico Schottelius
375 62 Nico Schottelius
h2. Calico CNI
376
377
h3. Calico Installation
378
379
* We install "calico using helm":https://docs.projectcalico.org/getting-started/kubernetes/helm
380
* This has the following advantages:
381
** Easy to upgrade
382
** Does not require os to configure IPv6/dual stack settings as the tigera operator figures out things on its own
383
384
Usually plain calico can be installed directly using:
385
386
<pre>
387 149 Nico Schottelius
VERSION=v3.24.1
388
389 120 Nico Schottelius
helm repo add projectcalico https://docs.projectcalico.org/charts
390 124 Nico Schottelius
helm upgrade --install --namespace tigera calico projectcalico/tigera-operator --version $VERSION --create-namespace
391 1 Nico Schottelius
</pre>
392 92 Nico Schottelius
393
* Check the tags on https://github.com/projectcalico/calico/tags for the latest release
394 62 Nico Schottelius
395
h3. Installing calicoctl
396
397 115 Nico Schottelius
* General installation instructions, including binary download: https://projectcalico.docs.tigera.io/maintenance/clis/calicoctl/install
398
399 62 Nico Schottelius
To be able to manage and configure calico, we need to 
400
"install calicoctl (we choose the version as a pod)":https://docs.projectcalico.org/getting-started/clis/calicoctl/install#install-calicoctl-as-a-kubernetes-pod
401
402
<pre>
403
kubectl apply -f https://docs.projectcalico.org/manifests/calicoctl.yaml
404
</pre>
405
406 93 Nico Schottelius
Or version specific:
407
408
<pre>
409
kubectl apply -f https://github.com/projectcalico/calico/blob/v3.20.4/manifests/calicoctl.yaml
410 97 Nico Schottelius
411
# For 3.22
412
kubectl apply -f https://projectcalico.docs.tigera.io/archive/v3.22/manifests/calicoctl.yaml
413 93 Nico Schottelius
</pre>
414
415 70 Nico Schottelius
And making it easier accessible by alias:
416
417
<pre>
418
alias calicoctl="kubectl exec -i -n kube-system calicoctl -- /calicoctl"
419
</pre>
420
421 62 Nico Schottelius
h3. Calico configuration
422
423 63 Nico Schottelius
By default our k8s clusters "BGP peer":https://docs.projectcalico.org/networking/bgp
424
with an upstream router to propagate podcidr and servicecidr.
425 62 Nico Schottelius
426
Default settings in our infrastructure:
427
428
* We use a full-mesh using the @nodeToNodeMeshEnabled: true@ option
429
* We keep the original next hop so that *only* the server with the pod is announcing it (instead of ecmp)
430 1 Nico Schottelius
* We use private ASNs for k8s clusters
431 63 Nico Schottelius
* We do *not* use any overlay
432 62 Nico Schottelius
433
After installing calico and calicoctl the last step of the installation is usually:
434
435 1 Nico Schottelius
<pre>
436 79 Nico Schottelius
calicoctl create -f - < calico-bgp.yaml
437 62 Nico Schottelius
</pre>
438
439
440
A sample BGP configuration:
441
442
<pre>
443
---
444
apiVersion: projectcalico.org/v3
445
kind: BGPConfiguration
446
metadata:
447
  name: default
448
spec:
449
  logSeverityScreen: Info
450
  nodeToNodeMeshEnabled: true
451
  asNumber: 65534
452
  serviceClusterIPs:
453
  - cidr: 2a0a:e5c0:10:3::/108
454
  serviceExternalIPs:
455
  - cidr: 2a0a:e5c0:10:3::/108
456
---
457
apiVersion: projectcalico.org/v3
458
kind: BGPPeer
459
metadata:
460
  name: router1-place10
461
spec:
462
  peerIP: 2a0a:e5c0:10:1::50
463
  asNumber: 213081
464
  keepOriginalNextHop: true
465
</pre>
466
467 126 Nico Schottelius
h2. Cilium CNI (experimental)
468
469 137 Nico Schottelius
h3. Status
470
471 138 Nico Schottelius
*NO WORKING CILIUM CONFIGURATION FOR IPV6 only modes*
472 137 Nico Schottelius
473 146 Nico Schottelius
h3. Latest error
474
475
It seems cilium does not run on IPv6 only hosts:
476
477
<pre>
478
level=info msg="Validating configured node address ranges" subsys=daemon
479
level=fatal msg="postinit failed" error="external IPv4 node address could not be derived, please configure via --ipv4-node" subsys=daemon
480
level=info msg="Starting IP identity watcher" subsys=ipcache
481
</pre>
482
483
It crashes after that log entry
484
485 128 Nico Schottelius
h3. BGP configuration
486
487
* The cilium-operator will not start without a correct configmap being present beforehand (see error message below)
488
* Creating the bgp config beforehand as a configmap is thus required.
489
490
The error one gets without the configmap present:
491
492
Pods are hanging with:
493
494
<pre>
495
cilium-bpqm6                       0/1     Init:0/4            0             9s
496
cilium-operator-5947d94f7f-5bmh2   0/1     ContainerCreating   0             9s
497
</pre>
498
499
The error message in the cilium-*perator is:
500
501
<pre>
502
Events:
503
  Type     Reason       Age                From               Message
504
  ----     ------       ----               ----               -------
505
  Normal   Scheduled    80s                default-scheduler  Successfully assigned kube-system/cilium-operator-5947d94f7f-lqcsp to server56
506
  Warning  FailedMount  16s (x8 over 80s)  kubelet            MountVolume.SetUp failed for volume "bgp-config-path" : configmap "bgp-config" not found
507
</pre>
508
509
A correct bgp config looks like this:
510
511
<pre>
512
apiVersion: v1
513
kind: ConfigMap
514
metadata:
515
  name: bgp-config
516
  namespace: kube-system
517
data:
518
  config.yaml: |
519
    peers:
520
      - peer-address: 2a0a:e5c0::46
521
        peer-asn: 209898
522
        my-asn: 65533
523
      - peer-address: 2a0a:e5c0::47
524
        peer-asn: 209898
525
        my-asn: 65533
526
    address-pools:
527
      - name: default
528
        protocol: bgp
529
        addresses:
530
          - 2a0a:e5c0:0:14::/64
531
</pre>
532 127 Nico Schottelius
533
h3. Installation
534 130 Nico Schottelius
535 127 Nico Schottelius
Adding the repo
536 1 Nico Schottelius
<pre>
537 127 Nico Schottelius
538 129 Nico Schottelius
helm repo add cilium https://helm.cilium.io/
539 130 Nico Schottelius
helm repo update
540
</pre>
541 129 Nico Schottelius
542 135 Nico Schottelius
Installing + configuring cilium
543 129 Nico Schottelius
<pre>
544 130 Nico Schottelius
ipv6pool=2a0a:e5c0:0:14::/112
545 1 Nico Schottelius
546 146 Nico Schottelius
version=1.12.2
547 129 Nico Schottelius
548
helm upgrade --install cilium cilium/cilium --version $version \
549 1 Nico Schottelius
  --namespace kube-system \
550
  --set ipv4.enabled=false \
551
  --set ipv6.enabled=true \
552 146 Nico Schottelius
  --set enableIPv6Masquerade=false \
553
  --set bgpControlPlane.enabled=true 
554 1 Nico Schottelius
555 146 Nico Schottelius
#  --set ipam.operator.clusterPoolIPv6PodCIDRList=$ipv6pool
556
557
# Old style bgp?
558 136 Nico Schottelius
#   --set bgp.enabled=true --set bgp.announce.podCIDR=true \
559 127 Nico Schottelius
560
# Show possible configuration options
561
helm show values cilium/cilium
562
563 1 Nico Schottelius
</pre>
564 132 Nico Schottelius
565
Using a /64 for ipam.operator.clusterPoolIPv6PodCIDRList fails with:
566
567
<pre>
568
level=fatal msg="Unable to init cluster-pool allocator" error="unable to initialize IPv6 allocator New CIDR set failed; the node CIDR size is too big" subsys=cilium-operator-generic
569
</pre>
570
571 126 Nico Schottelius
572 1 Nico Schottelius
See also https://github.com/cilium/cilium/issues/20756
573 135 Nico Schottelius
574
Seems a /112 is actually working.
575
576
h3. Kernel modules
577
578
Cilium requires the following modules to be loaded on the host (not loaded by default):
579
580
<pre>
581 1 Nico Schottelius
modprobe  ip6table_raw
582
modprobe  ip6table_filter
583
</pre>
584 146 Nico Schottelius
585
h3. Interesting helm flags
586
587
* autoDirectNodeRoutes
588
* bgpControlPlane.enabled = true
589
590
h3. SEE ALSO
591
592
* https://docs.cilium.io/en/v1.12/helm-reference/
593 133 Nico Schottelius
594 150 Nico Schottelius
h2. Multus (incomplete/experimental)
595
596
(TBD)
597
598 122 Nico Schottelius
h2. ArgoCD 
599 56 Nico Schottelius
600 60 Nico Schottelius
h3. Argocd Installation
601 1 Nico Schottelius
602 116 Nico Schottelius
* See https://argo-cd.readthedocs.io/en/stable/
603
604 60 Nico Schottelius
As there is no configuration management present yet, argocd is installed using
605
606 1 Nico Schottelius
<pre>
607 60 Nico Schottelius
kubectl create namespace argocd
608 86 Nico Schottelius
609 96 Nico Schottelius
# Specific Version
610
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/v2.3.2/manifests/install.yaml
611 86 Nico Schottelius
612
# OR: latest stable
613 60 Nico Schottelius
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml
614 56 Nico Schottelius
</pre>
615 1 Nico Schottelius
616 116 Nico Schottelius
617 1 Nico Schottelius
618 60 Nico Schottelius
h3. Get the argocd credentials
619
620
<pre>
621
kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d; echo ""
622
</pre>
623 52 Nico Schottelius
624 87 Nico Schottelius
h3. Accessing argocd
625
626
In regular IPv6 clusters:
627
628
* Navigate to https://argocd-server.argocd.CLUSTERDOMAIN
629
630
In legacy IPv4 clusters
631
632
<pre>
633
kubectl --namespace argocd port-forward svc/argocd-server 8080:80
634
</pre>
635
636 88 Nico Schottelius
* Navigate to https://localhost:8080
637
638 68 Nico Schottelius
h3. Using the argocd webhook to trigger changes
639 67 Nico Schottelius
640
* To trigger changes post json https://argocd.example.com/api/webhook
641
642 72 Nico Schottelius
h3. Deploying an application
643
644
* Applications are deployed via git towards gitea (code.ungleich.ch) and then pulled by argo
645 73 Nico Schottelius
* Always include the *redmine-url* pointing to the (customer) ticket
646
** Also add the support-url if it exists
647 72 Nico Schottelius
648
Application sample
649
650
<pre>
651
apiVersion: argoproj.io/v1alpha1
652
kind: Application
653
metadata:
654
  name: gitea-CUSTOMER
655
  namespace: argocd
656
spec:
657
  destination:
658
    namespace: default
659
    server: 'https://kubernetes.default.svc'
660
  source:
661
    path: apps/prod/gitea
662
    repoURL: 'https://code.ungleich.ch/ungleich-intern/k8s-config.git'
663
    targetRevision: HEAD
664
    helm:
665
      parameters:
666
        - name: storage.data.storageClass
667
          value: rook-ceph-block-hdd
668
        - name: storage.data.size
669
          value: 200Gi
670
        - name: storage.db.storageClass
671
          value: rook-ceph-block-ssd
672
        - name: storage.db.size
673
          value: 10Gi
674
        - name: storage.letsencrypt.storageClass
675
          value: rook-ceph-block-hdd
676
        - name: storage.letsencrypt.size
677
          value: 50Mi
678
        - name: letsencryptStaging
679
          value: 'no'
680
        - name: fqdn
681
          value: 'code.verua.online'
682
  project: default
683
  syncPolicy:
684
    automated:
685
      prune: true
686
      selfHeal: true
687
  info:
688
    - name: 'redmine-url'
689
      value: 'https://redmine.ungleich.ch/issues/ISSUEID'
690
    - name: 'support-url'
691
      value: 'https://support.ungleich.ch/Ticket/Display.html?id=TICKETID'
692
</pre>
693
694 80 Nico Schottelius
h2. Helm related operations and conventions
695 55 Nico Schottelius
696 61 Nico Schottelius
We use helm charts extensively.
697
698
* In production, they are managed via argocd
699
* In development, helm chart can de developed and deployed manually using the helm utility.
700
701 55 Nico Schottelius
h3. Installing a helm chart
702
703
One can use the usual pattern of
704
705
<pre>
706
helm install <releasename> <chartdirectory>
707
</pre>
708
709
However often you want to reinstall/update when testing helm charts. The following pattern is "better", because it allows you to reinstall, if it is already installed:
710
711
<pre>
712
helm upgrade --install <releasename> <chartdirectory>
713 1 Nico Schottelius
</pre>
714 80 Nico Schottelius
715
h3. Naming services and deployments in helm charts [Application labels]
716
717
* We always have {{ .Release.Name }} to identify the current "instance"
718
* Deployments:
719
** use @app: <what it is>@, f.i. @app: nginx@, @app: postgres@, ...
720 81 Nico Schottelius
* See more about standard labels on
721
** https://kubernetes.io/docs/concepts/overview/working-with-objects/common-labels/
722
** https://helm.sh/docs/chart_best_practices/labels/
723 55 Nico Schottelius
724 151 Nico Schottelius
h3. Show all versions of a helm chart
725
726
<pre>
727
helm search repo -l repo/chart
728
</pre>
729
730
For example:
731
732
<pre>
733
% helm search repo -l projectcalico/tigera-operator 
734
NAME                         	CHART VERSION	APP VERSION	DESCRIPTION                            
735
projectcalico/tigera-operator	v3.23.3      	v3.23.3    	Installs the Tigera operator for Calico
736
projectcalico/tigera-operator	v3.23.2      	v3.23.2    	Installs the Tigera operator for Calico
737
....
738
</pre>
739
740 152 Nico Schottelius
h3. Show possible values of a chart
741
742
<pre>
743
helm show values <repo/chart>
744
</pre>
745
746
Example:
747
748
<pre>
749
helm show values ingress-nginx/ingress-nginx
750
</pre>
751
752
753 139 Nico Schottelius
h2. Rook + Ceph
754
755
h3. Installation
756
757
* Usually directly via argocd
758
759
Manual steps:
760
761
<pre>
762
763
</pre>
764 43 Nico Schottelius
765 71 Nico Schottelius
h3. Executing ceph commands
766
767
Using the ceph-tools pod as follows:
768
769
<pre>
770
kubectl exec -n rook-ceph -ti $(kubectl -n rook-ceph get pods -l app=rook-ceph-tools -o jsonpath='{.items[*].metadata.name}') -- ceph -s
771
</pre>
772
773 43 Nico Schottelius
h3. Inspecting the logs of a specific server
774
775
<pre>
776
# Get the related pods
777
kubectl -n rook-ceph get pods -l app=rook-ceph-osd-prepare 
778
...
779
780
# Inspect the logs of a specific pod
781
kubectl -n rook-ceph logs -f rook-ceph-osd-prepare-server23--1-444qx
782
783 71 Nico Schottelius
</pre>
784
785
h3. Inspecting the logs of the rook-ceph-operator
786
787
<pre>
788
kubectl -n rook-ceph logs -f -l app=rook-ceph-operator
789 43 Nico Schottelius
</pre>
790
791 121 Nico Schottelius
h3. Restarting the rook operator
792
793
<pre>
794
kubectl -n rook-ceph delete pods  -l app=rook-ceph-operator
795
</pre>
796
797 43 Nico Schottelius
h3. Triggering server prepare / adding new osds
798
799
The rook-ceph-operator triggers/watches/creates pods to maintain hosts. To trigger a full "re scan", simply delete that pod:
800
801
<pre>
802
kubectl -n rook-ceph delete pods -l app=rook-ceph-operator
803
</pre>
804
805
This will cause all the @rook-ceph-osd-prepare-..@ jobs to be recreated and thus OSDs to be created, if new disks have been added.
806
807
h3. Removing an OSD
808
809
* See "Ceph OSD Management":https://rook.io/docs/rook/v1.7/ceph-osd-mgmt.html
810 77 Nico Schottelius
* More specifically: https://github.com/rook/rook/blob/release-1.7/cluster/examples/kubernetes/ceph/osd-purge.yaml
811 99 Nico Schottelius
* Then delete the related deployment
812 41 Nico Schottelius
813 98 Nico Schottelius
Set osd id in the osd-purge.yaml and apply it. OSD should be down before.
814
815
<pre>
816
apiVersion: batch/v1
817
kind: Job
818
metadata:
819
  name: rook-ceph-purge-osd
820
  namespace: rook-ceph # namespace:cluster
821
  labels:
822
    app: rook-ceph-purge-osd
823
spec:
824
  template:
825
    metadata:
826
      labels:
827
        app: rook-ceph-purge-osd
828
    spec:
829
      serviceAccountName: rook-ceph-purge-osd
830
      containers:
831
        - name: osd-removal
832
          image: rook/ceph:master
833
          # TODO: Insert the OSD ID in the last parameter that is to be removed
834
          # The OSD IDs are a comma-separated list. For example: "0" or "0,2".
835
          # If you want to preserve the OSD PVCs, set `--preserve-pvc true`.
836
          #
837
          # A --force-osd-removal option is available if the OSD should be destroyed even though the
838
          # removal could lead to data loss.
839
          args:
840
            - "ceph"
841
            - "osd"
842
            - "remove"
843
            - "--preserve-pvc"
844
            - "false"
845
            - "--force-osd-removal"
846
            - "false"
847
            - "--osd-ids"
848
            - "SETTHEOSDIDHERE"
849
          env:
850
            - name: POD_NAMESPACE
851
              valueFrom:
852
                fieldRef:
853
                  fieldPath: metadata.namespace
854
            - name: ROOK_MON_ENDPOINTS
855
              valueFrom:
856
                configMapKeyRef:
857
                  key: data
858
                  name: rook-ceph-mon-endpoints
859
            - name: ROOK_CEPH_USERNAME
860
              valueFrom:
861
                secretKeyRef:
862
                  key: ceph-username
863
                  name: rook-ceph-mon
864
            - name: ROOK_CEPH_SECRET
865
              valueFrom:
866
                secretKeyRef:
867
                  key: ceph-secret
868
                  name: rook-ceph-mon
869
            - name: ROOK_CONFIG_DIR
870
              value: /var/lib/rook
871
            - name: ROOK_CEPH_CONFIG_OVERRIDE
872
              value: /etc/rook/config/override.conf
873
            - name: ROOK_FSID
874
              valueFrom:
875
                secretKeyRef:
876
                  key: fsid
877
                  name: rook-ceph-mon
878
            - name: ROOK_LOG_LEVEL
879
              value: DEBUG
880
          volumeMounts:
881
            - mountPath: /etc/ceph
882
              name: ceph-conf-emptydir
883
            - mountPath: /var/lib/rook
884
              name: rook-config
885
      volumes:
886
        - emptyDir: {}
887
          name: ceph-conf-emptydir
888
        - emptyDir: {}
889
          name: rook-config
890
      restartPolicy: Never
891
892
893 99 Nico Schottelius
</pre>
894
895
Deleting the deployment:
896
897
<pre>
898
[18:05] bridge:~% kubectl -n rook-ceph delete deployment rook-ceph-osd-6
899
deployment.apps "rook-ceph-osd-6" deleted
900 98 Nico Schottelius
</pre>
901
902 145 Nico Schottelius
h2. Ingress + Cert Manager
903
904
* We deploy "nginx-ingress":https://docs.nginx.com/nginx-ingress-controller/ to get an ingress
905
* we deploy "cert-manager":https://cert-manager.io/ to handle certificates
906
* We independently deploy @ClusterIssuer@ to allow the cert-manager app to deploy and the issuer to be created once the CRDs from cert manager are in place
907
908
h3. IPv4 reachability 
909
910
The ingress is by default IPv6 only. To make it reachable from the IPv4 world, get its IPv6 address and configure a NAT64 mapping in Jool.
911
912
Steps:
913
914
h4. Get the ingress IPv6 address
915
916
Use @kubectl -n ingress-nginx get svc ingress-nginx-controller -o jsonpath='{.spec.clusterIP}'; echo ''@
917
918
Example:
919
920
<pre>
921
kubectl -n ingress-nginx get svc ingress-nginx-controller -o jsonpath='{.spec.clusterIP}'; echo ''
922
2a0a:e5c0:10:1b::ce11
923
</pre>
924
925
h4. Add NAT64 mapping
926
927
* Update the __dcl_jool_siit cdist type
928
* Record the two IPs (IPv6 and IPv4)
929
* Configure all routers
930
931
932
h4. Add DNS record
933
934
To use the ingress capable as a CNAME destination, create an "ingress" DNS record, such as:
935
936
<pre>
937
; k8s ingress for dev
938
dev-ingress                 AAAA 2a0a:e5c0:10:1b::ce11
939
dev-ingress                 A 147.78.194.23
940
941
</pre> 
942
943
h4. Add supporting wildcard DNS
944
945
If you plan to add various sites under a specific domain, we can add a wildcard DNS entry, such as *.k8s-dev.django-hosting.ch:
946
947
<pre>
948
*.k8s-dev         CNAME dev-ingress.ungleich.ch.
949
</pre>
950
951 76 Nico Schottelius
h2. Harbor
952
953
* We user "Harbor":https://goharbor.io/ for caching and as an image registry. Internal app reference: apps/prod/harbor.
954
* The admin password is in the password store, auto generated per cluster
955
* At the moment harbor only authenticates against the internal ldap tree
956
957
h3. LDAP configuration
958
959
* The url needs to be ldaps://...
960
* uid = uid
961
* rest standard
962 75 Nico Schottelius
963 89 Nico Schottelius
h2. Monitoring / Prometheus
964
965 90 Nico Schottelius
* Via "kube-prometheus":https://github.com/prometheus-operator/kube-prometheus/
966 89 Nico Schottelius
967 91 Nico Schottelius
Access via ...
968
969
* http://prometheus-k8s.monitoring.svc:9090
970
* http://grafana.monitoring.svc:3000
971
* http://alertmanager.monitoring.svc:9093
972
973
974 100 Nico Schottelius
h3. Prometheus Options
975
976
* "helm/kube-prometheus-stack":https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack
977
** Includes dashboards and co.
978
* "manifest based kube-prometheus":https://github.com/prometheus-operator/kube-prometheus
979
** Includes dashboards and co.
980
* "Prometheus Operator (mainly CRD manifest":https://github.com/prometheus-operator/prometheus-operator
981
982 91 Nico Schottelius
983 82 Nico Schottelius
h2. Nextcloud
984
985 85 Nico Schottelius
h3. How to get the nextcloud credentials 
986 84 Nico Schottelius
987
* The initial username is set to "nextcloud"
988
* The password is autogenerated and saved in a kubernetes secret
989
990
<pre>
991 85 Nico Schottelius
kubectl get secret RELEASENAME-nextcloud -o jsonpath="{.data.PASSWORD}" | base64 -d; echo "" 
992 84 Nico Schottelius
</pre>
993
994 83 Nico Schottelius
h3. How to fix "Access through untrusted domain"
995
996 82 Nico Schottelius
* Nextcloud stores the initial domain configuration
997 1 Nico Schottelius
* If the FQDN is changed, it will show the error message "Access through untrusted domain"
998 82 Nico Schottelius
* To fix, edit /var/www/html/config/config.php and correct the domain
999 83 Nico Schottelius
* Then delete the pods
1000 82 Nico Schottelius
1001 1 Nico Schottelius
h2. Infrastructure versions
1002 35 Nico Schottelius
1003 57 Nico Schottelius
h3. ungleich kubernetes infrastructure v5 (2021-10)
1004 1 Nico Schottelius
1005 57 Nico Schottelius
Clusters are configured / setup in this order:
1006
1007
* Bootstrap via kubeadm
1008 59 Nico Schottelius
* "Networking via calico + BGP (non ECMP) using helm":https://docs.projectcalico.org/getting-started/kubernetes/helm
1009
* "ArgoCD for CD":https://argo-cd.readthedocs.io/en/stable/
1010
** "rook for storage via argocd":https://rook.io/
1011 58 Nico Schottelius
** haproxy for in IPv6-cluster-IPv4-to-IPv6 proxy via argocd
1012
** "kubernetes-secret-generator for in cluster secrets":https://github.com/mittwald/kubernetes-secret-generator
1013
** "ungleich-certbot managing certs and nginx":https://hub.docker.com/repository/docker/ungleich/ungleich-certbot
1014
1015 57 Nico Schottelius
1016
h3. ungleich kubernetes infrastructure v4 (2021-09)
1017
1018 54 Nico Schottelius
* rook is configured via manifests instead of using the rook-ceph-cluster helm chart
1019 1 Nico Schottelius
* The rook operator is still being installed via helm
1020 35 Nico Schottelius
1021 57 Nico Schottelius
h3. ungleich kubernetes infrastructure v3 (2021-07)
1022 1 Nico Schottelius
1023 10 Nico Schottelius
* rook is now installed via helm via argocd instead of directly via manifests
1024 28 Nico Schottelius
1025 57 Nico Schottelius
h3. ungleich kubernetes infrastructure v2 (2021-05)
1026 28 Nico Schottelius
1027
* Replaced fluxv2 from ungleich k8s v1 with argocd
1028 1 Nico Schottelius
** argocd can apply helm templates directly without needing to go through Chart releases
1029 28 Nico Schottelius
* We are also using argoflow for build flows
1030
* Planned to add "kaniko":https://github.com/GoogleContainerTools/kaniko for image building
1031
1032 57 Nico Schottelius
h3. ungleich kubernetes infrastructure v1 (2021-01)
1033 28 Nico Schottelius
1034
We are using the following components:
1035
1036
* "Calico as a CNI":https://www.projectcalico.org/ with BGP, IPv6 only, no encapsulation
1037
** Needed for basic networking
1038
* "kubernetes-secret-generator":https://github.com/mittwald/kubernetes-secret-generator for creating secrets
1039
** Needed so that secrets are not stored in the git repository, but only in the cluster
1040
* "ungleich-certbot":https://hub.docker.com/repository/docker/ungleich/ungleich-certbot
1041
** Needed to get letsencrypt certificates for services
1042
* "rook with ceph rbd + cephfs":https://rook.io/ for storage
1043
** rbd for almost everything, *ReadWriteOnce*
1044
** cephfs for smaller things, multi access *ReadWriteMany*
1045
** Needed for providing persistent storage
1046
* "flux v2":https://fluxcd.io/
1047
** Needed to manage resources automatically