Project

General

Profile

The ungleich kubernetes infrastructure » History » Version 153

Nico Schottelius, 10/15/2022 03:04 PM

1 22 Nico Schottelius
h1. The ungleich kubernetes infrastructure and ungleich kubernetes manual
2 1 Nico Schottelius
3 3 Nico Schottelius
{{toc}}
4
5 1 Nico Schottelius
h2. Status
6
7 28 Nico Schottelius
This document is **pre-production**.
8
This document is to become the ungleich kubernetes infrastructure overview as well as the ungleich kubernetes manual.
9 1 Nico Schottelius
10 10 Nico Schottelius
h2. k8s clusters
11
12 123 Nico Schottelius
| Cluster            | Purpose/Setup     | Maintainer | Master(s)                     | argo                                                   | v4 http proxy | last verified |
13
| c0.k8s.ooo         | Dev               | -          | UNUSED                        |                                                        |               |    2021-10-05 |
14
| c1.k8s.ooo         | retired           |            | -                             |                                                        |               |    2022-03-15 |
15
| c2.k8s.ooo         | Dev p7 HW         | Nico       | server47 server53 server54    | "argo":https://argocd-server.argocd.svc.c2.k8s.ooo     |               |    2021-10-05 |
16
| c3.k8s.ooo         | retired           | -          | -                             |                                                        |               |    2021-10-05 |
17
| c4.k8s.ooo         | Dev2 p7 HW        | Jin-Guk    | server52 server53 server54    |                                                        |               |             - |
18
| c5.k8s.ooo         | retired           |            | -                             |                                                        |               |    2022-03-15 |
19
| c6.k8s.ooo         | Dev p6 VM Jin-Guk | Jin-Guk    |                               |                                                        |               |               |
20
| [[p5.k8s.ooo]]     | production        |            | server34 server36 server38    | "argo":https://argocd-server.argocd.svc.p5.k8s.ooo     | -             |               |
21
| [[p5-cow.k8s.ooo]] | production        | Nico       | server47 server51 server55    | "argo":https://argocd-server.argocd.svc.p5-cow.k8s.ooo |               |    2022-08-27 |
22
| [[p6.k8s.ooo]]     | production        |            | server67 server69 server71    | "argo":https://argocd-server.argocd.svc.p6.k8s.ooo     | 147.78.194.13 |    2021-10-05 |
23
| [[p10.k8s.ooo]]    | production        |            | server63 server65 server83    | "argo":https://argocd-server.argocd.svc.p10.k8s.ooo    | 147.78.194.12 |    2021-10-05 |
24
| [[k8s.ge.nau.so]]  | development       |            | server107 server108 server109 | "argo":https://argocd-server.argocd.svc.k8s.ge.nau.so  |               |               |
25
| [[dev.k8s.ooo]]    | development       |            | server110 server111 server112 | "argo":https://argocd-server.argocd.svc.dev.k8s.ooo    | -             |    2022-07-08 |
26 142 Nico Schottelius
| [[server121.k8s.ooo]] | production | Nico | server121 | | | 2022-09-06 |
27 153 Nico Schottelius
| [[server123.k8s.ooo]] | production | Nico | server123 | | | 2022-10-15 |
28 21 Nico Schottelius
29 1 Nico Schottelius
h2. General architecture and components overview
30
31
* All k8s clusters are IPv6 only
32
* We use BGP peering to propagate podcidr and serviceCidr networks to our infrastructure
33
* The main public testing repository is "ungleich-k8s":https://code.ungleich.ch/ungleich-public/ungleich-k8s
34 18 Nico Schottelius
** Private configurations are found in the **k8s-config** repository
35 1 Nico Schottelius
36
h3. Cluster types
37
38 28 Nico Schottelius
| **Type/Feature**            | **Development**                | **Production**         |
39
| Min No. nodes               | 3 (1 master, 3 worker)         | 5 (3 master, 3 worker) |
40
| Recommended minimum         | 4 (dedicated master, 3 worker) | 8 (3 master, 5 worker) |
41
| Separation of control plane | optional                       | recommended            |
42
| Persistent storage          | required                       | required               |
43
| Number of storage monitors  | 3                              | 5                      |
44 1 Nico Schottelius
45 43 Nico Schottelius
h2. General k8s operations
46 1 Nico Schottelius
47 46 Nico Schottelius
h3. Cheat sheet / external great references
48
49
* "kubectl cheatsheet":https://kubernetes.io/docs/reference/kubectl/cheatsheet/
50
51 117 Nico Schottelius
h3. Allowing to schedule work on the control plane / removing node taints
52 69 Nico Schottelius
53
* Mostly for single node / test / development clusters
54
* Just remove the master taint as follows
55
56
<pre>
57
kubectl taint nodes --all node-role.kubernetes.io/master-
58 118 Nico Schottelius
kubectl taint nodes --all node-role.kubernetes.io/control-plane-
59 69 Nico Schottelius
</pre>
60 1 Nico Schottelius
61 117 Nico Schottelius
You can check the node taints using @kubectl describe node ...@
62 69 Nico Schottelius
63 44 Nico Schottelius
h3. Get the cluster admin.conf
64
65
* On the masters of each cluster you can find the file @/etc/kubernetes/admin.conf@
66
* To be able to administrate the cluster you can copy the admin.conf to your local machine
67
* Multi cluster debugging can very easy if you name the config ~/cX-admin.conf (see example below)
68
69
<pre>
70
% scp root@server47.place7.ungleich.ch:/etc/kubernetes/admin.conf ~/c2-admin.conf
71
% export KUBECONFIG=~/c2-admin.conf    
72
% kubectl get nodes
73
NAME       STATUS                     ROLES                  AGE   VERSION
74
server47   Ready                      control-plane,master   82d   v1.22.0
75
server48   Ready                      control-plane,master   82d   v1.22.0
76
server49   Ready                      <none>                 82d   v1.22.0
77
server50   Ready                      <none>                 82d   v1.22.0
78
server59   Ready                      control-plane,master   82d   v1.22.0
79
server60   Ready,SchedulingDisabled   <none>                 82d   v1.22.0
80
server61   Ready                      <none>                 82d   v1.22.0
81
server62   Ready                      <none>                 82d   v1.22.0               
82
</pre>
83
84 18 Nico Schottelius
h3. Installing a new k8s cluster
85 8 Nico Schottelius
86 9 Nico Schottelius
* Decide on the cluster name (usually *cX.k8s.ooo*), X counting upwards
87 28 Nico Schottelius
** Using pXX.k8s.ooo for production clusters of placeXX
88 9 Nico Schottelius
* Use cdist to configure the nodes with requirements like crio
89
* Decide between single or multi node control plane setups (see below)
90 28 Nico Schottelius
** Single control plane suitable for development clusters
91 9 Nico Schottelius
92 28 Nico Schottelius
Typical init procedure:
93 9 Nico Schottelius
94 28 Nico Schottelius
* Single control plane: @kubeadm init --config bootstrap/XXX/kubeadm.yaml@
95
* Multi control plane (HA): @kubeadm init --config bootstrap/XXX/kubeadm.yaml --upload-certs@
96 10 Nico Schottelius
97 29 Nico Schottelius
h3. Deleting a pod that is hanging in terminating state
98
99
<pre>
100
kubectl delete pod <PODNAME> --grace-period=0 --force --namespace <NAMESPACE>
101
</pre>
102
103
(from https://stackoverflow.com/questions/35453792/pods-stuck-in-terminating-status)
104
105 42 Nico Schottelius
h3. Listing nodes of a cluster
106
107
<pre>
108
[15:05] bridge:~% kubectl get nodes
109
NAME       STATUS   ROLES                  AGE   VERSION
110
server22   Ready    <none>                 52d   v1.22.0
111
server23   Ready    <none>                 52d   v1.22.2
112
server24   Ready    <none>                 52d   v1.22.0
113
server25   Ready    <none>                 52d   v1.22.0
114
server26   Ready    <none>                 52d   v1.22.0
115
server27   Ready    <none>                 52d   v1.22.0
116
server63   Ready    control-plane,master   52d   v1.22.0
117
server64   Ready    <none>                 52d   v1.22.0
118
server65   Ready    control-plane,master   52d   v1.22.0
119
server66   Ready    <none>                 52d   v1.22.0
120
server83   Ready    control-plane,master   52d   v1.22.0
121
server84   Ready    <none>                 52d   v1.22.0
122
server85   Ready    <none>                 52d   v1.22.0
123
server86   Ready    <none>                 52d   v1.22.0
124
</pre>
125
126 41 Nico Schottelius
h3. Removing / draining a node
127
128
Usually @kubectl drain server@ should do the job, but sometimes we need to be more aggressive:
129
130 1 Nico Schottelius
<pre>
131 103 Nico Schottelius
kubectl drain --delete-emptydir-data --ignore-daemonsets serverXX
132 42 Nico Schottelius
</pre>
133
134
h3. Readding a node after draining
135
136
<pre>
137
kubectl uncordon serverXX
138 1 Nico Schottelius
</pre>
139 43 Nico Schottelius
140 50 Nico Schottelius
h3. (Re-)joining worker nodes after creating the cluster
141 49 Nico Schottelius
142
* We need to have an up-to-date token
143
* We use different join commands for the workers and control plane nodes
144
145
Generating the join command on an existing control plane node:
146
147
<pre>
148
kubeadm token create --print-join-command
149
</pre>
150
151 50 Nico Schottelius
h3. (Re-)joining control plane nodes after creating the cluster
152 1 Nico Schottelius
153 50 Nico Schottelius
* We generate the token again
154
* We upload the certificates
155
* We need to combine/create the join command for the control plane node
156
157
Example session:
158
159
<pre>
160
% kubeadm token create --print-join-command
161
kubeadm join p10-api.k8s.ooo:6443 --token xmff4i.ABC --discovery-token-ca-cert-hash sha256:longhash 
162
163
% kubeadm init phase upload-certs --upload-certs
164
[upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
165
[upload-certs] Using certificate key:
166
CERTKEY
167
168
# Then we use these two outputs on the joining node:
169
170
kubeadm join p10-api.k8s.ooo:6443 --token xmff4i.ABC --discovery-token-ca-cert-hash sha256:longhash --control-plane --certificate-key CERTKEY
171
</pre>
172
173
Commands to be used on a control plane node:
174
175
<pre>
176
kubeadm token create --print-join-command
177
kubeadm init phase upload-certs --upload-certs
178
</pre>
179
180
Commands to be used on the joining node:
181
182
<pre>
183
JOINCOMMAND --control-plane --certificate-key CERTKEY
184
</pre>
185 49 Nico Schottelius
186 51 Nico Schottelius
SEE ALSO
187
188
* https://stackoverflow.com/questions/63936268/how-to-generate-kubeadm-token-for-secondary-control-plane-nodes
189
* https://blog.scottlowe.org/2019/08/15/reconstructing-the-join-command-for-kubeadm/
190
191 53 Nico Schottelius
h3. How to fix etcd does not start when rejoining a kubernetes cluster as a control plane
192 52 Nico Schottelius
193
If during the above step etcd does not come up, @kubeadm join@ can hang as follows:
194
195
<pre>
196
[control-plane] Creating static Pod manifest for "kube-apiserver"                                                              
197
[control-plane] Creating static Pod manifest for "kube-controller-manager"                                                     
198
[control-plane] Creating static Pod manifest for "kube-scheduler"                                                              
199
[check-etcd] Checking that the etcd cluster is healthy                                                                         
200
error execution phase check-etcd: etcd cluster is not healthy: failed to dial endpoint https://[2a0a:e5c0:10:1:225:b3ff:fe20:37
201
8a]:2379 with maintenance client: context deadline exceeded                                                                    
202
To see the stack trace of this error execute with --v=5 or higher         
203
</pre>
204
205
Then the problem is likely that the etcd server is still a member of the cluster. We first need to remove it from the etcd cluster and then the join works.
206
207
To fix this we do:
208
209
* Find a working etcd pod
210
* Find the etcd members / member list
211
* Remove the etcd member that we want to re-join the cluster
212
213
214
<pre>
215
# Find the etcd pods
216
kubectl -n kube-system get pods -l component=etcd,tier=control-plane
217
218
# Get the list of etcd servers with the member id 
219
kubectl exec -n kube-system -ti ETCDPODNAME -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert  /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member list
220
221
# Remove the member
222
kubectl exec -n kube-system -ti ETCDPODNAME -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert  /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member remove MEMBERID
223
</pre>
224
225
Sample session:
226
227
<pre>
228
[10:48] line:~% kubectl -n kube-system get pods -l component=etcd,tier=control-plane
229
NAME            READY   STATUS    RESTARTS     AGE
230
etcd-server63   1/1     Running   0            3m11s
231
etcd-server65   1/1     Running   3            7d2h
232
etcd-server83   1/1     Running   8 (6d ago)   7d2h
233
[10:48] line:~% kubectl exec -n kube-system -ti etcd-server65 -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert  /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member list
234
356891cd676df6e4, started, server65, https://[2a0a:e5c0:10:1:225:b3ff:fe20:375c]:2380, https://[2a0a:e5c0:10:1:225:b3ff:fe20:375c]:2379, false
235
371b8a07185dee7e, started, server63, https://[2a0a:e5c0:10:1:225:b3ff:fe20:378a]:2380, https://[2a0a:e5c0:10:1:225:b3ff:fe20:378a]:2379, false
236
5942bc58307f8af9, started, server83, https://[2a0a:e5c0:10:1:3e4a:92ff:fe79:bb98]:2380, https://[2a0a:e5c0:10:1:3e4a:92ff:fe79:bb98]:2379, false
237
238
[10:48] line:~% kubectl exec -n kube-system -ti etcd-server65 -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert  /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member remove 371b8a07185dee7e
239
Member 371b8a07185dee7e removed from cluster e3c0805f592a8f77
240 1 Nico Schottelius
241
</pre>
242
243
SEE ALSO
244
245
* We found the solution using https://stackoverflow.com/questions/67921552/re-installed-node-cannot-join-kubernetes-cluster
246 56 Nico Schottelius
247 147 Nico Schottelius
h3. Node labels (adding, showing, removing)
248
249
Listing the labels:
250
251
<pre>
252
kubectl get nodes --show-labels
253
</pre>
254
255
Adding labels:
256
257
<pre>
258
kubectl label nodes LIST-OF-NODES label1=value1 
259
260
</pre>
261
262
For instance:
263
264
<pre>
265
kubectl label nodes router2 router3 hosttype=router 
266
</pre>
267
268
Selecting nodes in pods:
269
270
<pre>
271
apiVersion: v1
272
kind: Pod
273
...
274
spec:
275
  nodeSelector:
276
    hosttype: router
277
</pre>
278
279 148 Nico Schottelius
Removing labels by adding a minus at the end of the label name:
280
281
<pre>
282
kubectl label node <nodename> <labelname>-
283
</pre>
284
285
For instance:
286
287
<pre>
288
kubectl label nodes router2 router3 hosttype- 
289
</pre>
290
291 147 Nico Schottelius
SEE ALSO
292 1 Nico Schottelius
293 148 Nico Schottelius
* https://kubernetes.io/docs/tasks/configure-pod-container/assign-pods-nodes/
294
* https://stackoverflow.com/questions/34067979/how-to-delete-a-node-label-by-command-and-api
295 147 Nico Schottelius
296 101 Nico Schottelius
h3. Hardware Maintenance using ungleich-hardware
297
298
Use the following manifest and replace the HOST with the actual host:
299
300
<pre>
301
apiVersion: v1
302
kind: Pod
303
metadata:
304
  name: ungleich-hardware-HOST
305
spec:
306
  containers:
307
  - name: ungleich-hardware
308
    image: ungleich/ungleich-hardware:0.0.5
309
    args:
310
    - sleep
311
    - "1000000"
312
    volumeMounts:
313
      - mountPath: /dev
314
        name: dev
315
    securityContext:
316
      privileged: true
317
  nodeSelector:
318
    kubernetes.io/hostname: "HOST"
319
320
  volumes:
321
    - name: dev
322
      hostPath:
323
        path: /dev
324
</pre>
325
326 102 Nico Schottelius
Also see: [[The_ungleich_hardware_maintenance_guide]]
327
328 105 Nico Schottelius
h3. Triggering a cronjob / creating a job from a cronjob
329 104 Nico Schottelius
330
To test a cronjob, we can create a job from a cronjob:
331
332
<pre>
333
kubectl create job --from=cronjob/volume2-daily-backup volume2-manual
334
</pre>
335
336
This creates a job volume2-manual based on the cronjob  volume2-daily
337
338 112 Nico Schottelius
h3. su-ing into a user that has nologin shell set
339
340
Many times users are having nologin as their shell inside the container. To be able to execute maintenance commands within the
341
container, we can use @su -s /bin/sh@ like this:
342
343
<pre>
344
su -s /bin/sh -c '/path/to/your/script' testuser
345
</pre>
346
347
Found on https://serverfault.com/questions/351046/how-to-run-command-as-user-who-has-usr-sbin-nologin-as-shell
348
349 113 Nico Schottelius
h3. How to print a secret value
350
351
Assuming you want the "password" item from a secret, use:
352
353
<pre>
354
kubectl get secret SECRETNAME -o jsonpath="{.data.password}" | base64 -d; echo "" 
355
</pre>
356
357 62 Nico Schottelius
h2. Calico CNI
358
359
h3. Calico Installation
360
361
* We install "calico using helm":https://docs.projectcalico.org/getting-started/kubernetes/helm
362
* This has the following advantages:
363
** Easy to upgrade
364
** Does not require os to configure IPv6/dual stack settings as the tigera operator figures out things on its own
365
366
Usually plain calico can be installed directly using:
367
368
<pre>
369 149 Nico Schottelius
VERSION=v3.24.1
370
371 120 Nico Schottelius
helm repo add projectcalico https://docs.projectcalico.org/charts
372 124 Nico Schottelius
helm upgrade --install --namespace tigera calico projectcalico/tigera-operator --version $VERSION --create-namespace
373 1 Nico Schottelius
</pre>
374 92 Nico Schottelius
375
* Check the tags on https://github.com/projectcalico/calico/tags for the latest release
376 62 Nico Schottelius
377
h3. Installing calicoctl
378
379 115 Nico Schottelius
* General installation instructions, including binary download: https://projectcalico.docs.tigera.io/maintenance/clis/calicoctl/install
380
381 62 Nico Schottelius
To be able to manage and configure calico, we need to 
382
"install calicoctl (we choose the version as a pod)":https://docs.projectcalico.org/getting-started/clis/calicoctl/install#install-calicoctl-as-a-kubernetes-pod
383
384
<pre>
385
kubectl apply -f https://docs.projectcalico.org/manifests/calicoctl.yaml
386
</pre>
387
388 93 Nico Schottelius
Or version specific:
389
390
<pre>
391
kubectl apply -f https://github.com/projectcalico/calico/blob/v3.20.4/manifests/calicoctl.yaml
392 97 Nico Schottelius
393
# For 3.22
394
kubectl apply -f https://projectcalico.docs.tigera.io/archive/v3.22/manifests/calicoctl.yaml
395 93 Nico Schottelius
</pre>
396
397 70 Nico Schottelius
And making it easier accessible by alias:
398
399
<pre>
400
alias calicoctl="kubectl exec -i -n kube-system calicoctl -- /calicoctl"
401
</pre>
402
403 62 Nico Schottelius
h3. Calico configuration
404
405 63 Nico Schottelius
By default our k8s clusters "BGP peer":https://docs.projectcalico.org/networking/bgp
406
with an upstream router to propagate podcidr and servicecidr.
407 62 Nico Schottelius
408
Default settings in our infrastructure:
409
410
* We use a full-mesh using the @nodeToNodeMeshEnabled: true@ option
411
* We keep the original next hop so that *only* the server with the pod is announcing it (instead of ecmp)
412 1 Nico Schottelius
* We use private ASNs for k8s clusters
413 63 Nico Schottelius
* We do *not* use any overlay
414 62 Nico Schottelius
415
After installing calico and calicoctl the last step of the installation is usually:
416
417 1 Nico Schottelius
<pre>
418 79 Nico Schottelius
calicoctl create -f - < calico-bgp.yaml
419 62 Nico Schottelius
</pre>
420
421
422
A sample BGP configuration:
423
424
<pre>
425
---
426
apiVersion: projectcalico.org/v3
427
kind: BGPConfiguration
428
metadata:
429
  name: default
430
spec:
431
  logSeverityScreen: Info
432
  nodeToNodeMeshEnabled: true
433
  asNumber: 65534
434
  serviceClusterIPs:
435
  - cidr: 2a0a:e5c0:10:3::/108
436
  serviceExternalIPs:
437
  - cidr: 2a0a:e5c0:10:3::/108
438
---
439
apiVersion: projectcalico.org/v3
440
kind: BGPPeer
441
metadata:
442
  name: router1-place10
443
spec:
444
  peerIP: 2a0a:e5c0:10:1::50
445
  asNumber: 213081
446
  keepOriginalNextHop: true
447
</pre>
448
449 126 Nico Schottelius
h2. Cilium CNI (experimental)
450
451 137 Nico Schottelius
h3. Status
452
453 138 Nico Schottelius
*NO WORKING CILIUM CONFIGURATION FOR IPV6 only modes*
454 137 Nico Schottelius
455 146 Nico Schottelius
h3. Latest error
456
457
It seems cilium does not run on IPv6 only hosts:
458
459
<pre>
460
level=info msg="Validating configured node address ranges" subsys=daemon
461
level=fatal msg="postinit failed" error="external IPv4 node address could not be derived, please configure via --ipv4-node" subsys=daemon
462
level=info msg="Starting IP identity watcher" subsys=ipcache
463
</pre>
464
465
It crashes after that log entry
466
467 128 Nico Schottelius
h3. BGP configuration
468
469
* The cilium-operator will not start without a correct configmap being present beforehand (see error message below)
470
* Creating the bgp config beforehand as a configmap is thus required.
471
472
The error one gets without the configmap present:
473
474
Pods are hanging with:
475
476
<pre>
477
cilium-bpqm6                       0/1     Init:0/4            0             9s
478
cilium-operator-5947d94f7f-5bmh2   0/1     ContainerCreating   0             9s
479
</pre>
480
481
The error message in the cilium-*perator is:
482
483
<pre>
484
Events:
485
  Type     Reason       Age                From               Message
486
  ----     ------       ----               ----               -------
487
  Normal   Scheduled    80s                default-scheduler  Successfully assigned kube-system/cilium-operator-5947d94f7f-lqcsp to server56
488
  Warning  FailedMount  16s (x8 over 80s)  kubelet            MountVolume.SetUp failed for volume "bgp-config-path" : configmap "bgp-config" not found
489
</pre>
490
491
A correct bgp config looks like this:
492
493
<pre>
494
apiVersion: v1
495
kind: ConfigMap
496
metadata:
497
  name: bgp-config
498
  namespace: kube-system
499
data:
500
  config.yaml: |
501
    peers:
502
      - peer-address: 2a0a:e5c0::46
503
        peer-asn: 209898
504
        my-asn: 65533
505
      - peer-address: 2a0a:e5c0::47
506
        peer-asn: 209898
507
        my-asn: 65533
508
    address-pools:
509
      - name: default
510
        protocol: bgp
511
        addresses:
512
          - 2a0a:e5c0:0:14::/64
513
</pre>
514 127 Nico Schottelius
515
h3. Installation
516 130 Nico Schottelius
517 127 Nico Schottelius
Adding the repo
518 1 Nico Schottelius
<pre>
519 127 Nico Schottelius
520 129 Nico Schottelius
helm repo add cilium https://helm.cilium.io/
521 130 Nico Schottelius
helm repo update
522
</pre>
523 129 Nico Schottelius
524 135 Nico Schottelius
Installing + configuring cilium
525 129 Nico Schottelius
<pre>
526 130 Nico Schottelius
ipv6pool=2a0a:e5c0:0:14::/112
527 1 Nico Schottelius
528 146 Nico Schottelius
version=1.12.2
529 129 Nico Schottelius
530
helm upgrade --install cilium cilium/cilium --version $version \
531 1 Nico Schottelius
  --namespace kube-system \
532
  --set ipv4.enabled=false \
533
  --set ipv6.enabled=true \
534 146 Nico Schottelius
  --set enableIPv6Masquerade=false \
535
  --set bgpControlPlane.enabled=true 
536 1 Nico Schottelius
537 146 Nico Schottelius
#  --set ipam.operator.clusterPoolIPv6PodCIDRList=$ipv6pool
538
539
# Old style bgp?
540 136 Nico Schottelius
#   --set bgp.enabled=true --set bgp.announce.podCIDR=true \
541 127 Nico Schottelius
542
# Show possible configuration options
543
helm show values cilium/cilium
544
545 1 Nico Schottelius
</pre>
546 132 Nico Schottelius
547
Using a /64 for ipam.operator.clusterPoolIPv6PodCIDRList fails with:
548
549
<pre>
550
level=fatal msg="Unable to init cluster-pool allocator" error="unable to initialize IPv6 allocator New CIDR set failed; the node CIDR size is too big" subsys=cilium-operator-generic
551
</pre>
552
553 126 Nico Schottelius
554 1 Nico Schottelius
See also https://github.com/cilium/cilium/issues/20756
555 135 Nico Schottelius
556
Seems a /112 is actually working.
557
558
h3. Kernel modules
559
560
Cilium requires the following modules to be loaded on the host (not loaded by default):
561
562
<pre>
563 1 Nico Schottelius
modprobe  ip6table_raw
564
modprobe  ip6table_filter
565
</pre>
566 146 Nico Schottelius
567
h3. Interesting helm flags
568
569
* autoDirectNodeRoutes
570
* bgpControlPlane.enabled = true
571
572
h3. SEE ALSO
573
574
* https://docs.cilium.io/en/v1.12/helm-reference/
575 133 Nico Schottelius
576 150 Nico Schottelius
h2. Multus (incomplete/experimental)
577
578
(TBD)
579
580 122 Nico Schottelius
h2. ArgoCD 
581 56 Nico Schottelius
582 60 Nico Schottelius
h3. Argocd Installation
583 1 Nico Schottelius
584 116 Nico Schottelius
* See https://argo-cd.readthedocs.io/en/stable/
585
586 60 Nico Schottelius
As there is no configuration management present yet, argocd is installed using
587
588 1 Nico Schottelius
<pre>
589 60 Nico Schottelius
kubectl create namespace argocd
590 86 Nico Schottelius
591 96 Nico Schottelius
# Specific Version
592
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/v2.3.2/manifests/install.yaml
593 86 Nico Schottelius
594
# OR: latest stable
595 60 Nico Schottelius
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml
596 56 Nico Schottelius
</pre>
597 1 Nico Schottelius
598 116 Nico Schottelius
599 1 Nico Schottelius
600 60 Nico Schottelius
h3. Get the argocd credentials
601
602
<pre>
603
kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d; echo ""
604
</pre>
605 52 Nico Schottelius
606 87 Nico Schottelius
h3. Accessing argocd
607
608
In regular IPv6 clusters:
609
610
* Navigate to https://argocd-server.argocd.CLUSTERDOMAIN
611
612
In legacy IPv4 clusters
613
614
<pre>
615
kubectl --namespace argocd port-forward svc/argocd-server 8080:80
616
</pre>
617
618 88 Nico Schottelius
* Navigate to https://localhost:8080
619
620 68 Nico Schottelius
h3. Using the argocd webhook to trigger changes
621 67 Nico Schottelius
622
* To trigger changes post json https://argocd.example.com/api/webhook
623
624 72 Nico Schottelius
h3. Deploying an application
625
626
* Applications are deployed via git towards gitea (code.ungleich.ch) and then pulled by argo
627 73 Nico Schottelius
* Always include the *redmine-url* pointing to the (customer) ticket
628
** Also add the support-url if it exists
629 72 Nico Schottelius
630
Application sample
631
632
<pre>
633
apiVersion: argoproj.io/v1alpha1
634
kind: Application
635
metadata:
636
  name: gitea-CUSTOMER
637
  namespace: argocd
638
spec:
639
  destination:
640
    namespace: default
641
    server: 'https://kubernetes.default.svc'
642
  source:
643
    path: apps/prod/gitea
644
    repoURL: 'https://code.ungleich.ch/ungleich-intern/k8s-config.git'
645
    targetRevision: HEAD
646
    helm:
647
      parameters:
648
        - name: storage.data.storageClass
649
          value: rook-ceph-block-hdd
650
        - name: storage.data.size
651
          value: 200Gi
652
        - name: storage.db.storageClass
653
          value: rook-ceph-block-ssd
654
        - name: storage.db.size
655
          value: 10Gi
656
        - name: storage.letsencrypt.storageClass
657
          value: rook-ceph-block-hdd
658
        - name: storage.letsencrypt.size
659
          value: 50Mi
660
        - name: letsencryptStaging
661
          value: 'no'
662
        - name: fqdn
663
          value: 'code.verua.online'
664
  project: default
665
  syncPolicy:
666
    automated:
667
      prune: true
668
      selfHeal: true
669
  info:
670
    - name: 'redmine-url'
671
      value: 'https://redmine.ungleich.ch/issues/ISSUEID'
672
    - name: 'support-url'
673
      value: 'https://support.ungleich.ch/Ticket/Display.html?id=TICKETID'
674
</pre>
675
676 80 Nico Schottelius
h2. Helm related operations and conventions
677 55 Nico Schottelius
678 61 Nico Schottelius
We use helm charts extensively.
679
680
* In production, they are managed via argocd
681
* In development, helm chart can de developed and deployed manually using the helm utility.
682
683 55 Nico Schottelius
h3. Installing a helm chart
684
685
One can use the usual pattern of
686
687
<pre>
688
helm install <releasename> <chartdirectory>
689
</pre>
690
691
However often you want to reinstall/update when testing helm charts. The following pattern is "better", because it allows you to reinstall, if it is already installed:
692
693
<pre>
694
helm upgrade --install <releasename> <chartdirectory>
695 1 Nico Schottelius
</pre>
696 80 Nico Schottelius
697
h3. Naming services and deployments in helm charts [Application labels]
698
699
* We always have {{ .Release.Name }} to identify the current "instance"
700
* Deployments:
701
** use @app: <what it is>@, f.i. @app: nginx@, @app: postgres@, ...
702 81 Nico Schottelius
* See more about standard labels on
703
** https://kubernetes.io/docs/concepts/overview/working-with-objects/common-labels/
704
** https://helm.sh/docs/chart_best_practices/labels/
705 55 Nico Schottelius
706 151 Nico Schottelius
h3. Show all versions of a helm chart
707
708
<pre>
709
helm search repo -l repo/chart
710
</pre>
711
712
For example:
713
714
<pre>
715
% helm search repo -l projectcalico/tigera-operator 
716
NAME                         	CHART VERSION	APP VERSION	DESCRIPTION                            
717
projectcalico/tigera-operator	v3.23.3      	v3.23.3    	Installs the Tigera operator for Calico
718
projectcalico/tigera-operator	v3.23.2      	v3.23.2    	Installs the Tigera operator for Calico
719
....
720
</pre>
721
722 152 Nico Schottelius
h3. Show possible values of a chart
723
724
<pre>
725
helm show values <repo/chart>
726
</pre>
727
728
Example:
729
730
<pre>
731
helm show values ingress-nginx/ingress-nginx
732
</pre>
733
734
735 139 Nico Schottelius
h2. Rook + Ceph
736
737
h3. Installation
738
739
* Usually directly via argocd
740
741
Manual steps:
742
743
<pre>
744
745
</pre>
746 43 Nico Schottelius
747 71 Nico Schottelius
h3. Executing ceph commands
748
749
Using the ceph-tools pod as follows:
750
751
<pre>
752
kubectl exec -n rook-ceph -ti $(kubectl -n rook-ceph get pods -l app=rook-ceph-tools -o jsonpath='{.items[*].metadata.name}') -- ceph -s
753
</pre>
754
755 43 Nico Schottelius
h3. Inspecting the logs of a specific server
756
757
<pre>
758
# Get the related pods
759
kubectl -n rook-ceph get pods -l app=rook-ceph-osd-prepare 
760
...
761
762
# Inspect the logs of a specific pod
763
kubectl -n rook-ceph logs -f rook-ceph-osd-prepare-server23--1-444qx
764
765 71 Nico Schottelius
</pre>
766
767
h3. Inspecting the logs of the rook-ceph-operator
768
769
<pre>
770
kubectl -n rook-ceph logs -f -l app=rook-ceph-operator
771 43 Nico Schottelius
</pre>
772
773 121 Nico Schottelius
h3. Restarting the rook operator
774
775
<pre>
776
kubectl -n rook-ceph delete pods  -l app=rook-ceph-operator
777
</pre>
778
779 43 Nico Schottelius
h3. Triggering server prepare / adding new osds
780
781
The rook-ceph-operator triggers/watches/creates pods to maintain hosts. To trigger a full "re scan", simply delete that pod:
782
783
<pre>
784
kubectl -n rook-ceph delete pods -l app=rook-ceph-operator
785
</pre>
786
787
This will cause all the @rook-ceph-osd-prepare-..@ jobs to be recreated and thus OSDs to be created, if new disks have been added.
788
789
h3. Removing an OSD
790
791
* See "Ceph OSD Management":https://rook.io/docs/rook/v1.7/ceph-osd-mgmt.html
792 77 Nico Schottelius
* More specifically: https://github.com/rook/rook/blob/release-1.7/cluster/examples/kubernetes/ceph/osd-purge.yaml
793 99 Nico Schottelius
* Then delete the related deployment
794 41 Nico Schottelius
795 98 Nico Schottelius
Set osd id in the osd-purge.yaml and apply it. OSD should be down before.
796
797
<pre>
798
apiVersion: batch/v1
799
kind: Job
800
metadata:
801
  name: rook-ceph-purge-osd
802
  namespace: rook-ceph # namespace:cluster
803
  labels:
804
    app: rook-ceph-purge-osd
805
spec:
806
  template:
807
    metadata:
808
      labels:
809
        app: rook-ceph-purge-osd
810
    spec:
811
      serviceAccountName: rook-ceph-purge-osd
812
      containers:
813
        - name: osd-removal
814
          image: rook/ceph:master
815
          # TODO: Insert the OSD ID in the last parameter that is to be removed
816
          # The OSD IDs are a comma-separated list. For example: "0" or "0,2".
817
          # If you want to preserve the OSD PVCs, set `--preserve-pvc true`.
818
          #
819
          # A --force-osd-removal option is available if the OSD should be destroyed even though the
820
          # removal could lead to data loss.
821
          args:
822
            - "ceph"
823
            - "osd"
824
            - "remove"
825
            - "--preserve-pvc"
826
            - "false"
827
            - "--force-osd-removal"
828
            - "false"
829
            - "--osd-ids"
830
            - "SETTHEOSDIDHERE"
831
          env:
832
            - name: POD_NAMESPACE
833
              valueFrom:
834
                fieldRef:
835
                  fieldPath: metadata.namespace
836
            - name: ROOK_MON_ENDPOINTS
837
              valueFrom:
838
                configMapKeyRef:
839
                  key: data
840
                  name: rook-ceph-mon-endpoints
841
            - name: ROOK_CEPH_USERNAME
842
              valueFrom:
843
                secretKeyRef:
844
                  key: ceph-username
845
                  name: rook-ceph-mon
846
            - name: ROOK_CEPH_SECRET
847
              valueFrom:
848
                secretKeyRef:
849
                  key: ceph-secret
850
                  name: rook-ceph-mon
851
            - name: ROOK_CONFIG_DIR
852
              value: /var/lib/rook
853
            - name: ROOK_CEPH_CONFIG_OVERRIDE
854
              value: /etc/rook/config/override.conf
855
            - name: ROOK_FSID
856
              valueFrom:
857
                secretKeyRef:
858
                  key: fsid
859
                  name: rook-ceph-mon
860
            - name: ROOK_LOG_LEVEL
861
              value: DEBUG
862
          volumeMounts:
863
            - mountPath: /etc/ceph
864
              name: ceph-conf-emptydir
865
            - mountPath: /var/lib/rook
866
              name: rook-config
867
      volumes:
868
        - emptyDir: {}
869
          name: ceph-conf-emptydir
870
        - emptyDir: {}
871
          name: rook-config
872
      restartPolicy: Never
873
874
875 99 Nico Schottelius
</pre>
876
877
Deleting the deployment:
878
879
<pre>
880
[18:05] bridge:~% kubectl -n rook-ceph delete deployment rook-ceph-osd-6
881
deployment.apps "rook-ceph-osd-6" deleted
882 98 Nico Schottelius
</pre>
883
884 145 Nico Schottelius
h2. Ingress + Cert Manager
885
886
* We deploy "nginx-ingress":https://docs.nginx.com/nginx-ingress-controller/ to get an ingress
887
* we deploy "cert-manager":https://cert-manager.io/ to handle certificates
888
* We independently deploy @ClusterIssuer@ to allow the cert-manager app to deploy and the issuer to be created once the CRDs from cert manager are in place
889
890
h3. IPv4 reachability 
891
892
The ingress is by default IPv6 only. To make it reachable from the IPv4 world, get its IPv6 address and configure a NAT64 mapping in Jool.
893
894
Steps:
895
896
h4. Get the ingress IPv6 address
897
898
Use @kubectl -n ingress-nginx get svc ingress-nginx-controller -o jsonpath='{.spec.clusterIP}'; echo ''@
899
900
Example:
901
902
<pre>
903
kubectl -n ingress-nginx get svc ingress-nginx-controller -o jsonpath='{.spec.clusterIP}'; echo ''
904
2a0a:e5c0:10:1b::ce11
905
</pre>
906
907
h4. Add NAT64 mapping
908
909
* Update the __dcl_jool_siit cdist type
910
* Record the two IPs (IPv6 and IPv4)
911
* Configure all routers
912
913
914
h4. Add DNS record
915
916
To use the ingress capable as a CNAME destination, create an "ingress" DNS record, such as:
917
918
<pre>
919
; k8s ingress for dev
920
dev-ingress                 AAAA 2a0a:e5c0:10:1b::ce11
921
dev-ingress                 A 147.78.194.23
922
923
</pre> 
924
925
h4. Add supporting wildcard DNS
926
927
If you plan to add various sites under a specific domain, we can add a wildcard DNS entry, such as *.k8s-dev.django-hosting.ch:
928
929
<pre>
930
*.k8s-dev         CNAME dev-ingress.ungleich.ch.
931
</pre>
932
933 76 Nico Schottelius
h2. Harbor
934
935
* We user "Harbor":https://goharbor.io/ for caching and as an image registry. Internal app reference: apps/prod/harbor.
936
* The admin password is in the password store, auto generated per cluster
937
* At the moment harbor only authenticates against the internal ldap tree
938
939
h3. LDAP configuration
940
941
* The url needs to be ldaps://...
942
* uid = uid
943
* rest standard
944 75 Nico Schottelius
945 89 Nico Schottelius
h2. Monitoring / Prometheus
946
947 90 Nico Schottelius
* Via "kube-prometheus":https://github.com/prometheus-operator/kube-prometheus/
948 89 Nico Schottelius
949 91 Nico Schottelius
Access via ...
950
951
* http://prometheus-k8s.monitoring.svc:9090
952
* http://grafana.monitoring.svc:3000
953
* http://alertmanager.monitoring.svc:9093
954
955
956 100 Nico Schottelius
h3. Prometheus Options
957
958
* "helm/kube-prometheus-stack":https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack
959
** Includes dashboards and co.
960
* "manifest based kube-prometheus":https://github.com/prometheus-operator/kube-prometheus
961
** Includes dashboards and co.
962
* "Prometheus Operator (mainly CRD manifest":https://github.com/prometheus-operator/prometheus-operator
963
964 91 Nico Schottelius
965 82 Nico Schottelius
h2. Nextcloud
966
967 85 Nico Schottelius
h3. How to get the nextcloud credentials 
968 84 Nico Schottelius
969
* The initial username is set to "nextcloud"
970
* The password is autogenerated and saved in a kubernetes secret
971
972
<pre>
973 85 Nico Schottelius
kubectl get secret RELEASENAME-nextcloud -o jsonpath="{.data.PASSWORD}" | base64 -d; echo "" 
974 84 Nico Schottelius
</pre>
975
976 83 Nico Schottelius
h3. How to fix "Access through untrusted domain"
977
978 82 Nico Schottelius
* Nextcloud stores the initial domain configuration
979 1 Nico Schottelius
* If the FQDN is changed, it will show the error message "Access through untrusted domain"
980 82 Nico Schottelius
* To fix, edit /var/www/html/config/config.php and correct the domain
981 83 Nico Schottelius
* Then delete the pods
982 82 Nico Schottelius
983 1 Nico Schottelius
h2. Infrastructure versions
984 35 Nico Schottelius
985 57 Nico Schottelius
h3. ungleich kubernetes infrastructure v5 (2021-10)
986 1 Nico Schottelius
987 57 Nico Schottelius
Clusters are configured / setup in this order:
988
989
* Bootstrap via kubeadm
990 59 Nico Schottelius
* "Networking via calico + BGP (non ECMP) using helm":https://docs.projectcalico.org/getting-started/kubernetes/helm
991
* "ArgoCD for CD":https://argo-cd.readthedocs.io/en/stable/
992
** "rook for storage via argocd":https://rook.io/
993 58 Nico Schottelius
** haproxy for in IPv6-cluster-IPv4-to-IPv6 proxy via argocd
994
** "kubernetes-secret-generator for in cluster secrets":https://github.com/mittwald/kubernetes-secret-generator
995
** "ungleich-certbot managing certs and nginx":https://hub.docker.com/repository/docker/ungleich/ungleich-certbot
996
997 57 Nico Schottelius
998
h3. ungleich kubernetes infrastructure v4 (2021-09)
999
1000 54 Nico Schottelius
* rook is configured via manifests instead of using the rook-ceph-cluster helm chart
1001 1 Nico Schottelius
* The rook operator is still being installed via helm
1002 35 Nico Schottelius
1003 57 Nico Schottelius
h3. ungleich kubernetes infrastructure v3 (2021-07)
1004 1 Nico Schottelius
1005 10 Nico Schottelius
* rook is now installed via helm via argocd instead of directly via manifests
1006 28 Nico Schottelius
1007 57 Nico Schottelius
h3. ungleich kubernetes infrastructure v2 (2021-05)
1008 28 Nico Schottelius
1009
* Replaced fluxv2 from ungleich k8s v1 with argocd
1010 1 Nico Schottelius
** argocd can apply helm templates directly without needing to go through Chart releases
1011 28 Nico Schottelius
* We are also using argoflow for build flows
1012
* Planned to add "kaniko":https://github.com/GoogleContainerTools/kaniko for image building
1013
1014 57 Nico Schottelius
h3. ungleich kubernetes infrastructure v1 (2021-01)
1015 28 Nico Schottelius
1016
We are using the following components:
1017
1018
* "Calico as a CNI":https://www.projectcalico.org/ with BGP, IPv6 only, no encapsulation
1019
** Needed for basic networking
1020
* "kubernetes-secret-generator":https://github.com/mittwald/kubernetes-secret-generator for creating secrets
1021
** Needed so that secrets are not stored in the git repository, but only in the cluster
1022
* "ungleich-certbot":https://hub.docker.com/repository/docker/ungleich/ungleich-certbot
1023
** Needed to get letsencrypt certificates for services
1024
* "rook with ceph rbd + cephfs":https://rook.io/ for storage
1025
** rbd for almost everything, *ReadWriteOnce*
1026
** cephfs for smaller things, multi access *ReadWriteMany*
1027
** Needed for providing persistent storage
1028
* "flux v2":https://fluxcd.io/
1029
** Needed to manage resources automatically