Project

General

Profile

The ungleich kubernetes infrastructure » History » Version 152

Nico Schottelius, 10/08/2022 02:33 PM

1 22 Nico Schottelius
h1. The ungleich kubernetes infrastructure and ungleich kubernetes manual
2 1 Nico Schottelius
3 3 Nico Schottelius
{{toc}}
4
5 1 Nico Schottelius
h2. Status
6
7 28 Nico Schottelius
This document is **pre-production**.
8
This document is to become the ungleich kubernetes infrastructure overview as well as the ungleich kubernetes manual.
9 1 Nico Schottelius
10 10 Nico Schottelius
h2. k8s clusters
11
12 123 Nico Schottelius
| Cluster            | Purpose/Setup     | Maintainer | Master(s)                     | argo                                                   | v4 http proxy | last verified |
13
| c0.k8s.ooo         | Dev               | -          | UNUSED                        |                                                        |               |    2021-10-05 |
14
| c1.k8s.ooo         | retired           |            | -                             |                                                        |               |    2022-03-15 |
15
| c2.k8s.ooo         | Dev p7 HW         | Nico       | server47 server53 server54    | "argo":https://argocd-server.argocd.svc.c2.k8s.ooo     |               |    2021-10-05 |
16
| c3.k8s.ooo         | retired           | -          | -                             |                                                        |               |    2021-10-05 |
17
| c4.k8s.ooo         | Dev2 p7 HW        | Jin-Guk    | server52 server53 server54    |                                                        |               |             - |
18
| c5.k8s.ooo         | retired           |            | -                             |                                                        |               |    2022-03-15 |
19
| c6.k8s.ooo         | Dev p6 VM Jin-Guk | Jin-Guk    |                               |                                                        |               |               |
20
| [[p5.k8s.ooo]]     | production        |            | server34 server36 server38    | "argo":https://argocd-server.argocd.svc.p5.k8s.ooo     | -             |               |
21
| [[p5-cow.k8s.ooo]] | production        | Nico       | server47 server51 server55    | "argo":https://argocd-server.argocd.svc.p5-cow.k8s.ooo |               |    2022-08-27 |
22
| [[p6.k8s.ooo]]     | production        |            | server67 server69 server71    | "argo":https://argocd-server.argocd.svc.p6.k8s.ooo     | 147.78.194.13 |    2021-10-05 |
23
| [[p10.k8s.ooo]]    | production        |            | server63 server65 server83    | "argo":https://argocd-server.argocd.svc.p10.k8s.ooo    | 147.78.194.12 |    2021-10-05 |
24
| [[k8s.ge.nau.so]]  | development       |            | server107 server108 server109 | "argo":https://argocd-server.argocd.svc.k8s.ge.nau.so  |               |               |
25
| [[dev.k8s.ooo]]    | development       |            | server110 server111 server112 | "argo":https://argocd-server.argocd.svc.dev.k8s.ooo    | -             |    2022-07-08 |
26 142 Nico Schottelius
| [[server121.k8s.ooo]] | production | Nico | server121 | | | 2022-09-06 |
27 21 Nico Schottelius
28 1 Nico Schottelius
h2. General architecture and components overview
29
30
* All k8s clusters are IPv6 only
31
* We use BGP peering to propagate podcidr and serviceCidr networks to our infrastructure
32
* The main public testing repository is "ungleich-k8s":https://code.ungleich.ch/ungleich-public/ungleich-k8s
33 18 Nico Schottelius
** Private configurations are found in the **k8s-config** repository
34 1 Nico Schottelius
35
h3. Cluster types
36
37 28 Nico Schottelius
| **Type/Feature**            | **Development**                | **Production**         |
38
| Min No. nodes               | 3 (1 master, 3 worker)         | 5 (3 master, 3 worker) |
39
| Recommended minimum         | 4 (dedicated master, 3 worker) | 8 (3 master, 5 worker) |
40
| Separation of control plane | optional                       | recommended            |
41
| Persistent storage          | required                       | required               |
42
| Number of storage monitors  | 3                              | 5                      |
43 1 Nico Schottelius
44 43 Nico Schottelius
h2. General k8s operations
45 1 Nico Schottelius
46 46 Nico Schottelius
h3. Cheat sheet / external great references
47
48
* "kubectl cheatsheet":https://kubernetes.io/docs/reference/kubectl/cheatsheet/
49
50 117 Nico Schottelius
h3. Allowing to schedule work on the control plane / removing node taints
51 69 Nico Schottelius
52
* Mostly for single node / test / development clusters
53
* Just remove the master taint as follows
54
55
<pre>
56
kubectl taint nodes --all node-role.kubernetes.io/master-
57 118 Nico Schottelius
kubectl taint nodes --all node-role.kubernetes.io/control-plane-
58 69 Nico Schottelius
</pre>
59 1 Nico Schottelius
60 117 Nico Schottelius
You can check the node taints using @kubectl describe node ...@
61 69 Nico Schottelius
62 44 Nico Schottelius
h3. Get the cluster admin.conf
63
64
* On the masters of each cluster you can find the file @/etc/kubernetes/admin.conf@
65
* To be able to administrate the cluster you can copy the admin.conf to your local machine
66
* Multi cluster debugging can very easy if you name the config ~/cX-admin.conf (see example below)
67
68
<pre>
69
% scp root@server47.place7.ungleich.ch:/etc/kubernetes/admin.conf ~/c2-admin.conf
70
% export KUBECONFIG=~/c2-admin.conf    
71
% kubectl get nodes
72
NAME       STATUS                     ROLES                  AGE   VERSION
73
server47   Ready                      control-plane,master   82d   v1.22.0
74
server48   Ready                      control-plane,master   82d   v1.22.0
75
server49   Ready                      <none>                 82d   v1.22.0
76
server50   Ready                      <none>                 82d   v1.22.0
77
server59   Ready                      control-plane,master   82d   v1.22.0
78
server60   Ready,SchedulingDisabled   <none>                 82d   v1.22.0
79
server61   Ready                      <none>                 82d   v1.22.0
80
server62   Ready                      <none>                 82d   v1.22.0               
81
</pre>
82
83 18 Nico Schottelius
h3. Installing a new k8s cluster
84 8 Nico Schottelius
85 9 Nico Schottelius
* Decide on the cluster name (usually *cX.k8s.ooo*), X counting upwards
86 28 Nico Schottelius
** Using pXX.k8s.ooo for production clusters of placeXX
87 9 Nico Schottelius
* Use cdist to configure the nodes with requirements like crio
88
* Decide between single or multi node control plane setups (see below)
89 28 Nico Schottelius
** Single control plane suitable for development clusters
90 9 Nico Schottelius
91 28 Nico Schottelius
Typical init procedure:
92 9 Nico Schottelius
93 28 Nico Schottelius
* Single control plane: @kubeadm init --config bootstrap/XXX/kubeadm.yaml@
94
* Multi control plane (HA): @kubeadm init --config bootstrap/XXX/kubeadm.yaml --upload-certs@
95 10 Nico Schottelius
96 29 Nico Schottelius
h3. Deleting a pod that is hanging in terminating state
97
98
<pre>
99
kubectl delete pod <PODNAME> --grace-period=0 --force --namespace <NAMESPACE>
100
</pre>
101
102
(from https://stackoverflow.com/questions/35453792/pods-stuck-in-terminating-status)
103
104 42 Nico Schottelius
h3. Listing nodes of a cluster
105
106
<pre>
107
[15:05] bridge:~% kubectl get nodes
108
NAME       STATUS   ROLES                  AGE   VERSION
109
server22   Ready    <none>                 52d   v1.22.0
110
server23   Ready    <none>                 52d   v1.22.2
111
server24   Ready    <none>                 52d   v1.22.0
112
server25   Ready    <none>                 52d   v1.22.0
113
server26   Ready    <none>                 52d   v1.22.0
114
server27   Ready    <none>                 52d   v1.22.0
115
server63   Ready    control-plane,master   52d   v1.22.0
116
server64   Ready    <none>                 52d   v1.22.0
117
server65   Ready    control-plane,master   52d   v1.22.0
118
server66   Ready    <none>                 52d   v1.22.0
119
server83   Ready    control-plane,master   52d   v1.22.0
120
server84   Ready    <none>                 52d   v1.22.0
121
server85   Ready    <none>                 52d   v1.22.0
122
server86   Ready    <none>                 52d   v1.22.0
123
</pre>
124
125 41 Nico Schottelius
h3. Removing / draining a node
126
127
Usually @kubectl drain server@ should do the job, but sometimes we need to be more aggressive:
128
129 1 Nico Schottelius
<pre>
130 103 Nico Schottelius
kubectl drain --delete-emptydir-data --ignore-daemonsets serverXX
131 42 Nico Schottelius
</pre>
132
133
h3. Readding a node after draining
134
135
<pre>
136
kubectl uncordon serverXX
137 1 Nico Schottelius
</pre>
138 43 Nico Schottelius
139 50 Nico Schottelius
h3. (Re-)joining worker nodes after creating the cluster
140 49 Nico Schottelius
141
* We need to have an up-to-date token
142
* We use different join commands for the workers and control plane nodes
143
144
Generating the join command on an existing control plane node:
145
146
<pre>
147
kubeadm token create --print-join-command
148
</pre>
149
150 50 Nico Schottelius
h3. (Re-)joining control plane nodes after creating the cluster
151 1 Nico Schottelius
152 50 Nico Schottelius
* We generate the token again
153
* We upload the certificates
154
* We need to combine/create the join command for the control plane node
155
156
Example session:
157
158
<pre>
159
% kubeadm token create --print-join-command
160
kubeadm join p10-api.k8s.ooo:6443 --token xmff4i.ABC --discovery-token-ca-cert-hash sha256:longhash 
161
162
% kubeadm init phase upload-certs --upload-certs
163
[upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
164
[upload-certs] Using certificate key:
165
CERTKEY
166
167
# Then we use these two outputs on the joining node:
168
169
kubeadm join p10-api.k8s.ooo:6443 --token xmff4i.ABC --discovery-token-ca-cert-hash sha256:longhash --control-plane --certificate-key CERTKEY
170
</pre>
171
172
Commands to be used on a control plane node:
173
174
<pre>
175
kubeadm token create --print-join-command
176
kubeadm init phase upload-certs --upload-certs
177
</pre>
178
179
Commands to be used on the joining node:
180
181
<pre>
182
JOINCOMMAND --control-plane --certificate-key CERTKEY
183
</pre>
184 49 Nico Schottelius
185 51 Nico Schottelius
SEE ALSO
186
187
* https://stackoverflow.com/questions/63936268/how-to-generate-kubeadm-token-for-secondary-control-plane-nodes
188
* https://blog.scottlowe.org/2019/08/15/reconstructing-the-join-command-for-kubeadm/
189
190 53 Nico Schottelius
h3. How to fix etcd does not start when rejoining a kubernetes cluster as a control plane
191 52 Nico Schottelius
192
If during the above step etcd does not come up, @kubeadm join@ can hang as follows:
193
194
<pre>
195
[control-plane] Creating static Pod manifest for "kube-apiserver"                                                              
196
[control-plane] Creating static Pod manifest for "kube-controller-manager"                                                     
197
[control-plane] Creating static Pod manifest for "kube-scheduler"                                                              
198
[check-etcd] Checking that the etcd cluster is healthy                                                                         
199
error execution phase check-etcd: etcd cluster is not healthy: failed to dial endpoint https://[2a0a:e5c0:10:1:225:b3ff:fe20:37
200
8a]:2379 with maintenance client: context deadline exceeded                                                                    
201
To see the stack trace of this error execute with --v=5 or higher         
202
</pre>
203
204
Then the problem is likely that the etcd server is still a member of the cluster. We first need to remove it from the etcd cluster and then the join works.
205
206
To fix this we do:
207
208
* Find a working etcd pod
209
* Find the etcd members / member list
210
* Remove the etcd member that we want to re-join the cluster
211
212
213
<pre>
214
# Find the etcd pods
215
kubectl -n kube-system get pods -l component=etcd,tier=control-plane
216
217
# Get the list of etcd servers with the member id 
218
kubectl exec -n kube-system -ti ETCDPODNAME -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert  /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member list
219
220
# Remove the member
221
kubectl exec -n kube-system -ti ETCDPODNAME -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert  /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member remove MEMBERID
222
</pre>
223
224
Sample session:
225
226
<pre>
227
[10:48] line:~% kubectl -n kube-system get pods -l component=etcd,tier=control-plane
228
NAME            READY   STATUS    RESTARTS     AGE
229
etcd-server63   1/1     Running   0            3m11s
230
etcd-server65   1/1     Running   3            7d2h
231
etcd-server83   1/1     Running   8 (6d ago)   7d2h
232
[10:48] line:~% kubectl exec -n kube-system -ti etcd-server65 -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert  /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member list
233
356891cd676df6e4, started, server65, https://[2a0a:e5c0:10:1:225:b3ff:fe20:375c]:2380, https://[2a0a:e5c0:10:1:225:b3ff:fe20:375c]:2379, false
234
371b8a07185dee7e, started, server63, https://[2a0a:e5c0:10:1:225:b3ff:fe20:378a]:2380, https://[2a0a:e5c0:10:1:225:b3ff:fe20:378a]:2379, false
235
5942bc58307f8af9, started, server83, https://[2a0a:e5c0:10:1:3e4a:92ff:fe79:bb98]:2380, https://[2a0a:e5c0:10:1:3e4a:92ff:fe79:bb98]:2379, false
236
237
[10:48] line:~% kubectl exec -n kube-system -ti etcd-server65 -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert  /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member remove 371b8a07185dee7e
238
Member 371b8a07185dee7e removed from cluster e3c0805f592a8f77
239 1 Nico Schottelius
240
</pre>
241
242
SEE ALSO
243
244
* We found the solution using https://stackoverflow.com/questions/67921552/re-installed-node-cannot-join-kubernetes-cluster
245 56 Nico Schottelius
246 147 Nico Schottelius
h3. Node labels (adding, showing, removing)
247
248
Listing the labels:
249
250
<pre>
251
kubectl get nodes --show-labels
252
</pre>
253
254
Adding labels:
255
256
<pre>
257
kubectl label nodes LIST-OF-NODES label1=value1 
258
259
</pre>
260
261
For instance:
262
263
<pre>
264
kubectl label nodes router2 router3 hosttype=router 
265
</pre>
266
267
Selecting nodes in pods:
268
269
<pre>
270
apiVersion: v1
271
kind: Pod
272
...
273
spec:
274
  nodeSelector:
275
    hosttype: router
276
</pre>
277
278 148 Nico Schottelius
Removing labels by adding a minus at the end of the label name:
279
280
<pre>
281
kubectl label node <nodename> <labelname>-
282
</pre>
283
284
For instance:
285
286
<pre>
287
kubectl label nodes router2 router3 hosttype- 
288
</pre>
289
290 147 Nico Schottelius
SEE ALSO
291 1 Nico Schottelius
292 148 Nico Schottelius
* https://kubernetes.io/docs/tasks/configure-pod-container/assign-pods-nodes/
293
* https://stackoverflow.com/questions/34067979/how-to-delete-a-node-label-by-command-and-api
294 147 Nico Schottelius
295 101 Nico Schottelius
h3. Hardware Maintenance using ungleich-hardware
296
297
Use the following manifest and replace the HOST with the actual host:
298
299
<pre>
300
apiVersion: v1
301
kind: Pod
302
metadata:
303
  name: ungleich-hardware-HOST
304
spec:
305
  containers:
306
  - name: ungleich-hardware
307
    image: ungleich/ungleich-hardware:0.0.5
308
    args:
309
    - sleep
310
    - "1000000"
311
    volumeMounts:
312
      - mountPath: /dev
313
        name: dev
314
    securityContext:
315
      privileged: true
316
  nodeSelector:
317
    kubernetes.io/hostname: "HOST"
318
319
  volumes:
320
    - name: dev
321
      hostPath:
322
        path: /dev
323
</pre>
324
325 102 Nico Schottelius
Also see: [[The_ungleich_hardware_maintenance_guide]]
326
327 105 Nico Schottelius
h3. Triggering a cronjob / creating a job from a cronjob
328 104 Nico Schottelius
329
To test a cronjob, we can create a job from a cronjob:
330
331
<pre>
332
kubectl create job --from=cronjob/volume2-daily-backup volume2-manual
333
</pre>
334
335
This creates a job volume2-manual based on the cronjob  volume2-daily
336
337 112 Nico Schottelius
h3. su-ing into a user that has nologin shell set
338
339
Many times users are having nologin as their shell inside the container. To be able to execute maintenance commands within the
340
container, we can use @su -s /bin/sh@ like this:
341
342
<pre>
343
su -s /bin/sh -c '/path/to/your/script' testuser
344
</pre>
345
346
Found on https://serverfault.com/questions/351046/how-to-run-command-as-user-who-has-usr-sbin-nologin-as-shell
347
348 113 Nico Schottelius
h3. How to print a secret value
349
350
Assuming you want the "password" item from a secret, use:
351
352
<pre>
353
kubectl get secret SECRETNAME -o jsonpath="{.data.password}" | base64 -d; echo "" 
354
</pre>
355
356 62 Nico Schottelius
h2. Calico CNI
357
358
h3. Calico Installation
359
360
* We install "calico using helm":https://docs.projectcalico.org/getting-started/kubernetes/helm
361
* This has the following advantages:
362
** Easy to upgrade
363
** Does not require os to configure IPv6/dual stack settings as the tigera operator figures out things on its own
364
365
Usually plain calico can be installed directly using:
366
367
<pre>
368 149 Nico Schottelius
VERSION=v3.24.1
369
370 120 Nico Schottelius
helm repo add projectcalico https://docs.projectcalico.org/charts
371 124 Nico Schottelius
helm upgrade --install --namespace tigera calico projectcalico/tigera-operator --version $VERSION --create-namespace
372 1 Nico Schottelius
</pre>
373 92 Nico Schottelius
374
* Check the tags on https://github.com/projectcalico/calico/tags for the latest release
375 62 Nico Schottelius
376
h3. Installing calicoctl
377
378 115 Nico Schottelius
* General installation instructions, including binary download: https://projectcalico.docs.tigera.io/maintenance/clis/calicoctl/install
379
380 62 Nico Schottelius
To be able to manage and configure calico, we need to 
381
"install calicoctl (we choose the version as a pod)":https://docs.projectcalico.org/getting-started/clis/calicoctl/install#install-calicoctl-as-a-kubernetes-pod
382
383
<pre>
384
kubectl apply -f https://docs.projectcalico.org/manifests/calicoctl.yaml
385
</pre>
386
387 93 Nico Schottelius
Or version specific:
388
389
<pre>
390
kubectl apply -f https://github.com/projectcalico/calico/blob/v3.20.4/manifests/calicoctl.yaml
391 97 Nico Schottelius
392
# For 3.22
393
kubectl apply -f https://projectcalico.docs.tigera.io/archive/v3.22/manifests/calicoctl.yaml
394 93 Nico Schottelius
</pre>
395
396 70 Nico Schottelius
And making it easier accessible by alias:
397
398
<pre>
399
alias calicoctl="kubectl exec -i -n kube-system calicoctl -- /calicoctl"
400
</pre>
401
402 62 Nico Schottelius
h3. Calico configuration
403
404 63 Nico Schottelius
By default our k8s clusters "BGP peer":https://docs.projectcalico.org/networking/bgp
405
with an upstream router to propagate podcidr and servicecidr.
406 62 Nico Schottelius
407
Default settings in our infrastructure:
408
409
* We use a full-mesh using the @nodeToNodeMeshEnabled: true@ option
410
* We keep the original next hop so that *only* the server with the pod is announcing it (instead of ecmp)
411 1 Nico Schottelius
* We use private ASNs for k8s clusters
412 63 Nico Schottelius
* We do *not* use any overlay
413 62 Nico Schottelius
414
After installing calico and calicoctl the last step of the installation is usually:
415
416 1 Nico Schottelius
<pre>
417 79 Nico Schottelius
calicoctl create -f - < calico-bgp.yaml
418 62 Nico Schottelius
</pre>
419
420
421
A sample BGP configuration:
422
423
<pre>
424
---
425
apiVersion: projectcalico.org/v3
426
kind: BGPConfiguration
427
metadata:
428
  name: default
429
spec:
430
  logSeverityScreen: Info
431
  nodeToNodeMeshEnabled: true
432
  asNumber: 65534
433
  serviceClusterIPs:
434
  - cidr: 2a0a:e5c0:10:3::/108
435
  serviceExternalIPs:
436
  - cidr: 2a0a:e5c0:10:3::/108
437
---
438
apiVersion: projectcalico.org/v3
439
kind: BGPPeer
440
metadata:
441
  name: router1-place10
442
spec:
443
  peerIP: 2a0a:e5c0:10:1::50
444
  asNumber: 213081
445
  keepOriginalNextHop: true
446
</pre>
447
448 126 Nico Schottelius
h2. Cilium CNI (experimental)
449
450 137 Nico Schottelius
h3. Status
451
452 138 Nico Schottelius
*NO WORKING CILIUM CONFIGURATION FOR IPV6 only modes*
453 137 Nico Schottelius
454 146 Nico Schottelius
h3. Latest error
455
456
It seems cilium does not run on IPv6 only hosts:
457
458
<pre>
459
level=info msg="Validating configured node address ranges" subsys=daemon
460
level=fatal msg="postinit failed" error="external IPv4 node address could not be derived, please configure via --ipv4-node" subsys=daemon
461
level=info msg="Starting IP identity watcher" subsys=ipcache
462
</pre>
463
464
It crashes after that log entry
465
466 128 Nico Schottelius
h3. BGP configuration
467
468
* The cilium-operator will not start without a correct configmap being present beforehand (see error message below)
469
* Creating the bgp config beforehand as a configmap is thus required.
470
471
The error one gets without the configmap present:
472
473
Pods are hanging with:
474
475
<pre>
476
cilium-bpqm6                       0/1     Init:0/4            0             9s
477
cilium-operator-5947d94f7f-5bmh2   0/1     ContainerCreating   0             9s
478
</pre>
479
480
The error message in the cilium-*perator is:
481
482
<pre>
483
Events:
484
  Type     Reason       Age                From               Message
485
  ----     ------       ----               ----               -------
486
  Normal   Scheduled    80s                default-scheduler  Successfully assigned kube-system/cilium-operator-5947d94f7f-lqcsp to server56
487
  Warning  FailedMount  16s (x8 over 80s)  kubelet            MountVolume.SetUp failed for volume "bgp-config-path" : configmap "bgp-config" not found
488
</pre>
489
490
A correct bgp config looks like this:
491
492
<pre>
493
apiVersion: v1
494
kind: ConfigMap
495
metadata:
496
  name: bgp-config
497
  namespace: kube-system
498
data:
499
  config.yaml: |
500
    peers:
501
      - peer-address: 2a0a:e5c0::46
502
        peer-asn: 209898
503
        my-asn: 65533
504
      - peer-address: 2a0a:e5c0::47
505
        peer-asn: 209898
506
        my-asn: 65533
507
    address-pools:
508
      - name: default
509
        protocol: bgp
510
        addresses:
511
          - 2a0a:e5c0:0:14::/64
512
</pre>
513 127 Nico Schottelius
514
h3. Installation
515 130 Nico Schottelius
516 127 Nico Schottelius
Adding the repo
517 1 Nico Schottelius
<pre>
518 127 Nico Schottelius
519 129 Nico Schottelius
helm repo add cilium https://helm.cilium.io/
520 130 Nico Schottelius
helm repo update
521
</pre>
522 129 Nico Schottelius
523 135 Nico Schottelius
Installing + configuring cilium
524 129 Nico Schottelius
<pre>
525 130 Nico Schottelius
ipv6pool=2a0a:e5c0:0:14::/112
526 1 Nico Schottelius
527 146 Nico Schottelius
version=1.12.2
528 129 Nico Schottelius
529
helm upgrade --install cilium cilium/cilium --version $version \
530 1 Nico Schottelius
  --namespace kube-system \
531
  --set ipv4.enabled=false \
532
  --set ipv6.enabled=true \
533 146 Nico Schottelius
  --set enableIPv6Masquerade=false \
534
  --set bgpControlPlane.enabled=true 
535 1 Nico Schottelius
536 146 Nico Schottelius
#  --set ipam.operator.clusterPoolIPv6PodCIDRList=$ipv6pool
537
538
# Old style bgp?
539 136 Nico Schottelius
#   --set bgp.enabled=true --set bgp.announce.podCIDR=true \
540 127 Nico Schottelius
541
# Show possible configuration options
542
helm show values cilium/cilium
543
544 1 Nico Schottelius
</pre>
545 132 Nico Schottelius
546
Using a /64 for ipam.operator.clusterPoolIPv6PodCIDRList fails with:
547
548
<pre>
549
level=fatal msg="Unable to init cluster-pool allocator" error="unable to initialize IPv6 allocator New CIDR set failed; the node CIDR size is too big" subsys=cilium-operator-generic
550
</pre>
551
552 126 Nico Schottelius
553 1 Nico Schottelius
See also https://github.com/cilium/cilium/issues/20756
554 135 Nico Schottelius
555
Seems a /112 is actually working.
556
557
h3. Kernel modules
558
559
Cilium requires the following modules to be loaded on the host (not loaded by default):
560
561
<pre>
562 1 Nico Schottelius
modprobe  ip6table_raw
563
modprobe  ip6table_filter
564
</pre>
565 146 Nico Schottelius
566
h3. Interesting helm flags
567
568
* autoDirectNodeRoutes
569
* bgpControlPlane.enabled = true
570
571
h3. SEE ALSO
572
573
* https://docs.cilium.io/en/v1.12/helm-reference/
574 133 Nico Schottelius
575 150 Nico Schottelius
h2. Multus (incomplete/experimental)
576
577
(TBD)
578
579 122 Nico Schottelius
h2. ArgoCD 
580 56 Nico Schottelius
581 60 Nico Schottelius
h3. Argocd Installation
582 1 Nico Schottelius
583 116 Nico Schottelius
* See https://argo-cd.readthedocs.io/en/stable/
584
585 60 Nico Schottelius
As there is no configuration management present yet, argocd is installed using
586
587 1 Nico Schottelius
<pre>
588 60 Nico Schottelius
kubectl create namespace argocd
589 86 Nico Schottelius
590 96 Nico Schottelius
# Specific Version
591
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/v2.3.2/manifests/install.yaml
592 86 Nico Schottelius
593
# OR: latest stable
594 60 Nico Schottelius
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml
595 56 Nico Schottelius
</pre>
596 1 Nico Schottelius
597 116 Nico Schottelius
598 1 Nico Schottelius
599 60 Nico Schottelius
h3. Get the argocd credentials
600
601
<pre>
602
kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d; echo ""
603
</pre>
604 52 Nico Schottelius
605 87 Nico Schottelius
h3. Accessing argocd
606
607
In regular IPv6 clusters:
608
609
* Navigate to https://argocd-server.argocd.CLUSTERDOMAIN
610
611
In legacy IPv4 clusters
612
613
<pre>
614
kubectl --namespace argocd port-forward svc/argocd-server 8080:80
615
</pre>
616
617 88 Nico Schottelius
* Navigate to https://localhost:8080
618
619 68 Nico Schottelius
h3. Using the argocd webhook to trigger changes
620 67 Nico Schottelius
621
* To trigger changes post json https://argocd.example.com/api/webhook
622
623 72 Nico Schottelius
h3. Deploying an application
624
625
* Applications are deployed via git towards gitea (code.ungleich.ch) and then pulled by argo
626 73 Nico Schottelius
* Always include the *redmine-url* pointing to the (customer) ticket
627
** Also add the support-url if it exists
628 72 Nico Schottelius
629
Application sample
630
631
<pre>
632
apiVersion: argoproj.io/v1alpha1
633
kind: Application
634
metadata:
635
  name: gitea-CUSTOMER
636
  namespace: argocd
637
spec:
638
  destination:
639
    namespace: default
640
    server: 'https://kubernetes.default.svc'
641
  source:
642
    path: apps/prod/gitea
643
    repoURL: 'https://code.ungleich.ch/ungleich-intern/k8s-config.git'
644
    targetRevision: HEAD
645
    helm:
646
      parameters:
647
        - name: storage.data.storageClass
648
          value: rook-ceph-block-hdd
649
        - name: storage.data.size
650
          value: 200Gi
651
        - name: storage.db.storageClass
652
          value: rook-ceph-block-ssd
653
        - name: storage.db.size
654
          value: 10Gi
655
        - name: storage.letsencrypt.storageClass
656
          value: rook-ceph-block-hdd
657
        - name: storage.letsencrypt.size
658
          value: 50Mi
659
        - name: letsencryptStaging
660
          value: 'no'
661
        - name: fqdn
662
          value: 'code.verua.online'
663
  project: default
664
  syncPolicy:
665
    automated:
666
      prune: true
667
      selfHeal: true
668
  info:
669
    - name: 'redmine-url'
670
      value: 'https://redmine.ungleich.ch/issues/ISSUEID'
671
    - name: 'support-url'
672
      value: 'https://support.ungleich.ch/Ticket/Display.html?id=TICKETID'
673
</pre>
674
675 80 Nico Schottelius
h2. Helm related operations and conventions
676 55 Nico Schottelius
677 61 Nico Schottelius
We use helm charts extensively.
678
679
* In production, they are managed via argocd
680
* In development, helm chart can de developed and deployed manually using the helm utility.
681
682 55 Nico Schottelius
h3. Installing a helm chart
683
684
One can use the usual pattern of
685
686
<pre>
687
helm install <releasename> <chartdirectory>
688
</pre>
689
690
However often you want to reinstall/update when testing helm charts. The following pattern is "better", because it allows you to reinstall, if it is already installed:
691
692
<pre>
693
helm upgrade --install <releasename> <chartdirectory>
694 1 Nico Schottelius
</pre>
695 80 Nico Schottelius
696
h3. Naming services and deployments in helm charts [Application labels]
697
698
* We always have {{ .Release.Name }} to identify the current "instance"
699
* Deployments:
700
** use @app: <what it is>@, f.i. @app: nginx@, @app: postgres@, ...
701 81 Nico Schottelius
* See more about standard labels on
702
** https://kubernetes.io/docs/concepts/overview/working-with-objects/common-labels/
703
** https://helm.sh/docs/chart_best_practices/labels/
704 55 Nico Schottelius
705 151 Nico Schottelius
h3. Show all versions of a helm chart
706
707
<pre>
708
helm search repo -l repo/chart
709
</pre>
710
711
For example:
712
713
<pre>
714
% helm search repo -l projectcalico/tigera-operator 
715
NAME                         	CHART VERSION	APP VERSION	DESCRIPTION                            
716
projectcalico/tigera-operator	v3.23.3      	v3.23.3    	Installs the Tigera operator for Calico
717
projectcalico/tigera-operator	v3.23.2      	v3.23.2    	Installs the Tigera operator for Calico
718
....
719
</pre>
720
721 152 Nico Schottelius
h3. Show possible values of a chart
722
723
<pre>
724
helm show values <repo/chart>
725
</pre>
726
727
Example:
728
729
<pre>
730
helm show values ingress-nginx/ingress-nginx
731
</pre>
732
733
734 139 Nico Schottelius
h2. Rook + Ceph
735
736
h3. Installation
737
738
* Usually directly via argocd
739
740
Manual steps:
741
742
<pre>
743
744
</pre>
745 43 Nico Schottelius
746 71 Nico Schottelius
h3. Executing ceph commands
747
748
Using the ceph-tools pod as follows:
749
750
<pre>
751
kubectl exec -n rook-ceph -ti $(kubectl -n rook-ceph get pods -l app=rook-ceph-tools -o jsonpath='{.items[*].metadata.name}') -- ceph -s
752
</pre>
753
754 43 Nico Schottelius
h3. Inspecting the logs of a specific server
755
756
<pre>
757
# Get the related pods
758
kubectl -n rook-ceph get pods -l app=rook-ceph-osd-prepare 
759
...
760
761
# Inspect the logs of a specific pod
762
kubectl -n rook-ceph logs -f rook-ceph-osd-prepare-server23--1-444qx
763
764 71 Nico Schottelius
</pre>
765
766
h3. Inspecting the logs of the rook-ceph-operator
767
768
<pre>
769
kubectl -n rook-ceph logs -f -l app=rook-ceph-operator
770 43 Nico Schottelius
</pre>
771
772 121 Nico Schottelius
h3. Restarting the rook operator
773
774
<pre>
775
kubectl -n rook-ceph delete pods  -l app=rook-ceph-operator
776
</pre>
777
778 43 Nico Schottelius
h3. Triggering server prepare / adding new osds
779
780
The rook-ceph-operator triggers/watches/creates pods to maintain hosts. To trigger a full "re scan", simply delete that pod:
781
782
<pre>
783
kubectl -n rook-ceph delete pods -l app=rook-ceph-operator
784
</pre>
785
786
This will cause all the @rook-ceph-osd-prepare-..@ jobs to be recreated and thus OSDs to be created, if new disks have been added.
787
788
h3. Removing an OSD
789
790
* See "Ceph OSD Management":https://rook.io/docs/rook/v1.7/ceph-osd-mgmt.html
791 77 Nico Schottelius
* More specifically: https://github.com/rook/rook/blob/release-1.7/cluster/examples/kubernetes/ceph/osd-purge.yaml
792 99 Nico Schottelius
* Then delete the related deployment
793 41 Nico Schottelius
794 98 Nico Schottelius
Set osd id in the osd-purge.yaml and apply it. OSD should be down before.
795
796
<pre>
797
apiVersion: batch/v1
798
kind: Job
799
metadata:
800
  name: rook-ceph-purge-osd
801
  namespace: rook-ceph # namespace:cluster
802
  labels:
803
    app: rook-ceph-purge-osd
804
spec:
805
  template:
806
    metadata:
807
      labels:
808
        app: rook-ceph-purge-osd
809
    spec:
810
      serviceAccountName: rook-ceph-purge-osd
811
      containers:
812
        - name: osd-removal
813
          image: rook/ceph:master
814
          # TODO: Insert the OSD ID in the last parameter that is to be removed
815
          # The OSD IDs are a comma-separated list. For example: "0" or "0,2".
816
          # If you want to preserve the OSD PVCs, set `--preserve-pvc true`.
817
          #
818
          # A --force-osd-removal option is available if the OSD should be destroyed even though the
819
          # removal could lead to data loss.
820
          args:
821
            - "ceph"
822
            - "osd"
823
            - "remove"
824
            - "--preserve-pvc"
825
            - "false"
826
            - "--force-osd-removal"
827
            - "false"
828
            - "--osd-ids"
829
            - "SETTHEOSDIDHERE"
830
          env:
831
            - name: POD_NAMESPACE
832
              valueFrom:
833
                fieldRef:
834
                  fieldPath: metadata.namespace
835
            - name: ROOK_MON_ENDPOINTS
836
              valueFrom:
837
                configMapKeyRef:
838
                  key: data
839
                  name: rook-ceph-mon-endpoints
840
            - name: ROOK_CEPH_USERNAME
841
              valueFrom:
842
                secretKeyRef:
843
                  key: ceph-username
844
                  name: rook-ceph-mon
845
            - name: ROOK_CEPH_SECRET
846
              valueFrom:
847
                secretKeyRef:
848
                  key: ceph-secret
849
                  name: rook-ceph-mon
850
            - name: ROOK_CONFIG_DIR
851
              value: /var/lib/rook
852
            - name: ROOK_CEPH_CONFIG_OVERRIDE
853
              value: /etc/rook/config/override.conf
854
            - name: ROOK_FSID
855
              valueFrom:
856
                secretKeyRef:
857
                  key: fsid
858
                  name: rook-ceph-mon
859
            - name: ROOK_LOG_LEVEL
860
              value: DEBUG
861
          volumeMounts:
862
            - mountPath: /etc/ceph
863
              name: ceph-conf-emptydir
864
            - mountPath: /var/lib/rook
865
              name: rook-config
866
      volumes:
867
        - emptyDir: {}
868
          name: ceph-conf-emptydir
869
        - emptyDir: {}
870
          name: rook-config
871
      restartPolicy: Never
872
873
874 99 Nico Schottelius
</pre>
875
876
Deleting the deployment:
877
878
<pre>
879
[18:05] bridge:~% kubectl -n rook-ceph delete deployment rook-ceph-osd-6
880
deployment.apps "rook-ceph-osd-6" deleted
881 98 Nico Schottelius
</pre>
882
883 145 Nico Schottelius
h2. Ingress + Cert Manager
884
885
* We deploy "nginx-ingress":https://docs.nginx.com/nginx-ingress-controller/ to get an ingress
886
* we deploy "cert-manager":https://cert-manager.io/ to handle certificates
887
* We independently deploy @ClusterIssuer@ to allow the cert-manager app to deploy and the issuer to be created once the CRDs from cert manager are in place
888
889
h3. IPv4 reachability 
890
891
The ingress is by default IPv6 only. To make it reachable from the IPv4 world, get its IPv6 address and configure a NAT64 mapping in Jool.
892
893
Steps:
894
895
h4. Get the ingress IPv6 address
896
897
Use @kubectl -n ingress-nginx get svc ingress-nginx-controller -o jsonpath='{.spec.clusterIP}'; echo ''@
898
899
Example:
900
901
<pre>
902
kubectl -n ingress-nginx get svc ingress-nginx-controller -o jsonpath='{.spec.clusterIP}'; echo ''
903
2a0a:e5c0:10:1b::ce11
904
</pre>
905
906
h4. Add NAT64 mapping
907
908
* Update the __dcl_jool_siit cdist type
909
* Record the two IPs (IPv6 and IPv4)
910
* Configure all routers
911
912
913
h4. Add DNS record
914
915
To use the ingress capable as a CNAME destination, create an "ingress" DNS record, such as:
916
917
<pre>
918
; k8s ingress for dev
919
dev-ingress                 AAAA 2a0a:e5c0:10:1b::ce11
920
dev-ingress                 A 147.78.194.23
921
922
</pre> 
923
924
h4. Add supporting wildcard DNS
925
926
If you plan to add various sites under a specific domain, we can add a wildcard DNS entry, such as *.k8s-dev.django-hosting.ch:
927
928
<pre>
929
*.k8s-dev         CNAME dev-ingress.ungleich.ch.
930
</pre>
931
932 76 Nico Schottelius
h2. Harbor
933
934
* We user "Harbor":https://goharbor.io/ for caching and as an image registry. Internal app reference: apps/prod/harbor.
935
* The admin password is in the password store, auto generated per cluster
936
* At the moment harbor only authenticates against the internal ldap tree
937
938
h3. LDAP configuration
939
940
* The url needs to be ldaps://...
941
* uid = uid
942
* rest standard
943 75 Nico Schottelius
944 89 Nico Schottelius
h2. Monitoring / Prometheus
945
946 90 Nico Schottelius
* Via "kube-prometheus":https://github.com/prometheus-operator/kube-prometheus/
947 89 Nico Schottelius
948 91 Nico Schottelius
Access via ...
949
950
* http://prometheus-k8s.monitoring.svc:9090
951
* http://grafana.monitoring.svc:3000
952
* http://alertmanager.monitoring.svc:9093
953
954
955 100 Nico Schottelius
h3. Prometheus Options
956
957
* "helm/kube-prometheus-stack":https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack
958
** Includes dashboards and co.
959
* "manifest based kube-prometheus":https://github.com/prometheus-operator/kube-prometheus
960
** Includes dashboards and co.
961
* "Prometheus Operator (mainly CRD manifest":https://github.com/prometheus-operator/prometheus-operator
962
963 91 Nico Schottelius
964 82 Nico Schottelius
h2. Nextcloud
965
966 85 Nico Schottelius
h3. How to get the nextcloud credentials 
967 84 Nico Schottelius
968
* The initial username is set to "nextcloud"
969
* The password is autogenerated and saved in a kubernetes secret
970
971
<pre>
972 85 Nico Schottelius
kubectl get secret RELEASENAME-nextcloud -o jsonpath="{.data.PASSWORD}" | base64 -d; echo "" 
973 84 Nico Schottelius
</pre>
974
975 83 Nico Schottelius
h3. How to fix "Access through untrusted domain"
976
977 82 Nico Schottelius
* Nextcloud stores the initial domain configuration
978 1 Nico Schottelius
* If the FQDN is changed, it will show the error message "Access through untrusted domain"
979 82 Nico Schottelius
* To fix, edit /var/www/html/config/config.php and correct the domain
980 83 Nico Schottelius
* Then delete the pods
981 82 Nico Schottelius
982 1 Nico Schottelius
h2. Infrastructure versions
983 35 Nico Schottelius
984 57 Nico Schottelius
h3. ungleich kubernetes infrastructure v5 (2021-10)
985 1 Nico Schottelius
986 57 Nico Schottelius
Clusters are configured / setup in this order:
987
988
* Bootstrap via kubeadm
989 59 Nico Schottelius
* "Networking via calico + BGP (non ECMP) using helm":https://docs.projectcalico.org/getting-started/kubernetes/helm
990
* "ArgoCD for CD":https://argo-cd.readthedocs.io/en/stable/
991
** "rook for storage via argocd":https://rook.io/
992 58 Nico Schottelius
** haproxy for in IPv6-cluster-IPv4-to-IPv6 proxy via argocd
993
** "kubernetes-secret-generator for in cluster secrets":https://github.com/mittwald/kubernetes-secret-generator
994
** "ungleich-certbot managing certs and nginx":https://hub.docker.com/repository/docker/ungleich/ungleich-certbot
995
996 57 Nico Schottelius
997
h3. ungleich kubernetes infrastructure v4 (2021-09)
998
999 54 Nico Schottelius
* rook is configured via manifests instead of using the rook-ceph-cluster helm chart
1000 1 Nico Schottelius
* The rook operator is still being installed via helm
1001 35 Nico Schottelius
1002 57 Nico Schottelius
h3. ungleich kubernetes infrastructure v3 (2021-07)
1003 1 Nico Schottelius
1004 10 Nico Schottelius
* rook is now installed via helm via argocd instead of directly via manifests
1005 28 Nico Schottelius
1006 57 Nico Schottelius
h3. ungleich kubernetes infrastructure v2 (2021-05)
1007 28 Nico Schottelius
1008
* Replaced fluxv2 from ungleich k8s v1 with argocd
1009 1 Nico Schottelius
** argocd can apply helm templates directly without needing to go through Chart releases
1010 28 Nico Schottelius
* We are also using argoflow for build flows
1011
* Planned to add "kaniko":https://github.com/GoogleContainerTools/kaniko for image building
1012
1013 57 Nico Schottelius
h3. ungleich kubernetes infrastructure v1 (2021-01)
1014 28 Nico Schottelius
1015
We are using the following components:
1016
1017
* "Calico as a CNI":https://www.projectcalico.org/ with BGP, IPv6 only, no encapsulation
1018
** Needed for basic networking
1019
* "kubernetes-secret-generator":https://github.com/mittwald/kubernetes-secret-generator for creating secrets
1020
** Needed so that secrets are not stored in the git repository, but only in the cluster
1021
* "ungleich-certbot":https://hub.docker.com/repository/docker/ungleich/ungleich-certbot
1022
** Needed to get letsencrypt certificates for services
1023
* "rook with ceph rbd + cephfs":https://rook.io/ for storage
1024
** rbd for almost everything, *ReadWriteOnce*
1025
** cephfs for smaller things, multi access *ReadWriteMany*
1026
** Needed for providing persistent storage
1027
* "flux v2":https://fluxcd.io/
1028
** Needed to manage resources automatically