Project

General

Profile

The ungleich kubernetes infrastructure » History » Version 159

Nico Schottelius, 10/30/2022 07:54 AM

1 22 Nico Schottelius
h1. The ungleich kubernetes infrastructure and ungleich kubernetes manual
2 1 Nico Schottelius
3 3 Nico Schottelius
{{toc}}
4
5 1 Nico Schottelius
h2. Status
6
7 28 Nico Schottelius
This document is **pre-production**.
8
This document is to become the ungleich kubernetes infrastructure overview as well as the ungleich kubernetes manual.
9 1 Nico Schottelius
10 10 Nico Schottelius
h2. k8s clusters
11
12 123 Nico Schottelius
| Cluster            | Purpose/Setup     | Maintainer | Master(s)                     | argo                                                   | v4 http proxy | last verified |
13
| c0.k8s.ooo         | Dev               | -          | UNUSED                        |                                                        |               |    2021-10-05 |
14
| c1.k8s.ooo         | retired           |            | -                             |                                                        |               |    2022-03-15 |
15
| c2.k8s.ooo         | Dev p7 HW         | Nico       | server47 server53 server54    | "argo":https://argocd-server.argocd.svc.c2.k8s.ooo     |               |    2021-10-05 |
16
| c3.k8s.ooo         | retired           | -          | -                             |                                                        |               |    2021-10-05 |
17
| c4.k8s.ooo         | Dev2 p7 HW        | Jin-Guk    | server52 server53 server54    |                                                        |               |             - |
18
| c5.k8s.ooo         | retired           |            | -                             |                                                        |               |    2022-03-15 |
19
| c6.k8s.ooo         | Dev p6 VM Jin-Guk | Jin-Guk    |                               |                                                        |               |               |
20
| [[p5.k8s.ooo]]     | production        |            | server34 server36 server38    | "argo":https://argocd-server.argocd.svc.p5.k8s.ooo     | -             |               |
21
| [[p5-cow.k8s.ooo]] | production        | Nico       | server47 server51 server55    | "argo":https://argocd-server.argocd.svc.p5-cow.k8s.ooo |               |    2022-08-27 |
22
| [[p6.k8s.ooo]]     | production        |            | server67 server69 server71    | "argo":https://argocd-server.argocd.svc.p6.k8s.ooo     | 147.78.194.13 |    2021-10-05 |
23
| [[p10.k8s.ooo]]    | production        |            | server63 server65 server83    | "argo":https://argocd-server.argocd.svc.p10.k8s.ooo    | 147.78.194.12 |    2021-10-05 |
24
| [[k8s.ge.nau.so]]  | development       |            | server107 server108 server109 | "argo":https://argocd-server.argocd.svc.k8s.ge.nau.so  |               |               |
25
| [[dev.k8s.ooo]]    | development       |            | server110 server111 server112 | "argo":https://argocd-server.argocd.svc.dev.k8s.ooo    | -             |    2022-07-08 |
26 142 Nico Schottelius
| [[server121.k8s.ooo]] | production | Nico | server121 | | | 2022-09-06 |
27 155 Nico Schottelius
| [[server122-123.k8s.ooo|server122.k8s.ooo]] | production | Nico | server122 | | | 2022-10-30 |
28 156 Nico Schottelius
| [[server122-123.k8s.ooo|server123.k8s.ooo]] | production | Nico | server123 | | | 2022-10-15 |
29 21 Nico Schottelius
30 1 Nico Schottelius
h2. General architecture and components overview
31
32
* All k8s clusters are IPv6 only
33
* We use BGP peering to propagate podcidr and serviceCidr networks to our infrastructure
34
* The main public testing repository is "ungleich-k8s":https://code.ungleich.ch/ungleich-public/ungleich-k8s
35 18 Nico Schottelius
** Private configurations are found in the **k8s-config** repository
36 1 Nico Schottelius
37
h3. Cluster types
38
39 28 Nico Schottelius
| **Type/Feature**            | **Development**                | **Production**         |
40
| Min No. nodes               | 3 (1 master, 3 worker)         | 5 (3 master, 3 worker) |
41
| Recommended minimum         | 4 (dedicated master, 3 worker) | 8 (3 master, 5 worker) |
42
| Separation of control plane | optional                       | recommended            |
43
| Persistent storage          | required                       | required               |
44
| Number of storage monitors  | 3                              | 5                      |
45 1 Nico Schottelius
46 43 Nico Schottelius
h2. General k8s operations
47 1 Nico Schottelius
48 46 Nico Schottelius
h3. Cheat sheet / external great references
49
50
* "kubectl cheatsheet":https://kubernetes.io/docs/reference/kubectl/cheatsheet/
51
52 117 Nico Schottelius
h3. Allowing to schedule work on the control plane / removing node taints
53 69 Nico Schottelius
54
* Mostly for single node / test / development clusters
55
* Just remove the master taint as follows
56
57
<pre>
58
kubectl taint nodes --all node-role.kubernetes.io/master-
59 118 Nico Schottelius
kubectl taint nodes --all node-role.kubernetes.io/control-plane-
60 69 Nico Schottelius
</pre>
61 1 Nico Schottelius
62 117 Nico Schottelius
You can check the node taints using @kubectl describe node ...@
63 69 Nico Schottelius
64 44 Nico Schottelius
h3. Get the cluster admin.conf
65
66
* On the masters of each cluster you can find the file @/etc/kubernetes/admin.conf@
67
* To be able to administrate the cluster you can copy the admin.conf to your local machine
68
* Multi cluster debugging can very easy if you name the config ~/cX-admin.conf (see example below)
69
70
<pre>
71
% scp root@server47.place7.ungleich.ch:/etc/kubernetes/admin.conf ~/c2-admin.conf
72
% export KUBECONFIG=~/c2-admin.conf    
73
% kubectl get nodes
74
NAME       STATUS                     ROLES                  AGE   VERSION
75
server47   Ready                      control-plane,master   82d   v1.22.0
76
server48   Ready                      control-plane,master   82d   v1.22.0
77
server49   Ready                      <none>                 82d   v1.22.0
78
server50   Ready                      <none>                 82d   v1.22.0
79
server59   Ready                      control-plane,master   82d   v1.22.0
80
server60   Ready,SchedulingDisabled   <none>                 82d   v1.22.0
81
server61   Ready                      <none>                 82d   v1.22.0
82
server62   Ready                      <none>                 82d   v1.22.0               
83
</pre>
84
85 18 Nico Schottelius
h3. Installing a new k8s cluster
86 8 Nico Schottelius
87 9 Nico Schottelius
* Decide on the cluster name (usually *cX.k8s.ooo*), X counting upwards
88 28 Nico Schottelius
** Using pXX.k8s.ooo for production clusters of placeXX
89 9 Nico Schottelius
* Use cdist to configure the nodes with requirements like crio
90
* Decide between single or multi node control plane setups (see below)
91 28 Nico Schottelius
** Single control plane suitable for development clusters
92 9 Nico Schottelius
93 28 Nico Schottelius
Typical init procedure:
94 9 Nico Schottelius
95 28 Nico Schottelius
* Single control plane: @kubeadm init --config bootstrap/XXX/kubeadm.yaml@
96
* Multi control plane (HA): @kubeadm init --config bootstrap/XXX/kubeadm.yaml --upload-certs@
97 10 Nico Schottelius
98 29 Nico Schottelius
h3. Deleting a pod that is hanging in terminating state
99
100
<pre>
101
kubectl delete pod <PODNAME> --grace-period=0 --force --namespace <NAMESPACE>
102
</pre>
103
104
(from https://stackoverflow.com/questions/35453792/pods-stuck-in-terminating-status)
105
106 42 Nico Schottelius
h3. Listing nodes of a cluster
107
108
<pre>
109
[15:05] bridge:~% kubectl get nodes
110
NAME       STATUS   ROLES                  AGE   VERSION
111
server22   Ready    <none>                 52d   v1.22.0
112
server23   Ready    <none>                 52d   v1.22.2
113
server24   Ready    <none>                 52d   v1.22.0
114
server25   Ready    <none>                 52d   v1.22.0
115
server26   Ready    <none>                 52d   v1.22.0
116
server27   Ready    <none>                 52d   v1.22.0
117
server63   Ready    control-plane,master   52d   v1.22.0
118
server64   Ready    <none>                 52d   v1.22.0
119
server65   Ready    control-plane,master   52d   v1.22.0
120
server66   Ready    <none>                 52d   v1.22.0
121
server83   Ready    control-plane,master   52d   v1.22.0
122
server84   Ready    <none>                 52d   v1.22.0
123
server85   Ready    <none>                 52d   v1.22.0
124
server86   Ready    <none>                 52d   v1.22.0
125
</pre>
126
127 41 Nico Schottelius
h3. Removing / draining a node
128
129
Usually @kubectl drain server@ should do the job, but sometimes we need to be more aggressive:
130
131 1 Nico Schottelius
<pre>
132 103 Nico Schottelius
kubectl drain --delete-emptydir-data --ignore-daemonsets serverXX
133 42 Nico Schottelius
</pre>
134
135
h3. Readding a node after draining
136
137
<pre>
138
kubectl uncordon serverXX
139 1 Nico Schottelius
</pre>
140 43 Nico Schottelius
141 50 Nico Schottelius
h3. (Re-)joining worker nodes after creating the cluster
142 49 Nico Schottelius
143
* We need to have an up-to-date token
144
* We use different join commands for the workers and control plane nodes
145
146
Generating the join command on an existing control plane node:
147
148
<pre>
149
kubeadm token create --print-join-command
150
</pre>
151
152 50 Nico Schottelius
h3. (Re-)joining control plane nodes after creating the cluster
153 1 Nico Schottelius
154 50 Nico Schottelius
* We generate the token again
155
* We upload the certificates
156
* We need to combine/create the join command for the control plane node
157
158
Example session:
159
160
<pre>
161
% kubeadm token create --print-join-command
162
kubeadm join p10-api.k8s.ooo:6443 --token xmff4i.ABC --discovery-token-ca-cert-hash sha256:longhash 
163
164
% kubeadm init phase upload-certs --upload-certs
165
[upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
166
[upload-certs] Using certificate key:
167
CERTKEY
168
169
# Then we use these two outputs on the joining node:
170
171
kubeadm join p10-api.k8s.ooo:6443 --token xmff4i.ABC --discovery-token-ca-cert-hash sha256:longhash --control-plane --certificate-key CERTKEY
172
</pre>
173
174
Commands to be used on a control plane node:
175
176
<pre>
177
kubeadm token create --print-join-command
178
kubeadm init phase upload-certs --upload-certs
179
</pre>
180
181
Commands to be used on the joining node:
182
183
<pre>
184
JOINCOMMAND --control-plane --certificate-key CERTKEY
185
</pre>
186 49 Nico Schottelius
187 51 Nico Schottelius
SEE ALSO
188
189
* https://stackoverflow.com/questions/63936268/how-to-generate-kubeadm-token-for-secondary-control-plane-nodes
190
* https://blog.scottlowe.org/2019/08/15/reconstructing-the-join-command-for-kubeadm/
191
192 53 Nico Schottelius
h3. How to fix etcd does not start when rejoining a kubernetes cluster as a control plane
193 52 Nico Schottelius
194
If during the above step etcd does not come up, @kubeadm join@ can hang as follows:
195
196
<pre>
197
[control-plane] Creating static Pod manifest for "kube-apiserver"                                                              
198
[control-plane] Creating static Pod manifest for "kube-controller-manager"                                                     
199
[control-plane] Creating static Pod manifest for "kube-scheduler"                                                              
200
[check-etcd] Checking that the etcd cluster is healthy                                                                         
201
error execution phase check-etcd: etcd cluster is not healthy: failed to dial endpoint https://[2a0a:e5c0:10:1:225:b3ff:fe20:37
202
8a]:2379 with maintenance client: context deadline exceeded                                                                    
203
To see the stack trace of this error execute with --v=5 or higher         
204
</pre>
205
206
Then the problem is likely that the etcd server is still a member of the cluster. We first need to remove it from the etcd cluster and then the join works.
207
208
To fix this we do:
209
210
* Find a working etcd pod
211
* Find the etcd members / member list
212
* Remove the etcd member that we want to re-join the cluster
213
214
215
<pre>
216
# Find the etcd pods
217
kubectl -n kube-system get pods -l component=etcd,tier=control-plane
218
219
# Get the list of etcd servers with the member id 
220
kubectl exec -n kube-system -ti ETCDPODNAME -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert  /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member list
221
222
# Remove the member
223
kubectl exec -n kube-system -ti ETCDPODNAME -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert  /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member remove MEMBERID
224
</pre>
225
226
Sample session:
227
228
<pre>
229
[10:48] line:~% kubectl -n kube-system get pods -l component=etcd,tier=control-plane
230
NAME            READY   STATUS    RESTARTS     AGE
231
etcd-server63   1/1     Running   0            3m11s
232
etcd-server65   1/1     Running   3            7d2h
233
etcd-server83   1/1     Running   8 (6d ago)   7d2h
234
[10:48] line:~% kubectl exec -n kube-system -ti etcd-server65 -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert  /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member list
235
356891cd676df6e4, started, server65, https://[2a0a:e5c0:10:1:225:b3ff:fe20:375c]:2380, https://[2a0a:e5c0:10:1:225:b3ff:fe20:375c]:2379, false
236
371b8a07185dee7e, started, server63, https://[2a0a:e5c0:10:1:225:b3ff:fe20:378a]:2380, https://[2a0a:e5c0:10:1:225:b3ff:fe20:378a]:2379, false
237
5942bc58307f8af9, started, server83, https://[2a0a:e5c0:10:1:3e4a:92ff:fe79:bb98]:2380, https://[2a0a:e5c0:10:1:3e4a:92ff:fe79:bb98]:2379, false
238
239
[10:48] line:~% kubectl exec -n kube-system -ti etcd-server65 -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert  /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member remove 371b8a07185dee7e
240
Member 371b8a07185dee7e removed from cluster e3c0805f592a8f77
241 1 Nico Schottelius
242
</pre>
243
244
SEE ALSO
245
246
* We found the solution using https://stackoverflow.com/questions/67921552/re-installed-node-cannot-join-kubernetes-cluster
247 56 Nico Schottelius
248 147 Nico Schottelius
h3. Node labels (adding, showing, removing)
249
250
Listing the labels:
251
252
<pre>
253
kubectl get nodes --show-labels
254
</pre>
255
256
Adding labels:
257
258
<pre>
259
kubectl label nodes LIST-OF-NODES label1=value1 
260
261
</pre>
262
263
For instance:
264
265
<pre>
266
kubectl label nodes router2 router3 hosttype=router 
267
</pre>
268
269
Selecting nodes in pods:
270
271
<pre>
272
apiVersion: v1
273
kind: Pod
274
...
275
spec:
276
  nodeSelector:
277
    hosttype: router
278
</pre>
279
280 148 Nico Schottelius
Removing labels by adding a minus at the end of the label name:
281
282
<pre>
283
kubectl label node <nodename> <labelname>-
284
</pre>
285
286
For instance:
287
288
<pre>
289
kubectl label nodes router2 router3 hosttype- 
290
</pre>
291
292 147 Nico Schottelius
SEE ALSO
293 1 Nico Schottelius
294 148 Nico Schottelius
* https://kubernetes.io/docs/tasks/configure-pod-container/assign-pods-nodes/
295
* https://stackoverflow.com/questions/34067979/how-to-delete-a-node-label-by-command-and-api
296 147 Nico Schottelius
297 101 Nico Schottelius
h3. Hardware Maintenance using ungleich-hardware
298
299
Use the following manifest and replace the HOST with the actual host:
300
301
<pre>
302
apiVersion: v1
303
kind: Pod
304
metadata:
305
  name: ungleich-hardware-HOST
306
spec:
307
  containers:
308
  - name: ungleich-hardware
309
    image: ungleich/ungleich-hardware:0.0.5
310
    args:
311
    - sleep
312
    - "1000000"
313
    volumeMounts:
314
      - mountPath: /dev
315
        name: dev
316
    securityContext:
317
      privileged: true
318
  nodeSelector:
319
    kubernetes.io/hostname: "HOST"
320
321
  volumes:
322
    - name: dev
323
      hostPath:
324
        path: /dev
325
</pre>
326
327 102 Nico Schottelius
Also see: [[The_ungleich_hardware_maintenance_guide]]
328
329 105 Nico Schottelius
h3. Triggering a cronjob / creating a job from a cronjob
330 104 Nico Schottelius
331
To test a cronjob, we can create a job from a cronjob:
332
333
<pre>
334
kubectl create job --from=cronjob/volume2-daily-backup volume2-manual
335
</pre>
336
337
This creates a job volume2-manual based on the cronjob  volume2-daily
338
339 112 Nico Schottelius
h3. su-ing into a user that has nologin shell set
340
341
Many times users are having nologin as their shell inside the container. To be able to execute maintenance commands within the
342
container, we can use @su -s /bin/sh@ like this:
343
344
<pre>
345
su -s /bin/sh -c '/path/to/your/script' testuser
346
</pre>
347
348
Found on https://serverfault.com/questions/351046/how-to-run-command-as-user-who-has-usr-sbin-nologin-as-shell
349
350 113 Nico Schottelius
h3. How to print a secret value
351
352
Assuming you want the "password" item from a secret, use:
353
354
<pre>
355
kubectl get secret SECRETNAME -o jsonpath="{.data.password}" | base64 -d; echo "" 
356
</pre>
357
358 157 Nico Schottelius
h2. Reference CNI
359
360
* Mainly "stupid", but effective plugins
361
* Main documentation on https://www.cni.dev/plugins/current/
362 158 Nico Schottelius
* Plugins
363
** bridge
364
*** Can create the bridge on the host
365
*** But seems not to be able to add host interfaces to it as well
366
*** Has support for vlan tags
367
** vlan
368
*** creates vlan tagged sub interface on the host
369
*** Reads like a 1:1 mapping (i.e. no bridge in between)
370
** host-device
371
*** moves the interface from the host into the container
372
*** very easy for physical connections to containers
373 159 Nico Schottelius
** ipvlan
374
*** "virtualisation" of a host device
375
*** routing based on IP
376
*** Same MAC for everyone
377
*** Cannot reach the master interface
378
** maclvan
379
*** With mac addresses
380
*** Supports various modes (to be checked)
381
** ptp ("point to point")
382
*** Creates a host device and connects it to the container
383
** win*
384
*** Windows implementations
385 158 Nico Schottelius
386 157 Nico Schottelius
387 62 Nico Schottelius
h2. Calico CNI
388
389
h3. Calico Installation
390
391
* We install "calico using helm":https://docs.projectcalico.org/getting-started/kubernetes/helm
392
* This has the following advantages:
393
** Easy to upgrade
394
** Does not require os to configure IPv6/dual stack settings as the tigera operator figures out things on its own
395
396
Usually plain calico can be installed directly using:
397
398
<pre>
399 149 Nico Schottelius
VERSION=v3.24.1
400
401 120 Nico Schottelius
helm repo add projectcalico https://docs.projectcalico.org/charts
402 124 Nico Schottelius
helm upgrade --install --namespace tigera calico projectcalico/tigera-operator --version $VERSION --create-namespace
403 1 Nico Schottelius
</pre>
404 92 Nico Schottelius
405
* Check the tags on https://github.com/projectcalico/calico/tags for the latest release
406 62 Nico Schottelius
407
h3. Installing calicoctl
408
409 115 Nico Schottelius
* General installation instructions, including binary download: https://projectcalico.docs.tigera.io/maintenance/clis/calicoctl/install
410
411 62 Nico Schottelius
To be able to manage and configure calico, we need to 
412
"install calicoctl (we choose the version as a pod)":https://docs.projectcalico.org/getting-started/clis/calicoctl/install#install-calicoctl-as-a-kubernetes-pod
413
414
<pre>
415
kubectl apply -f https://docs.projectcalico.org/manifests/calicoctl.yaml
416
</pre>
417
418 93 Nico Schottelius
Or version specific:
419
420
<pre>
421
kubectl apply -f https://github.com/projectcalico/calico/blob/v3.20.4/manifests/calicoctl.yaml
422 97 Nico Schottelius
423
# For 3.22
424
kubectl apply -f https://projectcalico.docs.tigera.io/archive/v3.22/manifests/calicoctl.yaml
425 93 Nico Schottelius
</pre>
426
427 70 Nico Schottelius
And making it easier accessible by alias:
428
429
<pre>
430
alias calicoctl="kubectl exec -i -n kube-system calicoctl -- /calicoctl"
431
</pre>
432
433 62 Nico Schottelius
h3. Calico configuration
434
435 63 Nico Schottelius
By default our k8s clusters "BGP peer":https://docs.projectcalico.org/networking/bgp
436
with an upstream router to propagate podcidr and servicecidr.
437 62 Nico Schottelius
438
Default settings in our infrastructure:
439
440
* We use a full-mesh using the @nodeToNodeMeshEnabled: true@ option
441
* We keep the original next hop so that *only* the server with the pod is announcing it (instead of ecmp)
442 1 Nico Schottelius
* We use private ASNs for k8s clusters
443 63 Nico Schottelius
* We do *not* use any overlay
444 62 Nico Schottelius
445
After installing calico and calicoctl the last step of the installation is usually:
446
447 1 Nico Schottelius
<pre>
448 79 Nico Schottelius
calicoctl create -f - < calico-bgp.yaml
449 62 Nico Schottelius
</pre>
450
451
452
A sample BGP configuration:
453
454
<pre>
455
---
456
apiVersion: projectcalico.org/v3
457
kind: BGPConfiguration
458
metadata:
459
  name: default
460
spec:
461
  logSeverityScreen: Info
462
  nodeToNodeMeshEnabled: true
463
  asNumber: 65534
464
  serviceClusterIPs:
465
  - cidr: 2a0a:e5c0:10:3::/108
466
  serviceExternalIPs:
467
  - cidr: 2a0a:e5c0:10:3::/108
468
---
469
apiVersion: projectcalico.org/v3
470
kind: BGPPeer
471
metadata:
472
  name: router1-place10
473
spec:
474
  peerIP: 2a0a:e5c0:10:1::50
475
  asNumber: 213081
476
  keepOriginalNextHop: true
477
</pre>
478
479 126 Nico Schottelius
h2. Cilium CNI (experimental)
480
481 137 Nico Schottelius
h3. Status
482
483 138 Nico Schottelius
*NO WORKING CILIUM CONFIGURATION FOR IPV6 only modes*
484 137 Nico Schottelius
485 146 Nico Schottelius
h3. Latest error
486
487
It seems cilium does not run on IPv6 only hosts:
488
489
<pre>
490
level=info msg="Validating configured node address ranges" subsys=daemon
491
level=fatal msg="postinit failed" error="external IPv4 node address could not be derived, please configure via --ipv4-node" subsys=daemon
492
level=info msg="Starting IP identity watcher" subsys=ipcache
493
</pre>
494
495
It crashes after that log entry
496
497 128 Nico Schottelius
h3. BGP configuration
498
499
* The cilium-operator will not start without a correct configmap being present beforehand (see error message below)
500
* Creating the bgp config beforehand as a configmap is thus required.
501
502
The error one gets without the configmap present:
503
504
Pods are hanging with:
505
506
<pre>
507
cilium-bpqm6                       0/1     Init:0/4            0             9s
508
cilium-operator-5947d94f7f-5bmh2   0/1     ContainerCreating   0             9s
509
</pre>
510
511
The error message in the cilium-*perator is:
512
513
<pre>
514
Events:
515
  Type     Reason       Age                From               Message
516
  ----     ------       ----               ----               -------
517
  Normal   Scheduled    80s                default-scheduler  Successfully assigned kube-system/cilium-operator-5947d94f7f-lqcsp to server56
518
  Warning  FailedMount  16s (x8 over 80s)  kubelet            MountVolume.SetUp failed for volume "bgp-config-path" : configmap "bgp-config" not found
519
</pre>
520
521
A correct bgp config looks like this:
522
523
<pre>
524
apiVersion: v1
525
kind: ConfigMap
526
metadata:
527
  name: bgp-config
528
  namespace: kube-system
529
data:
530
  config.yaml: |
531
    peers:
532
      - peer-address: 2a0a:e5c0::46
533
        peer-asn: 209898
534
        my-asn: 65533
535
      - peer-address: 2a0a:e5c0::47
536
        peer-asn: 209898
537
        my-asn: 65533
538
    address-pools:
539
      - name: default
540
        protocol: bgp
541
        addresses:
542
          - 2a0a:e5c0:0:14::/64
543
</pre>
544 127 Nico Schottelius
545
h3. Installation
546 130 Nico Schottelius
547 127 Nico Schottelius
Adding the repo
548 1 Nico Schottelius
<pre>
549 127 Nico Schottelius
550 129 Nico Schottelius
helm repo add cilium https://helm.cilium.io/
551 130 Nico Schottelius
helm repo update
552
</pre>
553 129 Nico Schottelius
554 135 Nico Schottelius
Installing + configuring cilium
555 129 Nico Schottelius
<pre>
556 130 Nico Schottelius
ipv6pool=2a0a:e5c0:0:14::/112
557 1 Nico Schottelius
558 146 Nico Schottelius
version=1.12.2
559 129 Nico Schottelius
560
helm upgrade --install cilium cilium/cilium --version $version \
561 1 Nico Schottelius
  --namespace kube-system \
562
  --set ipv4.enabled=false \
563
  --set ipv6.enabled=true \
564 146 Nico Schottelius
  --set enableIPv6Masquerade=false \
565
  --set bgpControlPlane.enabled=true 
566 1 Nico Schottelius
567 146 Nico Schottelius
#  --set ipam.operator.clusterPoolIPv6PodCIDRList=$ipv6pool
568
569
# Old style bgp?
570 136 Nico Schottelius
#   --set bgp.enabled=true --set bgp.announce.podCIDR=true \
571 127 Nico Schottelius
572
# Show possible configuration options
573
helm show values cilium/cilium
574
575 1 Nico Schottelius
</pre>
576 132 Nico Schottelius
577
Using a /64 for ipam.operator.clusterPoolIPv6PodCIDRList fails with:
578
579
<pre>
580
level=fatal msg="Unable to init cluster-pool allocator" error="unable to initialize IPv6 allocator New CIDR set failed; the node CIDR size is too big" subsys=cilium-operator-generic
581
</pre>
582
583 126 Nico Schottelius
584 1 Nico Schottelius
See also https://github.com/cilium/cilium/issues/20756
585 135 Nico Schottelius
586
Seems a /112 is actually working.
587
588
h3. Kernel modules
589
590
Cilium requires the following modules to be loaded on the host (not loaded by default):
591
592
<pre>
593 1 Nico Schottelius
modprobe  ip6table_raw
594
modprobe  ip6table_filter
595
</pre>
596 146 Nico Schottelius
597
h3. Interesting helm flags
598
599
* autoDirectNodeRoutes
600
* bgpControlPlane.enabled = true
601
602
h3. SEE ALSO
603
604
* https://docs.cilium.io/en/v1.12/helm-reference/
605 133 Nico Schottelius
606 150 Nico Schottelius
h2. Multus (incomplete/experimental)
607
608
(TBD)
609
610 122 Nico Schottelius
h2. ArgoCD 
611 56 Nico Schottelius
612 60 Nico Schottelius
h3. Argocd Installation
613 1 Nico Schottelius
614 116 Nico Schottelius
* See https://argo-cd.readthedocs.io/en/stable/
615
616 60 Nico Schottelius
As there is no configuration management present yet, argocd is installed using
617
618 1 Nico Schottelius
<pre>
619 60 Nico Schottelius
kubectl create namespace argocd
620 86 Nico Schottelius
621 96 Nico Schottelius
# Specific Version
622
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/v2.3.2/manifests/install.yaml
623 86 Nico Schottelius
624
# OR: latest stable
625 60 Nico Schottelius
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml
626 56 Nico Schottelius
</pre>
627 1 Nico Schottelius
628 116 Nico Schottelius
629 1 Nico Schottelius
630 60 Nico Schottelius
h3. Get the argocd credentials
631
632
<pre>
633
kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d; echo ""
634
</pre>
635 52 Nico Schottelius
636 87 Nico Schottelius
h3. Accessing argocd
637
638
In regular IPv6 clusters:
639
640
* Navigate to https://argocd-server.argocd.CLUSTERDOMAIN
641
642
In legacy IPv4 clusters
643
644
<pre>
645
kubectl --namespace argocd port-forward svc/argocd-server 8080:80
646
</pre>
647
648 88 Nico Schottelius
* Navigate to https://localhost:8080
649
650 68 Nico Schottelius
h3. Using the argocd webhook to trigger changes
651 67 Nico Schottelius
652
* To trigger changes post json https://argocd.example.com/api/webhook
653
654 72 Nico Schottelius
h3. Deploying an application
655
656
* Applications are deployed via git towards gitea (code.ungleich.ch) and then pulled by argo
657 73 Nico Schottelius
* Always include the *redmine-url* pointing to the (customer) ticket
658
** Also add the support-url if it exists
659 72 Nico Schottelius
660
Application sample
661
662
<pre>
663
apiVersion: argoproj.io/v1alpha1
664
kind: Application
665
metadata:
666
  name: gitea-CUSTOMER
667
  namespace: argocd
668
spec:
669
  destination:
670
    namespace: default
671
    server: 'https://kubernetes.default.svc'
672
  source:
673
    path: apps/prod/gitea
674
    repoURL: 'https://code.ungleich.ch/ungleich-intern/k8s-config.git'
675
    targetRevision: HEAD
676
    helm:
677
      parameters:
678
        - name: storage.data.storageClass
679
          value: rook-ceph-block-hdd
680
        - name: storage.data.size
681
          value: 200Gi
682
        - name: storage.db.storageClass
683
          value: rook-ceph-block-ssd
684
        - name: storage.db.size
685
          value: 10Gi
686
        - name: storage.letsencrypt.storageClass
687
          value: rook-ceph-block-hdd
688
        - name: storage.letsencrypt.size
689
          value: 50Mi
690
        - name: letsencryptStaging
691
          value: 'no'
692
        - name: fqdn
693
          value: 'code.verua.online'
694
  project: default
695
  syncPolicy:
696
    automated:
697
      prune: true
698
      selfHeal: true
699
  info:
700
    - name: 'redmine-url'
701
      value: 'https://redmine.ungleich.ch/issues/ISSUEID'
702
    - name: 'support-url'
703
      value: 'https://support.ungleich.ch/Ticket/Display.html?id=TICKETID'
704
</pre>
705
706 80 Nico Schottelius
h2. Helm related operations and conventions
707 55 Nico Schottelius
708 61 Nico Schottelius
We use helm charts extensively.
709
710
* In production, they are managed via argocd
711
* In development, helm chart can de developed and deployed manually using the helm utility.
712
713 55 Nico Schottelius
h3. Installing a helm chart
714
715
One can use the usual pattern of
716
717
<pre>
718
helm install <releasename> <chartdirectory>
719
</pre>
720
721
However often you want to reinstall/update when testing helm charts. The following pattern is "better", because it allows you to reinstall, if it is already installed:
722
723
<pre>
724
helm upgrade --install <releasename> <chartdirectory>
725 1 Nico Schottelius
</pre>
726 80 Nico Schottelius
727
h3. Naming services and deployments in helm charts [Application labels]
728
729
* We always have {{ .Release.Name }} to identify the current "instance"
730
* Deployments:
731
** use @app: <what it is>@, f.i. @app: nginx@, @app: postgres@, ...
732 81 Nico Schottelius
* See more about standard labels on
733
** https://kubernetes.io/docs/concepts/overview/working-with-objects/common-labels/
734
** https://helm.sh/docs/chart_best_practices/labels/
735 55 Nico Schottelius
736 151 Nico Schottelius
h3. Show all versions of a helm chart
737
738
<pre>
739
helm search repo -l repo/chart
740
</pre>
741
742
For example:
743
744
<pre>
745
% helm search repo -l projectcalico/tigera-operator 
746
NAME                         	CHART VERSION	APP VERSION	DESCRIPTION                            
747
projectcalico/tigera-operator	v3.23.3      	v3.23.3    	Installs the Tigera operator for Calico
748
projectcalico/tigera-operator	v3.23.2      	v3.23.2    	Installs the Tigera operator for Calico
749
....
750
</pre>
751
752 152 Nico Schottelius
h3. Show possible values of a chart
753
754
<pre>
755
helm show values <repo/chart>
756
</pre>
757
758
Example:
759
760
<pre>
761
helm show values ingress-nginx/ingress-nginx
762
</pre>
763
764
765 139 Nico Schottelius
h2. Rook + Ceph
766
767
h3. Installation
768
769
* Usually directly via argocd
770
771
Manual steps:
772
773
<pre>
774
775
</pre>
776 43 Nico Schottelius
777 71 Nico Schottelius
h3. Executing ceph commands
778
779
Using the ceph-tools pod as follows:
780
781
<pre>
782
kubectl exec -n rook-ceph -ti $(kubectl -n rook-ceph get pods -l app=rook-ceph-tools -o jsonpath='{.items[*].metadata.name}') -- ceph -s
783
</pre>
784
785 43 Nico Schottelius
h3. Inspecting the logs of a specific server
786
787
<pre>
788
# Get the related pods
789
kubectl -n rook-ceph get pods -l app=rook-ceph-osd-prepare 
790
...
791
792
# Inspect the logs of a specific pod
793
kubectl -n rook-ceph logs -f rook-ceph-osd-prepare-server23--1-444qx
794
795 71 Nico Schottelius
</pre>
796
797
h3. Inspecting the logs of the rook-ceph-operator
798
799
<pre>
800
kubectl -n rook-ceph logs -f -l app=rook-ceph-operator
801 43 Nico Schottelius
</pre>
802
803 121 Nico Schottelius
h3. Restarting the rook operator
804
805
<pre>
806
kubectl -n rook-ceph delete pods  -l app=rook-ceph-operator
807
</pre>
808
809 43 Nico Schottelius
h3. Triggering server prepare / adding new osds
810
811
The rook-ceph-operator triggers/watches/creates pods to maintain hosts. To trigger a full "re scan", simply delete that pod:
812
813
<pre>
814
kubectl -n rook-ceph delete pods -l app=rook-ceph-operator
815
</pre>
816
817
This will cause all the @rook-ceph-osd-prepare-..@ jobs to be recreated and thus OSDs to be created, if new disks have been added.
818
819
h3. Removing an OSD
820
821
* See "Ceph OSD Management":https://rook.io/docs/rook/v1.7/ceph-osd-mgmt.html
822 77 Nico Schottelius
* More specifically: https://github.com/rook/rook/blob/release-1.7/cluster/examples/kubernetes/ceph/osd-purge.yaml
823 99 Nico Schottelius
* Then delete the related deployment
824 41 Nico Schottelius
825 98 Nico Schottelius
Set osd id in the osd-purge.yaml and apply it. OSD should be down before.
826
827
<pre>
828
apiVersion: batch/v1
829
kind: Job
830
metadata:
831
  name: rook-ceph-purge-osd
832
  namespace: rook-ceph # namespace:cluster
833
  labels:
834
    app: rook-ceph-purge-osd
835
spec:
836
  template:
837
    metadata:
838
      labels:
839
        app: rook-ceph-purge-osd
840
    spec:
841
      serviceAccountName: rook-ceph-purge-osd
842
      containers:
843
        - name: osd-removal
844
          image: rook/ceph:master
845
          # TODO: Insert the OSD ID in the last parameter that is to be removed
846
          # The OSD IDs are a comma-separated list. For example: "0" or "0,2".
847
          # If you want to preserve the OSD PVCs, set `--preserve-pvc true`.
848
          #
849
          # A --force-osd-removal option is available if the OSD should be destroyed even though the
850
          # removal could lead to data loss.
851
          args:
852
            - "ceph"
853
            - "osd"
854
            - "remove"
855
            - "--preserve-pvc"
856
            - "false"
857
            - "--force-osd-removal"
858
            - "false"
859
            - "--osd-ids"
860
            - "SETTHEOSDIDHERE"
861
          env:
862
            - name: POD_NAMESPACE
863
              valueFrom:
864
                fieldRef:
865
                  fieldPath: metadata.namespace
866
            - name: ROOK_MON_ENDPOINTS
867
              valueFrom:
868
                configMapKeyRef:
869
                  key: data
870
                  name: rook-ceph-mon-endpoints
871
            - name: ROOK_CEPH_USERNAME
872
              valueFrom:
873
                secretKeyRef:
874
                  key: ceph-username
875
                  name: rook-ceph-mon
876
            - name: ROOK_CEPH_SECRET
877
              valueFrom:
878
                secretKeyRef:
879
                  key: ceph-secret
880
                  name: rook-ceph-mon
881
            - name: ROOK_CONFIG_DIR
882
              value: /var/lib/rook
883
            - name: ROOK_CEPH_CONFIG_OVERRIDE
884
              value: /etc/rook/config/override.conf
885
            - name: ROOK_FSID
886
              valueFrom:
887
                secretKeyRef:
888
                  key: fsid
889
                  name: rook-ceph-mon
890
            - name: ROOK_LOG_LEVEL
891
              value: DEBUG
892
          volumeMounts:
893
            - mountPath: /etc/ceph
894
              name: ceph-conf-emptydir
895
            - mountPath: /var/lib/rook
896
              name: rook-config
897
      volumes:
898
        - emptyDir: {}
899
          name: ceph-conf-emptydir
900
        - emptyDir: {}
901
          name: rook-config
902
      restartPolicy: Never
903
904
905 99 Nico Schottelius
</pre>
906
907
Deleting the deployment:
908
909
<pre>
910
[18:05] bridge:~% kubectl -n rook-ceph delete deployment rook-ceph-osd-6
911
deployment.apps "rook-ceph-osd-6" deleted
912 98 Nico Schottelius
</pre>
913
914 145 Nico Schottelius
h2. Ingress + Cert Manager
915
916
* We deploy "nginx-ingress":https://docs.nginx.com/nginx-ingress-controller/ to get an ingress
917
* we deploy "cert-manager":https://cert-manager.io/ to handle certificates
918
* We independently deploy @ClusterIssuer@ to allow the cert-manager app to deploy and the issuer to be created once the CRDs from cert manager are in place
919
920
h3. IPv4 reachability 
921
922
The ingress is by default IPv6 only. To make it reachable from the IPv4 world, get its IPv6 address and configure a NAT64 mapping in Jool.
923
924
Steps:
925
926
h4. Get the ingress IPv6 address
927
928
Use @kubectl -n ingress-nginx get svc ingress-nginx-controller -o jsonpath='{.spec.clusterIP}'; echo ''@
929
930
Example:
931
932
<pre>
933
kubectl -n ingress-nginx get svc ingress-nginx-controller -o jsonpath='{.spec.clusterIP}'; echo ''
934
2a0a:e5c0:10:1b::ce11
935
</pre>
936
937
h4. Add NAT64 mapping
938
939
* Update the __dcl_jool_siit cdist type
940
* Record the two IPs (IPv6 and IPv4)
941
* Configure all routers
942
943
944
h4. Add DNS record
945
946
To use the ingress capable as a CNAME destination, create an "ingress" DNS record, such as:
947
948
<pre>
949
; k8s ingress for dev
950
dev-ingress                 AAAA 2a0a:e5c0:10:1b::ce11
951
dev-ingress                 A 147.78.194.23
952
953
</pre> 
954
955
h4. Add supporting wildcard DNS
956
957
If you plan to add various sites under a specific domain, we can add a wildcard DNS entry, such as *.k8s-dev.django-hosting.ch:
958
959
<pre>
960
*.k8s-dev         CNAME dev-ingress.ungleich.ch.
961
</pre>
962
963 76 Nico Schottelius
h2. Harbor
964
965
* We user "Harbor":https://goharbor.io/ for caching and as an image registry. Internal app reference: apps/prod/harbor.
966
* The admin password is in the password store, auto generated per cluster
967
* At the moment harbor only authenticates against the internal ldap tree
968
969
h3. LDAP configuration
970
971
* The url needs to be ldaps://...
972
* uid = uid
973
* rest standard
974 75 Nico Schottelius
975 89 Nico Schottelius
h2. Monitoring / Prometheus
976
977 90 Nico Schottelius
* Via "kube-prometheus":https://github.com/prometheus-operator/kube-prometheus/
978 89 Nico Schottelius
979 91 Nico Schottelius
Access via ...
980
981
* http://prometheus-k8s.monitoring.svc:9090
982
* http://grafana.monitoring.svc:3000
983
* http://alertmanager.monitoring.svc:9093
984
985
986 100 Nico Schottelius
h3. Prometheus Options
987
988
* "helm/kube-prometheus-stack":https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack
989
** Includes dashboards and co.
990
* "manifest based kube-prometheus":https://github.com/prometheus-operator/kube-prometheus
991
** Includes dashboards and co.
992
* "Prometheus Operator (mainly CRD manifest":https://github.com/prometheus-operator/prometheus-operator
993
994 91 Nico Schottelius
995 82 Nico Schottelius
h2. Nextcloud
996
997 85 Nico Schottelius
h3. How to get the nextcloud credentials 
998 84 Nico Schottelius
999
* The initial username is set to "nextcloud"
1000
* The password is autogenerated and saved in a kubernetes secret
1001
1002
<pre>
1003 85 Nico Schottelius
kubectl get secret RELEASENAME-nextcloud -o jsonpath="{.data.PASSWORD}" | base64 -d; echo "" 
1004 84 Nico Schottelius
</pre>
1005
1006 83 Nico Schottelius
h3. How to fix "Access through untrusted domain"
1007
1008 82 Nico Schottelius
* Nextcloud stores the initial domain configuration
1009 1 Nico Schottelius
* If the FQDN is changed, it will show the error message "Access through untrusted domain"
1010 82 Nico Schottelius
* To fix, edit /var/www/html/config/config.php and correct the domain
1011 83 Nico Schottelius
* Then delete the pods
1012 82 Nico Schottelius
1013 1 Nico Schottelius
h2. Infrastructure versions
1014 35 Nico Schottelius
1015 57 Nico Schottelius
h3. ungleich kubernetes infrastructure v5 (2021-10)
1016 1 Nico Schottelius
1017 57 Nico Schottelius
Clusters are configured / setup in this order:
1018
1019
* Bootstrap via kubeadm
1020 59 Nico Schottelius
* "Networking via calico + BGP (non ECMP) using helm":https://docs.projectcalico.org/getting-started/kubernetes/helm
1021
* "ArgoCD for CD":https://argo-cd.readthedocs.io/en/stable/
1022
** "rook for storage via argocd":https://rook.io/
1023 58 Nico Schottelius
** haproxy for in IPv6-cluster-IPv4-to-IPv6 proxy via argocd
1024
** "kubernetes-secret-generator for in cluster secrets":https://github.com/mittwald/kubernetes-secret-generator
1025
** "ungleich-certbot managing certs and nginx":https://hub.docker.com/repository/docker/ungleich/ungleich-certbot
1026
1027 57 Nico Schottelius
1028
h3. ungleich kubernetes infrastructure v4 (2021-09)
1029
1030 54 Nico Schottelius
* rook is configured via manifests instead of using the rook-ceph-cluster helm chart
1031 1 Nico Schottelius
* The rook operator is still being installed via helm
1032 35 Nico Schottelius
1033 57 Nico Schottelius
h3. ungleich kubernetes infrastructure v3 (2021-07)
1034 1 Nico Schottelius
1035 10 Nico Schottelius
* rook is now installed via helm via argocd instead of directly via manifests
1036 28 Nico Schottelius
1037 57 Nico Schottelius
h3. ungleich kubernetes infrastructure v2 (2021-05)
1038 28 Nico Schottelius
1039
* Replaced fluxv2 from ungleich k8s v1 with argocd
1040 1 Nico Schottelius
** argocd can apply helm templates directly without needing to go through Chart releases
1041 28 Nico Schottelius
* We are also using argoflow for build flows
1042
* Planned to add "kaniko":https://github.com/GoogleContainerTools/kaniko for image building
1043
1044 57 Nico Schottelius
h3. ungleich kubernetes infrastructure v1 (2021-01)
1045 28 Nico Schottelius
1046
We are using the following components:
1047
1048
* "Calico as a CNI":https://www.projectcalico.org/ with BGP, IPv6 only, no encapsulation
1049
** Needed for basic networking
1050
* "kubernetes-secret-generator":https://github.com/mittwald/kubernetes-secret-generator for creating secrets
1051
** Needed so that secrets are not stored in the git repository, but only in the cluster
1052
* "ungleich-certbot":https://hub.docker.com/repository/docker/ungleich/ungleich-certbot
1053
** Needed to get letsencrypt certificates for services
1054
* "rook with ceph rbd + cephfs":https://rook.io/ for storage
1055
** rbd for almost everything, *ReadWriteOnce*
1056
** cephfs for smaller things, multi access *ReadWriteMany*
1057
** Needed for providing persistent storage
1058
* "flux v2":https://fluxcd.io/
1059
** Needed to manage resources automatically