Project

General

Profile

The ungleich kubernetes infrastructure » History » Version 160

Nico Schottelius, 10/30/2022 08:09 AM

1 22 Nico Schottelius
h1. The ungleich kubernetes infrastructure and ungleich kubernetes manual
2 1 Nico Schottelius
3 3 Nico Schottelius
{{toc}}
4
5 1 Nico Schottelius
h2. Status
6
7 28 Nico Schottelius
This document is **pre-production**.
8
This document is to become the ungleich kubernetes infrastructure overview as well as the ungleich kubernetes manual.
9 1 Nico Schottelius
10 10 Nico Schottelius
h2. k8s clusters
11
12 123 Nico Schottelius
| Cluster            | Purpose/Setup     | Maintainer | Master(s)                     | argo                                                   | v4 http proxy | last verified |
13
| c0.k8s.ooo         | Dev               | -          | UNUSED                        |                                                        |               |    2021-10-05 |
14
| c1.k8s.ooo         | retired           |            | -                             |                                                        |               |    2022-03-15 |
15
| c2.k8s.ooo         | Dev p7 HW         | Nico       | server47 server53 server54    | "argo":https://argocd-server.argocd.svc.c2.k8s.ooo     |               |    2021-10-05 |
16
| c3.k8s.ooo         | retired           | -          | -                             |                                                        |               |    2021-10-05 |
17
| c4.k8s.ooo         | Dev2 p7 HW        | Jin-Guk    | server52 server53 server54    |                                                        |               |             - |
18
| c5.k8s.ooo         | retired           |            | -                             |                                                        |               |    2022-03-15 |
19
| c6.k8s.ooo         | Dev p6 VM Jin-Guk | Jin-Guk    |                               |                                                        |               |               |
20
| [[p5.k8s.ooo]]     | production        |            | server34 server36 server38    | "argo":https://argocd-server.argocd.svc.p5.k8s.ooo     | -             |               |
21
| [[p5-cow.k8s.ooo]] | production        | Nico       | server47 server51 server55    | "argo":https://argocd-server.argocd.svc.p5-cow.k8s.ooo |               |    2022-08-27 |
22
| [[p6.k8s.ooo]]     | production        |            | server67 server69 server71    | "argo":https://argocd-server.argocd.svc.p6.k8s.ooo     | 147.78.194.13 |    2021-10-05 |
23
| [[p10.k8s.ooo]]    | production        |            | server63 server65 server83    | "argo":https://argocd-server.argocd.svc.p10.k8s.ooo    | 147.78.194.12 |    2021-10-05 |
24
| [[k8s.ge.nau.so]]  | development       |            | server107 server108 server109 | "argo":https://argocd-server.argocd.svc.k8s.ge.nau.so  |               |               |
25
| [[dev.k8s.ooo]]    | development       |            | server110 server111 server112 | "argo":https://argocd-server.argocd.svc.dev.k8s.ooo    | -             |    2022-07-08 |
26 142 Nico Schottelius
| [[server121.k8s.ooo]] | production | Nico | server121 | | | 2022-09-06 |
27 155 Nico Schottelius
| [[server122-123.k8s.ooo|server122.k8s.ooo]] | production | Nico | server122 | | | 2022-10-30 |
28 156 Nico Schottelius
| [[server122-123.k8s.ooo|server123.k8s.ooo]] | production | Nico | server123 | | | 2022-10-15 |
29 21 Nico Schottelius
30 1 Nico Schottelius
h2. General architecture and components overview
31
32
* All k8s clusters are IPv6 only
33
* We use BGP peering to propagate podcidr and serviceCidr networks to our infrastructure
34
* The main public testing repository is "ungleich-k8s":https://code.ungleich.ch/ungleich-public/ungleich-k8s
35 18 Nico Schottelius
** Private configurations are found in the **k8s-config** repository
36 1 Nico Schottelius
37
h3. Cluster types
38
39 28 Nico Schottelius
| **Type/Feature**            | **Development**                | **Production**         |
40
| Min No. nodes               | 3 (1 master, 3 worker)         | 5 (3 master, 3 worker) |
41
| Recommended minimum         | 4 (dedicated master, 3 worker) | 8 (3 master, 5 worker) |
42
| Separation of control plane | optional                       | recommended            |
43
| Persistent storage          | required                       | required               |
44
| Number of storage monitors  | 3                              | 5                      |
45 1 Nico Schottelius
46 43 Nico Schottelius
h2. General k8s operations
47 1 Nico Schottelius
48 46 Nico Schottelius
h3. Cheat sheet / external great references
49
50
* "kubectl cheatsheet":https://kubernetes.io/docs/reference/kubectl/cheatsheet/
51
52 117 Nico Schottelius
h3. Allowing to schedule work on the control plane / removing node taints
53 69 Nico Schottelius
54
* Mostly for single node / test / development clusters
55
* Just remove the master taint as follows
56
57
<pre>
58
kubectl taint nodes --all node-role.kubernetes.io/master-
59 118 Nico Schottelius
kubectl taint nodes --all node-role.kubernetes.io/control-plane-
60 69 Nico Schottelius
</pre>
61 1 Nico Schottelius
62 117 Nico Schottelius
You can check the node taints using @kubectl describe node ...@
63 69 Nico Schottelius
64 44 Nico Schottelius
h3. Get the cluster admin.conf
65
66
* On the masters of each cluster you can find the file @/etc/kubernetes/admin.conf@
67
* To be able to administrate the cluster you can copy the admin.conf to your local machine
68
* Multi cluster debugging can very easy if you name the config ~/cX-admin.conf (see example below)
69
70
<pre>
71
% scp root@server47.place7.ungleich.ch:/etc/kubernetes/admin.conf ~/c2-admin.conf
72
% export KUBECONFIG=~/c2-admin.conf    
73
% kubectl get nodes
74
NAME       STATUS                     ROLES                  AGE   VERSION
75
server47   Ready                      control-plane,master   82d   v1.22.0
76
server48   Ready                      control-plane,master   82d   v1.22.0
77
server49   Ready                      <none>                 82d   v1.22.0
78
server50   Ready                      <none>                 82d   v1.22.0
79
server59   Ready                      control-plane,master   82d   v1.22.0
80
server60   Ready,SchedulingDisabled   <none>                 82d   v1.22.0
81
server61   Ready                      <none>                 82d   v1.22.0
82
server62   Ready                      <none>                 82d   v1.22.0               
83
</pre>
84
85 18 Nico Schottelius
h3. Installing a new k8s cluster
86 8 Nico Schottelius
87 9 Nico Schottelius
* Decide on the cluster name (usually *cX.k8s.ooo*), X counting upwards
88 28 Nico Schottelius
** Using pXX.k8s.ooo for production clusters of placeXX
89 9 Nico Schottelius
* Use cdist to configure the nodes with requirements like crio
90
* Decide between single or multi node control plane setups (see below)
91 28 Nico Schottelius
** Single control plane suitable for development clusters
92 9 Nico Schottelius
93 28 Nico Schottelius
Typical init procedure:
94 9 Nico Schottelius
95 28 Nico Schottelius
* Single control plane: @kubeadm init --config bootstrap/XXX/kubeadm.yaml@
96
* Multi control plane (HA): @kubeadm init --config bootstrap/XXX/kubeadm.yaml --upload-certs@
97 10 Nico Schottelius
98 29 Nico Schottelius
h3. Deleting a pod that is hanging in terminating state
99
100
<pre>
101
kubectl delete pod <PODNAME> --grace-period=0 --force --namespace <NAMESPACE>
102
</pre>
103
104
(from https://stackoverflow.com/questions/35453792/pods-stuck-in-terminating-status)
105
106 42 Nico Schottelius
h3. Listing nodes of a cluster
107
108
<pre>
109
[15:05] bridge:~% kubectl get nodes
110
NAME       STATUS   ROLES                  AGE   VERSION
111
server22   Ready    <none>                 52d   v1.22.0
112
server23   Ready    <none>                 52d   v1.22.2
113
server24   Ready    <none>                 52d   v1.22.0
114
server25   Ready    <none>                 52d   v1.22.0
115
server26   Ready    <none>                 52d   v1.22.0
116
server27   Ready    <none>                 52d   v1.22.0
117
server63   Ready    control-plane,master   52d   v1.22.0
118
server64   Ready    <none>                 52d   v1.22.0
119
server65   Ready    control-plane,master   52d   v1.22.0
120
server66   Ready    <none>                 52d   v1.22.0
121
server83   Ready    control-plane,master   52d   v1.22.0
122
server84   Ready    <none>                 52d   v1.22.0
123
server85   Ready    <none>                 52d   v1.22.0
124
server86   Ready    <none>                 52d   v1.22.0
125
</pre>
126
127 41 Nico Schottelius
h3. Removing / draining a node
128
129
Usually @kubectl drain server@ should do the job, but sometimes we need to be more aggressive:
130
131 1 Nico Schottelius
<pre>
132 103 Nico Schottelius
kubectl drain --delete-emptydir-data --ignore-daemonsets serverXX
133 42 Nico Schottelius
</pre>
134
135
h3. Readding a node after draining
136
137
<pre>
138
kubectl uncordon serverXX
139 1 Nico Schottelius
</pre>
140 43 Nico Schottelius
141 50 Nico Schottelius
h3. (Re-)joining worker nodes after creating the cluster
142 49 Nico Schottelius
143
* We need to have an up-to-date token
144
* We use different join commands for the workers and control plane nodes
145
146
Generating the join command on an existing control plane node:
147
148
<pre>
149
kubeadm token create --print-join-command
150
</pre>
151
152 50 Nico Schottelius
h3. (Re-)joining control plane nodes after creating the cluster
153 1 Nico Schottelius
154 50 Nico Schottelius
* We generate the token again
155
* We upload the certificates
156
* We need to combine/create the join command for the control plane node
157
158
Example session:
159
160
<pre>
161
% kubeadm token create --print-join-command
162
kubeadm join p10-api.k8s.ooo:6443 --token xmff4i.ABC --discovery-token-ca-cert-hash sha256:longhash 
163
164
% kubeadm init phase upload-certs --upload-certs
165
[upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
166
[upload-certs] Using certificate key:
167
CERTKEY
168
169
# Then we use these two outputs on the joining node:
170
171
kubeadm join p10-api.k8s.ooo:6443 --token xmff4i.ABC --discovery-token-ca-cert-hash sha256:longhash --control-plane --certificate-key CERTKEY
172
</pre>
173
174
Commands to be used on a control plane node:
175
176
<pre>
177
kubeadm token create --print-join-command
178
kubeadm init phase upload-certs --upload-certs
179
</pre>
180
181
Commands to be used on the joining node:
182
183
<pre>
184
JOINCOMMAND --control-plane --certificate-key CERTKEY
185
</pre>
186 49 Nico Schottelius
187 51 Nico Schottelius
SEE ALSO
188
189
* https://stackoverflow.com/questions/63936268/how-to-generate-kubeadm-token-for-secondary-control-plane-nodes
190
* https://blog.scottlowe.org/2019/08/15/reconstructing-the-join-command-for-kubeadm/
191
192 53 Nico Schottelius
h3. How to fix etcd does not start when rejoining a kubernetes cluster as a control plane
193 52 Nico Schottelius
194
If during the above step etcd does not come up, @kubeadm join@ can hang as follows:
195
196
<pre>
197
[control-plane] Creating static Pod manifest for "kube-apiserver"                                                              
198
[control-plane] Creating static Pod manifest for "kube-controller-manager"                                                     
199
[control-plane] Creating static Pod manifest for "kube-scheduler"                                                              
200
[check-etcd] Checking that the etcd cluster is healthy                                                                         
201
error execution phase check-etcd: etcd cluster is not healthy: failed to dial endpoint https://[2a0a:e5c0:10:1:225:b3ff:fe20:37
202
8a]:2379 with maintenance client: context deadline exceeded                                                                    
203
To see the stack trace of this error execute with --v=5 or higher         
204
</pre>
205
206
Then the problem is likely that the etcd server is still a member of the cluster. We first need to remove it from the etcd cluster and then the join works.
207
208
To fix this we do:
209
210
* Find a working etcd pod
211
* Find the etcd members / member list
212
* Remove the etcd member that we want to re-join the cluster
213
214
215
<pre>
216
# Find the etcd pods
217
kubectl -n kube-system get pods -l component=etcd,tier=control-plane
218
219
# Get the list of etcd servers with the member id 
220
kubectl exec -n kube-system -ti ETCDPODNAME -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert  /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member list
221
222
# Remove the member
223
kubectl exec -n kube-system -ti ETCDPODNAME -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert  /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member remove MEMBERID
224
</pre>
225
226
Sample session:
227
228
<pre>
229
[10:48] line:~% kubectl -n kube-system get pods -l component=etcd,tier=control-plane
230
NAME            READY   STATUS    RESTARTS     AGE
231
etcd-server63   1/1     Running   0            3m11s
232
etcd-server65   1/1     Running   3            7d2h
233
etcd-server83   1/1     Running   8 (6d ago)   7d2h
234
[10:48] line:~% kubectl exec -n kube-system -ti etcd-server65 -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert  /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member list
235
356891cd676df6e4, started, server65, https://[2a0a:e5c0:10:1:225:b3ff:fe20:375c]:2380, https://[2a0a:e5c0:10:1:225:b3ff:fe20:375c]:2379, false
236
371b8a07185dee7e, started, server63, https://[2a0a:e5c0:10:1:225:b3ff:fe20:378a]:2380, https://[2a0a:e5c0:10:1:225:b3ff:fe20:378a]:2379, false
237
5942bc58307f8af9, started, server83, https://[2a0a:e5c0:10:1:3e4a:92ff:fe79:bb98]:2380, https://[2a0a:e5c0:10:1:3e4a:92ff:fe79:bb98]:2379, false
238
239
[10:48] line:~% kubectl exec -n kube-system -ti etcd-server65 -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert  /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member remove 371b8a07185dee7e
240
Member 371b8a07185dee7e removed from cluster e3c0805f592a8f77
241 1 Nico Schottelius
242
</pre>
243
244
SEE ALSO
245
246
* We found the solution using https://stackoverflow.com/questions/67921552/re-installed-node-cannot-join-kubernetes-cluster
247 56 Nico Schottelius
248 147 Nico Schottelius
h3. Node labels (adding, showing, removing)
249
250
Listing the labels:
251
252
<pre>
253
kubectl get nodes --show-labels
254
</pre>
255
256
Adding labels:
257
258
<pre>
259
kubectl label nodes LIST-OF-NODES label1=value1 
260
261
</pre>
262
263
For instance:
264
265
<pre>
266
kubectl label nodes router2 router3 hosttype=router 
267
</pre>
268
269
Selecting nodes in pods:
270
271
<pre>
272
apiVersion: v1
273
kind: Pod
274
...
275
spec:
276
  nodeSelector:
277
    hosttype: router
278
</pre>
279
280 148 Nico Schottelius
Removing labels by adding a minus at the end of the label name:
281
282
<pre>
283
kubectl label node <nodename> <labelname>-
284
</pre>
285
286
For instance:
287
288
<pre>
289
kubectl label nodes router2 router3 hosttype- 
290
</pre>
291
292 147 Nico Schottelius
SEE ALSO
293 1 Nico Schottelius
294 148 Nico Schottelius
* https://kubernetes.io/docs/tasks/configure-pod-container/assign-pods-nodes/
295
* https://stackoverflow.com/questions/34067979/how-to-delete-a-node-label-by-command-and-api
296 147 Nico Schottelius
297 101 Nico Schottelius
h3. Hardware Maintenance using ungleich-hardware
298
299
Use the following manifest and replace the HOST with the actual host:
300
301
<pre>
302
apiVersion: v1
303
kind: Pod
304
metadata:
305
  name: ungleich-hardware-HOST
306
spec:
307
  containers:
308
  - name: ungleich-hardware
309
    image: ungleich/ungleich-hardware:0.0.5
310
    args:
311
    - sleep
312
    - "1000000"
313
    volumeMounts:
314
      - mountPath: /dev
315
        name: dev
316
    securityContext:
317
      privileged: true
318
  nodeSelector:
319
    kubernetes.io/hostname: "HOST"
320
321
  volumes:
322
    - name: dev
323
      hostPath:
324
        path: /dev
325
</pre>
326
327 102 Nico Schottelius
Also see: [[The_ungleich_hardware_maintenance_guide]]
328
329 105 Nico Schottelius
h3. Triggering a cronjob / creating a job from a cronjob
330 104 Nico Schottelius
331
To test a cronjob, we can create a job from a cronjob:
332
333
<pre>
334
kubectl create job --from=cronjob/volume2-daily-backup volume2-manual
335
</pre>
336
337
This creates a job volume2-manual based on the cronjob  volume2-daily
338
339 112 Nico Schottelius
h3. su-ing into a user that has nologin shell set
340
341
Many times users are having nologin as their shell inside the container. To be able to execute maintenance commands within the
342
container, we can use @su -s /bin/sh@ like this:
343
344
<pre>
345
su -s /bin/sh -c '/path/to/your/script' testuser
346
</pre>
347
348
Found on https://serverfault.com/questions/351046/how-to-run-command-as-user-who-has-usr-sbin-nologin-as-shell
349
350 113 Nico Schottelius
h3. How to print a secret value
351
352
Assuming you want the "password" item from a secret, use:
353
354
<pre>
355
kubectl get secret SECRETNAME -o jsonpath="{.data.password}" | base64 -d; echo "" 
356
</pre>
357
358 157 Nico Schottelius
h2. Reference CNI
359
360
* Mainly "stupid", but effective plugins
361
* Main documentation on https://www.cni.dev/plugins/current/
362 158 Nico Schottelius
* Plugins
363
** bridge
364
*** Can create the bridge on the host
365
*** But seems not to be able to add host interfaces to it as well
366
*** Has support for vlan tags
367
** vlan
368
*** creates vlan tagged sub interface on the host
369 160 Nico Schottelius
*** "It's a 1:1 mapping (i.e. no bridge in between)":https://github.com/k8snetworkplumbingwg/multus-cni/issues/569
370 158 Nico Schottelius
** host-device
371
*** moves the interface from the host into the container
372
*** very easy for physical connections to containers
373 159 Nico Schottelius
** ipvlan
374
*** "virtualisation" of a host device
375
*** routing based on IP
376
*** Same MAC for everyone
377
*** Cannot reach the master interface
378
** maclvan
379
*** With mac addresses
380
*** Supports various modes (to be checked)
381
** ptp ("point to point")
382
*** Creates a host device and connects it to the container
383
** win*
384 158 Nico Schottelius
*** Windows implementations
385 157 Nico Schottelius
386 62 Nico Schottelius
h2. Calico CNI
387
388
h3. Calico Installation
389
390
* We install "calico using helm":https://docs.projectcalico.org/getting-started/kubernetes/helm
391
* This has the following advantages:
392
** Easy to upgrade
393
** Does not require os to configure IPv6/dual stack settings as the tigera operator figures out things on its own
394
395
Usually plain calico can be installed directly using:
396
397
<pre>
398 149 Nico Schottelius
VERSION=v3.24.1
399
400 120 Nico Schottelius
helm repo add projectcalico https://docs.projectcalico.org/charts
401 124 Nico Schottelius
helm upgrade --install --namespace tigera calico projectcalico/tigera-operator --version $VERSION --create-namespace
402 1 Nico Schottelius
</pre>
403 92 Nico Schottelius
404
* Check the tags on https://github.com/projectcalico/calico/tags for the latest release
405 62 Nico Schottelius
406
h3. Installing calicoctl
407
408 115 Nico Schottelius
* General installation instructions, including binary download: https://projectcalico.docs.tigera.io/maintenance/clis/calicoctl/install
409
410 62 Nico Schottelius
To be able to manage and configure calico, we need to 
411
"install calicoctl (we choose the version as a pod)":https://docs.projectcalico.org/getting-started/clis/calicoctl/install#install-calicoctl-as-a-kubernetes-pod
412
413
<pre>
414
kubectl apply -f https://docs.projectcalico.org/manifests/calicoctl.yaml
415
</pre>
416
417 93 Nico Schottelius
Or version specific:
418
419
<pre>
420
kubectl apply -f https://github.com/projectcalico/calico/blob/v3.20.4/manifests/calicoctl.yaml
421 97 Nico Schottelius
422
# For 3.22
423
kubectl apply -f https://projectcalico.docs.tigera.io/archive/v3.22/manifests/calicoctl.yaml
424 93 Nico Schottelius
</pre>
425
426 70 Nico Schottelius
And making it easier accessible by alias:
427
428
<pre>
429
alias calicoctl="kubectl exec -i -n kube-system calicoctl -- /calicoctl"
430
</pre>
431
432 62 Nico Schottelius
h3. Calico configuration
433
434 63 Nico Schottelius
By default our k8s clusters "BGP peer":https://docs.projectcalico.org/networking/bgp
435
with an upstream router to propagate podcidr and servicecidr.
436 62 Nico Schottelius
437
Default settings in our infrastructure:
438
439
* We use a full-mesh using the @nodeToNodeMeshEnabled: true@ option
440
* We keep the original next hop so that *only* the server with the pod is announcing it (instead of ecmp)
441 1 Nico Schottelius
* We use private ASNs for k8s clusters
442 63 Nico Schottelius
* We do *not* use any overlay
443 62 Nico Schottelius
444
After installing calico and calicoctl the last step of the installation is usually:
445
446 1 Nico Schottelius
<pre>
447 79 Nico Schottelius
calicoctl create -f - < calico-bgp.yaml
448 62 Nico Schottelius
</pre>
449
450
451
A sample BGP configuration:
452
453
<pre>
454
---
455
apiVersion: projectcalico.org/v3
456
kind: BGPConfiguration
457
metadata:
458
  name: default
459
spec:
460
  logSeverityScreen: Info
461
  nodeToNodeMeshEnabled: true
462
  asNumber: 65534
463
  serviceClusterIPs:
464
  - cidr: 2a0a:e5c0:10:3::/108
465
  serviceExternalIPs:
466
  - cidr: 2a0a:e5c0:10:3::/108
467
---
468
apiVersion: projectcalico.org/v3
469
kind: BGPPeer
470
metadata:
471
  name: router1-place10
472
spec:
473
  peerIP: 2a0a:e5c0:10:1::50
474
  asNumber: 213081
475
  keepOriginalNextHop: true
476
</pre>
477
478 126 Nico Schottelius
h2. Cilium CNI (experimental)
479
480 137 Nico Schottelius
h3. Status
481
482 138 Nico Schottelius
*NO WORKING CILIUM CONFIGURATION FOR IPV6 only modes*
483 137 Nico Schottelius
484 146 Nico Schottelius
h3. Latest error
485
486
It seems cilium does not run on IPv6 only hosts:
487
488
<pre>
489
level=info msg="Validating configured node address ranges" subsys=daemon
490
level=fatal msg="postinit failed" error="external IPv4 node address could not be derived, please configure via --ipv4-node" subsys=daemon
491
level=info msg="Starting IP identity watcher" subsys=ipcache
492
</pre>
493
494
It crashes after that log entry
495
496 128 Nico Schottelius
h3. BGP configuration
497
498
* The cilium-operator will not start without a correct configmap being present beforehand (see error message below)
499
* Creating the bgp config beforehand as a configmap is thus required.
500
501
The error one gets without the configmap present:
502
503
Pods are hanging with:
504
505
<pre>
506
cilium-bpqm6                       0/1     Init:0/4            0             9s
507
cilium-operator-5947d94f7f-5bmh2   0/1     ContainerCreating   0             9s
508
</pre>
509
510
The error message in the cilium-*perator is:
511
512
<pre>
513
Events:
514
  Type     Reason       Age                From               Message
515
  ----     ------       ----               ----               -------
516
  Normal   Scheduled    80s                default-scheduler  Successfully assigned kube-system/cilium-operator-5947d94f7f-lqcsp to server56
517
  Warning  FailedMount  16s (x8 over 80s)  kubelet            MountVolume.SetUp failed for volume "bgp-config-path" : configmap "bgp-config" not found
518
</pre>
519
520
A correct bgp config looks like this:
521
522
<pre>
523
apiVersion: v1
524
kind: ConfigMap
525
metadata:
526
  name: bgp-config
527
  namespace: kube-system
528
data:
529
  config.yaml: |
530
    peers:
531
      - peer-address: 2a0a:e5c0::46
532
        peer-asn: 209898
533
        my-asn: 65533
534
      - peer-address: 2a0a:e5c0::47
535
        peer-asn: 209898
536
        my-asn: 65533
537
    address-pools:
538
      - name: default
539
        protocol: bgp
540
        addresses:
541
          - 2a0a:e5c0:0:14::/64
542
</pre>
543 127 Nico Schottelius
544
h3. Installation
545 130 Nico Schottelius
546 127 Nico Schottelius
Adding the repo
547 1 Nico Schottelius
<pre>
548 127 Nico Schottelius
549 129 Nico Schottelius
helm repo add cilium https://helm.cilium.io/
550 130 Nico Schottelius
helm repo update
551
</pre>
552 129 Nico Schottelius
553 135 Nico Schottelius
Installing + configuring cilium
554 129 Nico Schottelius
<pre>
555 130 Nico Schottelius
ipv6pool=2a0a:e5c0:0:14::/112
556 1 Nico Schottelius
557 146 Nico Schottelius
version=1.12.2
558 129 Nico Schottelius
559
helm upgrade --install cilium cilium/cilium --version $version \
560 1 Nico Schottelius
  --namespace kube-system \
561
  --set ipv4.enabled=false \
562
  --set ipv6.enabled=true \
563 146 Nico Schottelius
  --set enableIPv6Masquerade=false \
564
  --set bgpControlPlane.enabled=true 
565 1 Nico Schottelius
566 146 Nico Schottelius
#  --set ipam.operator.clusterPoolIPv6PodCIDRList=$ipv6pool
567
568
# Old style bgp?
569 136 Nico Schottelius
#   --set bgp.enabled=true --set bgp.announce.podCIDR=true \
570 127 Nico Schottelius
571
# Show possible configuration options
572
helm show values cilium/cilium
573
574 1 Nico Schottelius
</pre>
575 132 Nico Schottelius
576
Using a /64 for ipam.operator.clusterPoolIPv6PodCIDRList fails with:
577
578
<pre>
579
level=fatal msg="Unable to init cluster-pool allocator" error="unable to initialize IPv6 allocator New CIDR set failed; the node CIDR size is too big" subsys=cilium-operator-generic
580
</pre>
581
582 126 Nico Schottelius
583 1 Nico Schottelius
See also https://github.com/cilium/cilium/issues/20756
584 135 Nico Schottelius
585
Seems a /112 is actually working.
586
587
h3. Kernel modules
588
589
Cilium requires the following modules to be loaded on the host (not loaded by default):
590
591
<pre>
592 1 Nico Schottelius
modprobe  ip6table_raw
593
modprobe  ip6table_filter
594
</pre>
595 146 Nico Schottelius
596
h3. Interesting helm flags
597
598
* autoDirectNodeRoutes
599
* bgpControlPlane.enabled = true
600
601
h3. SEE ALSO
602
603
* https://docs.cilium.io/en/v1.12/helm-reference/
604 133 Nico Schottelius
605 150 Nico Schottelius
h2. Multus (incomplete/experimental)
606
607
(TBD)
608
609 122 Nico Schottelius
h2. ArgoCD 
610 56 Nico Schottelius
611 60 Nico Schottelius
h3. Argocd Installation
612 1 Nico Schottelius
613 116 Nico Schottelius
* See https://argo-cd.readthedocs.io/en/stable/
614
615 60 Nico Schottelius
As there is no configuration management present yet, argocd is installed using
616
617 1 Nico Schottelius
<pre>
618 60 Nico Schottelius
kubectl create namespace argocd
619 86 Nico Schottelius
620 96 Nico Schottelius
# Specific Version
621
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/v2.3.2/manifests/install.yaml
622 86 Nico Schottelius
623
# OR: latest stable
624 60 Nico Schottelius
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml
625 56 Nico Schottelius
</pre>
626 1 Nico Schottelius
627 116 Nico Schottelius
628 1 Nico Schottelius
629 60 Nico Schottelius
h3. Get the argocd credentials
630
631
<pre>
632
kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d; echo ""
633
</pre>
634 52 Nico Schottelius
635 87 Nico Schottelius
h3. Accessing argocd
636
637
In regular IPv6 clusters:
638
639
* Navigate to https://argocd-server.argocd.CLUSTERDOMAIN
640
641
In legacy IPv4 clusters
642
643
<pre>
644
kubectl --namespace argocd port-forward svc/argocd-server 8080:80
645
</pre>
646
647 88 Nico Schottelius
* Navigate to https://localhost:8080
648
649 68 Nico Schottelius
h3. Using the argocd webhook to trigger changes
650 67 Nico Schottelius
651
* To trigger changes post json https://argocd.example.com/api/webhook
652
653 72 Nico Schottelius
h3. Deploying an application
654
655
* Applications are deployed via git towards gitea (code.ungleich.ch) and then pulled by argo
656 73 Nico Schottelius
* Always include the *redmine-url* pointing to the (customer) ticket
657
** Also add the support-url if it exists
658 72 Nico Schottelius
659
Application sample
660
661
<pre>
662
apiVersion: argoproj.io/v1alpha1
663
kind: Application
664
metadata:
665
  name: gitea-CUSTOMER
666
  namespace: argocd
667
spec:
668
  destination:
669
    namespace: default
670
    server: 'https://kubernetes.default.svc'
671
  source:
672
    path: apps/prod/gitea
673
    repoURL: 'https://code.ungleich.ch/ungleich-intern/k8s-config.git'
674
    targetRevision: HEAD
675
    helm:
676
      parameters:
677
        - name: storage.data.storageClass
678
          value: rook-ceph-block-hdd
679
        - name: storage.data.size
680
          value: 200Gi
681
        - name: storage.db.storageClass
682
          value: rook-ceph-block-ssd
683
        - name: storage.db.size
684
          value: 10Gi
685
        - name: storage.letsencrypt.storageClass
686
          value: rook-ceph-block-hdd
687
        - name: storage.letsencrypt.size
688
          value: 50Mi
689
        - name: letsencryptStaging
690
          value: 'no'
691
        - name: fqdn
692
          value: 'code.verua.online'
693
  project: default
694
  syncPolicy:
695
    automated:
696
      prune: true
697
      selfHeal: true
698
  info:
699
    - name: 'redmine-url'
700
      value: 'https://redmine.ungleich.ch/issues/ISSUEID'
701
    - name: 'support-url'
702
      value: 'https://support.ungleich.ch/Ticket/Display.html?id=TICKETID'
703
</pre>
704
705 80 Nico Schottelius
h2. Helm related operations and conventions
706 55 Nico Schottelius
707 61 Nico Schottelius
We use helm charts extensively.
708
709
* In production, they are managed via argocd
710
* In development, helm chart can de developed and deployed manually using the helm utility.
711
712 55 Nico Schottelius
h3. Installing a helm chart
713
714
One can use the usual pattern of
715
716
<pre>
717
helm install <releasename> <chartdirectory>
718
</pre>
719
720
However often you want to reinstall/update when testing helm charts. The following pattern is "better", because it allows you to reinstall, if it is already installed:
721
722
<pre>
723
helm upgrade --install <releasename> <chartdirectory>
724 1 Nico Schottelius
</pre>
725 80 Nico Schottelius
726
h3. Naming services and deployments in helm charts [Application labels]
727
728
* We always have {{ .Release.Name }} to identify the current "instance"
729
* Deployments:
730
** use @app: <what it is>@, f.i. @app: nginx@, @app: postgres@, ...
731 81 Nico Schottelius
* See more about standard labels on
732
** https://kubernetes.io/docs/concepts/overview/working-with-objects/common-labels/
733
** https://helm.sh/docs/chart_best_practices/labels/
734 55 Nico Schottelius
735 151 Nico Schottelius
h3. Show all versions of a helm chart
736
737
<pre>
738
helm search repo -l repo/chart
739
</pre>
740
741
For example:
742
743
<pre>
744
% helm search repo -l projectcalico/tigera-operator 
745
NAME                         	CHART VERSION	APP VERSION	DESCRIPTION                            
746
projectcalico/tigera-operator	v3.23.3      	v3.23.3    	Installs the Tigera operator for Calico
747
projectcalico/tigera-operator	v3.23.2      	v3.23.2    	Installs the Tigera operator for Calico
748
....
749
</pre>
750
751 152 Nico Schottelius
h3. Show possible values of a chart
752
753
<pre>
754
helm show values <repo/chart>
755
</pre>
756
757
Example:
758
759
<pre>
760
helm show values ingress-nginx/ingress-nginx
761
</pre>
762
763
764 139 Nico Schottelius
h2. Rook + Ceph
765
766
h3. Installation
767
768
* Usually directly via argocd
769
770
Manual steps:
771
772
<pre>
773
774
</pre>
775 43 Nico Schottelius
776 71 Nico Schottelius
h3. Executing ceph commands
777
778
Using the ceph-tools pod as follows:
779
780
<pre>
781
kubectl exec -n rook-ceph -ti $(kubectl -n rook-ceph get pods -l app=rook-ceph-tools -o jsonpath='{.items[*].metadata.name}') -- ceph -s
782
</pre>
783
784 43 Nico Schottelius
h3. Inspecting the logs of a specific server
785
786
<pre>
787
# Get the related pods
788
kubectl -n rook-ceph get pods -l app=rook-ceph-osd-prepare 
789
...
790
791
# Inspect the logs of a specific pod
792
kubectl -n rook-ceph logs -f rook-ceph-osd-prepare-server23--1-444qx
793
794 71 Nico Schottelius
</pre>
795
796
h3. Inspecting the logs of the rook-ceph-operator
797
798
<pre>
799
kubectl -n rook-ceph logs -f -l app=rook-ceph-operator
800 43 Nico Schottelius
</pre>
801
802 121 Nico Schottelius
h3. Restarting the rook operator
803
804
<pre>
805
kubectl -n rook-ceph delete pods  -l app=rook-ceph-operator
806
</pre>
807
808 43 Nico Schottelius
h3. Triggering server prepare / adding new osds
809
810
The rook-ceph-operator triggers/watches/creates pods to maintain hosts. To trigger a full "re scan", simply delete that pod:
811
812
<pre>
813
kubectl -n rook-ceph delete pods -l app=rook-ceph-operator
814
</pre>
815
816
This will cause all the @rook-ceph-osd-prepare-..@ jobs to be recreated and thus OSDs to be created, if new disks have been added.
817
818
h3. Removing an OSD
819
820
* See "Ceph OSD Management":https://rook.io/docs/rook/v1.7/ceph-osd-mgmt.html
821 77 Nico Schottelius
* More specifically: https://github.com/rook/rook/blob/release-1.7/cluster/examples/kubernetes/ceph/osd-purge.yaml
822 99 Nico Schottelius
* Then delete the related deployment
823 41 Nico Schottelius
824 98 Nico Schottelius
Set osd id in the osd-purge.yaml and apply it. OSD should be down before.
825
826
<pre>
827
apiVersion: batch/v1
828
kind: Job
829
metadata:
830
  name: rook-ceph-purge-osd
831
  namespace: rook-ceph # namespace:cluster
832
  labels:
833
    app: rook-ceph-purge-osd
834
spec:
835
  template:
836
    metadata:
837
      labels:
838
        app: rook-ceph-purge-osd
839
    spec:
840
      serviceAccountName: rook-ceph-purge-osd
841
      containers:
842
        - name: osd-removal
843
          image: rook/ceph:master
844
          # TODO: Insert the OSD ID in the last parameter that is to be removed
845
          # The OSD IDs are a comma-separated list. For example: "0" or "0,2".
846
          # If you want to preserve the OSD PVCs, set `--preserve-pvc true`.
847
          #
848
          # A --force-osd-removal option is available if the OSD should be destroyed even though the
849
          # removal could lead to data loss.
850
          args:
851
            - "ceph"
852
            - "osd"
853
            - "remove"
854
            - "--preserve-pvc"
855
            - "false"
856
            - "--force-osd-removal"
857
            - "false"
858
            - "--osd-ids"
859
            - "SETTHEOSDIDHERE"
860
          env:
861
            - name: POD_NAMESPACE
862
              valueFrom:
863
                fieldRef:
864
                  fieldPath: metadata.namespace
865
            - name: ROOK_MON_ENDPOINTS
866
              valueFrom:
867
                configMapKeyRef:
868
                  key: data
869
                  name: rook-ceph-mon-endpoints
870
            - name: ROOK_CEPH_USERNAME
871
              valueFrom:
872
                secretKeyRef:
873
                  key: ceph-username
874
                  name: rook-ceph-mon
875
            - name: ROOK_CEPH_SECRET
876
              valueFrom:
877
                secretKeyRef:
878
                  key: ceph-secret
879
                  name: rook-ceph-mon
880
            - name: ROOK_CONFIG_DIR
881
              value: /var/lib/rook
882
            - name: ROOK_CEPH_CONFIG_OVERRIDE
883
              value: /etc/rook/config/override.conf
884
            - name: ROOK_FSID
885
              valueFrom:
886
                secretKeyRef:
887
                  key: fsid
888
                  name: rook-ceph-mon
889
            - name: ROOK_LOG_LEVEL
890
              value: DEBUG
891
          volumeMounts:
892
            - mountPath: /etc/ceph
893
              name: ceph-conf-emptydir
894
            - mountPath: /var/lib/rook
895
              name: rook-config
896
      volumes:
897
        - emptyDir: {}
898
          name: ceph-conf-emptydir
899
        - emptyDir: {}
900
          name: rook-config
901
      restartPolicy: Never
902
903
904 99 Nico Schottelius
</pre>
905
906
Deleting the deployment:
907
908
<pre>
909
[18:05] bridge:~% kubectl -n rook-ceph delete deployment rook-ceph-osd-6
910
deployment.apps "rook-ceph-osd-6" deleted
911 98 Nico Schottelius
</pre>
912
913 145 Nico Schottelius
h2. Ingress + Cert Manager
914
915
* We deploy "nginx-ingress":https://docs.nginx.com/nginx-ingress-controller/ to get an ingress
916
* we deploy "cert-manager":https://cert-manager.io/ to handle certificates
917
* We independently deploy @ClusterIssuer@ to allow the cert-manager app to deploy and the issuer to be created once the CRDs from cert manager are in place
918
919
h3. IPv4 reachability 
920
921
The ingress is by default IPv6 only. To make it reachable from the IPv4 world, get its IPv6 address and configure a NAT64 mapping in Jool.
922
923
Steps:
924
925
h4. Get the ingress IPv6 address
926
927
Use @kubectl -n ingress-nginx get svc ingress-nginx-controller -o jsonpath='{.spec.clusterIP}'; echo ''@
928
929
Example:
930
931
<pre>
932
kubectl -n ingress-nginx get svc ingress-nginx-controller -o jsonpath='{.spec.clusterIP}'; echo ''
933
2a0a:e5c0:10:1b::ce11
934
</pre>
935
936
h4. Add NAT64 mapping
937
938
* Update the __dcl_jool_siit cdist type
939
* Record the two IPs (IPv6 and IPv4)
940
* Configure all routers
941
942
943
h4. Add DNS record
944
945
To use the ingress capable as a CNAME destination, create an "ingress" DNS record, such as:
946
947
<pre>
948
; k8s ingress for dev
949
dev-ingress                 AAAA 2a0a:e5c0:10:1b::ce11
950
dev-ingress                 A 147.78.194.23
951
952
</pre> 
953
954
h4. Add supporting wildcard DNS
955
956
If you plan to add various sites under a specific domain, we can add a wildcard DNS entry, such as *.k8s-dev.django-hosting.ch:
957
958
<pre>
959
*.k8s-dev         CNAME dev-ingress.ungleich.ch.
960
</pre>
961
962 76 Nico Schottelius
h2. Harbor
963
964
* We user "Harbor":https://goharbor.io/ for caching and as an image registry. Internal app reference: apps/prod/harbor.
965
* The admin password is in the password store, auto generated per cluster
966
* At the moment harbor only authenticates against the internal ldap tree
967
968
h3. LDAP configuration
969
970
* The url needs to be ldaps://...
971
* uid = uid
972
* rest standard
973 75 Nico Schottelius
974 89 Nico Schottelius
h2. Monitoring / Prometheus
975
976 90 Nico Schottelius
* Via "kube-prometheus":https://github.com/prometheus-operator/kube-prometheus/
977 89 Nico Schottelius
978 91 Nico Schottelius
Access via ...
979
980
* http://prometheus-k8s.monitoring.svc:9090
981
* http://grafana.monitoring.svc:3000
982
* http://alertmanager.monitoring.svc:9093
983
984
985 100 Nico Schottelius
h3. Prometheus Options
986
987
* "helm/kube-prometheus-stack":https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack
988
** Includes dashboards and co.
989
* "manifest based kube-prometheus":https://github.com/prometheus-operator/kube-prometheus
990
** Includes dashboards and co.
991
* "Prometheus Operator (mainly CRD manifest":https://github.com/prometheus-operator/prometheus-operator
992
993 91 Nico Schottelius
994 82 Nico Schottelius
h2. Nextcloud
995
996 85 Nico Schottelius
h3. How to get the nextcloud credentials 
997 84 Nico Schottelius
998
* The initial username is set to "nextcloud"
999
* The password is autogenerated and saved in a kubernetes secret
1000
1001
<pre>
1002 85 Nico Schottelius
kubectl get secret RELEASENAME-nextcloud -o jsonpath="{.data.PASSWORD}" | base64 -d; echo "" 
1003 84 Nico Schottelius
</pre>
1004
1005 83 Nico Schottelius
h3. How to fix "Access through untrusted domain"
1006
1007 82 Nico Schottelius
* Nextcloud stores the initial domain configuration
1008 1 Nico Schottelius
* If the FQDN is changed, it will show the error message "Access through untrusted domain"
1009 82 Nico Schottelius
* To fix, edit /var/www/html/config/config.php and correct the domain
1010 83 Nico Schottelius
* Then delete the pods
1011 82 Nico Schottelius
1012 1 Nico Schottelius
h2. Infrastructure versions
1013 35 Nico Schottelius
1014 57 Nico Schottelius
h3. ungleich kubernetes infrastructure v5 (2021-10)
1015 1 Nico Schottelius
1016 57 Nico Schottelius
Clusters are configured / setup in this order:
1017
1018
* Bootstrap via kubeadm
1019 59 Nico Schottelius
* "Networking via calico + BGP (non ECMP) using helm":https://docs.projectcalico.org/getting-started/kubernetes/helm
1020
* "ArgoCD for CD":https://argo-cd.readthedocs.io/en/stable/
1021
** "rook for storage via argocd":https://rook.io/
1022 58 Nico Schottelius
** haproxy for in IPv6-cluster-IPv4-to-IPv6 proxy via argocd
1023
** "kubernetes-secret-generator for in cluster secrets":https://github.com/mittwald/kubernetes-secret-generator
1024
** "ungleich-certbot managing certs and nginx":https://hub.docker.com/repository/docker/ungleich/ungleich-certbot
1025
1026 57 Nico Schottelius
1027
h3. ungleich kubernetes infrastructure v4 (2021-09)
1028
1029 54 Nico Schottelius
* rook is configured via manifests instead of using the rook-ceph-cluster helm chart
1030 1 Nico Schottelius
* The rook operator is still being installed via helm
1031 35 Nico Schottelius
1032 57 Nico Schottelius
h3. ungleich kubernetes infrastructure v3 (2021-07)
1033 1 Nico Schottelius
1034 10 Nico Schottelius
* rook is now installed via helm via argocd instead of directly via manifests
1035 28 Nico Schottelius
1036 57 Nico Schottelius
h3. ungleich kubernetes infrastructure v2 (2021-05)
1037 28 Nico Schottelius
1038
* Replaced fluxv2 from ungleich k8s v1 with argocd
1039 1 Nico Schottelius
** argocd can apply helm templates directly without needing to go through Chart releases
1040 28 Nico Schottelius
* We are also using argoflow for build flows
1041
* Planned to add "kaniko":https://github.com/GoogleContainerTools/kaniko for image building
1042
1043 57 Nico Schottelius
h3. ungleich kubernetes infrastructure v1 (2021-01)
1044 28 Nico Schottelius
1045
We are using the following components:
1046
1047
* "Calico as a CNI":https://www.projectcalico.org/ with BGP, IPv6 only, no encapsulation
1048
** Needed for basic networking
1049
* "kubernetes-secret-generator":https://github.com/mittwald/kubernetes-secret-generator for creating secrets
1050
** Needed so that secrets are not stored in the git repository, but only in the cluster
1051
* "ungleich-certbot":https://hub.docker.com/repository/docker/ungleich/ungleich-certbot
1052
** Needed to get letsencrypt certificates for services
1053
* "rook with ceph rbd + cephfs":https://rook.io/ for storage
1054
** rbd for almost everything, *ReadWriteOnce*
1055
** cephfs for smaller things, multi access *ReadWriteMany*
1056
** Needed for providing persistent storage
1057
* "flux v2":https://fluxcd.io/
1058
** Needed to manage resources automatically