Project

General

Profile

The ungleich kubernetes infrastructure » History » Version 162

Nico Schottelius, 10/30/2022 09:04 AM

1 22 Nico Schottelius
h1. The ungleich kubernetes infrastructure and ungleich kubernetes manual
2 1 Nico Schottelius
3 3 Nico Schottelius
{{toc}}
4
5 1 Nico Schottelius
h2. Status
6
7 28 Nico Schottelius
This document is **pre-production**.
8
This document is to become the ungleich kubernetes infrastructure overview as well as the ungleich kubernetes manual.
9 1 Nico Schottelius
10 10 Nico Schottelius
h2. k8s clusters
11
12 123 Nico Schottelius
| Cluster            | Purpose/Setup     | Maintainer | Master(s)                     | argo                                                   | v4 http proxy | last verified |
13
| c0.k8s.ooo         | Dev               | -          | UNUSED                        |                                                        |               |    2021-10-05 |
14
| c1.k8s.ooo         | retired           |            | -                             |                                                        |               |    2022-03-15 |
15
| c2.k8s.ooo         | Dev p7 HW         | Nico       | server47 server53 server54    | "argo":https://argocd-server.argocd.svc.c2.k8s.ooo     |               |    2021-10-05 |
16
| c3.k8s.ooo         | retired           | -          | -                             |                                                        |               |    2021-10-05 |
17
| c4.k8s.ooo         | Dev2 p7 HW        | Jin-Guk    | server52 server53 server54    |                                                        |               |             - |
18
| c5.k8s.ooo         | retired           |            | -                             |                                                        |               |    2022-03-15 |
19
| c6.k8s.ooo         | Dev p6 VM Jin-Guk | Jin-Guk    |                               |                                                        |               |               |
20
| [[p5.k8s.ooo]]     | production        |            | server34 server36 server38    | "argo":https://argocd-server.argocd.svc.p5.k8s.ooo     | -             |               |
21
| [[p5-cow.k8s.ooo]] | production        | Nico       | server47 server51 server55    | "argo":https://argocd-server.argocd.svc.p5-cow.k8s.ooo |               |    2022-08-27 |
22
| [[p6.k8s.ooo]]     | production        |            | server67 server69 server71    | "argo":https://argocd-server.argocd.svc.p6.k8s.ooo     | 147.78.194.13 |    2021-10-05 |
23
| [[p10.k8s.ooo]]    | production        |            | server63 server65 server83    | "argo":https://argocd-server.argocd.svc.p10.k8s.ooo    | 147.78.194.12 |    2021-10-05 |
24
| [[k8s.ge.nau.so]]  | development       |            | server107 server108 server109 | "argo":https://argocd-server.argocd.svc.k8s.ge.nau.so  |               |               |
25
| [[dev.k8s.ooo]]    | development       |            | server110 server111 server112 | "argo":https://argocd-server.argocd.svc.dev.k8s.ooo    | -             |    2022-07-08 |
26 142 Nico Schottelius
| [[server121.k8s.ooo]] | production | Nico | server121 | | | 2022-09-06 |
27 162 Nico Schottelius
| [[r1r2p10k8sooo|r1.p10.k8s.ooo]] | production | Nico | server122 | | | 2022-10-30 |
28
| [[r1r2p10k8sooo|r2.p10.k8s.ooo]] | production | Nico | server123 | | | 2022-10-15 |
29
| [[r1r2p5k8sooo|r1.p5.k8s.ooo]] | production | Nico | server137 | | | 2022-10-30 |
30
| [[r1r2p5k8sooo|r2.p5.k8s.ooo]] | production | Nico | server138 | | | 2022-10-30 |
31
| [[r1r2p6k8sooo|r1.p6.k8s.ooo]] | production | Nico | server139 | | | 2022-10-30 |
32
| [[r1r2p6k8sooo|r2.p6.k8s.ooo]] | production | Nico | server140 | | | 2022-10-30 |
33 21 Nico Schottelius
34 1 Nico Schottelius
h2. General architecture and components overview
35
36
* All k8s clusters are IPv6 only
37
* We use BGP peering to propagate podcidr and serviceCidr networks to our infrastructure
38
* The main public testing repository is "ungleich-k8s":https://code.ungleich.ch/ungleich-public/ungleich-k8s
39 18 Nico Schottelius
** Private configurations are found in the **k8s-config** repository
40 1 Nico Schottelius
41
h3. Cluster types
42
43 28 Nico Schottelius
| **Type/Feature**            | **Development**                | **Production**         |
44
| Min No. nodes               | 3 (1 master, 3 worker)         | 5 (3 master, 3 worker) |
45
| Recommended minimum         | 4 (dedicated master, 3 worker) | 8 (3 master, 5 worker) |
46
| Separation of control plane | optional                       | recommended            |
47
| Persistent storage          | required                       | required               |
48
| Number of storage monitors  | 3                              | 5                      |
49 1 Nico Schottelius
50 43 Nico Schottelius
h2. General k8s operations
51 1 Nico Schottelius
52 46 Nico Schottelius
h3. Cheat sheet / external great references
53
54
* "kubectl cheatsheet":https://kubernetes.io/docs/reference/kubectl/cheatsheet/
55
56 117 Nico Schottelius
h3. Allowing to schedule work on the control plane / removing node taints
57 69 Nico Schottelius
58
* Mostly for single node / test / development clusters
59
* Just remove the master taint as follows
60
61
<pre>
62
kubectl taint nodes --all node-role.kubernetes.io/master-
63 118 Nico Schottelius
kubectl taint nodes --all node-role.kubernetes.io/control-plane-
64 69 Nico Schottelius
</pre>
65 1 Nico Schottelius
66 117 Nico Schottelius
You can check the node taints using @kubectl describe node ...@
67 69 Nico Schottelius
68 44 Nico Schottelius
h3. Get the cluster admin.conf
69
70
* On the masters of each cluster you can find the file @/etc/kubernetes/admin.conf@
71
* To be able to administrate the cluster you can copy the admin.conf to your local machine
72
* Multi cluster debugging can very easy if you name the config ~/cX-admin.conf (see example below)
73
74
<pre>
75
% scp root@server47.place7.ungleich.ch:/etc/kubernetes/admin.conf ~/c2-admin.conf
76
% export KUBECONFIG=~/c2-admin.conf    
77
% kubectl get nodes
78
NAME       STATUS                     ROLES                  AGE   VERSION
79
server47   Ready                      control-plane,master   82d   v1.22.0
80
server48   Ready                      control-plane,master   82d   v1.22.0
81
server49   Ready                      <none>                 82d   v1.22.0
82
server50   Ready                      <none>                 82d   v1.22.0
83
server59   Ready                      control-plane,master   82d   v1.22.0
84
server60   Ready,SchedulingDisabled   <none>                 82d   v1.22.0
85
server61   Ready                      <none>                 82d   v1.22.0
86
server62   Ready                      <none>                 82d   v1.22.0               
87
</pre>
88
89 18 Nico Schottelius
h3. Installing a new k8s cluster
90 8 Nico Schottelius
91 9 Nico Schottelius
* Decide on the cluster name (usually *cX.k8s.ooo*), X counting upwards
92 28 Nico Schottelius
** Using pXX.k8s.ooo for production clusters of placeXX
93 9 Nico Schottelius
* Use cdist to configure the nodes with requirements like crio
94
* Decide between single or multi node control plane setups (see below)
95 28 Nico Schottelius
** Single control plane suitable for development clusters
96 9 Nico Schottelius
97 28 Nico Schottelius
Typical init procedure:
98 9 Nico Schottelius
99 28 Nico Schottelius
* Single control plane: @kubeadm init --config bootstrap/XXX/kubeadm.yaml@
100
* Multi control plane (HA): @kubeadm init --config bootstrap/XXX/kubeadm.yaml --upload-certs@
101 10 Nico Schottelius
102 29 Nico Schottelius
h3. Deleting a pod that is hanging in terminating state
103
104
<pre>
105
kubectl delete pod <PODNAME> --grace-period=0 --force --namespace <NAMESPACE>
106
</pre>
107
108
(from https://stackoverflow.com/questions/35453792/pods-stuck-in-terminating-status)
109
110 42 Nico Schottelius
h3. Listing nodes of a cluster
111
112
<pre>
113
[15:05] bridge:~% kubectl get nodes
114
NAME       STATUS   ROLES                  AGE   VERSION
115
server22   Ready    <none>                 52d   v1.22.0
116
server23   Ready    <none>                 52d   v1.22.2
117
server24   Ready    <none>                 52d   v1.22.0
118
server25   Ready    <none>                 52d   v1.22.0
119
server26   Ready    <none>                 52d   v1.22.0
120
server27   Ready    <none>                 52d   v1.22.0
121
server63   Ready    control-plane,master   52d   v1.22.0
122
server64   Ready    <none>                 52d   v1.22.0
123
server65   Ready    control-plane,master   52d   v1.22.0
124
server66   Ready    <none>                 52d   v1.22.0
125
server83   Ready    control-plane,master   52d   v1.22.0
126
server84   Ready    <none>                 52d   v1.22.0
127
server85   Ready    <none>                 52d   v1.22.0
128
server86   Ready    <none>                 52d   v1.22.0
129
</pre>
130
131 41 Nico Schottelius
h3. Removing / draining a node
132
133
Usually @kubectl drain server@ should do the job, but sometimes we need to be more aggressive:
134
135 1 Nico Schottelius
<pre>
136 103 Nico Schottelius
kubectl drain --delete-emptydir-data --ignore-daemonsets serverXX
137 42 Nico Schottelius
</pre>
138
139
h3. Readding a node after draining
140
141
<pre>
142
kubectl uncordon serverXX
143 1 Nico Schottelius
</pre>
144 43 Nico Schottelius
145 50 Nico Schottelius
h3. (Re-)joining worker nodes after creating the cluster
146 49 Nico Schottelius
147
* We need to have an up-to-date token
148
* We use different join commands for the workers and control plane nodes
149
150
Generating the join command on an existing control plane node:
151
152
<pre>
153
kubeadm token create --print-join-command
154
</pre>
155
156 50 Nico Schottelius
h3. (Re-)joining control plane nodes after creating the cluster
157 1 Nico Schottelius
158 50 Nico Schottelius
* We generate the token again
159
* We upload the certificates
160
* We need to combine/create the join command for the control plane node
161
162
Example session:
163
164
<pre>
165
% kubeadm token create --print-join-command
166
kubeadm join p10-api.k8s.ooo:6443 --token xmff4i.ABC --discovery-token-ca-cert-hash sha256:longhash 
167
168
% kubeadm init phase upload-certs --upload-certs
169
[upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
170
[upload-certs] Using certificate key:
171
CERTKEY
172
173
# Then we use these two outputs on the joining node:
174
175
kubeadm join p10-api.k8s.ooo:6443 --token xmff4i.ABC --discovery-token-ca-cert-hash sha256:longhash --control-plane --certificate-key CERTKEY
176
</pre>
177
178
Commands to be used on a control plane node:
179
180
<pre>
181
kubeadm token create --print-join-command
182
kubeadm init phase upload-certs --upload-certs
183
</pre>
184
185
Commands to be used on the joining node:
186
187
<pre>
188
JOINCOMMAND --control-plane --certificate-key CERTKEY
189
</pre>
190 49 Nico Schottelius
191 51 Nico Schottelius
SEE ALSO
192
193
* https://stackoverflow.com/questions/63936268/how-to-generate-kubeadm-token-for-secondary-control-plane-nodes
194
* https://blog.scottlowe.org/2019/08/15/reconstructing-the-join-command-for-kubeadm/
195
196 53 Nico Schottelius
h3. How to fix etcd does not start when rejoining a kubernetes cluster as a control plane
197 52 Nico Schottelius
198
If during the above step etcd does not come up, @kubeadm join@ can hang as follows:
199
200
<pre>
201
[control-plane] Creating static Pod manifest for "kube-apiserver"                                                              
202
[control-plane] Creating static Pod manifest for "kube-controller-manager"                                                     
203
[control-plane] Creating static Pod manifest for "kube-scheduler"                                                              
204
[check-etcd] Checking that the etcd cluster is healthy                                                                         
205
error execution phase check-etcd: etcd cluster is not healthy: failed to dial endpoint https://[2a0a:e5c0:10:1:225:b3ff:fe20:37
206
8a]:2379 with maintenance client: context deadline exceeded                                                                    
207
To see the stack trace of this error execute with --v=5 or higher         
208
</pre>
209
210
Then the problem is likely that the etcd server is still a member of the cluster. We first need to remove it from the etcd cluster and then the join works.
211
212
To fix this we do:
213
214
* Find a working etcd pod
215
* Find the etcd members / member list
216
* Remove the etcd member that we want to re-join the cluster
217
218
219
<pre>
220
# Find the etcd pods
221
kubectl -n kube-system get pods -l component=etcd,tier=control-plane
222
223
# Get the list of etcd servers with the member id 
224
kubectl exec -n kube-system -ti ETCDPODNAME -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert  /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member list
225
226
# Remove the member
227
kubectl exec -n kube-system -ti ETCDPODNAME -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert  /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member remove MEMBERID
228
</pre>
229
230
Sample session:
231
232
<pre>
233
[10:48] line:~% kubectl -n kube-system get pods -l component=etcd,tier=control-plane
234
NAME            READY   STATUS    RESTARTS     AGE
235
etcd-server63   1/1     Running   0            3m11s
236
etcd-server65   1/1     Running   3            7d2h
237
etcd-server83   1/1     Running   8 (6d ago)   7d2h
238
[10:48] line:~% kubectl exec -n kube-system -ti etcd-server65 -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert  /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member list
239
356891cd676df6e4, started, server65, https://[2a0a:e5c0:10:1:225:b3ff:fe20:375c]:2380, https://[2a0a:e5c0:10:1:225:b3ff:fe20:375c]:2379, false
240
371b8a07185dee7e, started, server63, https://[2a0a:e5c0:10:1:225:b3ff:fe20:378a]:2380, https://[2a0a:e5c0:10:1:225:b3ff:fe20:378a]:2379, false
241
5942bc58307f8af9, started, server83, https://[2a0a:e5c0:10:1:3e4a:92ff:fe79:bb98]:2380, https://[2a0a:e5c0:10:1:3e4a:92ff:fe79:bb98]:2379, false
242
243
[10:48] line:~% kubectl exec -n kube-system -ti etcd-server65 -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert  /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member remove 371b8a07185dee7e
244
Member 371b8a07185dee7e removed from cluster e3c0805f592a8f77
245 1 Nico Schottelius
246
</pre>
247
248
SEE ALSO
249
250
* We found the solution using https://stackoverflow.com/questions/67921552/re-installed-node-cannot-join-kubernetes-cluster
251 56 Nico Schottelius
252 147 Nico Schottelius
h3. Node labels (adding, showing, removing)
253
254
Listing the labels:
255
256
<pre>
257
kubectl get nodes --show-labels
258
</pre>
259
260
Adding labels:
261
262
<pre>
263
kubectl label nodes LIST-OF-NODES label1=value1 
264
265
</pre>
266
267
For instance:
268
269
<pre>
270
kubectl label nodes router2 router3 hosttype=router 
271
</pre>
272
273
Selecting nodes in pods:
274
275
<pre>
276
apiVersion: v1
277
kind: Pod
278
...
279
spec:
280
  nodeSelector:
281
    hosttype: router
282
</pre>
283
284 148 Nico Schottelius
Removing labels by adding a minus at the end of the label name:
285
286
<pre>
287
kubectl label node <nodename> <labelname>-
288
</pre>
289
290
For instance:
291
292
<pre>
293
kubectl label nodes router2 router3 hosttype- 
294
</pre>
295
296 147 Nico Schottelius
SEE ALSO
297 1 Nico Schottelius
298 148 Nico Schottelius
* https://kubernetes.io/docs/tasks/configure-pod-container/assign-pods-nodes/
299
* https://stackoverflow.com/questions/34067979/how-to-delete-a-node-label-by-command-and-api
300 147 Nico Schottelius
301 101 Nico Schottelius
h3. Hardware Maintenance using ungleich-hardware
302
303
Use the following manifest and replace the HOST with the actual host:
304
305
<pre>
306
apiVersion: v1
307
kind: Pod
308
metadata:
309
  name: ungleich-hardware-HOST
310
spec:
311
  containers:
312
  - name: ungleich-hardware
313
    image: ungleich/ungleich-hardware:0.0.5
314
    args:
315
    - sleep
316
    - "1000000"
317
    volumeMounts:
318
      - mountPath: /dev
319
        name: dev
320
    securityContext:
321
      privileged: true
322
  nodeSelector:
323
    kubernetes.io/hostname: "HOST"
324
325
  volumes:
326
    - name: dev
327
      hostPath:
328
        path: /dev
329
</pre>
330
331 102 Nico Schottelius
Also see: [[The_ungleich_hardware_maintenance_guide]]
332
333 105 Nico Schottelius
h3. Triggering a cronjob / creating a job from a cronjob
334 104 Nico Schottelius
335
To test a cronjob, we can create a job from a cronjob:
336
337
<pre>
338
kubectl create job --from=cronjob/volume2-daily-backup volume2-manual
339
</pre>
340
341
This creates a job volume2-manual based on the cronjob  volume2-daily
342
343 112 Nico Schottelius
h3. su-ing into a user that has nologin shell set
344
345
Many times users are having nologin as their shell inside the container. To be able to execute maintenance commands within the
346
container, we can use @su -s /bin/sh@ like this:
347
348
<pre>
349
su -s /bin/sh -c '/path/to/your/script' testuser
350
</pre>
351
352
Found on https://serverfault.com/questions/351046/how-to-run-command-as-user-who-has-usr-sbin-nologin-as-shell
353
354 113 Nico Schottelius
h3. How to print a secret value
355
356
Assuming you want the "password" item from a secret, use:
357
358
<pre>
359
kubectl get secret SECRETNAME -o jsonpath="{.data.password}" | base64 -d; echo "" 
360
</pre>
361
362 157 Nico Schottelius
h2. Reference CNI
363
364
* Mainly "stupid", but effective plugins
365
* Main documentation on https://www.cni.dev/plugins/current/
366 158 Nico Schottelius
* Plugins
367
** bridge
368
*** Can create the bridge on the host
369
*** But seems not to be able to add host interfaces to it as well
370
*** Has support for vlan tags
371
** vlan
372
*** creates vlan tagged sub interface on the host
373 160 Nico Schottelius
*** "It's a 1:1 mapping (i.e. no bridge in between)":https://github.com/k8snetworkplumbingwg/multus-cni/issues/569
374 158 Nico Schottelius
** host-device
375
*** moves the interface from the host into the container
376
*** very easy for physical connections to containers
377 159 Nico Schottelius
** ipvlan
378
*** "virtualisation" of a host device
379
*** routing based on IP
380
*** Same MAC for everyone
381
*** Cannot reach the master interface
382
** maclvan
383
*** With mac addresses
384
*** Supports various modes (to be checked)
385
** ptp ("point to point")
386
*** Creates a host device and connects it to the container
387
** win*
388 158 Nico Schottelius
*** Windows implementations
389 157 Nico Schottelius
390 62 Nico Schottelius
h2. Calico CNI
391
392
h3. Calico Installation
393
394
* We install "calico using helm":https://docs.projectcalico.org/getting-started/kubernetes/helm
395
* This has the following advantages:
396
** Easy to upgrade
397
** Does not require os to configure IPv6/dual stack settings as the tigera operator figures out things on its own
398
399
Usually plain calico can be installed directly using:
400
401
<pre>
402 149 Nico Schottelius
VERSION=v3.24.1
403
404 120 Nico Schottelius
helm repo add projectcalico https://docs.projectcalico.org/charts
405 124 Nico Schottelius
helm upgrade --install --namespace tigera calico projectcalico/tigera-operator --version $VERSION --create-namespace
406 1 Nico Schottelius
</pre>
407 92 Nico Schottelius
408
* Check the tags on https://github.com/projectcalico/calico/tags for the latest release
409 62 Nico Schottelius
410
h3. Installing calicoctl
411
412 115 Nico Schottelius
* General installation instructions, including binary download: https://projectcalico.docs.tigera.io/maintenance/clis/calicoctl/install
413
414 62 Nico Schottelius
To be able to manage and configure calico, we need to 
415
"install calicoctl (we choose the version as a pod)":https://docs.projectcalico.org/getting-started/clis/calicoctl/install#install-calicoctl-as-a-kubernetes-pod
416
417
<pre>
418
kubectl apply -f https://docs.projectcalico.org/manifests/calicoctl.yaml
419
</pre>
420
421 93 Nico Schottelius
Or version specific:
422
423
<pre>
424
kubectl apply -f https://github.com/projectcalico/calico/blob/v3.20.4/manifests/calicoctl.yaml
425 97 Nico Schottelius
426
# For 3.22
427
kubectl apply -f https://projectcalico.docs.tigera.io/archive/v3.22/manifests/calicoctl.yaml
428 93 Nico Schottelius
</pre>
429
430 70 Nico Schottelius
And making it easier accessible by alias:
431
432
<pre>
433
alias calicoctl="kubectl exec -i -n kube-system calicoctl -- /calicoctl"
434
</pre>
435
436 62 Nico Schottelius
h3. Calico configuration
437
438 63 Nico Schottelius
By default our k8s clusters "BGP peer":https://docs.projectcalico.org/networking/bgp
439
with an upstream router to propagate podcidr and servicecidr.
440 62 Nico Schottelius
441
Default settings in our infrastructure:
442
443
* We use a full-mesh using the @nodeToNodeMeshEnabled: true@ option
444
* We keep the original next hop so that *only* the server with the pod is announcing it (instead of ecmp)
445 1 Nico Schottelius
* We use private ASNs for k8s clusters
446 63 Nico Schottelius
* We do *not* use any overlay
447 62 Nico Schottelius
448
After installing calico and calicoctl the last step of the installation is usually:
449
450 1 Nico Schottelius
<pre>
451 79 Nico Schottelius
calicoctl create -f - < calico-bgp.yaml
452 62 Nico Schottelius
</pre>
453
454
455
A sample BGP configuration:
456
457
<pre>
458
---
459
apiVersion: projectcalico.org/v3
460
kind: BGPConfiguration
461
metadata:
462
  name: default
463
spec:
464
  logSeverityScreen: Info
465
  nodeToNodeMeshEnabled: true
466
  asNumber: 65534
467
  serviceClusterIPs:
468
  - cidr: 2a0a:e5c0:10:3::/108
469
  serviceExternalIPs:
470
  - cidr: 2a0a:e5c0:10:3::/108
471
---
472
apiVersion: projectcalico.org/v3
473
kind: BGPPeer
474
metadata:
475
  name: router1-place10
476
spec:
477
  peerIP: 2a0a:e5c0:10:1::50
478
  asNumber: 213081
479
  keepOriginalNextHop: true
480
</pre>
481
482 126 Nico Schottelius
h2. Cilium CNI (experimental)
483
484 137 Nico Schottelius
h3. Status
485
486 138 Nico Schottelius
*NO WORKING CILIUM CONFIGURATION FOR IPV6 only modes*
487 137 Nico Schottelius
488 146 Nico Schottelius
h3. Latest error
489
490
It seems cilium does not run on IPv6 only hosts:
491
492
<pre>
493
level=info msg="Validating configured node address ranges" subsys=daemon
494
level=fatal msg="postinit failed" error="external IPv4 node address could not be derived, please configure via --ipv4-node" subsys=daemon
495
level=info msg="Starting IP identity watcher" subsys=ipcache
496
</pre>
497
498
It crashes after that log entry
499
500 128 Nico Schottelius
h3. BGP configuration
501
502
* The cilium-operator will not start without a correct configmap being present beforehand (see error message below)
503
* Creating the bgp config beforehand as a configmap is thus required.
504
505
The error one gets without the configmap present:
506
507
Pods are hanging with:
508
509
<pre>
510
cilium-bpqm6                       0/1     Init:0/4            0             9s
511
cilium-operator-5947d94f7f-5bmh2   0/1     ContainerCreating   0             9s
512
</pre>
513
514
The error message in the cilium-*perator is:
515
516
<pre>
517
Events:
518
  Type     Reason       Age                From               Message
519
  ----     ------       ----               ----               -------
520
  Normal   Scheduled    80s                default-scheduler  Successfully assigned kube-system/cilium-operator-5947d94f7f-lqcsp to server56
521
  Warning  FailedMount  16s (x8 over 80s)  kubelet            MountVolume.SetUp failed for volume "bgp-config-path" : configmap "bgp-config" not found
522
</pre>
523
524
A correct bgp config looks like this:
525
526
<pre>
527
apiVersion: v1
528
kind: ConfigMap
529
metadata:
530
  name: bgp-config
531
  namespace: kube-system
532
data:
533
  config.yaml: |
534
    peers:
535
      - peer-address: 2a0a:e5c0::46
536
        peer-asn: 209898
537
        my-asn: 65533
538
      - peer-address: 2a0a:e5c0::47
539
        peer-asn: 209898
540
        my-asn: 65533
541
    address-pools:
542
      - name: default
543
        protocol: bgp
544
        addresses:
545
          - 2a0a:e5c0:0:14::/64
546
</pre>
547 127 Nico Schottelius
548
h3. Installation
549 130 Nico Schottelius
550 127 Nico Schottelius
Adding the repo
551 1 Nico Schottelius
<pre>
552 127 Nico Schottelius
553 129 Nico Schottelius
helm repo add cilium https://helm.cilium.io/
554 130 Nico Schottelius
helm repo update
555
</pre>
556 129 Nico Schottelius
557 135 Nico Schottelius
Installing + configuring cilium
558 129 Nico Schottelius
<pre>
559 130 Nico Schottelius
ipv6pool=2a0a:e5c0:0:14::/112
560 1 Nico Schottelius
561 146 Nico Schottelius
version=1.12.2
562 129 Nico Schottelius
563
helm upgrade --install cilium cilium/cilium --version $version \
564 1 Nico Schottelius
  --namespace kube-system \
565
  --set ipv4.enabled=false \
566
  --set ipv6.enabled=true \
567 146 Nico Schottelius
  --set enableIPv6Masquerade=false \
568
  --set bgpControlPlane.enabled=true 
569 1 Nico Schottelius
570 146 Nico Schottelius
#  --set ipam.operator.clusterPoolIPv6PodCIDRList=$ipv6pool
571
572
# Old style bgp?
573 136 Nico Schottelius
#   --set bgp.enabled=true --set bgp.announce.podCIDR=true \
574 127 Nico Schottelius
575
# Show possible configuration options
576
helm show values cilium/cilium
577
578 1 Nico Schottelius
</pre>
579 132 Nico Schottelius
580
Using a /64 for ipam.operator.clusterPoolIPv6PodCIDRList fails with:
581
582
<pre>
583
level=fatal msg="Unable to init cluster-pool allocator" error="unable to initialize IPv6 allocator New CIDR set failed; the node CIDR size is too big" subsys=cilium-operator-generic
584
</pre>
585
586 126 Nico Schottelius
587 1 Nico Schottelius
See also https://github.com/cilium/cilium/issues/20756
588 135 Nico Schottelius
589
Seems a /112 is actually working.
590
591
h3. Kernel modules
592
593
Cilium requires the following modules to be loaded on the host (not loaded by default):
594
595
<pre>
596 1 Nico Schottelius
modprobe  ip6table_raw
597
modprobe  ip6table_filter
598
</pre>
599 146 Nico Schottelius
600
h3. Interesting helm flags
601
602
* autoDirectNodeRoutes
603
* bgpControlPlane.enabled = true
604
605
h3. SEE ALSO
606
607
* https://docs.cilium.io/en/v1.12/helm-reference/
608 133 Nico Schottelius
609 150 Nico Schottelius
h2. Multus (incomplete/experimental)
610
611
(TBD)
612
613 122 Nico Schottelius
h2. ArgoCD 
614 56 Nico Schottelius
615 60 Nico Schottelius
h3. Argocd Installation
616 1 Nico Schottelius
617 116 Nico Schottelius
* See https://argo-cd.readthedocs.io/en/stable/
618
619 60 Nico Schottelius
As there is no configuration management present yet, argocd is installed using
620
621 1 Nico Schottelius
<pre>
622 60 Nico Schottelius
kubectl create namespace argocd
623 86 Nico Schottelius
624 96 Nico Schottelius
# Specific Version
625
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/v2.3.2/manifests/install.yaml
626 86 Nico Schottelius
627
# OR: latest stable
628 60 Nico Schottelius
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml
629 56 Nico Schottelius
</pre>
630 1 Nico Schottelius
631 116 Nico Schottelius
632 1 Nico Schottelius
633 60 Nico Schottelius
h3. Get the argocd credentials
634
635
<pre>
636
kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d; echo ""
637
</pre>
638 52 Nico Schottelius
639 87 Nico Schottelius
h3. Accessing argocd
640
641
In regular IPv6 clusters:
642
643
* Navigate to https://argocd-server.argocd.CLUSTERDOMAIN
644
645
In legacy IPv4 clusters
646
647
<pre>
648
kubectl --namespace argocd port-forward svc/argocd-server 8080:80
649
</pre>
650
651 88 Nico Schottelius
* Navigate to https://localhost:8080
652
653 68 Nico Schottelius
h3. Using the argocd webhook to trigger changes
654 67 Nico Schottelius
655
* To trigger changes post json https://argocd.example.com/api/webhook
656
657 72 Nico Schottelius
h3. Deploying an application
658
659
* Applications are deployed via git towards gitea (code.ungleich.ch) and then pulled by argo
660 73 Nico Schottelius
* Always include the *redmine-url* pointing to the (customer) ticket
661
** Also add the support-url if it exists
662 72 Nico Schottelius
663
Application sample
664
665
<pre>
666
apiVersion: argoproj.io/v1alpha1
667
kind: Application
668
metadata:
669
  name: gitea-CUSTOMER
670
  namespace: argocd
671
spec:
672
  destination:
673
    namespace: default
674
    server: 'https://kubernetes.default.svc'
675
  source:
676
    path: apps/prod/gitea
677
    repoURL: 'https://code.ungleich.ch/ungleich-intern/k8s-config.git'
678
    targetRevision: HEAD
679
    helm:
680
      parameters:
681
        - name: storage.data.storageClass
682
          value: rook-ceph-block-hdd
683
        - name: storage.data.size
684
          value: 200Gi
685
        - name: storage.db.storageClass
686
          value: rook-ceph-block-ssd
687
        - name: storage.db.size
688
          value: 10Gi
689
        - name: storage.letsencrypt.storageClass
690
          value: rook-ceph-block-hdd
691
        - name: storage.letsencrypt.size
692
          value: 50Mi
693
        - name: letsencryptStaging
694
          value: 'no'
695
        - name: fqdn
696
          value: 'code.verua.online'
697
  project: default
698
  syncPolicy:
699
    automated:
700
      prune: true
701
      selfHeal: true
702
  info:
703
    - name: 'redmine-url'
704
      value: 'https://redmine.ungleich.ch/issues/ISSUEID'
705
    - name: 'support-url'
706
      value: 'https://support.ungleich.ch/Ticket/Display.html?id=TICKETID'
707
</pre>
708
709 80 Nico Schottelius
h2. Helm related operations and conventions
710 55 Nico Schottelius
711 61 Nico Schottelius
We use helm charts extensively.
712
713
* In production, they are managed via argocd
714
* In development, helm chart can de developed and deployed manually using the helm utility.
715
716 55 Nico Schottelius
h3. Installing a helm chart
717
718
One can use the usual pattern of
719
720
<pre>
721
helm install <releasename> <chartdirectory>
722
</pre>
723
724
However often you want to reinstall/update when testing helm charts. The following pattern is "better", because it allows you to reinstall, if it is already installed:
725
726
<pre>
727
helm upgrade --install <releasename> <chartdirectory>
728 1 Nico Schottelius
</pre>
729 80 Nico Schottelius
730
h3. Naming services and deployments in helm charts [Application labels]
731
732
* We always have {{ .Release.Name }} to identify the current "instance"
733
* Deployments:
734
** use @app: <what it is>@, f.i. @app: nginx@, @app: postgres@, ...
735 81 Nico Schottelius
* See more about standard labels on
736
** https://kubernetes.io/docs/concepts/overview/working-with-objects/common-labels/
737
** https://helm.sh/docs/chart_best_practices/labels/
738 55 Nico Schottelius
739 151 Nico Schottelius
h3. Show all versions of a helm chart
740
741
<pre>
742
helm search repo -l repo/chart
743
</pre>
744
745
For example:
746
747
<pre>
748
% helm search repo -l projectcalico/tigera-operator 
749
NAME                         	CHART VERSION	APP VERSION	DESCRIPTION                            
750
projectcalico/tigera-operator	v3.23.3      	v3.23.3    	Installs the Tigera operator for Calico
751
projectcalico/tigera-operator	v3.23.2      	v3.23.2    	Installs the Tigera operator for Calico
752
....
753
</pre>
754
755 152 Nico Schottelius
h3. Show possible values of a chart
756
757
<pre>
758
helm show values <repo/chart>
759
</pre>
760
761
Example:
762
763
<pre>
764
helm show values ingress-nginx/ingress-nginx
765
</pre>
766
767
768 139 Nico Schottelius
h2. Rook + Ceph
769
770
h3. Installation
771
772
* Usually directly via argocd
773
774
Manual steps:
775
776
<pre>
777
778
</pre>
779 43 Nico Schottelius
780 71 Nico Schottelius
h3. Executing ceph commands
781
782
Using the ceph-tools pod as follows:
783
784
<pre>
785
kubectl exec -n rook-ceph -ti $(kubectl -n rook-ceph get pods -l app=rook-ceph-tools -o jsonpath='{.items[*].metadata.name}') -- ceph -s
786
</pre>
787
788 43 Nico Schottelius
h3. Inspecting the logs of a specific server
789
790
<pre>
791
# Get the related pods
792
kubectl -n rook-ceph get pods -l app=rook-ceph-osd-prepare 
793
...
794
795
# Inspect the logs of a specific pod
796
kubectl -n rook-ceph logs -f rook-ceph-osd-prepare-server23--1-444qx
797
798 71 Nico Schottelius
</pre>
799
800
h3. Inspecting the logs of the rook-ceph-operator
801
802
<pre>
803
kubectl -n rook-ceph logs -f -l app=rook-ceph-operator
804 43 Nico Schottelius
</pre>
805
806 121 Nico Schottelius
h3. Restarting the rook operator
807
808
<pre>
809
kubectl -n rook-ceph delete pods  -l app=rook-ceph-operator
810
</pre>
811
812 43 Nico Schottelius
h3. Triggering server prepare / adding new osds
813
814
The rook-ceph-operator triggers/watches/creates pods to maintain hosts. To trigger a full "re scan", simply delete that pod:
815
816
<pre>
817
kubectl -n rook-ceph delete pods -l app=rook-ceph-operator
818
</pre>
819
820
This will cause all the @rook-ceph-osd-prepare-..@ jobs to be recreated and thus OSDs to be created, if new disks have been added.
821
822
h3. Removing an OSD
823
824
* See "Ceph OSD Management":https://rook.io/docs/rook/v1.7/ceph-osd-mgmt.html
825 77 Nico Schottelius
* More specifically: https://github.com/rook/rook/blob/release-1.7/cluster/examples/kubernetes/ceph/osd-purge.yaml
826 99 Nico Schottelius
* Then delete the related deployment
827 41 Nico Schottelius
828 98 Nico Schottelius
Set osd id in the osd-purge.yaml and apply it. OSD should be down before.
829
830
<pre>
831
apiVersion: batch/v1
832
kind: Job
833
metadata:
834
  name: rook-ceph-purge-osd
835
  namespace: rook-ceph # namespace:cluster
836
  labels:
837
    app: rook-ceph-purge-osd
838
spec:
839
  template:
840
    metadata:
841
      labels:
842
        app: rook-ceph-purge-osd
843
    spec:
844
      serviceAccountName: rook-ceph-purge-osd
845
      containers:
846
        - name: osd-removal
847
          image: rook/ceph:master
848
          # TODO: Insert the OSD ID in the last parameter that is to be removed
849
          # The OSD IDs are a comma-separated list. For example: "0" or "0,2".
850
          # If you want to preserve the OSD PVCs, set `--preserve-pvc true`.
851
          #
852
          # A --force-osd-removal option is available if the OSD should be destroyed even though the
853
          # removal could lead to data loss.
854
          args:
855
            - "ceph"
856
            - "osd"
857
            - "remove"
858
            - "--preserve-pvc"
859
            - "false"
860
            - "--force-osd-removal"
861
            - "false"
862
            - "--osd-ids"
863
            - "SETTHEOSDIDHERE"
864
          env:
865
            - name: POD_NAMESPACE
866
              valueFrom:
867
                fieldRef:
868
                  fieldPath: metadata.namespace
869
            - name: ROOK_MON_ENDPOINTS
870
              valueFrom:
871
                configMapKeyRef:
872
                  key: data
873
                  name: rook-ceph-mon-endpoints
874
            - name: ROOK_CEPH_USERNAME
875
              valueFrom:
876
                secretKeyRef:
877
                  key: ceph-username
878
                  name: rook-ceph-mon
879
            - name: ROOK_CEPH_SECRET
880
              valueFrom:
881
                secretKeyRef:
882
                  key: ceph-secret
883
                  name: rook-ceph-mon
884
            - name: ROOK_CONFIG_DIR
885
              value: /var/lib/rook
886
            - name: ROOK_CEPH_CONFIG_OVERRIDE
887
              value: /etc/rook/config/override.conf
888
            - name: ROOK_FSID
889
              valueFrom:
890
                secretKeyRef:
891
                  key: fsid
892
                  name: rook-ceph-mon
893
            - name: ROOK_LOG_LEVEL
894
              value: DEBUG
895
          volumeMounts:
896
            - mountPath: /etc/ceph
897
              name: ceph-conf-emptydir
898
            - mountPath: /var/lib/rook
899
              name: rook-config
900
      volumes:
901
        - emptyDir: {}
902
          name: ceph-conf-emptydir
903
        - emptyDir: {}
904
          name: rook-config
905
      restartPolicy: Never
906
907
908 99 Nico Schottelius
</pre>
909
910
Deleting the deployment:
911
912
<pre>
913
[18:05] bridge:~% kubectl -n rook-ceph delete deployment rook-ceph-osd-6
914
deployment.apps "rook-ceph-osd-6" deleted
915 98 Nico Schottelius
</pre>
916
917 145 Nico Schottelius
h2. Ingress + Cert Manager
918
919
* We deploy "nginx-ingress":https://docs.nginx.com/nginx-ingress-controller/ to get an ingress
920
* we deploy "cert-manager":https://cert-manager.io/ to handle certificates
921
* We independently deploy @ClusterIssuer@ to allow the cert-manager app to deploy and the issuer to be created once the CRDs from cert manager are in place
922
923
h3. IPv4 reachability 
924
925
The ingress is by default IPv6 only. To make it reachable from the IPv4 world, get its IPv6 address and configure a NAT64 mapping in Jool.
926
927
Steps:
928
929
h4. Get the ingress IPv6 address
930
931
Use @kubectl -n ingress-nginx get svc ingress-nginx-controller -o jsonpath='{.spec.clusterIP}'; echo ''@
932
933
Example:
934
935
<pre>
936
kubectl -n ingress-nginx get svc ingress-nginx-controller -o jsonpath='{.spec.clusterIP}'; echo ''
937
2a0a:e5c0:10:1b::ce11
938
</pre>
939
940
h4. Add NAT64 mapping
941
942
* Update the __dcl_jool_siit cdist type
943
* Record the two IPs (IPv6 and IPv4)
944
* Configure all routers
945
946
947
h4. Add DNS record
948
949
To use the ingress capable as a CNAME destination, create an "ingress" DNS record, such as:
950
951
<pre>
952
; k8s ingress for dev
953
dev-ingress                 AAAA 2a0a:e5c0:10:1b::ce11
954
dev-ingress                 A 147.78.194.23
955
956
</pre> 
957
958
h4. Add supporting wildcard DNS
959
960
If you plan to add various sites under a specific domain, we can add a wildcard DNS entry, such as *.k8s-dev.django-hosting.ch:
961
962
<pre>
963
*.k8s-dev         CNAME dev-ingress.ungleich.ch.
964
</pre>
965
966 76 Nico Schottelius
h2. Harbor
967
968
* We user "Harbor":https://goharbor.io/ for caching and as an image registry. Internal app reference: apps/prod/harbor.
969
* The admin password is in the password store, auto generated per cluster
970
* At the moment harbor only authenticates against the internal ldap tree
971
972
h3. LDAP configuration
973
974
* The url needs to be ldaps://...
975
* uid = uid
976
* rest standard
977 75 Nico Schottelius
978 89 Nico Schottelius
h2. Monitoring / Prometheus
979
980 90 Nico Schottelius
* Via "kube-prometheus":https://github.com/prometheus-operator/kube-prometheus/
981 89 Nico Schottelius
982 91 Nico Schottelius
Access via ...
983
984
* http://prometheus-k8s.monitoring.svc:9090
985
* http://grafana.monitoring.svc:3000
986
* http://alertmanager.monitoring.svc:9093
987
988
989 100 Nico Schottelius
h3. Prometheus Options
990
991
* "helm/kube-prometheus-stack":https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack
992
** Includes dashboards and co.
993
* "manifest based kube-prometheus":https://github.com/prometheus-operator/kube-prometheus
994
** Includes dashboards and co.
995
* "Prometheus Operator (mainly CRD manifest":https://github.com/prometheus-operator/prometheus-operator
996
997 91 Nico Schottelius
998 82 Nico Schottelius
h2. Nextcloud
999
1000 85 Nico Schottelius
h3. How to get the nextcloud credentials 
1001 84 Nico Schottelius
1002
* The initial username is set to "nextcloud"
1003
* The password is autogenerated and saved in a kubernetes secret
1004
1005
<pre>
1006 85 Nico Schottelius
kubectl get secret RELEASENAME-nextcloud -o jsonpath="{.data.PASSWORD}" | base64 -d; echo "" 
1007 84 Nico Schottelius
</pre>
1008
1009 83 Nico Schottelius
h3. How to fix "Access through untrusted domain"
1010
1011 82 Nico Schottelius
* Nextcloud stores the initial domain configuration
1012 1 Nico Schottelius
* If the FQDN is changed, it will show the error message "Access through untrusted domain"
1013 82 Nico Schottelius
* To fix, edit /var/www/html/config/config.php and correct the domain
1014 83 Nico Schottelius
* Then delete the pods
1015 82 Nico Schottelius
1016 1 Nico Schottelius
h2. Infrastructure versions
1017 35 Nico Schottelius
1018 57 Nico Schottelius
h3. ungleich kubernetes infrastructure v5 (2021-10)
1019 1 Nico Schottelius
1020 57 Nico Schottelius
Clusters are configured / setup in this order:
1021
1022
* Bootstrap via kubeadm
1023 59 Nico Schottelius
* "Networking via calico + BGP (non ECMP) using helm":https://docs.projectcalico.org/getting-started/kubernetes/helm
1024
* "ArgoCD for CD":https://argo-cd.readthedocs.io/en/stable/
1025
** "rook for storage via argocd":https://rook.io/
1026 58 Nico Schottelius
** haproxy for in IPv6-cluster-IPv4-to-IPv6 proxy via argocd
1027
** "kubernetes-secret-generator for in cluster secrets":https://github.com/mittwald/kubernetes-secret-generator
1028
** "ungleich-certbot managing certs and nginx":https://hub.docker.com/repository/docker/ungleich/ungleich-certbot
1029
1030 57 Nico Schottelius
1031
h3. ungleich kubernetes infrastructure v4 (2021-09)
1032
1033 54 Nico Schottelius
* rook is configured via manifests instead of using the rook-ceph-cluster helm chart
1034 1 Nico Schottelius
* The rook operator is still being installed via helm
1035 35 Nico Schottelius
1036 57 Nico Schottelius
h3. ungleich kubernetes infrastructure v3 (2021-07)
1037 1 Nico Schottelius
1038 10 Nico Schottelius
* rook is now installed via helm via argocd instead of directly via manifests
1039 28 Nico Schottelius
1040 57 Nico Schottelius
h3. ungleich kubernetes infrastructure v2 (2021-05)
1041 28 Nico Schottelius
1042
* Replaced fluxv2 from ungleich k8s v1 with argocd
1043 1 Nico Schottelius
** argocd can apply helm templates directly without needing to go through Chart releases
1044 28 Nico Schottelius
* We are also using argoflow for build flows
1045
* Planned to add "kaniko":https://github.com/GoogleContainerTools/kaniko for image building
1046
1047 57 Nico Schottelius
h3. ungleich kubernetes infrastructure v1 (2021-01)
1048 28 Nico Schottelius
1049
We are using the following components:
1050
1051
* "Calico as a CNI":https://www.projectcalico.org/ with BGP, IPv6 only, no encapsulation
1052
** Needed for basic networking
1053
* "kubernetes-secret-generator":https://github.com/mittwald/kubernetes-secret-generator for creating secrets
1054
** Needed so that secrets are not stored in the git repository, but only in the cluster
1055
* "ungleich-certbot":https://hub.docker.com/repository/docker/ungleich/ungleich-certbot
1056
** Needed to get letsencrypt certificates for services
1057
* "rook with ceph rbd + cephfs":https://rook.io/ for storage
1058
** rbd for almost everything, *ReadWriteOnce*
1059
** cephfs for smaller things, multi access *ReadWriteMany*
1060
** Needed for providing persistent storage
1061
* "flux v2":https://fluxcd.io/
1062
** Needed to manage resources automatically