Project

General

Profile

The ungleich kubernetes infrastructure » History » Version 147

Nico Schottelius, 10/01/2022 08:24 AM

1 22 Nico Schottelius
h1. The ungleich kubernetes infrastructure and ungleich kubernetes manual
2 1 Nico Schottelius
3 3 Nico Schottelius
{{toc}}
4
5 1 Nico Schottelius
h2. Status
6
7 28 Nico Schottelius
This document is **pre-production**.
8
This document is to become the ungleich kubernetes infrastructure overview as well as the ungleich kubernetes manual.
9 1 Nico Schottelius
10 10 Nico Schottelius
h2. k8s clusters
11
12 123 Nico Schottelius
| Cluster            | Purpose/Setup     | Maintainer | Master(s)                     | argo                                                   | v4 http proxy | last verified |
13
| c0.k8s.ooo         | Dev               | -          | UNUSED                        |                                                        |               |    2021-10-05 |
14
| c1.k8s.ooo         | retired           |            | -                             |                                                        |               |    2022-03-15 |
15
| c2.k8s.ooo         | Dev p7 HW         | Nico       | server47 server53 server54    | "argo":https://argocd-server.argocd.svc.c2.k8s.ooo     |               |    2021-10-05 |
16
| c3.k8s.ooo         | retired           | -          | -                             |                                                        |               |    2021-10-05 |
17
| c4.k8s.ooo         | Dev2 p7 HW        | Jin-Guk    | server52 server53 server54    |                                                        |               |             - |
18
| c5.k8s.ooo         | retired           |            | -                             |                                                        |               |    2022-03-15 |
19
| c6.k8s.ooo         | Dev p6 VM Jin-Guk | Jin-Guk    |                               |                                                        |               |               |
20
| [[p5.k8s.ooo]]     | production        |            | server34 server36 server38    | "argo":https://argocd-server.argocd.svc.p5.k8s.ooo     | -             |               |
21
| [[p5-cow.k8s.ooo]] | production        | Nico       | server47 server51 server55    | "argo":https://argocd-server.argocd.svc.p5-cow.k8s.ooo |               |    2022-08-27 |
22
| [[p6.k8s.ooo]]     | production        |            | server67 server69 server71    | "argo":https://argocd-server.argocd.svc.p6.k8s.ooo     | 147.78.194.13 |    2021-10-05 |
23
| [[p10.k8s.ooo]]    | production        |            | server63 server65 server83    | "argo":https://argocd-server.argocd.svc.p10.k8s.ooo    | 147.78.194.12 |    2021-10-05 |
24
| [[k8s.ge.nau.so]]  | development       |            | server107 server108 server109 | "argo":https://argocd-server.argocd.svc.k8s.ge.nau.so  |               |               |
25
| [[dev.k8s.ooo]]    | development       |            | server110 server111 server112 | "argo":https://argocd-server.argocd.svc.dev.k8s.ooo    | -             |    2022-07-08 |
26 142 Nico Schottelius
| [[server121.k8s.ooo]] | production | Nico | server121 | | | 2022-09-06 |
27 21 Nico Schottelius
28 1 Nico Schottelius
h2. General architecture and components overview
29
30
* All k8s clusters are IPv6 only
31
* We use BGP peering to propagate podcidr and serviceCidr networks to our infrastructure
32
* The main public testing repository is "ungleich-k8s":https://code.ungleich.ch/ungleich-public/ungleich-k8s
33 18 Nico Schottelius
** Private configurations are found in the **k8s-config** repository
34 1 Nico Schottelius
35
h3. Cluster types
36
37 28 Nico Schottelius
| **Type/Feature**            | **Development**                | **Production**         |
38
| Min No. nodes               | 3 (1 master, 3 worker)         | 5 (3 master, 3 worker) |
39
| Recommended minimum         | 4 (dedicated master, 3 worker) | 8 (3 master, 5 worker) |
40
| Separation of control plane | optional                       | recommended            |
41
| Persistent storage          | required                       | required               |
42
| Number of storage monitors  | 3                              | 5                      |
43 1 Nico Schottelius
44 43 Nico Schottelius
h2. General k8s operations
45 1 Nico Schottelius
46 46 Nico Schottelius
h3. Cheat sheet / external great references
47
48
* "kubectl cheatsheet":https://kubernetes.io/docs/reference/kubectl/cheatsheet/
49
50 117 Nico Schottelius
h3. Allowing to schedule work on the control plane / removing node taints
51 69 Nico Schottelius
52
* Mostly for single node / test / development clusters
53
* Just remove the master taint as follows
54
55
<pre>
56
kubectl taint nodes --all node-role.kubernetes.io/master-
57 118 Nico Schottelius
kubectl taint nodes --all node-role.kubernetes.io/control-plane-
58 69 Nico Schottelius
</pre>
59 1 Nico Schottelius
60 117 Nico Schottelius
You can check the node taints using @kubectl describe node ...@
61 69 Nico Schottelius
62 44 Nico Schottelius
h3. Get the cluster admin.conf
63
64
* On the masters of each cluster you can find the file @/etc/kubernetes/admin.conf@
65
* To be able to administrate the cluster you can copy the admin.conf to your local machine
66
* Multi cluster debugging can very easy if you name the config ~/cX-admin.conf (see example below)
67
68
<pre>
69
% scp root@server47.place7.ungleich.ch:/etc/kubernetes/admin.conf ~/c2-admin.conf
70
% export KUBECONFIG=~/c2-admin.conf    
71
% kubectl get nodes
72
NAME       STATUS                     ROLES                  AGE   VERSION
73
server47   Ready                      control-plane,master   82d   v1.22.0
74
server48   Ready                      control-plane,master   82d   v1.22.0
75
server49   Ready                      <none>                 82d   v1.22.0
76
server50   Ready                      <none>                 82d   v1.22.0
77
server59   Ready                      control-plane,master   82d   v1.22.0
78
server60   Ready,SchedulingDisabled   <none>                 82d   v1.22.0
79
server61   Ready                      <none>                 82d   v1.22.0
80
server62   Ready                      <none>                 82d   v1.22.0               
81
</pre>
82
83 18 Nico Schottelius
h3. Installing a new k8s cluster
84 8 Nico Schottelius
85 9 Nico Schottelius
* Decide on the cluster name (usually *cX.k8s.ooo*), X counting upwards
86 28 Nico Schottelius
** Using pXX.k8s.ooo for production clusters of placeXX
87 9 Nico Schottelius
* Use cdist to configure the nodes with requirements like crio
88
* Decide between single or multi node control plane setups (see below)
89 28 Nico Schottelius
** Single control plane suitable for development clusters
90 9 Nico Schottelius
91 28 Nico Schottelius
Typical init procedure:
92 9 Nico Schottelius
93 28 Nico Schottelius
* Single control plane: @kubeadm init --config bootstrap/XXX/kubeadm.yaml@
94
* Multi control plane (HA): @kubeadm init --config bootstrap/XXX/kubeadm.yaml --upload-certs@
95 10 Nico Schottelius
96 29 Nico Schottelius
h3. Deleting a pod that is hanging in terminating state
97
98
<pre>
99
kubectl delete pod <PODNAME> --grace-period=0 --force --namespace <NAMESPACE>
100
</pre>
101
102
(from https://stackoverflow.com/questions/35453792/pods-stuck-in-terminating-status)
103
104 42 Nico Schottelius
h3. Listing nodes of a cluster
105
106
<pre>
107
[15:05] bridge:~% kubectl get nodes
108
NAME       STATUS   ROLES                  AGE   VERSION
109
server22   Ready    <none>                 52d   v1.22.0
110
server23   Ready    <none>                 52d   v1.22.2
111
server24   Ready    <none>                 52d   v1.22.0
112
server25   Ready    <none>                 52d   v1.22.0
113
server26   Ready    <none>                 52d   v1.22.0
114
server27   Ready    <none>                 52d   v1.22.0
115
server63   Ready    control-plane,master   52d   v1.22.0
116
server64   Ready    <none>                 52d   v1.22.0
117
server65   Ready    control-plane,master   52d   v1.22.0
118
server66   Ready    <none>                 52d   v1.22.0
119
server83   Ready    control-plane,master   52d   v1.22.0
120
server84   Ready    <none>                 52d   v1.22.0
121
server85   Ready    <none>                 52d   v1.22.0
122
server86   Ready    <none>                 52d   v1.22.0
123
</pre>
124
125 41 Nico Schottelius
h3. Removing / draining a node
126
127
Usually @kubectl drain server@ should do the job, but sometimes we need to be more aggressive:
128
129 1 Nico Schottelius
<pre>
130 103 Nico Schottelius
kubectl drain --delete-emptydir-data --ignore-daemonsets serverXX
131 42 Nico Schottelius
</pre>
132
133
h3. Readding a node after draining
134
135
<pre>
136
kubectl uncordon serverXX
137 1 Nico Schottelius
</pre>
138 43 Nico Schottelius
139 50 Nico Schottelius
h3. (Re-)joining worker nodes after creating the cluster
140 49 Nico Schottelius
141
* We need to have an up-to-date token
142
* We use different join commands for the workers and control plane nodes
143
144
Generating the join command on an existing control plane node:
145
146
<pre>
147
kubeadm token create --print-join-command
148
</pre>
149
150 50 Nico Schottelius
h3. (Re-)joining control plane nodes after creating the cluster
151 1 Nico Schottelius
152 50 Nico Schottelius
* We generate the token again
153
* We upload the certificates
154
* We need to combine/create the join command for the control plane node
155
156
Example session:
157
158
<pre>
159
% kubeadm token create --print-join-command
160
kubeadm join p10-api.k8s.ooo:6443 --token xmff4i.ABC --discovery-token-ca-cert-hash sha256:longhash 
161
162
% kubeadm init phase upload-certs --upload-certs
163
[upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
164
[upload-certs] Using certificate key:
165
CERTKEY
166
167
# Then we use these two outputs on the joining node:
168
169
kubeadm join p10-api.k8s.ooo:6443 --token xmff4i.ABC --discovery-token-ca-cert-hash sha256:longhash --control-plane --certificate-key CERTKEY
170
</pre>
171
172
Commands to be used on a control plane node:
173
174
<pre>
175
kubeadm token create --print-join-command
176
kubeadm init phase upload-certs --upload-certs
177
</pre>
178
179
Commands to be used on the joining node:
180
181
<pre>
182
JOINCOMMAND --control-plane --certificate-key CERTKEY
183
</pre>
184 49 Nico Schottelius
185 51 Nico Schottelius
SEE ALSO
186
187
* https://stackoverflow.com/questions/63936268/how-to-generate-kubeadm-token-for-secondary-control-plane-nodes
188
* https://blog.scottlowe.org/2019/08/15/reconstructing-the-join-command-for-kubeadm/
189
190 53 Nico Schottelius
h3. How to fix etcd does not start when rejoining a kubernetes cluster as a control plane
191 52 Nico Schottelius
192
If during the above step etcd does not come up, @kubeadm join@ can hang as follows:
193
194
<pre>
195
[control-plane] Creating static Pod manifest for "kube-apiserver"                                                              
196
[control-plane] Creating static Pod manifest for "kube-controller-manager"                                                     
197
[control-plane] Creating static Pod manifest for "kube-scheduler"                                                              
198
[check-etcd] Checking that the etcd cluster is healthy                                                                         
199
error execution phase check-etcd: etcd cluster is not healthy: failed to dial endpoint https://[2a0a:e5c0:10:1:225:b3ff:fe20:37
200
8a]:2379 with maintenance client: context deadline exceeded                                                                    
201
To see the stack trace of this error execute with --v=5 or higher         
202
</pre>
203
204
Then the problem is likely that the etcd server is still a member of the cluster. We first need to remove it from the etcd cluster and then the join works.
205
206
To fix this we do:
207
208
* Find a working etcd pod
209
* Find the etcd members / member list
210
* Remove the etcd member that we want to re-join the cluster
211
212
213
<pre>
214
# Find the etcd pods
215
kubectl -n kube-system get pods -l component=etcd,tier=control-plane
216
217
# Get the list of etcd servers with the member id 
218
kubectl exec -n kube-system -ti ETCDPODNAME -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert  /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member list
219
220
# Remove the member
221
kubectl exec -n kube-system -ti ETCDPODNAME -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert  /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member remove MEMBERID
222
</pre>
223
224
Sample session:
225
226
<pre>
227
[10:48] line:~% kubectl -n kube-system get pods -l component=etcd,tier=control-plane
228
NAME            READY   STATUS    RESTARTS     AGE
229
etcd-server63   1/1     Running   0            3m11s
230
etcd-server65   1/1     Running   3            7d2h
231
etcd-server83   1/1     Running   8 (6d ago)   7d2h
232
[10:48] line:~% kubectl exec -n kube-system -ti etcd-server65 -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert  /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member list
233
356891cd676df6e4, started, server65, https://[2a0a:e5c0:10:1:225:b3ff:fe20:375c]:2380, https://[2a0a:e5c0:10:1:225:b3ff:fe20:375c]:2379, false
234
371b8a07185dee7e, started, server63, https://[2a0a:e5c0:10:1:225:b3ff:fe20:378a]:2380, https://[2a0a:e5c0:10:1:225:b3ff:fe20:378a]:2379, false
235
5942bc58307f8af9, started, server83, https://[2a0a:e5c0:10:1:3e4a:92ff:fe79:bb98]:2380, https://[2a0a:e5c0:10:1:3e4a:92ff:fe79:bb98]:2379, false
236
237
[10:48] line:~% kubectl exec -n kube-system -ti etcd-server65 -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert  /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member remove 371b8a07185dee7e
238
Member 371b8a07185dee7e removed from cluster e3c0805f592a8f77
239 1 Nico Schottelius
240
</pre>
241
242
SEE ALSO
243
244
* We found the solution using https://stackoverflow.com/questions/67921552/re-installed-node-cannot-join-kubernetes-cluster
245 56 Nico Schottelius
246 147 Nico Schottelius
h3. Node labels (adding, showing, removing)
247
248
Listing the labels:
249
250
<pre>
251
kubectl get nodes --show-labels
252
</pre>
253
254
Adding labels:
255
256
<pre>
257
kubectl label nodes LIST-OF-NODES label1=value1 
258
259
</pre>
260
261
For instance:
262
263
<pre>
264
kubectl label nodes router2 router3 hosttype=router 
265
</pre>
266
267
Selecting nodes in pods:
268
269
<pre>
270
apiVersion: v1
271
kind: Pod
272
...
273
spec:
274
  nodeSelector:
275
    hosttype: router
276
</pre>
277
278
SEE ALSO
279
280
* kubectl get nodes --show-labels
281
282 101 Nico Schottelius
h3. Hardware Maintenance using ungleich-hardware
283
284
Use the following manifest and replace the HOST with the actual host:
285
286
<pre>
287
apiVersion: v1
288
kind: Pod
289
metadata:
290
  name: ungleich-hardware-HOST
291
spec:
292
  containers:
293
  - name: ungleich-hardware
294
    image: ungleich/ungleich-hardware:0.0.5
295
    args:
296
    - sleep
297
    - "1000000"
298
    volumeMounts:
299
      - mountPath: /dev
300
        name: dev
301
    securityContext:
302
      privileged: true
303
  nodeSelector:
304
    kubernetes.io/hostname: "HOST"
305
306
  volumes:
307
    - name: dev
308
      hostPath:
309
        path: /dev
310
</pre>
311
312 102 Nico Schottelius
Also see: [[The_ungleich_hardware_maintenance_guide]]
313
314 105 Nico Schottelius
h3. Triggering a cronjob / creating a job from a cronjob
315 104 Nico Schottelius
316
To test a cronjob, we can create a job from a cronjob:
317
318
<pre>
319
kubectl create job --from=cronjob/volume2-daily-backup volume2-manual
320
</pre>
321
322
This creates a job volume2-manual based on the cronjob  volume2-daily
323
324 112 Nico Schottelius
h3. su-ing into a user that has nologin shell set
325
326
Many times users are having nologin as their shell inside the container. To be able to execute maintenance commands within the
327
container, we can use @su -s /bin/sh@ like this:
328
329
<pre>
330
su -s /bin/sh -c '/path/to/your/script' testuser
331
</pre>
332
333
Found on https://serverfault.com/questions/351046/how-to-run-command-as-user-who-has-usr-sbin-nologin-as-shell
334
335 113 Nico Schottelius
h3. How to print a secret value
336
337
Assuming you want the "password" item from a secret, use:
338
339
<pre>
340
kubectl get secret SECRETNAME -o jsonpath="{.data.password}" | base64 -d; echo "" 
341
</pre>
342
343 62 Nico Schottelius
h2. Calico CNI
344
345
h3. Calico Installation
346
347
* We install "calico using helm":https://docs.projectcalico.org/getting-started/kubernetes/helm
348
* This has the following advantages:
349
** Easy to upgrade
350
** Does not require os to configure IPv6/dual stack settings as the tigera operator figures out things on its own
351
352
Usually plain calico can be installed directly using:
353
354
<pre>
355 125 Nico Schottelius
VERSION=v3.23.3
356 120 Nico Schottelius
helm repo add projectcalico https://docs.projectcalico.org/charts
357 124 Nico Schottelius
helm upgrade --install --namespace tigera calico projectcalico/tigera-operator --version $VERSION --create-namespace
358 1 Nico Schottelius
</pre>
359 92 Nico Schottelius
360
* Check the tags on https://github.com/projectcalico/calico/tags for the latest release
361 62 Nico Schottelius
362
h3. Installing calicoctl
363
364 115 Nico Schottelius
* General installation instructions, including binary download: https://projectcalico.docs.tigera.io/maintenance/clis/calicoctl/install
365
366 62 Nico Schottelius
To be able to manage and configure calico, we need to 
367
"install calicoctl (we choose the version as a pod)":https://docs.projectcalico.org/getting-started/clis/calicoctl/install#install-calicoctl-as-a-kubernetes-pod
368
369
<pre>
370
kubectl apply -f https://docs.projectcalico.org/manifests/calicoctl.yaml
371
</pre>
372
373 93 Nico Schottelius
Or version specific:
374
375
<pre>
376
kubectl apply -f https://github.com/projectcalico/calico/blob/v3.20.4/manifests/calicoctl.yaml
377 97 Nico Schottelius
378
# For 3.22
379
kubectl apply -f https://projectcalico.docs.tigera.io/archive/v3.22/manifests/calicoctl.yaml
380 93 Nico Schottelius
</pre>
381
382 70 Nico Schottelius
And making it easier accessible by alias:
383
384
<pre>
385
alias calicoctl="kubectl exec -i -n kube-system calicoctl -- /calicoctl"
386
</pre>
387
388 62 Nico Schottelius
h3. Calico configuration
389
390 63 Nico Schottelius
By default our k8s clusters "BGP peer":https://docs.projectcalico.org/networking/bgp
391
with an upstream router to propagate podcidr and servicecidr.
392 62 Nico Schottelius
393
Default settings in our infrastructure:
394
395
* We use a full-mesh using the @nodeToNodeMeshEnabled: true@ option
396
* We keep the original next hop so that *only* the server with the pod is announcing it (instead of ecmp)
397 1 Nico Schottelius
* We use private ASNs for k8s clusters
398 63 Nico Schottelius
* We do *not* use any overlay
399 62 Nico Schottelius
400
After installing calico and calicoctl the last step of the installation is usually:
401
402 1 Nico Schottelius
<pre>
403 79 Nico Schottelius
calicoctl create -f - < calico-bgp.yaml
404 62 Nico Schottelius
</pre>
405
406
407
A sample BGP configuration:
408
409
<pre>
410
---
411
apiVersion: projectcalico.org/v3
412
kind: BGPConfiguration
413
metadata:
414
  name: default
415
spec:
416
  logSeverityScreen: Info
417
  nodeToNodeMeshEnabled: true
418
  asNumber: 65534
419
  serviceClusterIPs:
420
  - cidr: 2a0a:e5c0:10:3::/108
421
  serviceExternalIPs:
422
  - cidr: 2a0a:e5c0:10:3::/108
423
---
424
apiVersion: projectcalico.org/v3
425
kind: BGPPeer
426
metadata:
427
  name: router1-place10
428
spec:
429
  peerIP: 2a0a:e5c0:10:1::50
430
  asNumber: 213081
431
  keepOriginalNextHop: true
432
</pre>
433
434 126 Nico Schottelius
h2. Cilium CNI (experimental)
435
436 137 Nico Schottelius
h3. Status
437
438 138 Nico Schottelius
*NO WORKING CILIUM CONFIGURATION FOR IPV6 only modes*
439 137 Nico Schottelius
440 146 Nico Schottelius
h3. Latest error
441
442
It seems cilium does not run on IPv6 only hosts:
443
444
<pre>
445
level=info msg="Validating configured node address ranges" subsys=daemon
446
level=fatal msg="postinit failed" error="external IPv4 node address could not be derived, please configure via --ipv4-node" subsys=daemon
447
level=info msg="Starting IP identity watcher" subsys=ipcache
448
</pre>
449
450
It crashes after that log entry
451
452 128 Nico Schottelius
h3. BGP configuration
453
454
* The cilium-operator will not start without a correct configmap being present beforehand (see error message below)
455
* Creating the bgp config beforehand as a configmap is thus required.
456
457
The error one gets without the configmap present:
458
459
Pods are hanging with:
460
461
<pre>
462
cilium-bpqm6                       0/1     Init:0/4            0             9s
463
cilium-operator-5947d94f7f-5bmh2   0/1     ContainerCreating   0             9s
464
</pre>
465
466
The error message in the cilium-*perator is:
467
468
<pre>
469
Events:
470
  Type     Reason       Age                From               Message
471
  ----     ------       ----               ----               -------
472
  Normal   Scheduled    80s                default-scheduler  Successfully assigned kube-system/cilium-operator-5947d94f7f-lqcsp to server56
473
  Warning  FailedMount  16s (x8 over 80s)  kubelet            MountVolume.SetUp failed for volume "bgp-config-path" : configmap "bgp-config" not found
474
</pre>
475
476
A correct bgp config looks like this:
477
478
<pre>
479
apiVersion: v1
480
kind: ConfigMap
481
metadata:
482
  name: bgp-config
483
  namespace: kube-system
484
data:
485
  config.yaml: |
486
    peers:
487
      - peer-address: 2a0a:e5c0::46
488
        peer-asn: 209898
489
        my-asn: 65533
490
      - peer-address: 2a0a:e5c0::47
491
        peer-asn: 209898
492
        my-asn: 65533
493
    address-pools:
494
      - name: default
495
        protocol: bgp
496
        addresses:
497
          - 2a0a:e5c0:0:14::/64
498
</pre>
499 127 Nico Schottelius
500
h3. Installation
501 130 Nico Schottelius
502 127 Nico Schottelius
Adding the repo
503 1 Nico Schottelius
<pre>
504 127 Nico Schottelius
505 129 Nico Schottelius
helm repo add cilium https://helm.cilium.io/
506 130 Nico Schottelius
helm repo update
507
</pre>
508 129 Nico Schottelius
509 135 Nico Schottelius
Installing + configuring cilium
510 129 Nico Schottelius
<pre>
511 130 Nico Schottelius
ipv6pool=2a0a:e5c0:0:14::/112
512 1 Nico Schottelius
513 146 Nico Schottelius
version=1.12.2
514 129 Nico Schottelius
515
helm upgrade --install cilium cilium/cilium --version $version \
516 1 Nico Schottelius
  --namespace kube-system \
517
  --set ipv4.enabled=false \
518
  --set ipv6.enabled=true \
519 146 Nico Schottelius
  --set enableIPv6Masquerade=false \
520
  --set bgpControlPlane.enabled=true 
521 1 Nico Schottelius
522 146 Nico Schottelius
#  --set ipam.operator.clusterPoolIPv6PodCIDRList=$ipv6pool
523
524
# Old style bgp?
525 136 Nico Schottelius
#   --set bgp.enabled=true --set bgp.announce.podCIDR=true \
526 127 Nico Schottelius
527
# Show possible configuration options
528
helm show values cilium/cilium
529
530 1 Nico Schottelius
</pre>
531 132 Nico Schottelius
532
Using a /64 for ipam.operator.clusterPoolIPv6PodCIDRList fails with:
533
534
<pre>
535
level=fatal msg="Unable to init cluster-pool allocator" error="unable to initialize IPv6 allocator New CIDR set failed; the node CIDR size is too big" subsys=cilium-operator-generic
536
</pre>
537
538 126 Nico Schottelius
539 1 Nico Schottelius
See also https://github.com/cilium/cilium/issues/20756
540 135 Nico Schottelius
541
Seems a /112 is actually working.
542
543
h3. Kernel modules
544
545
Cilium requires the following modules to be loaded on the host (not loaded by default):
546
547
<pre>
548 1 Nico Schottelius
modprobe  ip6table_raw
549
modprobe  ip6table_filter
550
</pre>
551 146 Nico Schottelius
552
h3. Interesting helm flags
553
554
* autoDirectNodeRoutes
555
* bgpControlPlane.enabled = true
556
557
h3. SEE ALSO
558
559
* https://docs.cilium.io/en/v1.12/helm-reference/
560 133 Nico Schottelius
561 122 Nico Schottelius
h2. ArgoCD 
562 56 Nico Schottelius
563 60 Nico Schottelius
h3. Argocd Installation
564 1 Nico Schottelius
565 116 Nico Schottelius
* See https://argo-cd.readthedocs.io/en/stable/
566
567 60 Nico Schottelius
As there is no configuration management present yet, argocd is installed using
568
569 1 Nico Schottelius
<pre>
570 60 Nico Schottelius
kubectl create namespace argocd
571 86 Nico Schottelius
572 96 Nico Schottelius
# Specific Version
573
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/v2.3.2/manifests/install.yaml
574 86 Nico Schottelius
575
# OR: latest stable
576 60 Nico Schottelius
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml
577 56 Nico Schottelius
</pre>
578 1 Nico Schottelius
579 116 Nico Schottelius
580 1 Nico Schottelius
581 60 Nico Schottelius
h3. Get the argocd credentials
582
583
<pre>
584
kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d; echo ""
585
</pre>
586 52 Nico Schottelius
587 87 Nico Schottelius
h3. Accessing argocd
588
589
In regular IPv6 clusters:
590
591
* Navigate to https://argocd-server.argocd.CLUSTERDOMAIN
592
593
In legacy IPv4 clusters
594
595
<pre>
596
kubectl --namespace argocd port-forward svc/argocd-server 8080:80
597
</pre>
598
599 88 Nico Schottelius
* Navigate to https://localhost:8080
600
601 68 Nico Schottelius
h3. Using the argocd webhook to trigger changes
602 67 Nico Schottelius
603
* To trigger changes post json https://argocd.example.com/api/webhook
604
605 72 Nico Schottelius
h3. Deploying an application
606
607
* Applications are deployed via git towards gitea (code.ungleich.ch) and then pulled by argo
608 73 Nico Schottelius
* Always include the *redmine-url* pointing to the (customer) ticket
609
** Also add the support-url if it exists
610 72 Nico Schottelius
611
Application sample
612
613
<pre>
614
apiVersion: argoproj.io/v1alpha1
615
kind: Application
616
metadata:
617
  name: gitea-CUSTOMER
618
  namespace: argocd
619
spec:
620
  destination:
621
    namespace: default
622
    server: 'https://kubernetes.default.svc'
623
  source:
624
    path: apps/prod/gitea
625
    repoURL: 'https://code.ungleich.ch/ungleich-intern/k8s-config.git'
626
    targetRevision: HEAD
627
    helm:
628
      parameters:
629
        - name: storage.data.storageClass
630
          value: rook-ceph-block-hdd
631
        - name: storage.data.size
632
          value: 200Gi
633
        - name: storage.db.storageClass
634
          value: rook-ceph-block-ssd
635
        - name: storage.db.size
636
          value: 10Gi
637
        - name: storage.letsencrypt.storageClass
638
          value: rook-ceph-block-hdd
639
        - name: storage.letsencrypt.size
640
          value: 50Mi
641
        - name: letsencryptStaging
642
          value: 'no'
643
        - name: fqdn
644
          value: 'code.verua.online'
645
  project: default
646
  syncPolicy:
647
    automated:
648
      prune: true
649
      selfHeal: true
650
  info:
651
    - name: 'redmine-url'
652
      value: 'https://redmine.ungleich.ch/issues/ISSUEID'
653
    - name: 'support-url'
654
      value: 'https://support.ungleich.ch/Ticket/Display.html?id=TICKETID'
655
</pre>
656
657 80 Nico Schottelius
h2. Helm related operations and conventions
658 55 Nico Schottelius
659 61 Nico Schottelius
We use helm charts extensively.
660
661
* In production, they are managed via argocd
662
* In development, helm chart can de developed and deployed manually using the helm utility.
663
664 55 Nico Schottelius
h3. Installing a helm chart
665
666
One can use the usual pattern of
667
668
<pre>
669
helm install <releasename> <chartdirectory>
670
</pre>
671
672
However often you want to reinstall/update when testing helm charts. The following pattern is "better", because it allows you to reinstall, if it is already installed:
673
674
<pre>
675
helm upgrade --install <releasename> <chartdirectory>
676 1 Nico Schottelius
</pre>
677 80 Nico Schottelius
678
h3. Naming services and deployments in helm charts [Application labels]
679
680
* We always have {{ .Release.Name }} to identify the current "instance"
681
* Deployments:
682
** use @app: <what it is>@, f.i. @app: nginx@, @app: postgres@, ...
683 81 Nico Schottelius
* See more about standard labels on
684
** https://kubernetes.io/docs/concepts/overview/working-with-objects/common-labels/
685
** https://helm.sh/docs/chart_best_practices/labels/
686 55 Nico Schottelius
687 139 Nico Schottelius
h2. Rook + Ceph
688
689
h3. Installation
690
691
* Usually directly via argocd
692
693
Manual steps:
694
695
<pre>
696
697
</pre>
698 43 Nico Schottelius
699 71 Nico Schottelius
h3. Executing ceph commands
700
701
Using the ceph-tools pod as follows:
702
703
<pre>
704
kubectl exec -n rook-ceph -ti $(kubectl -n rook-ceph get pods -l app=rook-ceph-tools -o jsonpath='{.items[*].metadata.name}') -- ceph -s
705
</pre>
706
707 43 Nico Schottelius
h3. Inspecting the logs of a specific server
708
709
<pre>
710
# Get the related pods
711
kubectl -n rook-ceph get pods -l app=rook-ceph-osd-prepare 
712
...
713
714
# Inspect the logs of a specific pod
715
kubectl -n rook-ceph logs -f rook-ceph-osd-prepare-server23--1-444qx
716
717 71 Nico Schottelius
</pre>
718
719
h3. Inspecting the logs of the rook-ceph-operator
720
721
<pre>
722
kubectl -n rook-ceph logs -f -l app=rook-ceph-operator
723 43 Nico Schottelius
</pre>
724
725 121 Nico Schottelius
h3. Restarting the rook operator
726
727
<pre>
728
kubectl -n rook-ceph delete pods  -l app=rook-ceph-operator
729
</pre>
730
731 43 Nico Schottelius
h3. Triggering server prepare / adding new osds
732
733
The rook-ceph-operator triggers/watches/creates pods to maintain hosts. To trigger a full "re scan", simply delete that pod:
734
735
<pre>
736
kubectl -n rook-ceph delete pods -l app=rook-ceph-operator
737
</pre>
738
739
This will cause all the @rook-ceph-osd-prepare-..@ jobs to be recreated and thus OSDs to be created, if new disks have been added.
740
741
h3. Removing an OSD
742
743
* See "Ceph OSD Management":https://rook.io/docs/rook/v1.7/ceph-osd-mgmt.html
744 77 Nico Schottelius
* More specifically: https://github.com/rook/rook/blob/release-1.7/cluster/examples/kubernetes/ceph/osd-purge.yaml
745 99 Nico Schottelius
* Then delete the related deployment
746 41 Nico Schottelius
747 98 Nico Schottelius
Set osd id in the osd-purge.yaml and apply it. OSD should be down before.
748
749
<pre>
750
apiVersion: batch/v1
751
kind: Job
752
metadata:
753
  name: rook-ceph-purge-osd
754
  namespace: rook-ceph # namespace:cluster
755
  labels:
756
    app: rook-ceph-purge-osd
757
spec:
758
  template:
759
    metadata:
760
      labels:
761
        app: rook-ceph-purge-osd
762
    spec:
763
      serviceAccountName: rook-ceph-purge-osd
764
      containers:
765
        - name: osd-removal
766
          image: rook/ceph:master
767
          # TODO: Insert the OSD ID in the last parameter that is to be removed
768
          # The OSD IDs are a comma-separated list. For example: "0" or "0,2".
769
          # If you want to preserve the OSD PVCs, set `--preserve-pvc true`.
770
          #
771
          # A --force-osd-removal option is available if the OSD should be destroyed even though the
772
          # removal could lead to data loss.
773
          args:
774
            - "ceph"
775
            - "osd"
776
            - "remove"
777
            - "--preserve-pvc"
778
            - "false"
779
            - "--force-osd-removal"
780
            - "false"
781
            - "--osd-ids"
782
            - "SETTHEOSDIDHERE"
783
          env:
784
            - name: POD_NAMESPACE
785
              valueFrom:
786
                fieldRef:
787
                  fieldPath: metadata.namespace
788
            - name: ROOK_MON_ENDPOINTS
789
              valueFrom:
790
                configMapKeyRef:
791
                  key: data
792
                  name: rook-ceph-mon-endpoints
793
            - name: ROOK_CEPH_USERNAME
794
              valueFrom:
795
                secretKeyRef:
796
                  key: ceph-username
797
                  name: rook-ceph-mon
798
            - name: ROOK_CEPH_SECRET
799
              valueFrom:
800
                secretKeyRef:
801
                  key: ceph-secret
802
                  name: rook-ceph-mon
803
            - name: ROOK_CONFIG_DIR
804
              value: /var/lib/rook
805
            - name: ROOK_CEPH_CONFIG_OVERRIDE
806
              value: /etc/rook/config/override.conf
807
            - name: ROOK_FSID
808
              valueFrom:
809
                secretKeyRef:
810
                  key: fsid
811
                  name: rook-ceph-mon
812
            - name: ROOK_LOG_LEVEL
813
              value: DEBUG
814
          volumeMounts:
815
            - mountPath: /etc/ceph
816
              name: ceph-conf-emptydir
817
            - mountPath: /var/lib/rook
818
              name: rook-config
819
      volumes:
820
        - emptyDir: {}
821
          name: ceph-conf-emptydir
822
        - emptyDir: {}
823
          name: rook-config
824
      restartPolicy: Never
825
826
827 99 Nico Schottelius
</pre>
828
829
Deleting the deployment:
830
831
<pre>
832
[18:05] bridge:~% kubectl -n rook-ceph delete deployment rook-ceph-osd-6
833
deployment.apps "rook-ceph-osd-6" deleted
834 98 Nico Schottelius
</pre>
835
836 145 Nico Schottelius
h2. Ingress + Cert Manager
837
838
* We deploy "nginx-ingress":https://docs.nginx.com/nginx-ingress-controller/ to get an ingress
839
* we deploy "cert-manager":https://cert-manager.io/ to handle certificates
840
* We independently deploy @ClusterIssuer@ to allow the cert-manager app to deploy and the issuer to be created once the CRDs from cert manager are in place
841
842
h3. IPv4 reachability 
843
844
The ingress is by default IPv6 only. To make it reachable from the IPv4 world, get its IPv6 address and configure a NAT64 mapping in Jool.
845
846
Steps:
847
848
h4. Get the ingress IPv6 address
849
850
Use @kubectl -n ingress-nginx get svc ingress-nginx-controller -o jsonpath='{.spec.clusterIP}'; echo ''@
851
852
Example:
853
854
<pre>
855
kubectl -n ingress-nginx get svc ingress-nginx-controller -o jsonpath='{.spec.clusterIP}'; echo ''
856
2a0a:e5c0:10:1b::ce11
857
</pre>
858
859
h4. Add NAT64 mapping
860
861
* Update the __dcl_jool_siit cdist type
862
* Record the two IPs (IPv6 and IPv4)
863
* Configure all routers
864
865
866
h4. Add DNS record
867
868
To use the ingress capable as a CNAME destination, create an "ingress" DNS record, such as:
869
870
<pre>
871
; k8s ingress for dev
872
dev-ingress                 AAAA 2a0a:e5c0:10:1b::ce11
873
dev-ingress                 A 147.78.194.23
874
875
</pre> 
876
877
h4. Add supporting wildcard DNS
878
879
If you plan to add various sites under a specific domain, we can add a wildcard DNS entry, such as *.k8s-dev.django-hosting.ch:
880
881
<pre>
882
*.k8s-dev         CNAME dev-ingress.ungleich.ch.
883
</pre>
884
885 76 Nico Schottelius
h2. Harbor
886
887
* We user "Harbor":https://goharbor.io/ for caching and as an image registry. Internal app reference: apps/prod/harbor.
888
* The admin password is in the password store, auto generated per cluster
889
* At the moment harbor only authenticates against the internal ldap tree
890
891
h3. LDAP configuration
892
893
* The url needs to be ldaps://...
894
* uid = uid
895
* rest standard
896 75 Nico Schottelius
897 89 Nico Schottelius
h2. Monitoring / Prometheus
898
899 90 Nico Schottelius
* Via "kube-prometheus":https://github.com/prometheus-operator/kube-prometheus/
900 89 Nico Schottelius
901 91 Nico Schottelius
Access via ...
902
903
* http://prometheus-k8s.monitoring.svc:9090
904
* http://grafana.monitoring.svc:3000
905
* http://alertmanager.monitoring.svc:9093
906
907
908 100 Nico Schottelius
h3. Prometheus Options
909
910
* "helm/kube-prometheus-stack":https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack
911
** Includes dashboards and co.
912
* "manifest based kube-prometheus":https://github.com/prometheus-operator/kube-prometheus
913
** Includes dashboards and co.
914
* "Prometheus Operator (mainly CRD manifest":https://github.com/prometheus-operator/prometheus-operator
915
916 91 Nico Schottelius
917 82 Nico Schottelius
h2. Nextcloud
918
919 85 Nico Schottelius
h3. How to get the nextcloud credentials 
920 84 Nico Schottelius
921
* The initial username is set to "nextcloud"
922
* The password is autogenerated and saved in a kubernetes secret
923
924
<pre>
925 85 Nico Schottelius
kubectl get secret RELEASENAME-nextcloud -o jsonpath="{.data.PASSWORD}" | base64 -d; echo "" 
926 84 Nico Schottelius
</pre>
927
928 83 Nico Schottelius
h3. How to fix "Access through untrusted domain"
929
930 82 Nico Schottelius
* Nextcloud stores the initial domain configuration
931 1 Nico Schottelius
* If the FQDN is changed, it will show the error message "Access through untrusted domain"
932 82 Nico Schottelius
* To fix, edit /var/www/html/config/config.php and correct the domain
933 83 Nico Schottelius
* Then delete the pods
934 82 Nico Schottelius
935 1 Nico Schottelius
h2. Infrastructure versions
936 35 Nico Schottelius
937 57 Nico Schottelius
h3. ungleich kubernetes infrastructure v5 (2021-10)
938 1 Nico Schottelius
939 57 Nico Schottelius
Clusters are configured / setup in this order:
940
941
* Bootstrap via kubeadm
942 59 Nico Schottelius
* "Networking via calico + BGP (non ECMP) using helm":https://docs.projectcalico.org/getting-started/kubernetes/helm
943
* "ArgoCD for CD":https://argo-cd.readthedocs.io/en/stable/
944
** "rook for storage via argocd":https://rook.io/
945 58 Nico Schottelius
** haproxy for in IPv6-cluster-IPv4-to-IPv6 proxy via argocd
946
** "kubernetes-secret-generator for in cluster secrets":https://github.com/mittwald/kubernetes-secret-generator
947
** "ungleich-certbot managing certs and nginx":https://hub.docker.com/repository/docker/ungleich/ungleich-certbot
948
949 57 Nico Schottelius
950
h3. ungleich kubernetes infrastructure v4 (2021-09)
951
952 54 Nico Schottelius
* rook is configured via manifests instead of using the rook-ceph-cluster helm chart
953 1 Nico Schottelius
* The rook operator is still being installed via helm
954 35 Nico Schottelius
955 57 Nico Schottelius
h3. ungleich kubernetes infrastructure v3 (2021-07)
956 1 Nico Schottelius
957 10 Nico Schottelius
* rook is now installed via helm via argocd instead of directly via manifests
958 28 Nico Schottelius
959 57 Nico Schottelius
h3. ungleich kubernetes infrastructure v2 (2021-05)
960 28 Nico Schottelius
961
* Replaced fluxv2 from ungleich k8s v1 with argocd
962 1 Nico Schottelius
** argocd can apply helm templates directly without needing to go through Chart releases
963 28 Nico Schottelius
* We are also using argoflow for build flows
964
* Planned to add "kaniko":https://github.com/GoogleContainerTools/kaniko for image building
965
966 57 Nico Schottelius
h3. ungleich kubernetes infrastructure v1 (2021-01)
967 28 Nico Schottelius
968
We are using the following components:
969
970
* "Calico as a CNI":https://www.projectcalico.org/ with BGP, IPv6 only, no encapsulation
971
** Needed for basic networking
972
* "kubernetes-secret-generator":https://github.com/mittwald/kubernetes-secret-generator for creating secrets
973
** Needed so that secrets are not stored in the git repository, but only in the cluster
974
* "ungleich-certbot":https://hub.docker.com/repository/docker/ungleich/ungleich-certbot
975
** Needed to get letsencrypt certificates for services
976
* "rook with ceph rbd + cephfs":https://rook.io/ for storage
977
** rbd for almost everything, *ReadWriteOnce*
978
** cephfs for smaller things, multi access *ReadWriteMany*
979
** Needed for providing persistent storage
980
* "flux v2":https://fluxcd.io/
981
** Needed to manage resources automatically