Project

General

Profile

The ungleich kubernetes infrastructure » History » Version 146

Nico Schottelius, 09/30/2022 09:35 PM

1 22 Nico Schottelius
h1. The ungleich kubernetes infrastructure and ungleich kubernetes manual
2 1 Nico Schottelius
3 3 Nico Schottelius
{{toc}}
4
5 1 Nico Schottelius
h2. Status
6
7 28 Nico Schottelius
This document is **pre-production**.
8
This document is to become the ungleich kubernetes infrastructure overview as well as the ungleich kubernetes manual.
9 1 Nico Schottelius
10 10 Nico Schottelius
h2. k8s clusters
11
12 123 Nico Schottelius
| Cluster            | Purpose/Setup     | Maintainer | Master(s)                     | argo                                                   | v4 http proxy | last verified |
13
| c0.k8s.ooo         | Dev               | -          | UNUSED                        |                                                        |               |    2021-10-05 |
14
| c1.k8s.ooo         | retired           |            | -                             |                                                        |               |    2022-03-15 |
15
| c2.k8s.ooo         | Dev p7 HW         | Nico       | server47 server53 server54    | "argo":https://argocd-server.argocd.svc.c2.k8s.ooo     |               |    2021-10-05 |
16
| c3.k8s.ooo         | retired           | -          | -                             |                                                        |               |    2021-10-05 |
17
| c4.k8s.ooo         | Dev2 p7 HW        | Jin-Guk    | server52 server53 server54    |                                                        |               |             - |
18
| c5.k8s.ooo         | retired           |            | -                             |                                                        |               |    2022-03-15 |
19
| c6.k8s.ooo         | Dev p6 VM Jin-Guk | Jin-Guk    |                               |                                                        |               |               |
20
| [[p5.k8s.ooo]]     | production        |            | server34 server36 server38    | "argo":https://argocd-server.argocd.svc.p5.k8s.ooo     | -             |               |
21
| [[p5-cow.k8s.ooo]] | production        | Nico       | server47 server51 server55    | "argo":https://argocd-server.argocd.svc.p5-cow.k8s.ooo |               |    2022-08-27 |
22
| [[p6.k8s.ooo]]     | production        |            | server67 server69 server71    | "argo":https://argocd-server.argocd.svc.p6.k8s.ooo     | 147.78.194.13 |    2021-10-05 |
23
| [[p10.k8s.ooo]]    | production        |            | server63 server65 server83    | "argo":https://argocd-server.argocd.svc.p10.k8s.ooo    | 147.78.194.12 |    2021-10-05 |
24
| [[k8s.ge.nau.so]]  | development       |            | server107 server108 server109 | "argo":https://argocd-server.argocd.svc.k8s.ge.nau.so  |               |               |
25
| [[dev.k8s.ooo]]    | development       |            | server110 server111 server112 | "argo":https://argocd-server.argocd.svc.dev.k8s.ooo    | -             |    2022-07-08 |
26 142 Nico Schottelius
| [[server121.k8s.ooo]] | production | Nico | server121 | | | 2022-09-06 |
27 21 Nico Schottelius
28 1 Nico Schottelius
h2. General architecture and components overview
29
30
* All k8s clusters are IPv6 only
31
* We use BGP peering to propagate podcidr and serviceCidr networks to our infrastructure
32
* The main public testing repository is "ungleich-k8s":https://code.ungleich.ch/ungleich-public/ungleich-k8s
33 18 Nico Schottelius
** Private configurations are found in the **k8s-config** repository
34 1 Nico Schottelius
35
h3. Cluster types
36
37 28 Nico Schottelius
| **Type/Feature**            | **Development**                | **Production**         |
38
| Min No. nodes               | 3 (1 master, 3 worker)         | 5 (3 master, 3 worker) |
39
| Recommended minimum         | 4 (dedicated master, 3 worker) | 8 (3 master, 5 worker) |
40
| Separation of control plane | optional                       | recommended            |
41
| Persistent storage          | required                       | required               |
42
| Number of storage monitors  | 3                              | 5                      |
43 1 Nico Schottelius
44 43 Nico Schottelius
h2. General k8s operations
45 1 Nico Schottelius
46 46 Nico Schottelius
h3. Cheat sheet / external great references
47
48
* "kubectl cheatsheet":https://kubernetes.io/docs/reference/kubectl/cheatsheet/
49
50 117 Nico Schottelius
h3. Allowing to schedule work on the control plane / removing node taints
51 69 Nico Schottelius
52
* Mostly for single node / test / development clusters
53
* Just remove the master taint as follows
54
55
<pre>
56
kubectl taint nodes --all node-role.kubernetes.io/master-
57 118 Nico Schottelius
kubectl taint nodes --all node-role.kubernetes.io/control-plane-
58 69 Nico Schottelius
</pre>
59 1 Nico Schottelius
60 117 Nico Schottelius
You can check the node taints using @kubectl describe node ...@
61 69 Nico Schottelius
62 44 Nico Schottelius
h3. Get the cluster admin.conf
63
64
* On the masters of each cluster you can find the file @/etc/kubernetes/admin.conf@
65
* To be able to administrate the cluster you can copy the admin.conf to your local machine
66
* Multi cluster debugging can very easy if you name the config ~/cX-admin.conf (see example below)
67
68
<pre>
69
% scp root@server47.place7.ungleich.ch:/etc/kubernetes/admin.conf ~/c2-admin.conf
70
% export KUBECONFIG=~/c2-admin.conf    
71
% kubectl get nodes
72
NAME       STATUS                     ROLES                  AGE   VERSION
73
server47   Ready                      control-plane,master   82d   v1.22.0
74
server48   Ready                      control-plane,master   82d   v1.22.0
75
server49   Ready                      <none>                 82d   v1.22.0
76
server50   Ready                      <none>                 82d   v1.22.0
77
server59   Ready                      control-plane,master   82d   v1.22.0
78
server60   Ready,SchedulingDisabled   <none>                 82d   v1.22.0
79
server61   Ready                      <none>                 82d   v1.22.0
80
server62   Ready                      <none>                 82d   v1.22.0               
81
</pre>
82
83 18 Nico Schottelius
h3. Installing a new k8s cluster
84 8 Nico Schottelius
85 9 Nico Schottelius
* Decide on the cluster name (usually *cX.k8s.ooo*), X counting upwards
86 28 Nico Schottelius
** Using pXX.k8s.ooo for production clusters of placeXX
87 9 Nico Schottelius
* Use cdist to configure the nodes with requirements like crio
88
* Decide between single or multi node control plane setups (see below)
89 28 Nico Schottelius
** Single control plane suitable for development clusters
90 9 Nico Schottelius
91 28 Nico Schottelius
Typical init procedure:
92 9 Nico Schottelius
93 28 Nico Schottelius
* Single control plane: @kubeadm init --config bootstrap/XXX/kubeadm.yaml@
94
* Multi control plane (HA): @kubeadm init --config bootstrap/XXX/kubeadm.yaml --upload-certs@
95 10 Nico Schottelius
96 29 Nico Schottelius
h3. Deleting a pod that is hanging in terminating state
97
98
<pre>
99
kubectl delete pod <PODNAME> --grace-period=0 --force --namespace <NAMESPACE>
100
</pre>
101
102
(from https://stackoverflow.com/questions/35453792/pods-stuck-in-terminating-status)
103
104 42 Nico Schottelius
h3. Listing nodes of a cluster
105
106
<pre>
107
[15:05] bridge:~% kubectl get nodes
108
NAME       STATUS   ROLES                  AGE   VERSION
109
server22   Ready    <none>                 52d   v1.22.0
110
server23   Ready    <none>                 52d   v1.22.2
111
server24   Ready    <none>                 52d   v1.22.0
112
server25   Ready    <none>                 52d   v1.22.0
113
server26   Ready    <none>                 52d   v1.22.0
114
server27   Ready    <none>                 52d   v1.22.0
115
server63   Ready    control-plane,master   52d   v1.22.0
116
server64   Ready    <none>                 52d   v1.22.0
117
server65   Ready    control-plane,master   52d   v1.22.0
118
server66   Ready    <none>                 52d   v1.22.0
119
server83   Ready    control-plane,master   52d   v1.22.0
120
server84   Ready    <none>                 52d   v1.22.0
121
server85   Ready    <none>                 52d   v1.22.0
122
server86   Ready    <none>                 52d   v1.22.0
123
</pre>
124
125 41 Nico Schottelius
h3. Removing / draining a node
126
127
Usually @kubectl drain server@ should do the job, but sometimes we need to be more aggressive:
128
129 1 Nico Schottelius
<pre>
130 103 Nico Schottelius
kubectl drain --delete-emptydir-data --ignore-daemonsets serverXX
131 42 Nico Schottelius
</pre>
132
133
h3. Readding a node after draining
134
135
<pre>
136
kubectl uncordon serverXX
137 1 Nico Schottelius
</pre>
138 43 Nico Schottelius
139 50 Nico Schottelius
h3. (Re-)joining worker nodes after creating the cluster
140 49 Nico Schottelius
141
* We need to have an up-to-date token
142
* We use different join commands for the workers and control plane nodes
143
144
Generating the join command on an existing control plane node:
145
146
<pre>
147
kubeadm token create --print-join-command
148
</pre>
149
150 50 Nico Schottelius
h3. (Re-)joining control plane nodes after creating the cluster
151 1 Nico Schottelius
152 50 Nico Schottelius
* We generate the token again
153
* We upload the certificates
154
* We need to combine/create the join command for the control plane node
155
156
Example session:
157
158
<pre>
159
% kubeadm token create --print-join-command
160
kubeadm join p10-api.k8s.ooo:6443 --token xmff4i.ABC --discovery-token-ca-cert-hash sha256:longhash 
161
162
% kubeadm init phase upload-certs --upload-certs
163
[upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
164
[upload-certs] Using certificate key:
165
CERTKEY
166
167
# Then we use these two outputs on the joining node:
168
169
kubeadm join p10-api.k8s.ooo:6443 --token xmff4i.ABC --discovery-token-ca-cert-hash sha256:longhash --control-plane --certificate-key CERTKEY
170
</pre>
171
172
Commands to be used on a control plane node:
173
174
<pre>
175
kubeadm token create --print-join-command
176
kubeadm init phase upload-certs --upload-certs
177
</pre>
178
179
Commands to be used on the joining node:
180
181
<pre>
182
JOINCOMMAND --control-plane --certificate-key CERTKEY
183
</pre>
184 49 Nico Schottelius
185 51 Nico Schottelius
SEE ALSO
186
187
* https://stackoverflow.com/questions/63936268/how-to-generate-kubeadm-token-for-secondary-control-plane-nodes
188
* https://blog.scottlowe.org/2019/08/15/reconstructing-the-join-command-for-kubeadm/
189
190 53 Nico Schottelius
h3. How to fix etcd does not start when rejoining a kubernetes cluster as a control plane
191 52 Nico Schottelius
192
If during the above step etcd does not come up, @kubeadm join@ can hang as follows:
193
194
<pre>
195
[control-plane] Creating static Pod manifest for "kube-apiserver"                                                              
196
[control-plane] Creating static Pod manifest for "kube-controller-manager"                                                     
197
[control-plane] Creating static Pod manifest for "kube-scheduler"                                                              
198
[check-etcd] Checking that the etcd cluster is healthy                                                                         
199
error execution phase check-etcd: etcd cluster is not healthy: failed to dial endpoint https://[2a0a:e5c0:10:1:225:b3ff:fe20:37
200
8a]:2379 with maintenance client: context deadline exceeded                                                                    
201
To see the stack trace of this error execute with --v=5 or higher         
202
</pre>
203
204
Then the problem is likely that the etcd server is still a member of the cluster. We first need to remove it from the etcd cluster and then the join works.
205
206
To fix this we do:
207
208
* Find a working etcd pod
209
* Find the etcd members / member list
210
* Remove the etcd member that we want to re-join the cluster
211
212
213
<pre>
214
# Find the etcd pods
215
kubectl -n kube-system get pods -l component=etcd,tier=control-plane
216
217
# Get the list of etcd servers with the member id 
218
kubectl exec -n kube-system -ti ETCDPODNAME -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert  /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member list
219
220
# Remove the member
221
kubectl exec -n kube-system -ti ETCDPODNAME -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert  /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member remove MEMBERID
222
</pre>
223
224
Sample session:
225
226
<pre>
227
[10:48] line:~% kubectl -n kube-system get pods -l component=etcd,tier=control-plane
228
NAME            READY   STATUS    RESTARTS     AGE
229
etcd-server63   1/1     Running   0            3m11s
230
etcd-server65   1/1     Running   3            7d2h
231
etcd-server83   1/1     Running   8 (6d ago)   7d2h
232
[10:48] line:~% kubectl exec -n kube-system -ti etcd-server65 -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert  /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member list
233
356891cd676df6e4, started, server65, https://[2a0a:e5c0:10:1:225:b3ff:fe20:375c]:2380, https://[2a0a:e5c0:10:1:225:b3ff:fe20:375c]:2379, false
234
371b8a07185dee7e, started, server63, https://[2a0a:e5c0:10:1:225:b3ff:fe20:378a]:2380, https://[2a0a:e5c0:10:1:225:b3ff:fe20:378a]:2379, false
235
5942bc58307f8af9, started, server83, https://[2a0a:e5c0:10:1:3e4a:92ff:fe79:bb98]:2380, https://[2a0a:e5c0:10:1:3e4a:92ff:fe79:bb98]:2379, false
236
237
[10:48] line:~% kubectl exec -n kube-system -ti etcd-server65 -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert  /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member remove 371b8a07185dee7e
238
Member 371b8a07185dee7e removed from cluster e3c0805f592a8f77
239 1 Nico Schottelius
240
</pre>
241
242
SEE ALSO
243
244
* We found the solution using https://stackoverflow.com/questions/67921552/re-installed-node-cannot-join-kubernetes-cluster
245 56 Nico Schottelius
246 101 Nico Schottelius
h3. Hardware Maintenance using ungleich-hardware
247
248
Use the following manifest and replace the HOST with the actual host:
249
250
<pre>
251
apiVersion: v1
252
kind: Pod
253
metadata:
254
  name: ungleich-hardware-HOST
255
spec:
256
  containers:
257
  - name: ungleich-hardware
258
    image: ungleich/ungleich-hardware:0.0.5
259
    args:
260
    - sleep
261
    - "1000000"
262
    volumeMounts:
263
      - mountPath: /dev
264
        name: dev
265
    securityContext:
266
      privileged: true
267
  nodeSelector:
268
    kubernetes.io/hostname: "HOST"
269
270
  volumes:
271
    - name: dev
272
      hostPath:
273
        path: /dev
274
</pre>
275
276 102 Nico Schottelius
Also see: [[The_ungleich_hardware_maintenance_guide]]
277
278 105 Nico Schottelius
h3. Triggering a cronjob / creating a job from a cronjob
279 104 Nico Schottelius
280
To test a cronjob, we can create a job from a cronjob:
281
282
<pre>
283
kubectl create job --from=cronjob/volume2-daily-backup volume2-manual
284
</pre>
285
286
This creates a job volume2-manual based on the cronjob  volume2-daily
287
288 112 Nico Schottelius
h3. su-ing into a user that has nologin shell set
289
290
Many times users are having nologin as their shell inside the container. To be able to execute maintenance commands within the
291
container, we can use @su -s /bin/sh@ like this:
292
293
<pre>
294
su -s /bin/sh -c '/path/to/your/script' testuser
295
</pre>
296
297
Found on https://serverfault.com/questions/351046/how-to-run-command-as-user-who-has-usr-sbin-nologin-as-shell
298
299 113 Nico Schottelius
h3. How to print a secret value
300
301
Assuming you want the "password" item from a secret, use:
302
303
<pre>
304
kubectl get secret SECRETNAME -o jsonpath="{.data.password}" | base64 -d; echo "" 
305
</pre>
306
307 62 Nico Schottelius
h2. Calico CNI
308
309
h3. Calico Installation
310
311
* We install "calico using helm":https://docs.projectcalico.org/getting-started/kubernetes/helm
312
* This has the following advantages:
313
** Easy to upgrade
314
** Does not require os to configure IPv6/dual stack settings as the tigera operator figures out things on its own
315
316
Usually plain calico can be installed directly using:
317
318
<pre>
319 125 Nico Schottelius
VERSION=v3.23.3
320 120 Nico Schottelius
helm repo add projectcalico https://docs.projectcalico.org/charts
321 124 Nico Schottelius
helm upgrade --install --namespace tigera calico projectcalico/tigera-operator --version $VERSION --create-namespace
322 1 Nico Schottelius
</pre>
323 92 Nico Schottelius
324
* Check the tags on https://github.com/projectcalico/calico/tags for the latest release
325 62 Nico Schottelius
326
h3. Installing calicoctl
327
328 115 Nico Schottelius
* General installation instructions, including binary download: https://projectcalico.docs.tigera.io/maintenance/clis/calicoctl/install
329
330 62 Nico Schottelius
To be able to manage and configure calico, we need to 
331
"install calicoctl (we choose the version as a pod)":https://docs.projectcalico.org/getting-started/clis/calicoctl/install#install-calicoctl-as-a-kubernetes-pod
332
333
<pre>
334
kubectl apply -f https://docs.projectcalico.org/manifests/calicoctl.yaml
335
</pre>
336
337 93 Nico Schottelius
Or version specific:
338
339
<pre>
340
kubectl apply -f https://github.com/projectcalico/calico/blob/v3.20.4/manifests/calicoctl.yaml
341 97 Nico Schottelius
342
# For 3.22
343
kubectl apply -f https://projectcalico.docs.tigera.io/archive/v3.22/manifests/calicoctl.yaml
344 93 Nico Schottelius
</pre>
345
346 70 Nico Schottelius
And making it easier accessible by alias:
347
348
<pre>
349
alias calicoctl="kubectl exec -i -n kube-system calicoctl -- /calicoctl"
350
</pre>
351
352 62 Nico Schottelius
h3. Calico configuration
353
354 63 Nico Schottelius
By default our k8s clusters "BGP peer":https://docs.projectcalico.org/networking/bgp
355
with an upstream router to propagate podcidr and servicecidr.
356 62 Nico Schottelius
357
Default settings in our infrastructure:
358
359
* We use a full-mesh using the @nodeToNodeMeshEnabled: true@ option
360
* We keep the original next hop so that *only* the server with the pod is announcing it (instead of ecmp)
361 1 Nico Schottelius
* We use private ASNs for k8s clusters
362 63 Nico Schottelius
* We do *not* use any overlay
363 62 Nico Schottelius
364
After installing calico and calicoctl the last step of the installation is usually:
365
366 1 Nico Schottelius
<pre>
367 79 Nico Schottelius
calicoctl create -f - < calico-bgp.yaml
368 62 Nico Schottelius
</pre>
369
370
371
A sample BGP configuration:
372
373
<pre>
374
---
375
apiVersion: projectcalico.org/v3
376
kind: BGPConfiguration
377
metadata:
378
  name: default
379
spec:
380
  logSeverityScreen: Info
381
  nodeToNodeMeshEnabled: true
382
  asNumber: 65534
383
  serviceClusterIPs:
384
  - cidr: 2a0a:e5c0:10:3::/108
385
  serviceExternalIPs:
386
  - cidr: 2a0a:e5c0:10:3::/108
387
---
388
apiVersion: projectcalico.org/v3
389
kind: BGPPeer
390
metadata:
391
  name: router1-place10
392
spec:
393
  peerIP: 2a0a:e5c0:10:1::50
394
  asNumber: 213081
395
  keepOriginalNextHop: true
396
</pre>
397
398 126 Nico Schottelius
h2. Cilium CNI (experimental)
399
400 137 Nico Schottelius
h3. Status
401
402 138 Nico Schottelius
*NO WORKING CILIUM CONFIGURATION FOR IPV6 only modes*
403 137 Nico Schottelius
404 146 Nico Schottelius
h3. Latest error
405
406
It seems cilium does not run on IPv6 only hosts:
407
408
<pre>
409
level=info msg="Validating configured node address ranges" subsys=daemon
410
level=fatal msg="postinit failed" error="external IPv4 node address could not be derived, please configure via --ipv4-node" subsys=daemon
411
level=info msg="Starting IP identity watcher" subsys=ipcache
412
</pre>
413
414
It crashes after that log entry
415
416 128 Nico Schottelius
h3. BGP configuration
417
418
* The cilium-operator will not start without a correct configmap being present beforehand (see error message below)
419
* Creating the bgp config beforehand as a configmap is thus required.
420
421
The error one gets without the configmap present:
422
423
Pods are hanging with:
424
425
<pre>
426
cilium-bpqm6                       0/1     Init:0/4            0             9s
427
cilium-operator-5947d94f7f-5bmh2   0/1     ContainerCreating   0             9s
428
</pre>
429
430
The error message in the cilium-*perator is:
431
432
<pre>
433
Events:
434
  Type     Reason       Age                From               Message
435
  ----     ------       ----               ----               -------
436
  Normal   Scheduled    80s                default-scheduler  Successfully assigned kube-system/cilium-operator-5947d94f7f-lqcsp to server56
437
  Warning  FailedMount  16s (x8 over 80s)  kubelet            MountVolume.SetUp failed for volume "bgp-config-path" : configmap "bgp-config" not found
438
</pre>
439
440
A correct bgp config looks like this:
441
442
<pre>
443
apiVersion: v1
444
kind: ConfigMap
445
metadata:
446
  name: bgp-config
447
  namespace: kube-system
448
data:
449
  config.yaml: |
450
    peers:
451
      - peer-address: 2a0a:e5c0::46
452
        peer-asn: 209898
453
        my-asn: 65533
454
      - peer-address: 2a0a:e5c0::47
455
        peer-asn: 209898
456
        my-asn: 65533
457
    address-pools:
458
      - name: default
459
        protocol: bgp
460
        addresses:
461
          - 2a0a:e5c0:0:14::/64
462
</pre>
463 127 Nico Schottelius
464
h3. Installation
465 130 Nico Schottelius
466 127 Nico Schottelius
Adding the repo
467 1 Nico Schottelius
<pre>
468 127 Nico Schottelius
469 129 Nico Schottelius
helm repo add cilium https://helm.cilium.io/
470 130 Nico Schottelius
helm repo update
471
</pre>
472 129 Nico Schottelius
473 135 Nico Schottelius
Installing + configuring cilium
474 129 Nico Schottelius
<pre>
475 130 Nico Schottelius
ipv6pool=2a0a:e5c0:0:14::/112
476 1 Nico Schottelius
477 146 Nico Schottelius
version=1.12.2
478 129 Nico Schottelius
479
helm upgrade --install cilium cilium/cilium --version $version \
480 1 Nico Schottelius
  --namespace kube-system \
481
  --set ipv4.enabled=false \
482
  --set ipv6.enabled=true \
483 146 Nico Schottelius
  --set enableIPv6Masquerade=false \
484
  --set bgpControlPlane.enabled=true 
485 1 Nico Schottelius
486 146 Nico Schottelius
#  --set ipam.operator.clusterPoolIPv6PodCIDRList=$ipv6pool
487
488
# Old style bgp?
489 136 Nico Schottelius
#   --set bgp.enabled=true --set bgp.announce.podCIDR=true \
490 127 Nico Schottelius
491
# Show possible configuration options
492
helm show values cilium/cilium
493
494 1 Nico Schottelius
</pre>
495 132 Nico Schottelius
496
Using a /64 for ipam.operator.clusterPoolIPv6PodCIDRList fails with:
497
498
<pre>
499
level=fatal msg="Unable to init cluster-pool allocator" error="unable to initialize IPv6 allocator New CIDR set failed; the node CIDR size is too big" subsys=cilium-operator-generic
500
</pre>
501
502 126 Nico Schottelius
503 1 Nico Schottelius
See also https://github.com/cilium/cilium/issues/20756
504 135 Nico Schottelius
505
Seems a /112 is actually working.
506
507
h3. Kernel modules
508
509
Cilium requires the following modules to be loaded on the host (not loaded by default):
510
511
<pre>
512 1 Nico Schottelius
modprobe  ip6table_raw
513
modprobe  ip6table_filter
514
</pre>
515 146 Nico Schottelius
516
h3. Interesting helm flags
517
518
* autoDirectNodeRoutes
519
* bgpControlPlane.enabled = true
520
521
h3. SEE ALSO
522
523
* https://docs.cilium.io/en/v1.12/helm-reference/
524 133 Nico Schottelius
525 122 Nico Schottelius
h2. ArgoCD 
526 56 Nico Schottelius
527 60 Nico Schottelius
h3. Argocd Installation
528 1 Nico Schottelius
529 116 Nico Schottelius
* See https://argo-cd.readthedocs.io/en/stable/
530
531 60 Nico Schottelius
As there is no configuration management present yet, argocd is installed using
532
533 1 Nico Schottelius
<pre>
534 60 Nico Schottelius
kubectl create namespace argocd
535 86 Nico Schottelius
536 96 Nico Schottelius
# Specific Version
537
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/v2.3.2/manifests/install.yaml
538 86 Nico Schottelius
539
# OR: latest stable
540 60 Nico Schottelius
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml
541 56 Nico Schottelius
</pre>
542 1 Nico Schottelius
543 116 Nico Schottelius
544 1 Nico Schottelius
545 60 Nico Schottelius
h3. Get the argocd credentials
546
547
<pre>
548
kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d; echo ""
549
</pre>
550 52 Nico Schottelius
551 87 Nico Schottelius
h3. Accessing argocd
552
553
In regular IPv6 clusters:
554
555
* Navigate to https://argocd-server.argocd.CLUSTERDOMAIN
556
557
In legacy IPv4 clusters
558
559
<pre>
560
kubectl --namespace argocd port-forward svc/argocd-server 8080:80
561
</pre>
562
563 88 Nico Schottelius
* Navigate to https://localhost:8080
564
565 68 Nico Schottelius
h3. Using the argocd webhook to trigger changes
566 67 Nico Schottelius
567
* To trigger changes post json https://argocd.example.com/api/webhook
568
569 72 Nico Schottelius
h3. Deploying an application
570
571
* Applications are deployed via git towards gitea (code.ungleich.ch) and then pulled by argo
572 73 Nico Schottelius
* Always include the *redmine-url* pointing to the (customer) ticket
573
** Also add the support-url if it exists
574 72 Nico Schottelius
575
Application sample
576
577
<pre>
578
apiVersion: argoproj.io/v1alpha1
579
kind: Application
580
metadata:
581
  name: gitea-CUSTOMER
582
  namespace: argocd
583
spec:
584
  destination:
585
    namespace: default
586
    server: 'https://kubernetes.default.svc'
587
  source:
588
    path: apps/prod/gitea
589
    repoURL: 'https://code.ungleich.ch/ungleich-intern/k8s-config.git'
590
    targetRevision: HEAD
591
    helm:
592
      parameters:
593
        - name: storage.data.storageClass
594
          value: rook-ceph-block-hdd
595
        - name: storage.data.size
596
          value: 200Gi
597
        - name: storage.db.storageClass
598
          value: rook-ceph-block-ssd
599
        - name: storage.db.size
600
          value: 10Gi
601
        - name: storage.letsencrypt.storageClass
602
          value: rook-ceph-block-hdd
603
        - name: storage.letsencrypt.size
604
          value: 50Mi
605
        - name: letsencryptStaging
606
          value: 'no'
607
        - name: fqdn
608
          value: 'code.verua.online'
609
  project: default
610
  syncPolicy:
611
    automated:
612
      prune: true
613
      selfHeal: true
614
  info:
615
    - name: 'redmine-url'
616
      value: 'https://redmine.ungleich.ch/issues/ISSUEID'
617
    - name: 'support-url'
618
      value: 'https://support.ungleich.ch/Ticket/Display.html?id=TICKETID'
619
</pre>
620
621 80 Nico Schottelius
h2. Helm related operations and conventions
622 55 Nico Schottelius
623 61 Nico Schottelius
We use helm charts extensively.
624
625
* In production, they are managed via argocd
626
* In development, helm chart can de developed and deployed manually using the helm utility.
627
628 55 Nico Schottelius
h3. Installing a helm chart
629
630
One can use the usual pattern of
631
632
<pre>
633
helm install <releasename> <chartdirectory>
634
</pre>
635
636
However often you want to reinstall/update when testing helm charts. The following pattern is "better", because it allows you to reinstall, if it is already installed:
637
638
<pre>
639
helm upgrade --install <releasename> <chartdirectory>
640 1 Nico Schottelius
</pre>
641 80 Nico Schottelius
642
h3. Naming services and deployments in helm charts [Application labels]
643
644
* We always have {{ .Release.Name }} to identify the current "instance"
645
* Deployments:
646
** use @app: <what it is>@, f.i. @app: nginx@, @app: postgres@, ...
647 81 Nico Schottelius
* See more about standard labels on
648
** https://kubernetes.io/docs/concepts/overview/working-with-objects/common-labels/
649
** https://helm.sh/docs/chart_best_practices/labels/
650 55 Nico Schottelius
651 139 Nico Schottelius
h2. Rook + Ceph
652
653
h3. Installation
654
655
* Usually directly via argocd
656
657
Manual steps:
658
659
<pre>
660
661
</pre>
662 43 Nico Schottelius
663 71 Nico Schottelius
h3. Executing ceph commands
664
665
Using the ceph-tools pod as follows:
666
667
<pre>
668
kubectl exec -n rook-ceph -ti $(kubectl -n rook-ceph get pods -l app=rook-ceph-tools -o jsonpath='{.items[*].metadata.name}') -- ceph -s
669
</pre>
670
671 43 Nico Schottelius
h3. Inspecting the logs of a specific server
672
673
<pre>
674
# Get the related pods
675
kubectl -n rook-ceph get pods -l app=rook-ceph-osd-prepare 
676
...
677
678
# Inspect the logs of a specific pod
679
kubectl -n rook-ceph logs -f rook-ceph-osd-prepare-server23--1-444qx
680
681 71 Nico Schottelius
</pre>
682
683
h3. Inspecting the logs of the rook-ceph-operator
684
685
<pre>
686
kubectl -n rook-ceph logs -f -l app=rook-ceph-operator
687 43 Nico Schottelius
</pre>
688
689 121 Nico Schottelius
h3. Restarting the rook operator
690
691
<pre>
692
kubectl -n rook-ceph delete pods  -l app=rook-ceph-operator
693
</pre>
694
695 43 Nico Schottelius
h3. Triggering server prepare / adding new osds
696
697
The rook-ceph-operator triggers/watches/creates pods to maintain hosts. To trigger a full "re scan", simply delete that pod:
698
699
<pre>
700
kubectl -n rook-ceph delete pods -l app=rook-ceph-operator
701
</pre>
702
703
This will cause all the @rook-ceph-osd-prepare-..@ jobs to be recreated and thus OSDs to be created, if new disks have been added.
704
705
h3. Removing an OSD
706
707
* See "Ceph OSD Management":https://rook.io/docs/rook/v1.7/ceph-osd-mgmt.html
708 77 Nico Schottelius
* More specifically: https://github.com/rook/rook/blob/release-1.7/cluster/examples/kubernetes/ceph/osd-purge.yaml
709 99 Nico Schottelius
* Then delete the related deployment
710 41 Nico Schottelius
711 98 Nico Schottelius
Set osd id in the osd-purge.yaml and apply it. OSD should be down before.
712
713
<pre>
714
apiVersion: batch/v1
715
kind: Job
716
metadata:
717
  name: rook-ceph-purge-osd
718
  namespace: rook-ceph # namespace:cluster
719
  labels:
720
    app: rook-ceph-purge-osd
721
spec:
722
  template:
723
    metadata:
724
      labels:
725
        app: rook-ceph-purge-osd
726
    spec:
727
      serviceAccountName: rook-ceph-purge-osd
728
      containers:
729
        - name: osd-removal
730
          image: rook/ceph:master
731
          # TODO: Insert the OSD ID in the last parameter that is to be removed
732
          # The OSD IDs are a comma-separated list. For example: "0" or "0,2".
733
          # If you want to preserve the OSD PVCs, set `--preserve-pvc true`.
734
          #
735
          # A --force-osd-removal option is available if the OSD should be destroyed even though the
736
          # removal could lead to data loss.
737
          args:
738
            - "ceph"
739
            - "osd"
740
            - "remove"
741
            - "--preserve-pvc"
742
            - "false"
743
            - "--force-osd-removal"
744
            - "false"
745
            - "--osd-ids"
746
            - "SETTHEOSDIDHERE"
747
          env:
748
            - name: POD_NAMESPACE
749
              valueFrom:
750
                fieldRef:
751
                  fieldPath: metadata.namespace
752
            - name: ROOK_MON_ENDPOINTS
753
              valueFrom:
754
                configMapKeyRef:
755
                  key: data
756
                  name: rook-ceph-mon-endpoints
757
            - name: ROOK_CEPH_USERNAME
758
              valueFrom:
759
                secretKeyRef:
760
                  key: ceph-username
761
                  name: rook-ceph-mon
762
            - name: ROOK_CEPH_SECRET
763
              valueFrom:
764
                secretKeyRef:
765
                  key: ceph-secret
766
                  name: rook-ceph-mon
767
            - name: ROOK_CONFIG_DIR
768
              value: /var/lib/rook
769
            - name: ROOK_CEPH_CONFIG_OVERRIDE
770
              value: /etc/rook/config/override.conf
771
            - name: ROOK_FSID
772
              valueFrom:
773
                secretKeyRef:
774
                  key: fsid
775
                  name: rook-ceph-mon
776
            - name: ROOK_LOG_LEVEL
777
              value: DEBUG
778
          volumeMounts:
779
            - mountPath: /etc/ceph
780
              name: ceph-conf-emptydir
781
            - mountPath: /var/lib/rook
782
              name: rook-config
783
      volumes:
784
        - emptyDir: {}
785
          name: ceph-conf-emptydir
786
        - emptyDir: {}
787
          name: rook-config
788
      restartPolicy: Never
789
790
791 99 Nico Schottelius
</pre>
792
793
Deleting the deployment:
794
795
<pre>
796
[18:05] bridge:~% kubectl -n rook-ceph delete deployment rook-ceph-osd-6
797
deployment.apps "rook-ceph-osd-6" deleted
798 98 Nico Schottelius
</pre>
799
800 145 Nico Schottelius
h2. Ingress + Cert Manager
801
802
* We deploy "nginx-ingress":https://docs.nginx.com/nginx-ingress-controller/ to get an ingress
803
* we deploy "cert-manager":https://cert-manager.io/ to handle certificates
804
* We independently deploy @ClusterIssuer@ to allow the cert-manager app to deploy and the issuer to be created once the CRDs from cert manager are in place
805
806
h3. IPv4 reachability 
807
808
The ingress is by default IPv6 only. To make it reachable from the IPv4 world, get its IPv6 address and configure a NAT64 mapping in Jool.
809
810
Steps:
811
812
h4. Get the ingress IPv6 address
813
814
Use @kubectl -n ingress-nginx get svc ingress-nginx-controller -o jsonpath='{.spec.clusterIP}'; echo ''@
815
816
Example:
817
818
<pre>
819
kubectl -n ingress-nginx get svc ingress-nginx-controller -o jsonpath='{.spec.clusterIP}'; echo ''
820
2a0a:e5c0:10:1b::ce11
821
</pre>
822
823
h4. Add NAT64 mapping
824
825
* Update the __dcl_jool_siit cdist type
826
* Record the two IPs (IPv6 and IPv4)
827
* Configure all routers
828
829
830
h4. Add DNS record
831
832
To use the ingress capable as a CNAME destination, create an "ingress" DNS record, such as:
833
834
<pre>
835
; k8s ingress for dev
836
dev-ingress                 AAAA 2a0a:e5c0:10:1b::ce11
837
dev-ingress                 A 147.78.194.23
838
839
</pre> 
840
841
h4. Add supporting wildcard DNS
842
843
If you plan to add various sites under a specific domain, we can add a wildcard DNS entry, such as *.k8s-dev.django-hosting.ch:
844
845
<pre>
846
*.k8s-dev         CNAME dev-ingress.ungleich.ch.
847
</pre>
848
849 76 Nico Schottelius
h2. Harbor
850
851
* We user "Harbor":https://goharbor.io/ for caching and as an image registry. Internal app reference: apps/prod/harbor.
852
* The admin password is in the password store, auto generated per cluster
853
* At the moment harbor only authenticates against the internal ldap tree
854
855
h3. LDAP configuration
856
857
* The url needs to be ldaps://...
858
* uid = uid
859
* rest standard
860 75 Nico Schottelius
861 89 Nico Schottelius
h2. Monitoring / Prometheus
862
863 90 Nico Schottelius
* Via "kube-prometheus":https://github.com/prometheus-operator/kube-prometheus/
864 89 Nico Schottelius
865 91 Nico Schottelius
Access via ...
866
867
* http://prometheus-k8s.monitoring.svc:9090
868
* http://grafana.monitoring.svc:3000
869
* http://alertmanager.monitoring.svc:9093
870
871
872 100 Nico Schottelius
h3. Prometheus Options
873
874
* "helm/kube-prometheus-stack":https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack
875
** Includes dashboards and co.
876
* "manifest based kube-prometheus":https://github.com/prometheus-operator/kube-prometheus
877
** Includes dashboards and co.
878
* "Prometheus Operator (mainly CRD manifest":https://github.com/prometheus-operator/prometheus-operator
879
880 91 Nico Schottelius
881 82 Nico Schottelius
h2. Nextcloud
882
883 85 Nico Schottelius
h3. How to get the nextcloud credentials 
884 84 Nico Schottelius
885
* The initial username is set to "nextcloud"
886
* The password is autogenerated and saved in a kubernetes secret
887
888
<pre>
889 85 Nico Schottelius
kubectl get secret RELEASENAME-nextcloud -o jsonpath="{.data.PASSWORD}" | base64 -d; echo "" 
890 84 Nico Schottelius
</pre>
891
892 83 Nico Schottelius
h3. How to fix "Access through untrusted domain"
893
894 82 Nico Schottelius
* Nextcloud stores the initial domain configuration
895 1 Nico Schottelius
* If the FQDN is changed, it will show the error message "Access through untrusted domain"
896 82 Nico Schottelius
* To fix, edit /var/www/html/config/config.php and correct the domain
897 83 Nico Schottelius
* Then delete the pods
898 82 Nico Schottelius
899 1 Nico Schottelius
h2. Infrastructure versions
900 35 Nico Schottelius
901 57 Nico Schottelius
h3. ungleich kubernetes infrastructure v5 (2021-10)
902 1 Nico Schottelius
903 57 Nico Schottelius
Clusters are configured / setup in this order:
904
905
* Bootstrap via kubeadm
906 59 Nico Schottelius
* "Networking via calico + BGP (non ECMP) using helm":https://docs.projectcalico.org/getting-started/kubernetes/helm
907
* "ArgoCD for CD":https://argo-cd.readthedocs.io/en/stable/
908
** "rook for storage via argocd":https://rook.io/
909 58 Nico Schottelius
** haproxy for in IPv6-cluster-IPv4-to-IPv6 proxy via argocd
910
** "kubernetes-secret-generator for in cluster secrets":https://github.com/mittwald/kubernetes-secret-generator
911
** "ungleich-certbot managing certs and nginx":https://hub.docker.com/repository/docker/ungleich/ungleich-certbot
912
913 57 Nico Schottelius
914
h3. ungleich kubernetes infrastructure v4 (2021-09)
915
916 54 Nico Schottelius
* rook is configured via manifests instead of using the rook-ceph-cluster helm chart
917 1 Nico Schottelius
* The rook operator is still being installed via helm
918 35 Nico Schottelius
919 57 Nico Schottelius
h3. ungleich kubernetes infrastructure v3 (2021-07)
920 1 Nico Schottelius
921 10 Nico Schottelius
* rook is now installed via helm via argocd instead of directly via manifests
922 28 Nico Schottelius
923 57 Nico Schottelius
h3. ungleich kubernetes infrastructure v2 (2021-05)
924 28 Nico Schottelius
925
* Replaced fluxv2 from ungleich k8s v1 with argocd
926 1 Nico Schottelius
** argocd can apply helm templates directly without needing to go through Chart releases
927 28 Nico Schottelius
* We are also using argoflow for build flows
928
* Planned to add "kaniko":https://github.com/GoogleContainerTools/kaniko for image building
929
930 57 Nico Schottelius
h3. ungleich kubernetes infrastructure v1 (2021-01)
931 28 Nico Schottelius
932
We are using the following components:
933
934
* "Calico as a CNI":https://www.projectcalico.org/ with BGP, IPv6 only, no encapsulation
935
** Needed for basic networking
936
* "kubernetes-secret-generator":https://github.com/mittwald/kubernetes-secret-generator for creating secrets
937
** Needed so that secrets are not stored in the git repository, but only in the cluster
938
* "ungleich-certbot":https://hub.docker.com/repository/docker/ungleich/ungleich-certbot
939
** Needed to get letsencrypt certificates for services
940
* "rook with ceph rbd + cephfs":https://rook.io/ for storage
941
** rbd for almost everything, *ReadWriteOnce*
942
** cephfs for smaller things, multi access *ReadWriteMany*
943
** Needed for providing persistent storage
944
* "flux v2":https://fluxcd.io/
945
** Needed to manage resources automatically