Project

General

Profile

The ungleich kubernetes infrastructure » History » Version 94

Nico Schottelius, 03/05/2022 11:32 AM

1 22 Nico Schottelius
h1. The ungleich kubernetes infrastructure and ungleich kubernetes manual
2 1 Nico Schottelius
3 3 Nico Schottelius
{{toc}}
4
5 1 Nico Schottelius
h2. Status
6
7 28 Nico Schottelius
This document is **pre-production**.
8
This document is to become the ungleich kubernetes infrastructure overview as well as the ungleich kubernetes manual.
9 1 Nico Schottelius
10 10 Nico Schottelius
h2. k8s clusters
11
12 78 Nico Schottelius
| Cluster         | Purpose/Setup     | Maintainer   | Master(s)                                                | argo                                                | rook | v4 http proxy | last verified |
13
| c0.k8s.ooo      | Dev               | -            | UNUSED                                                   |                                                     |      |               |    2021-10-05 |
14
| c1.k8s.ooo      | Dev p6 VM         | Nico         | 2a0a-e5c0-2-11-0-62ff-fe0b-1a3d.k8s-1.place6.ungleich.ch |                                                     |      |               |    2021-10-05 |
15
| c2.k8s.ooo      | Dev p7 HW         | Nico         | server47 server53 server54                               | "argo":https://argocd-server.argocd.svc.c2.k8s.ooo  | x    |               |    2021-10-05 |
16
| c3.k8s.ooo      | Test p7 PI        | -            | UNUSED                                                   |                                                     |      |               |    2021-10-05 |
17
| c4.k8s.ooo      | Dev2 p7 HW        | Fran/Jin-Guk | server52 server53 server54                               |                                                     |      |               |             - |
18
| c5.k8s.ooo      | Dev p6 VM Amal    | Nico/Amal    | 2a0a-e5c0-2-11-0-62ff-fe0b-1a46.k8s-1.place6.ungleich.ch |                                                     |      |               |               |
19
| c6.k8s.ooo      | Dev p6 VM Jin-Guk | Jin-Guk      |                                                          |                                                     |      |               |               |
20
| [[p5.k8s.ooo]]  | production        |              | server34 server36 server38                               | "argo":https://argocd-server.argocd.svc.p5.k8s.ooo  |      |             - |               |
21
| [[p6.k8s.ooo]]  | production        |              | server67 server69 server71                               | "argo":https://argocd-server.argocd.svc.p6.k8s.ooo  | x    | 147.78.194.13 |    2021-10-05 |
22
| [[p10.k8s.ooo]] | production        |              | server63 server65 server83                               | "argo":https://argocd-server.argocd.svc.p10.k8s.ooo | x    | 147.78.194.12 |    2021-10-05 |
23
24 21 Nico Schottelius
25 1 Nico Schottelius
h2. General architecture and components overview
26
27
* All k8s clusters are IPv6 only
28
* We use BGP peering to propagate podcidr and serviceCidr networks to our infrastructure
29
* The main public testing repository is "ungleich-k8s":https://code.ungleich.ch/ungleich-public/ungleich-k8s
30 18 Nico Schottelius
** Private configurations are found in the **k8s-config** repository
31 1 Nico Schottelius
32
h3. Cluster types
33
34 28 Nico Schottelius
| **Type/Feature**            | **Development**                | **Production**         |
35
| Min No. nodes               | 3 (1 master, 3 worker)         | 5 (3 master, 3 worker) |
36
| Recommended minimum         | 4 (dedicated master, 3 worker) | 8 (3 master, 5 worker) |
37
| Separation of control plane | optional                       | recommended            |
38
| Persistent storage          | required                       | required               |
39
| Number of storage monitors  | 3                              | 5                      |
40 1 Nico Schottelius
41 43 Nico Schottelius
h2. General k8s operations
42 1 Nico Schottelius
43 46 Nico Schottelius
h3. Cheat sheet / external great references
44
45
* "kubectl cheatsheet":https://kubernetes.io/docs/reference/kubectl/cheatsheet/
46
47 69 Nico Schottelius
h3. Allowing to schedule work on the control plane
48
49
* Mostly for single node / test / development clusters
50
* Just remove the master taint as follows
51
52
<pre>
53
kubectl taint nodes --all node-role.kubernetes.io/master-
54
</pre>
55
56
57 44 Nico Schottelius
h3. Get the cluster admin.conf
58
59
* On the masters of each cluster you can find the file @/etc/kubernetes/admin.conf@
60
* To be able to administrate the cluster you can copy the admin.conf to your local machine
61
* Multi cluster debugging can very easy if you name the config ~/cX-admin.conf (see example below)
62
63
<pre>
64
% scp root@server47.place7.ungleich.ch:/etc/kubernetes/admin.conf ~/c2-admin.conf
65
% export KUBECONFIG=~/c2-admin.conf    
66
% kubectl get nodes
67
NAME       STATUS                     ROLES                  AGE   VERSION
68
server47   Ready                      control-plane,master   82d   v1.22.0
69
server48   Ready                      control-plane,master   82d   v1.22.0
70
server49   Ready                      <none>                 82d   v1.22.0
71
server50   Ready                      <none>                 82d   v1.22.0
72
server59   Ready                      control-plane,master   82d   v1.22.0
73
server60   Ready,SchedulingDisabled   <none>                 82d   v1.22.0
74
server61   Ready                      <none>                 82d   v1.22.0
75
server62   Ready                      <none>                 82d   v1.22.0               
76
</pre>
77
78 18 Nico Schottelius
h3. Installing a new k8s cluster
79 8 Nico Schottelius
80 9 Nico Schottelius
* Decide on the cluster name (usually *cX.k8s.ooo*), X counting upwards
81 28 Nico Schottelius
** Using pXX.k8s.ooo for production clusters of placeXX
82 9 Nico Schottelius
* Use cdist to configure the nodes with requirements like crio
83
* Decide between single or multi node control plane setups (see below)
84 28 Nico Schottelius
** Single control plane suitable for development clusters
85 9 Nico Schottelius
86 28 Nico Schottelius
Typical init procedure:
87 9 Nico Schottelius
88 28 Nico Schottelius
* Single control plane: @kubeadm init --config bootstrap/XXX/kubeadm.yaml@
89
* Multi control plane (HA): @kubeadm init --config bootstrap/XXX/kubeadm.yaml --upload-certs@
90 10 Nico Schottelius
91 29 Nico Schottelius
h3. Deleting a pod that is hanging in terminating state
92
93
<pre>
94
kubectl delete pod <PODNAME> --grace-period=0 --force --namespace <NAMESPACE>
95
</pre>
96
97
(from https://stackoverflow.com/questions/35453792/pods-stuck-in-terminating-status)
98
99 42 Nico Schottelius
h3. Listing nodes of a cluster
100
101
<pre>
102
[15:05] bridge:~% kubectl get nodes
103
NAME       STATUS   ROLES                  AGE   VERSION
104
server22   Ready    <none>                 52d   v1.22.0
105
server23   Ready    <none>                 52d   v1.22.2
106
server24   Ready    <none>                 52d   v1.22.0
107
server25   Ready    <none>                 52d   v1.22.0
108
server26   Ready    <none>                 52d   v1.22.0
109
server27   Ready    <none>                 52d   v1.22.0
110
server63   Ready    control-plane,master   52d   v1.22.0
111
server64   Ready    <none>                 52d   v1.22.0
112
server65   Ready    control-plane,master   52d   v1.22.0
113
server66   Ready    <none>                 52d   v1.22.0
114
server83   Ready    control-plane,master   52d   v1.22.0
115
server84   Ready    <none>                 52d   v1.22.0
116
server85   Ready    <none>                 52d   v1.22.0
117
server86   Ready    <none>                 52d   v1.22.0
118
</pre>
119
120
121 41 Nico Schottelius
h3. Removing / draining a node
122
123
Usually @kubectl drain server@ should do the job, but sometimes we need to be more aggressive:
124
125
<pre>
126
kubectl drain --delete-emptydir-data --ignore-daemonsets server23
127 42 Nico Schottelius
</pre>
128
129
h3. Readding a node after draining
130
131
<pre>
132
kubectl uncordon serverXX
133 1 Nico Schottelius
</pre>
134 43 Nico Schottelius
135 50 Nico Schottelius
h3. (Re-)joining worker nodes after creating the cluster
136 49 Nico Schottelius
137
* We need to have an up-to-date token
138
* We use different join commands for the workers and control plane nodes
139
140
Generating the join command on an existing control plane node:
141
142
<pre>
143
kubeadm token create --print-join-command
144
</pre>
145
146 50 Nico Schottelius
h3. (Re-)joining control plane nodes after creating the cluster
147 1 Nico Schottelius
148 50 Nico Schottelius
* We generate the token again
149
* We upload the certificates
150
* We need to combine/create the join command for the control plane node
151
152
Example session:
153
154
<pre>
155
% kubeadm token create --print-join-command
156
kubeadm join p10-api.k8s.ooo:6443 --token xmff4i.ABC --discovery-token-ca-cert-hash sha256:longhash 
157
158
% kubeadm init phase upload-certs --upload-certs
159
[upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
160
[upload-certs] Using certificate key:
161
CERTKEY
162
163
# Then we use these two outputs on the joining node:
164
165
kubeadm join p10-api.k8s.ooo:6443 --token xmff4i.ABC --discovery-token-ca-cert-hash sha256:longhash --control-plane --certificate-key CERTKEY
166
</pre>
167
168
Commands to be used on a control plane node:
169
170
<pre>
171
kubeadm token create --print-join-command
172
kubeadm init phase upload-certs --upload-certs
173
</pre>
174
175
Commands to be used on the joining node:
176
177
<pre>
178
JOINCOMMAND --control-plane --certificate-key CERTKEY
179
</pre>
180 49 Nico Schottelius
181 51 Nico Schottelius
SEE ALSO
182
183
* https://stackoverflow.com/questions/63936268/how-to-generate-kubeadm-token-for-secondary-control-plane-nodes
184
* https://blog.scottlowe.org/2019/08/15/reconstructing-the-join-command-for-kubeadm/
185
186 53 Nico Schottelius
h3. How to fix etcd does not start when rejoining a kubernetes cluster as a control plane
187 52 Nico Schottelius
188
If during the above step etcd does not come up, @kubeadm join@ can hang as follows:
189
190
<pre>
191
[control-plane] Creating static Pod manifest for "kube-apiserver"                                                              
192
[control-plane] Creating static Pod manifest for "kube-controller-manager"                                                     
193
[control-plane] Creating static Pod manifest for "kube-scheduler"                                                              
194
[check-etcd] Checking that the etcd cluster is healthy                                                                         
195
error execution phase check-etcd: etcd cluster is not healthy: failed to dial endpoint https://[2a0a:e5c0:10:1:225:b3ff:fe20:37
196
8a]:2379 with maintenance client: context deadline exceeded                                                                    
197
To see the stack trace of this error execute with --v=5 or higher         
198
</pre>
199
200
Then the problem is likely that the etcd server is still a member of the cluster. We first need to remove it from the etcd cluster and then the join works.
201
202
To fix this we do:
203
204
* Find a working etcd pod
205
* Find the etcd members / member list
206
* Remove the etcd member that we want to re-join the cluster
207
208
209
<pre>
210
# Find the etcd pods
211
kubectl -n kube-system get pods -l component=etcd,tier=control-plane
212
213
# Get the list of etcd servers with the member id 
214
kubectl exec -n kube-system -ti ETCDPODNAME -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert  /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member list
215
216
# Remove the member
217
kubectl exec -n kube-system -ti ETCDPODNAME -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert  /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member remove MEMBERID
218
</pre>
219
220
Sample session:
221
222
<pre>
223
[10:48] line:~% kubectl -n kube-system get pods -l component=etcd,tier=control-plane
224
NAME            READY   STATUS    RESTARTS     AGE
225
etcd-server63   1/1     Running   0            3m11s
226
etcd-server65   1/1     Running   3            7d2h
227
etcd-server83   1/1     Running   8 (6d ago)   7d2h
228
[10:48] line:~% kubectl exec -n kube-system -ti etcd-server65 -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert  /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member list
229
356891cd676df6e4, started, server65, https://[2a0a:e5c0:10:1:225:b3ff:fe20:375c]:2380, https://[2a0a:e5c0:10:1:225:b3ff:fe20:375c]:2379, false
230
371b8a07185dee7e, started, server63, https://[2a0a:e5c0:10:1:225:b3ff:fe20:378a]:2380, https://[2a0a:e5c0:10:1:225:b3ff:fe20:378a]:2379, false
231
5942bc58307f8af9, started, server83, https://[2a0a:e5c0:10:1:3e4a:92ff:fe79:bb98]:2380, https://[2a0a:e5c0:10:1:3e4a:92ff:fe79:bb98]:2379, false
232
233
[10:48] line:~% kubectl exec -n kube-system -ti etcd-server65 -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert  /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member remove 371b8a07185dee7e
234
Member 371b8a07185dee7e removed from cluster e3c0805f592a8f77
235 1 Nico Schottelius
236
</pre>
237
238
SEE ALSO
239
240
* We found the solution using https://stackoverflow.com/questions/67921552/re-installed-node-cannot-join-kubernetes-cluster
241 56 Nico Schottelius
242 62 Nico Schottelius
h2. Calico CNI
243
244
h3. Calico Installation
245
246
* We install "calico using helm":https://docs.projectcalico.org/getting-started/kubernetes/helm
247
* This has the following advantages:
248
** Easy to upgrade
249
** Does not require os to configure IPv6/dual stack settings as the tigera operator figures out things on its own
250
251
Usually plain calico can be installed directly using:
252
253
<pre>
254
helm repo add projectcalico https://docs.projectcalico.org/charts
255 94 Nico Schottelius
helm install calico projectcalico/tigera-operator --version v3.20.4
256 1 Nico Schottelius
</pre>
257 92 Nico Schottelius
258
* Check the tags on https://github.com/projectcalico/calico/tags for the latest release
259 62 Nico Schottelius
260
h3. Installing calicoctl
261
262
To be able to manage and configure calico, we need to 
263
"install calicoctl (we choose the version as a pod)":https://docs.projectcalico.org/getting-started/clis/calicoctl/install#install-calicoctl-as-a-kubernetes-pod
264
265
<pre>
266
kubectl apply -f https://docs.projectcalico.org/manifests/calicoctl.yaml
267
</pre>
268
269 93 Nico Schottelius
Or version specific:
270
271
<pre>
272
kubectl apply -f https://github.com/projectcalico/calico/blob/v3.20.4/manifests/calicoctl.yaml
273
</pre>
274
275 70 Nico Schottelius
And making it easier accessible by alias:
276
277
<pre>
278
alias calicoctl="kubectl exec -i -n kube-system calicoctl -- /calicoctl"
279
</pre>
280
281 62 Nico Schottelius
h3. Calico configuration
282
283 63 Nico Schottelius
By default our k8s clusters "BGP peer":https://docs.projectcalico.org/networking/bgp
284
with an upstream router to propagate podcidr and servicecidr.
285 62 Nico Schottelius
286
Default settings in our infrastructure:
287
288
* We use a full-mesh using the @nodeToNodeMeshEnabled: true@ option
289
* We keep the original next hop so that *only* the server with the pod is announcing it (instead of ecmp)
290 1 Nico Schottelius
* We use private ASNs for k8s clusters
291 63 Nico Schottelius
* We do *not* use any overlay
292 62 Nico Schottelius
293
After installing calico and calicoctl the last step of the installation is usually:
294
295 1 Nico Schottelius
<pre>
296 79 Nico Schottelius
calicoctl create -f - < calico-bgp.yaml
297 62 Nico Schottelius
</pre>
298
299
300
A sample BGP configuration:
301
302
<pre>
303
---
304
apiVersion: projectcalico.org/v3
305
kind: BGPConfiguration
306
metadata:
307
  name: default
308
spec:
309
  logSeverityScreen: Info
310
  nodeToNodeMeshEnabled: true
311
  asNumber: 65534
312
  serviceClusterIPs:
313
  - cidr: 2a0a:e5c0:10:3::/108
314
  serviceExternalIPs:
315
  - cidr: 2a0a:e5c0:10:3::/108
316
---
317
apiVersion: projectcalico.org/v3
318
kind: BGPPeer
319
metadata:
320
  name: router1-place10
321
spec:
322
  peerIP: 2a0a:e5c0:10:1::50
323
  asNumber: 213081
324
  keepOriginalNextHop: true
325
</pre>
326
327 64 Nico Schottelius
h2. ArgoCD / ArgoWorkFlow
328 56 Nico Schottelius
329 60 Nico Schottelius
h3. Argocd Installation
330 1 Nico Schottelius
331 60 Nico Schottelius
As there is no configuration management present yet, argocd is installed using
332
333 1 Nico Schottelius
<pre>
334 60 Nico Schottelius
kubectl create namespace argocd
335 86 Nico Schottelius
336
# Version 2.2.3
337
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/v2.2.3/manifests/install.yaml
338
339
# OR: latest stable
340 60 Nico Schottelius
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml
341 1 Nico Schottelius
</pre>
342 56 Nico Schottelius
343 60 Nico Schottelius
* See https://argo-cd.readthedocs.io/en/stable/
344 1 Nico Schottelius
345 60 Nico Schottelius
h3. Get the argocd credentials
346
347
<pre>
348
kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d; echo ""
349
</pre>
350 52 Nico Schottelius
351 87 Nico Schottelius
h3. Accessing argocd
352
353
In regular IPv6 clusters:
354
355
* Navigate to https://argocd-server.argocd.CLUSTERDOMAIN
356
357
In legacy IPv4 clusters
358
359
<pre>
360
kubectl --namespace argocd port-forward svc/argocd-server 8080:80
361
</pre>
362
363 88 Nico Schottelius
* Navigate to https://localhost:8080
364
365 68 Nico Schottelius
h3. Using the argocd webhook to trigger changes
366 67 Nico Schottelius
367
* To trigger changes post json https://argocd.example.com/api/webhook
368
369 72 Nico Schottelius
h3. Deploying an application
370
371
* Applications are deployed via git towards gitea (code.ungleich.ch) and then pulled by argo
372 73 Nico Schottelius
* Always include the *redmine-url* pointing to the (customer) ticket
373
** Also add the support-url if it exists
374 72 Nico Schottelius
375
Application sample
376
377
<pre>
378
apiVersion: argoproj.io/v1alpha1
379
kind: Application
380
metadata:
381
  name: gitea-CUSTOMER
382
  namespace: argocd
383
spec:
384
  destination:
385
    namespace: default
386
    server: 'https://kubernetes.default.svc'
387
  source:
388
    path: apps/prod/gitea
389
    repoURL: 'https://code.ungleich.ch/ungleich-intern/k8s-config.git'
390
    targetRevision: HEAD
391
    helm:
392
      parameters:
393
        - name: storage.data.storageClass
394
          value: rook-ceph-block-hdd
395
        - name: storage.data.size
396
          value: 200Gi
397
        - name: storage.db.storageClass
398
          value: rook-ceph-block-ssd
399
        - name: storage.db.size
400
          value: 10Gi
401
        - name: storage.letsencrypt.storageClass
402
          value: rook-ceph-block-hdd
403
        - name: storage.letsencrypt.size
404
          value: 50Mi
405
        - name: letsencryptStaging
406
          value: 'no'
407
        - name: fqdn
408
          value: 'code.verua.online'
409
  project: default
410
  syncPolicy:
411
    automated:
412
      prune: true
413
      selfHeal: true
414
  info:
415
    - name: 'redmine-url'
416
      value: 'https://redmine.ungleich.ch/issues/ISSUEID'
417
    - name: 'support-url'
418
      value: 'https://support.ungleich.ch/Ticket/Display.html?id=TICKETID'
419
</pre>
420
421 80 Nico Schottelius
h2. Helm related operations and conventions
422 55 Nico Schottelius
423 61 Nico Schottelius
We use helm charts extensively.
424
425
* In production, they are managed via argocd
426
* In development, helm chart can de developed and deployed manually using the helm utility.
427
428 55 Nico Schottelius
h3. Installing a helm chart
429
430
One can use the usual pattern of
431
432
<pre>
433
helm install <releasename> <chartdirectory>
434
</pre>
435
436
However often you want to reinstall/update when testing helm charts. The following pattern is "better", because it allows you to reinstall, if it is already installed:
437
438
<pre>
439
helm upgrade --install <releasename> <chartdirectory>
440 1 Nico Schottelius
</pre>
441 80 Nico Schottelius
442
h3. Naming services and deployments in helm charts [Application labels]
443
444
* We always have {{ .Release.Name }} to identify the current "instance"
445
* Deployments:
446
** use @app: <what it is>@, f.i. @app: nginx@, @app: postgres@, ...
447 81 Nico Schottelius
* See more about standard labels on
448
** https://kubernetes.io/docs/concepts/overview/working-with-objects/common-labels/
449
** https://helm.sh/docs/chart_best_practices/labels/
450 55 Nico Schottelius
451 43 Nico Schottelius
h2. Rook / Ceph Related Operations
452
453 71 Nico Schottelius
h3. Executing ceph commands
454
455
Using the ceph-tools pod as follows:
456
457
<pre>
458
kubectl exec -n rook-ceph -ti $(kubectl -n rook-ceph get pods -l app=rook-ceph-tools -o jsonpath='{.items[*].metadata.name}') -- ceph -s
459
</pre>
460
461 43 Nico Schottelius
h3. Inspecting the logs of a specific server
462
463
<pre>
464
# Get the related pods
465
kubectl -n rook-ceph get pods -l app=rook-ceph-osd-prepare 
466
...
467
468
# Inspect the logs of a specific pod
469
kubectl -n rook-ceph logs -f rook-ceph-osd-prepare-server23--1-444qx
470
471 71 Nico Schottelius
</pre>
472
473
h3. Inspecting the logs of the rook-ceph-operator
474
475
<pre>
476
kubectl -n rook-ceph logs -f -l app=rook-ceph-operator
477 43 Nico Schottelius
</pre>
478
479
h3. Triggering server prepare / adding new osds
480
481
The rook-ceph-operator triggers/watches/creates pods to maintain hosts. To trigger a full "re scan", simply delete that pod:
482
483
<pre>
484
kubectl -n rook-ceph delete pods -l app=rook-ceph-operator
485
</pre>
486
487
This will cause all the @rook-ceph-osd-prepare-..@ jobs to be recreated and thus OSDs to be created, if new disks have been added.
488
489
h3. Removing an OSD
490
491
* See "Ceph OSD Management":https://rook.io/docs/rook/v1.7/ceph-osd-mgmt.html
492 77 Nico Schottelius
* More specifically: https://github.com/rook/rook/blob/release-1.7/cluster/examples/kubernetes/ceph/osd-purge.yaml
493 41 Nico Schottelius
494 76 Nico Schottelius
h2. Harbor
495
496
* We user "Harbor":https://goharbor.io/ for caching and as an image registry. Internal app reference: apps/prod/harbor.
497
* The admin password is in the password store, auto generated per cluster
498
* At the moment harbor only authenticates against the internal ldap tree
499
500
h3. LDAP configuration
501
502
* The url needs to be ldaps://...
503
* uid = uid
504
* rest standard
505 75 Nico Schottelius
506 89 Nico Schottelius
h2. Monitoring / Prometheus
507
508 90 Nico Schottelius
* Via "kube-prometheus":https://github.com/prometheus-operator/kube-prometheus/
509 89 Nico Schottelius
510 91 Nico Schottelius
Access via ...
511
512
* http://prometheus-k8s.monitoring.svc:9090
513
* http://grafana.monitoring.svc:3000
514
* http://alertmanager.monitoring.svc:9093
515
516
517
518 82 Nico Schottelius
h2. Nextcloud
519
520 85 Nico Schottelius
h3. How to get the nextcloud credentials 
521 84 Nico Schottelius
522
* The initial username is set to "nextcloud"
523
* The password is autogenerated and saved in a kubernetes secret
524
525
<pre>
526 85 Nico Schottelius
kubectl get secret RELEASENAME-nextcloud -o jsonpath="{.data.PASSWORD}" | base64 -d; echo "" 
527 84 Nico Schottelius
</pre>
528
529 83 Nico Schottelius
h3. How to fix "Access through untrusted domain"
530
531 82 Nico Schottelius
* Nextcloud stores the initial domain configuration
532 1 Nico Schottelius
* If the FQDN is changed, it will show the error message "Access through untrusted domain"
533 82 Nico Schottelius
* To fix, edit /var/www/html/config/config.php and correct the domain
534 83 Nico Schottelius
* Then delete the pods
535 82 Nico Schottelius
536 1 Nico Schottelius
h2. Infrastructure versions
537 35 Nico Schottelius
538 57 Nico Schottelius
h3. ungleich kubernetes infrastructure v5 (2021-10)
539 1 Nico Schottelius
540 57 Nico Schottelius
Clusters are configured / setup in this order:
541
542
* Bootstrap via kubeadm
543 59 Nico Schottelius
* "Networking via calico + BGP (non ECMP) using helm":https://docs.projectcalico.org/getting-started/kubernetes/helm
544
* "ArgoCD for CD":https://argo-cd.readthedocs.io/en/stable/
545
** "rook for storage via argocd":https://rook.io/
546 58 Nico Schottelius
** haproxy for in IPv6-cluster-IPv4-to-IPv6 proxy via argocd
547
** "kubernetes-secret-generator for in cluster secrets":https://github.com/mittwald/kubernetes-secret-generator
548
** "ungleich-certbot managing certs and nginx":https://hub.docker.com/repository/docker/ungleich/ungleich-certbot
549
550 57 Nico Schottelius
551
h3. ungleich kubernetes infrastructure v4 (2021-09)
552
553 54 Nico Schottelius
* rook is configured via manifests instead of using the rook-ceph-cluster helm chart
554 1 Nico Schottelius
* The rook operator is still being installed via helm
555 35 Nico Schottelius
556 57 Nico Schottelius
h3. ungleich kubernetes infrastructure v3 (2021-07)
557 1 Nico Schottelius
558 10 Nico Schottelius
* rook is now installed via helm via argocd instead of directly via manifests
559 28 Nico Schottelius
560 57 Nico Schottelius
h3. ungleich kubernetes infrastructure v2 (2021-05)
561 28 Nico Schottelius
562
* Replaced fluxv2 from ungleich k8s v1 with argocd
563 1 Nico Schottelius
** argocd can apply helm templates directly without needing to go through Chart releases
564 28 Nico Schottelius
* We are also using argoflow for build flows
565
* Planned to add "kaniko":https://github.com/GoogleContainerTools/kaniko for image building
566
567 57 Nico Schottelius
h3. ungleich kubernetes infrastructure v1 (2021-01)
568 28 Nico Schottelius
569
We are using the following components:
570
571
* "Calico as a CNI":https://www.projectcalico.org/ with BGP, IPv6 only, no encapsulation
572
** Needed for basic networking
573
* "kubernetes-secret-generator":https://github.com/mittwald/kubernetes-secret-generator for creating secrets
574
** Needed so that secrets are not stored in the git repository, but only in the cluster
575
* "ungleich-certbot":https://hub.docker.com/repository/docker/ungleich/ungleich-certbot
576
** Needed to get letsencrypt certificates for services
577
* "rook with ceph rbd + cephfs":https://rook.io/ for storage
578
** rbd for almost everything, *ReadWriteOnce*
579
** cephfs for smaller things, multi access *ReadWriteMany*
580
** Needed for providing persistent storage
581
* "flux v2":https://fluxcd.io/
582
** Needed to manage resources automatically