Version 174 - History - The ungleich kubernetes infrastructure - Open Infrastructure - ungleich redmine

The ungleich kubernetes infrastructure » History » Version 174

Nico Schottelius, 02/12/2023 09:16 PM

-Nico Schottelius
+h1. The ungleich kubernetes infrastructure and ungleich kubernetes manual
 Nico Schottelius
-Nico Schottelius
+{{toc}}
-Nico Schottelius
+h2. Status
-Nico Schottelius
+This document is **pre-production**.
 This document is to become the ungleich kubernetes infrastructure overview as well as the ungleich kubernetes manual.
 Nico Schottelius
-Nico Schottelius
+h2. k8s clusters
-Nico Schottelius
+| Cluster            | Purpose/Setup     | Maintainer | Master(s)                     | argo                                                   | v4 http proxy | last verified |
 | c0.k8s.ooo         | Dev               | -          | UNUSED                        |                                                        |               |    2021-10-05 |
 | c1.k8s.ooo         | retired           |            | -                             |                                                        |               |    2022-03-15 |
 | c2.k8s.ooo         | Dev p7 HW         | Nico       | server47 server53 server54    | "argo":https://argocd-server.argocd.svc.c2.k8s.ooo     |               |    2021-10-05 |
 | c3.k8s.ooo         | retired           | -          | -                             |                                                        |               |    2021-10-05 |
 | c4.k8s.ooo         | Dev2 p7 HW        | Jin-Guk    | server52 server53 server54    |                                                        |               |             - |
 | c5.k8s.ooo         | retired           |            | -                             |                                                        |               |    2022-03-15 |
 | c6.k8s.ooo         | Dev p6 VM Jin-Guk | Jin-Guk    |                               |                                                        |               |               |
 | [[p5.k8s.ooo]]     | production        |            | server34 server36 server38    | "argo":https://argocd-server.argocd.svc.p5.k8s.ooo     | -             |               |
 | [[p5-cow.k8s.ooo]] | production        | Nico       | server47 server51 server55    | "argo":https://argocd-server.argocd.svc.p5-cow.k8s.ooo |               |    2022-08-27 |
 | [[p6.k8s.ooo]]     | production        |            | server67 server69 server71    | "argo":https://argocd-server.argocd.svc.p6.k8s.ooo     | 147.78.194.13 |    2021-10-05 |
 | [[p10.k8s.ooo]]    | production        |            | server63 server65 server83    | "argo":https://argocd-server.argocd.svc.p10.k8s.ooo    | 147.78.194.12 |    2021-10-05 |
 | [[k8s.ge.nau.so]]  | development       |            | server107 server108 server109 | "argo":https://argocd-server.argocd.svc.k8s.ge.nau.so  |               |               |
 | [[dev.k8s.ooo]]    | development       |            | server110 server111 server112 | "argo":https://argocd-server.argocd.svc.dev.k8s.ooo    | -             |    2022-07-08 |
-Nico Schottelius
+| [[r1r2p15k8sooo|r1.p15.k8s.ooo]] | production | Nico | server120 | | | 2022-10-30 |
 | [[r1r2p15k8sooo|r2.p15.k8s.ooo]] | production | Nico | server121 | | | 2022-09-06 |
-Nico Schottelius
+| [[r1r2p10k8sooo|r1.p10.k8s.ooo]] | production | Nico | server122 | | | 2022-10-30 |
 | [[r1r2p10k8sooo|r2.p10.k8s.ooo]] | production | Nico | server123 | | | 2022-10-15 |
 | [[r1r2p5k8sooo|r1.p5.k8s.ooo]] | production | Nico | server137 | | | 2022-10-30 |
 | [[r1r2p5k8sooo|r2.p5.k8s.ooo]] | production | Nico | server138 | | | 2022-10-30 |
 | [[r1r2p6k8sooo|r1.p6.k8s.ooo]] | production | Nico | server139 | | | 2022-10-30 |
 | [[r1r2p6k8sooo|r2.p6.k8s.ooo]] | production | Nico | server140 | | | 2022-10-30 |
 Nico Schottelius
-Nico Schottelius
+h2. General architecture and components overview
 * All k8s clusters are IPv6 only
 * We use BGP peering to propagate podcidr and serviceCidr networks to our infrastructure
 * The main public testing repository is "ungleich-k8s":https://code.ungleich.ch/ungleich-public/ungleich-k8s
-Nico Schottelius
+** Private configurations are found in the **k8s-config** repository
 Nico Schottelius
 h3. Cluster types
-Nico Schottelius
+| **Type/Feature**            | **Development**                | **Production**         |
 | Min No. nodes               | 3 (1 master, 3 worker)         | 5 (3 master, 3 worker) |
 | Recommended minimum         | 4 (dedicated master, 3 worker) | 8 (3 master, 5 worker) |
 | Separation of control plane | optional                       | recommended            |
 | Persistent storage          | required                       | required               |
 | Number of storage monitors  | 3                              | 5                      |
 Nico Schottelius
-Nico Schottelius
+h2. General k8s operations
 Nico Schottelius
-Nico Schottelius
+h3. Cheat sheet / external great references
 * "kubectl cheatsheet":https://kubernetes.io/docs/reference/kubectl/cheatsheet/
-Nico Schottelius
+h3. Allowing to schedule work on the control plane / removing node taints
 Nico Schottelius
 * Mostly for single node / test / development clusters
 * Just remove the master taint as follows
 <pre>
 kubectl taint nodes --all node-role.kubernetes.io/master-
-Nico Schottelius
+kubectl taint nodes --all node-role.kubernetes.io/control-plane-
-Nico Schottelius
+</pre>
 Nico Schottelius
-Nico Schottelius
+You can check the node taints using @kubectl describe node ...@
 Nico Schottelius
-Nico Schottelius
+h3. Get the cluster admin.conf
 * On the masters of each cluster you can find the file @/etc/kubernetes/admin.conf@
 * To be able to administrate the cluster you can copy the admin.conf to your local machine
 * Multi cluster debugging can very easy if you name the config ~/cX-admin.conf (see example below)
 <pre>
 % scp root@server47.place7.ungleich.ch:/etc/kubernetes/admin.conf ~/c2-admin.conf
 % export KUBECONFIG=~/c2-admin.conf
 % kubectl get nodes
 NAME       STATUS                     ROLES                  AGE   VERSION
 server47   Ready                      control-plane,master   82d   v1.22.0
 server48   Ready                      control-plane,master   82d   v1.22.0
 server49   Ready                      <none>                 82d   v1.22.0
 server50   Ready                      <none>                 82d   v1.22.0
 server59   Ready                      control-plane,master   82d   v1.22.0
 server60   Ready,SchedulingDisabled   <none>                 82d   v1.22.0
 server61   Ready                      <none>                 82d   v1.22.0
 server62   Ready                      <none>                 82d   v1.22.0
 </pre>
-Nico Schottelius
+h3. Installing a new k8s cluster
 Nico Schottelius
-Nico Schottelius
+* Decide on the cluster name (usually *cX.k8s.ooo*), X counting upwards
-Nico Schottelius
+** Using pXX.k8s.ooo for production clusters of placeXX
-Nico Schottelius
+* Use cdist to configure the nodes with requirements like crio
 * Decide between single or multi node control plane setups (see below)
-Nico Schottelius
+** Single control plane suitable for development clusters
 Nico Schottelius
-Nico Schottelius
+Typical init procedure:
 Nico Schottelius
-Nico Schottelius
+* Single control plane: @kubeadm init --config bootstrap/XXX/kubeadm.yaml@
 * Multi control plane (HA): @kubeadm init --config bootstrap/XXX/kubeadm.yaml --upload-certs@
 Nico Schottelius
-Nico Schottelius
+h3. Deleting a pod that is hanging in terminating state
 <pre>
 kubectl delete pod <PODNAME> --grace-period=0 --force --namespace <NAMESPACE>
 </pre>
 (from https://stackoverflow.com/questions/35453792/pods-stuck-in-terminating-status)
-Nico Schottelius
+h3. Listing nodes of a cluster
 <pre>
 [15:05] bridge:~% kubectl get nodes
 NAME       STATUS   ROLES                  AGE   VERSION
 server22   Ready    <none>                 52d   v1.22.0
 server23   Ready    <none>                 52d   v1.22.2
 server24   Ready    <none>                 52d   v1.22.0
 server25   Ready    <none>                 52d   v1.22.0
 server26   Ready    <none>                 52d   v1.22.0
 server27   Ready    <none>                 52d   v1.22.0
 server63   Ready    control-plane,master   52d   v1.22.0
 server64   Ready    <none>                 52d   v1.22.0
 server65   Ready    control-plane,master   52d   v1.22.0
 server66   Ready    <none>                 52d   v1.22.0
 server83   Ready    control-plane,master   52d   v1.22.0
 server84   Ready    <none>                 52d   v1.22.0
 server85   Ready    <none>                 52d   v1.22.0
 server86   Ready    <none>                 52d   v1.22.0
 </pre>
-Nico Schottelius
+h3. Removing / draining a node
 Usually @kubectl drain server@ should do the job, but sometimes we need to be more aggressive:
-Nico Schottelius
+<pre>
-Nico Schottelius
+kubectl drain --delete-emptydir-data --ignore-daemonsets serverXX
-Nico Schottelius
+</pre>
 h3. Readding a node after draining
 <pre>
 kubectl uncordon serverXX
-Nico Schottelius
+</pre>
 Nico Schottelius
-Nico Schottelius
+h3. (Re-)joining worker nodes after creating the cluster
 Nico Schottelius
 * We need to have an up-to-date token
 * We use different join commands for the workers and control plane nodes
 Generating the join command on an existing control plane node:
 <pre>
 kubeadm token create --print-join-command
 </pre>
-Nico Schottelius
+h3. (Re-)joining control plane nodes after creating the cluster
 Nico Schottelius
-Nico Schottelius
+* We generate the token again
 * We upload the certificates
 * We need to combine/create the join command for the control plane node
 Example session:
 <pre>
 % kubeadm token create --print-join-command
 kubeadm join p10-api.k8s.ooo:6443 --token xmff4i.ABC --discovery-token-ca-cert-hash sha256:longhash
 % kubeadm init phase upload-certs --upload-certs
 [upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
 [upload-certs] Using certificate key:
 CERTKEY
 # Then we use these two outputs on the joining node:
 kubeadm join p10-api.k8s.ooo:6443 --token xmff4i.ABC --discovery-token-ca-cert-hash sha256:longhash --control-plane --certificate-key CERTKEY
 </pre>
 Commands to be used on a control plane node:
 <pre>
 kubeadm token create --print-join-command
 kubeadm init phase upload-certs --upload-certs
 </pre>
 Commands to be used on the joining node:
 <pre>
 JOINCOMMAND --control-plane --certificate-key CERTKEY
 </pre>
 Nico Schottelius
-Nico Schottelius
+SEE ALSO
 * https://stackoverflow.com/questions/63936268/how-to-generate-kubeadm-token-for-secondary-control-plane-nodes
 * https://blog.scottlowe.org/2019/08/15/reconstructing-the-join-command-for-kubeadm/
-Nico Schottelius
+h3. How to fix etcd does not start when rejoining a kubernetes cluster as a control plane
 Nico Schottelius
 If during the above step etcd does not come up, @kubeadm join@ can hang as follows:
 <pre>
 [control-plane] Creating static Pod manifest for "kube-apiserver"
 [control-plane] Creating static Pod manifest for "kube-controller-manager"
 [control-plane] Creating static Pod manifest for "kube-scheduler"
 [check-etcd] Checking that the etcd cluster is healthy
 error execution phase check-etcd: etcd cluster is not healthy: failed to dial endpoint https://[2a0a:e5c0:10:1:225:b3ff:fe20:37
 a]:2379 with maintenance client: context deadline exceeded
 To see the stack trace of this error execute with --v=5 or higher
 </pre>
 Then the problem is likely that the etcd server is still a member of the cluster. We first need to remove it from the etcd cluster and then the join works.
 To fix this we do:
 * Find a working etcd pod
 * Find the etcd members / member list
 * Remove the etcd member that we want to re-join the cluster
 <pre>
 # Find the etcd pods
 kubectl -n kube-system get pods -l component=etcd,tier=control-plane
 # Get the list of etcd servers with the member id
 kubectl exec -n kube-system -ti ETCDPODNAME -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert  /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member list
 # Remove the member
 kubectl exec -n kube-system -ti ETCDPODNAME -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert  /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member remove MEMBERID
 </pre>
 Sample session:
 <pre>
 [10:48] line:~% kubectl -n kube-system get pods -l component=etcd,tier=control-plane
 NAME            READY   STATUS    RESTARTS     AGE
 etcd-server63   1/1     Running   0            3m11s
 etcd-server65   1/1     Running   3            7d2h
 etcd-server83   1/1     Running   8 (6d ago)   7d2h
 [10:48] line:~% kubectl exec -n kube-system -ti etcd-server65 -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert  /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member list
 cd676df6e4, started, server65, https://[2a0a:e5c0:10:1:225:b3ff:fe20:375c]:2380, https://[2a0a:e5c0:10:1:225:b3ff:fe20:375c]:2379, false
 b8a07185dee7e, started, server63, https://[2a0a:e5c0:10:1:225:b3ff:fe20:378a]:2380, https://[2a0a:e5c0:10:1:225:b3ff:fe20:378a]:2379, false
 bc58307f8af9, started, server83, https://[2a0a:e5c0:10:1:3e4a:92ff:fe79:bb98]:2380, https://[2a0a:e5c0:10:1:3e4a:92ff:fe79:bb98]:2379, false
 [10:48] line:~% kubectl exec -n kube-system -ti etcd-server65 -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert  /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member remove 371b8a07185dee7e
 Member 371b8a07185dee7e removed from cluster e3c0805f592a8f77
 Nico Schottelius
 </pre>
 SEE ALSO
 * We found the solution using https://stackoverflow.com/questions/67921552/re-installed-node-cannot-join-kubernetes-cluster
 Nico Schottelius
-Nico Schottelius
+h3. Node labels (adding, showing, removing)
 Listing the labels:
 <pre>
 kubectl get nodes --show-labels
 </pre>
 Adding labels:
 <pre>
 kubectl label nodes LIST-OF-NODES label1=value1
 </pre>
 For instance:
 <pre>
 kubectl label nodes router2 router3 hosttype=router
 </pre>
 Selecting nodes in pods:
 <pre>
 apiVersion: v1
 kind: Pod
 ...
 spec:
   nodeSelector:
     hosttype: router
 </pre>
-Nico Schottelius
+Removing labels by adding a minus at the end of the label name:
 <pre>
 kubectl label node <nodename> <labelname>-
 </pre>
 For instance:
 <pre>
 kubectl label nodes router2 router3 hosttype-
 </pre>
-Nico Schottelius
+SEE ALSO
 Nico Schottelius
-Nico Schottelius
+* https://kubernetes.io/docs/tasks/configure-pod-container/assign-pods-nodes/
 * https://stackoverflow.com/questions/34067979/how-to-delete-a-node-label-by-command-and-api
 Nico Schottelius
-Nico Schottelius
+h3. Hardware Maintenance using ungleich-hardware
 Use the following manifest and replace the HOST with the actual host:
 <pre>
 apiVersion: v1
 kind: Pod
 metadata:
   name: ungleich-hardware-HOST
 spec:
   containers:
   - name: ungleich-hardware
     image: ungleich/ungleich-hardware:0.0.5
     args:
     - sleep
     - "1000000"
     volumeMounts:
       - mountPath: /dev
         name: dev
     securityContext:
       privileged: true
   nodeSelector:
     kubernetes.io/hostname: "HOST"
   volumes:
     - name: dev
       hostPath:
         path: /dev
 </pre>
-Nico Schottelius
+Also see: [[The_ungleich_hardware_maintenance_guide]]
-Nico Schottelius
+h3. Triggering a cronjob / creating a job from a cronjob
 Nico Schottelius
 To test a cronjob, we can create a job from a cronjob:
 <pre>
 kubectl create job --from=cronjob/volume2-daily-backup volume2-manual
 </pre>
 This creates a job volume2-manual based on the cronjob  volume2-daily
-Nico Schottelius
+h3. su-ing into a user that has nologin shell set
 Many times users are having nologin as their shell inside the container. To be able to execute maintenance commands within the
 container, we can use @su -s /bin/sh@ like this:
 <pre>
 su -s /bin/sh -c '/path/to/your/script' testuser
 </pre>
 Found on https://serverfault.com/questions/351046/how-to-run-command-as-user-who-has-usr-sbin-nologin-as-shell
-Nico Schottelius
+h3. How to print a secret value
 Assuming you want the "password" item from a secret, use:
 <pre>
 kubectl get secret SECRETNAME -o jsonpath="{.data.password}" | base64 -d; echo ""
 </pre>
-Nico Schottelius
+h3. How to upgrade a kubernetes cluster
 Nico Schottelius
 h4. General
 * Should be done every X months to stay up-to-date
 ** X probably something like 3-6
 * kubeadm based clusters
 * Needs specific kubeadm versions for upgrade
 * Follow instructions on https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/
 h4. Getting a specific kubeadm or kubelet version
 <pre>
 ARCH=amd64
 RELEASE=v1.24.9
 RELEASE=v1.25.5
 curl -L --remote-name-all https://dl.k8s.io/release/${RELEASE}/bin/linux/${ARCH}/{kubeadm,kubelet}
 </pre>
 h4. Steps
 * kubeadm upgrade plan
 ** On one control plane node
 * kubeadm upgrade apply vXX.YY.ZZ
 ** On one control plane node
-Nico Schottelius
+Repeat for all control planes nodes. The upgrade kubelet on all other nodes via package manager.
 Nico Schottelius
-Nico Schottelius
+h2. Reference CNI
 * Mainly "stupid", but effective plugins
 * Main documentation on https://www.cni.dev/plugins/current/
-Nico Schottelius
+* Plugins
 ** bridge
 *** Can create the bridge on the host
 *** But seems not to be able to add host interfaces to it as well
 *** Has support for vlan tags
 ** vlan
 *** creates vlan tagged sub interface on the host
-Nico Schottelius
+*** "It's a 1:1 mapping (i.e. no bridge in between)":https://github.com/k8snetworkplumbingwg/multus-cni/issues/569
-Nico Schottelius
+** host-device
 *** moves the interface from the host into the container
 *** very easy for physical connections to containers
-Nico Schottelius
+** ipvlan
 *** "virtualisation" of a host device
 *** routing based on IP
 *** Same MAC for everyone
 *** Cannot reach the master interface
 ** maclvan
 *** With mac addresses
 *** Supports various modes (to be checked)
 ** ptp ("point to point")
 *** Creates a host device and connects it to the container
 ** win*
-Nico Schottelius
+*** Windows implementations
 Nico Schottelius
-Nico Schottelius
+h2. Calico CNI
 h3. Calico Installation
 * We install "calico using helm":https://docs.projectcalico.org/getting-started/kubernetes/helm
 * This has the following advantages:
 ** Easy to upgrade
 ** Does not require os to configure IPv6/dual stack settings as the tigera operator figures out things on its own
 Usually plain calico can be installed directly using:
 <pre>
-Nico Schottelius
+VERSION=v3.25.0
 Nico Schottelius
-Nico Schottelius
+helm repo add projectcalico https://docs.projectcalico.org/charts
-Nico Schottelius
+helm repo update
-Nico Schottelius
+helm upgrade --install --namespace tigera calico projectcalico/tigera-operator --version $VERSION --create-namespace
-Nico Schottelius
+</pre>
 Nico Schottelius
 * Check the tags on https://github.com/projectcalico/calico/tags for the latest release
 Nico Schottelius
 h3. Installing calicoctl
-Nico Schottelius
+* General installation instructions, including binary download: https://projectcalico.docs.tigera.io/maintenance/clis/calicoctl/install
-Nico Schottelius
+To be able to manage and configure calico, we need to
 "install calicoctl (we choose the version as a pod)":https://docs.projectcalico.org/getting-started/clis/calicoctl/install#install-calicoctl-as-a-kubernetes-pod
 <pre>
 kubectl apply -f https://docs.projectcalico.org/manifests/calicoctl.yaml
 </pre>
-Nico Schottelius
+Or version specific:
 <pre>
 kubectl apply -f https://github.com/projectcalico/calico/blob/v3.20.4/manifests/calicoctl.yaml
 Nico Schottelius
 # For 3.22
 kubectl apply -f https://projectcalico.docs.tigera.io/archive/v3.22/manifests/calicoctl.yaml
-Nico Schottelius
+</pre>
-Nico Schottelius
+And making it easier accessible by alias:
 <pre>
 alias calicoctl="kubectl exec -i -n kube-system calicoctl -- /calicoctl"
 </pre>
-Nico Schottelius
+h3. Calico configuration
-Nico Schottelius
+By default our k8s clusters "BGP peer":https://docs.projectcalico.org/networking/bgp
 with an upstream router to propagate podcidr and servicecidr.
 Nico Schottelius
 Default settings in our infrastructure:
 * We use a full-mesh using the @nodeToNodeMeshEnabled: true@ option
 * We keep the original next hop so that *only* the server with the pod is announcing it (instead of ecmp)
-Nico Schottelius
+* We use private ASNs for k8s clusters
-Nico Schottelius
+* We do *not* use any overlay
 Nico Schottelius
 After installing calico and calicoctl the last step of the installation is usually:
-Nico Schottelius
+<pre>
-Nico Schottelius
+calicoctl create -f - < calico-bgp.yaml
-Nico Schottelius
+</pre>
 A sample BGP configuration:
 <pre>
 ---
 apiVersion: projectcalico.org/v3
 kind: BGPConfiguration
 metadata:
   name: default
 spec:
   logSeverityScreen: Info
   nodeToNodeMeshEnabled: true
   asNumber: 65534
   serviceClusterIPs:
   - cidr: 2a0a:e5c0:10:3::/108
   serviceExternalIPs:
   - cidr: 2a0a:e5c0:10:3::/108
 ---
 apiVersion: projectcalico.org/v3
 kind: BGPPeer
 metadata:
   name: router1-place10
 spec:
   peerIP: 2a0a:e5c0:10:1::50
   asNumber: 213081
   keepOriginalNextHop: true
 </pre>
-Nico Schottelius
+h2. Cilium CNI (experimental)
-Nico Schottelius
+h3. Status
-Nico Schottelius
+*NO WORKING CILIUM CONFIGURATION FOR IPV6 only modes*
 Nico Schottelius
-Nico Schottelius
+h3. Latest error
 It seems cilium does not run on IPv6 only hosts:
 <pre>
 level=info msg="Validating configured node address ranges" subsys=daemon
 level=fatal msg="postinit failed" error="external IPv4 node address could not be derived, please configure via --ipv4-node" subsys=daemon
 level=info msg="Starting IP identity watcher" subsys=ipcache
 </pre>
 It crashes after that log entry
-Nico Schottelius
+h3. BGP configuration
 * The cilium-operator will not start without a correct configmap being present beforehand (see error message below)
 * Creating the bgp config beforehand as a configmap is thus required.
 The error one gets without the configmap present:
 Pods are hanging with:
 <pre>
 cilium-bpqm6                       0/1     Init:0/4            0             9s
 cilium-operator-5947d94f7f-5bmh2   0/1     ContainerCreating   0             9s
 </pre>
 The error message in the cilium-*perator is:
 <pre>
 Events:
   Type     Reason       Age                From               Message
   ----     ------       ----               ----               -------
   Normal   Scheduled    80s                default-scheduler  Successfully assigned kube-system/cilium-operator-5947d94f7f-lqcsp to server56
   Warning  FailedMount  16s (x8 over 80s)  kubelet            MountVolume.SetUp failed for volume "bgp-config-path" : configmap "bgp-config" not found
 </pre>
 A correct bgp config looks like this:
 <pre>
 apiVersion: v1
 kind: ConfigMap
 metadata:
   name: bgp-config
   namespace: kube-system
 data:
   config.yaml: |
     peers:
       - peer-address: 2a0a:e5c0::46
         peer-asn: 209898
         my-asn: 65533
       - peer-address: 2a0a:e5c0::47
         peer-asn: 209898
         my-asn: 65533
     address-pools:
       - name: default
         protocol: bgp
         addresses:
           - 2a0a:e5c0:0:14::/64
 </pre>
 Nico Schottelius
 h3. Installation
 Nico Schottelius
-Nico Schottelius
+Adding the repo
-Nico Schottelius
+<pre>
 Nico Schottelius
-Nico Schottelius
+helm repo add cilium https://helm.cilium.io/
-Nico Schottelius
+helm repo update
 </pre>
 Nico Schottelius
-Nico Schottelius
+Installing + configuring cilium
-Nico Schottelius
+<pre>
-Nico Schottelius
+ipv6pool=2a0a:e5c0:0:14::/112
 Nico Schottelius
-Nico Schottelius
+version=1.12.2
 Nico Schottelius
 helm upgrade --install cilium cilium/cilium --version $version \
-Nico Schottelius
+  --namespace kube-system \
   --set ipv4.enabled=false \
   --set ipv6.enabled=true \
-Nico Schottelius
+  --set enableIPv6Masquerade=false \
   --set bgpControlPlane.enabled=true
 Nico Schottelius
-Nico Schottelius
+#  --set ipam.operator.clusterPoolIPv6PodCIDRList=$ipv6pool
 # Old style bgp?
-Nico Schottelius
+#   --set bgp.enabled=true --set bgp.announce.podCIDR=true \
 Nico Schottelius
 # Show possible configuration options
 helm show values cilium/cilium
-Nico Schottelius
+</pre>
 Nico Schottelius
 Using a /64 for ipam.operator.clusterPoolIPv6PodCIDRList fails with:
 <pre>
 level=fatal msg="Unable to init cluster-pool allocator" error="unable to initialize IPv6 allocator New CIDR set failed; the node CIDR size is too big" subsys=cilium-operator-generic
 </pre>
 Nico Schottelius
-Nico Schottelius
+See also https://github.com/cilium/cilium/issues/20756
 Nico Schottelius
 Seems a /112 is actually working.
 h3. Kernel modules
 Cilium requires the following modules to be loaded on the host (not loaded by default):
 <pre>
-Nico Schottelius
+modprobe  ip6table_raw
 modprobe  ip6table_filter
 </pre>
 Nico Schottelius
 h3. Interesting helm flags
 * autoDirectNodeRoutes
 * bgpControlPlane.enabled = true
 h3. SEE ALSO
 * https://docs.cilium.io/en/v1.12/helm-reference/
 Nico Schottelius
-Nico Schottelius
+h2. Multus (incomplete/experimental/WIP)
 Nico Schottelius
 Nico Schottelius
 * https://github.com/k8snetworkplumbingwg/multus-cni
 * Installing a deployment w/ CRDs
 Nico Schottelius
-Nico Schottelius
+<pre>
 VERSION=v3.9.2
-Nico Schottelius
+kubectl apply -f https://raw.githubusercontent.com/k8snetworkplumbingwg/multus-cni/${VERSION}/deployments/multus-daemonset-crio.yml
-Nico Schottelius
+</pre>
 Nico Schottelius
 * crio based fails on alpine linux due to:
 <pre>
 [22:07] nb3:~% kubectl logs -n kube-system kube-multus-ds-2g9d5
 -12-26T21:05:21+00:00 Generating Multus configuration file using files in /host/etc/cni/net.d...
 -12-26T21:05:21+00:00 Using MASTER_PLUGIN: 10-calico.conflist
 -12-26T21:05:25+00:00 Nested capabilities string: "capabilities": {"bandwidth": true, "portMappings": true},
 -12-26T21:05:25+00:00 Using /host/etc/cni/net.d/10-calico.conflist as a source to generate the Multus configuration
 -12-26T21:05:26+00:00 Config file created @ /host/etc/cni/net.d/00-multus.conf
 { "cniVersion": "0.3.1", "name": "multus-cni-network", "type": "multus", "capabilities": {"bandwidth": true, "portMappings": true}, "kubeconfig": "/etc/cni/net.d/multus.d/multus.kubeconfig", "delegates": [ { "name": "k8s-pod-network", "cniVersion": "0.3.1", "plugins": [ { "type": "calico", "datastore_type": "kubernetes", "mtu": 0, "nodename_file_optional": false, "log_level": "Info", "log_file_path": "/var/log/calico/cni/cni.log", "ipam": { "type": "calico-ipam", "assign_ipv4" : "false", "assign_ipv6" : "true"}, "container_settings": { "allow_ip_forwarding": false }, "policy": { "type": "k8s" }, "kubernetes": { "k8s_api_root":"https://[2a0a:e5c0:43:bb::1]:443", "kubeconfig": "/etc/cni/net.d/calico-kubeconfig" } }, { "type": "bandwidth", "capabilities": {"bandwidth": true} }, {"type": "portmap", "snat": true, "capabilities": {"portMappings": true}} ] } ] }
 -12-26T21:05:26+00:00 Restarting crio
 /entrypoint.sh: line 434: systemctl: command not found
 </pre>
 Nico Schottelius
-Nico Schottelius
+h2. ArgoCD
 Nico Schottelius
-Nico Schottelius
+h3. Argocd Installation
 Nico Schottelius
-Nico Schottelius
+* See https://argo-cd.readthedocs.io/en/stable/
-Nico Schottelius
+As there is no configuration management present yet, argocd is installed using
-Nico Schottelius
+<pre>
-Nico Schottelius
+kubectl create namespace argocd
 Nico Schottelius
-Nico Schottelius
+# Specific Version
 kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/v2.3.2/manifests/install.yaml
 Nico Schottelius
 # OR: latest stable
-Nico Schottelius
+kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml
-Nico Schottelius
+</pre>
 Nico Schottelius
 Nico Schottelius
 Nico Schottelius
-Nico Schottelius
+h3. Get the argocd credentials
 <pre>
 kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d; echo ""
 </pre>
 Nico Schottelius
-Nico Schottelius
+h3. Accessing argocd
 In regular IPv6 clusters:
 * Navigate to https://argocd-server.argocd.CLUSTERDOMAIN
 In legacy IPv4 clusters
 <pre>
 kubectl --namespace argocd port-forward svc/argocd-server 8080:80
 </pre>
-Nico Schottelius
+* Navigate to https://localhost:8080
-Nico Schottelius
+h3. Using the argocd webhook to trigger changes
 Nico Schottelius
 * To trigger changes post json https://argocd.example.com/api/webhook
-Nico Schottelius
+h3. Deploying an application
 * Applications are deployed via git towards gitea (code.ungleich.ch) and then pulled by argo
-Nico Schottelius
+* Always include the *redmine-url* pointing to the (customer) ticket
 ** Also add the support-url if it exists
 Nico Schottelius
 Application sample
 <pre>
 apiVersion: argoproj.io/v1alpha1
 kind: Application
 metadata:
   name: gitea-CUSTOMER
   namespace: argocd
 spec:
   destination:
     namespace: default
     server: 'https://kubernetes.default.svc'
   source:
     path: apps/prod/gitea
     repoURL: 'https://code.ungleich.ch/ungleich-intern/k8s-config.git'
     targetRevision: HEAD
     helm:
       parameters:
         - name: storage.data.storageClass
           value: rook-ceph-block-hdd
         - name: storage.data.size
           value: 200Gi
         - name: storage.db.storageClass
           value: rook-ceph-block-ssd
         - name: storage.db.size
           value: 10Gi
         - name: storage.letsencrypt.storageClass
           value: rook-ceph-block-hdd
         - name: storage.letsencrypt.size
           value: 50Mi
         - name: letsencryptStaging
           value: 'no'
         - name: fqdn
           value: 'code.verua.online'
   project: default
   syncPolicy:
     automated:
       prune: true
       selfHeal: true
   info:
     - name: 'redmine-url'
       value: 'https://redmine.ungleich.ch/issues/ISSUEID'
     - name: 'support-url'
       value: 'https://support.ungleich.ch/Ticket/Display.html?id=TICKETID'
 </pre>
-Nico Schottelius
+h2. Helm related operations and conventions
 Nico Schottelius
-Nico Schottelius
+We use helm charts extensively.
 * In production, they are managed via argocd
 * In development, helm chart can de developed and deployed manually using the helm utility.
-Nico Schottelius
+h3. Installing a helm chart
 One can use the usual pattern of
 <pre>
 helm install <releasename> <chartdirectory>
 </pre>
 However often you want to reinstall/update when testing helm charts. The following pattern is "better", because it allows you to reinstall, if it is already installed:
 <pre>
 helm upgrade --install <releasename> <chartdirectory>
-Nico Schottelius
+</pre>
 Nico Schottelius
 h3. Naming services and deployments in helm charts [Application labels]
 * We always have {{ .Release.Name }} to identify the current "instance"
 * Deployments:
 ** use @app: <what it is>@, f.i. @app: nginx@, @app: postgres@, ...
-Nico Schottelius
+* See more about standard labels on
 ** https://kubernetes.io/docs/concepts/overview/working-with-objects/common-labels/
 ** https://helm.sh/docs/chart_best_practices/labels/
 Nico Schottelius
-Nico Schottelius
+h3. Show all versions of a helm chart
 <pre>
 helm search repo -l repo/chart
 </pre>
 For example:
 <pre>
 % helm search repo -l projectcalico/tigera-operator
 NAME                         	CHART VERSION	APP VERSION	DESCRIPTION
 projectcalico/tigera-operator	v3.23.3      	v3.23.3    	Installs the Tigera operator for Calico
 projectcalico/tigera-operator	v3.23.2      	v3.23.2    	Installs the Tigera operator for Calico
 ....
 </pre>
-Nico Schottelius
+h3. Show possible values of a chart
 <pre>
 helm show values <repo/chart>
 </pre>
 Example:
 <pre>
 helm show values ingress-nginx/ingress-nginx
 </pre>
-Nico Schottelius
+h2. Rook + Ceph
 h3. Installation
 * Usually directly via argocd
 Manual steps:
 <pre>
 </pre>
 Nico Schottelius
-Nico Schottelius
+h3. Executing ceph commands
 Using the ceph-tools pod as follows:
 <pre>
 kubectl exec -n rook-ceph -ti $(kubectl -n rook-ceph get pods -l app=rook-ceph-tools -o jsonpath='{.items[*].metadata.name}') -- ceph -s
 </pre>
-Nico Schottelius
+h3. Inspecting the logs of a specific server
 <pre>
 # Get the related pods
 kubectl -n rook-ceph get pods -l app=rook-ceph-osd-prepare
 ...
 # Inspect the logs of a specific pod
 kubectl -n rook-ceph logs -f rook-ceph-osd-prepare-server23--1-444qx
-Nico Schottelius
+</pre>
 h3. Inspecting the logs of the rook-ceph-operator
 <pre>
 kubectl -n rook-ceph logs -f -l app=rook-ceph-operator
-Nico Schottelius
+</pre>
-Nico Schottelius
+h3. Restarting the rook operator
 <pre>
 kubectl -n rook-ceph delete pods  -l app=rook-ceph-operator
 </pre>
-Nico Schottelius
+h3. Triggering server prepare / adding new osds
 The rook-ceph-operator triggers/watches/creates pods to maintain hosts. To trigger a full "re scan", simply delete that pod:
 <pre>
 kubectl -n rook-ceph delete pods -l app=rook-ceph-operator
 </pre>
 This will cause all the @rook-ceph-osd-prepare-..@ jobs to be recreated and thus OSDs to be created, if new disks have been added.
 h3. Removing an OSD
 * See "Ceph OSD Management":https://rook.io/docs/rook/v1.7/ceph-osd-mgmt.html
-Nico Schottelius
+* More specifically: https://github.com/rook/rook/blob/release-1.7/cluster/examples/kubernetes/ceph/osd-purge.yaml
-Nico Schottelius
+* Then delete the related deployment
 Nico Schottelius
-Nico Schottelius
+Set osd id in the osd-purge.yaml and apply it. OSD should be down before.
 <pre>
 apiVersion: batch/v1
 kind: Job
 metadata:
   name: rook-ceph-purge-osd
   namespace: rook-ceph # namespace:cluster
   labels:
     app: rook-ceph-purge-osd
 spec:
   template:
     metadata:
       labels:
         app: rook-ceph-purge-osd
     spec:
       serviceAccountName: rook-ceph-purge-osd
       containers:
         - name: osd-removal
           image: rook/ceph:master
           # TODO: Insert the OSD ID in the last parameter that is to be removed
           # The OSD IDs are a comma-separated list. For example: "0" or "0,2".
           # If you want to preserve the OSD PVCs, set `--preserve-pvc true`.
+          #
           # A --force-osd-removal option is available if the OSD should be destroyed even though the
           # removal could lead to data loss.
           args:
             - "ceph"
             - "osd"
             - "remove"
             - "--preserve-pvc"
             - "false"
             - "--force-osd-removal"
             - "false"
             - "--osd-ids"
             - "SETTHEOSDIDHERE"
           env:
             - name: POD_NAMESPACE
               valueFrom:
                 fieldRef:
                   fieldPath: metadata.namespace
             - name: ROOK_MON_ENDPOINTS
               valueFrom:
                 configMapKeyRef:
                   key: data
                   name: rook-ceph-mon-endpoints
             - name: ROOK_CEPH_USERNAME
               valueFrom:
                 secretKeyRef:
                   key: ceph-username
                   name: rook-ceph-mon
             - name: ROOK_CEPH_SECRET
               valueFrom:
                 secretKeyRef:
                   key: ceph-secret
                   name: rook-ceph-mon
             - name: ROOK_CONFIG_DIR
               value: /var/lib/rook
             - name: ROOK_CEPH_CONFIG_OVERRIDE
               value: /etc/rook/config/override.conf
             - name: ROOK_FSID
               valueFrom:
                 secretKeyRef:
                   key: fsid
                   name: rook-ceph-mon
             - name: ROOK_LOG_LEVEL
               value: DEBUG
           volumeMounts:
             - mountPath: /etc/ceph
               name: ceph-conf-emptydir
             - mountPath: /var/lib/rook
               name: rook-config
       volumes:
         - emptyDir: {}
           name: ceph-conf-emptydir
         - emptyDir: {}
           name: rook-config
       restartPolicy: Never
-Nico Schottelius
+</pre>
 Deleting the deployment:
 <pre>
 [18:05] bridge:~% kubectl -n rook-ceph delete deployment rook-ceph-osd-6
 deployment.apps "rook-ceph-osd-6" deleted
-Nico Schottelius
+</pre>
-Nico Schottelius
+h2. Ingress + Cert Manager
 * We deploy "nginx-ingress":https://docs.nginx.com/nginx-ingress-controller/ to get an ingress
 * we deploy "cert-manager":https://cert-manager.io/ to handle certificates
 * We independently deploy @ClusterIssuer@ to allow the cert-manager app to deploy and the issuer to be created once the CRDs from cert manager are in place
 h3. IPv4 reachability
 The ingress is by default IPv6 only. To make it reachable from the IPv4 world, get its IPv6 address and configure a NAT64 mapping in Jool.
 Steps:
 h4. Get the ingress IPv6 address
 Use @kubectl -n ingress-nginx get svc ingress-nginx-controller -o jsonpath='{.spec.clusterIP}'; echo ''@
 Example:
 <pre>
 kubectl -n ingress-nginx get svc ingress-nginx-controller -o jsonpath='{.spec.clusterIP}'; echo ''
 a0a:e5c0:10:1b::ce11
 </pre>
 h4. Add NAT64 mapping
 * Update the __dcl_jool_siit cdist type
 * Record the two IPs (IPv6 and IPv4)
 * Configure all routers
 h4. Add DNS record
 To use the ingress capable as a CNAME destination, create an "ingress" DNS record, such as:
 <pre>
 ; k8s ingress for dev
 dev-ingress                 AAAA 2a0a:e5c0:10:1b::ce11
 dev-ingress                 A 147.78.194.23
 </pre>
 h4. Add supporting wildcard DNS
 If you plan to add various sites under a specific domain, we can add a wildcard DNS entry, such as *.k8s-dev.django-hosting.ch:
 <pre>
 *.k8s-dev         CNAME dev-ingress.ungleich.ch.
 </pre>
-Nico Schottelius
+h2. Harbor
 * We user "Harbor":https://goharbor.io/ for caching and as an image registry. Internal app reference: apps/prod/harbor.
 * The admin password is in the password store, auto generated per cluster
 * At the moment harbor only authenticates against the internal ldap tree
 h3. LDAP configuration
 * The url needs to be ldaps://...
 * uid = uid
 * rest standard
 Nico Schottelius
-Nico Schottelius
+h2. Monitoring / Prometheus
-Nico Schottelius
+* Via "kube-prometheus":https://github.com/prometheus-operator/kube-prometheus/
 Nico Schottelius
-Nico Schottelius
+Access via ...
 * http://prometheus-k8s.monitoring.svc:9090
 * http://grafana.monitoring.svc:3000
 * http://alertmanager.monitoring.svc:9093
-Nico Schottelius
+h3. Prometheus Options
 * "helm/kube-prometheus-stack":https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack
 ** Includes dashboards and co.
 * "manifest based kube-prometheus":https://github.com/prometheus-operator/kube-prometheus
 ** Includes dashboards and co.
 * "Prometheus Operator (mainly CRD manifest":https://github.com/prometheus-operator/prometheus-operator
-Nico Schottelius
+h3. Grafana default password
 * If not changed: @prom-operator@
-Nico Schottelius
+h2. Nextcloud
-Nico Schottelius
+h3. How to get the nextcloud credentials
 Nico Schottelius
 * The initial username is set to "nextcloud"
 * The password is autogenerated and saved in a kubernetes secret
 <pre>
-Nico Schottelius
+kubectl get secret RELEASENAME-nextcloud -o jsonpath="{.data.PASSWORD}" | base64 -d; echo ""
-Nico Schottelius
+</pre>
-Nico Schottelius
+h3. How to fix "Access through untrusted domain"
-Nico Schottelius
+* Nextcloud stores the initial domain configuration
-Nico Schottelius
+* If the FQDN is changed, it will show the error message "Access through untrusted domain"
-Nico Schottelius
+* To fix, edit /var/www/html/config/config.php and correct the domain
-Nico Schottelius
+* Then delete the pods
 Nico Schottelius
 h3. Running occ commands inside the nextcloud container
 * Find the pod in the right namespace
 Exec:
 <pre>
 su www-data -s /bin/sh -c ./occ
 </pre>
 * -s /bin/sh is needed as the default shell is set to /bin/false
-Nico Schottelius
+h4. Rescanning files
 Nico Schottelius
-Nico Schottelius
+* If files have been added without nextcloud's knowledge
 <pre>
 su www-data -s /bin/sh -c "./occ files:scan --all"
 </pre>
 Nico Schottelius
-Nico Schottelius
+h2. Infrastructure versions
 Nico Schottelius
-Nico Schottelius
+h3. ungleich kubernetes infrastructure v5 (2021-10)
 Nico Schottelius
-Nico Schottelius
+Clusters are configured / setup in this order:
 * Bootstrap via kubeadm
-Nico Schottelius
+* "Networking via calico + BGP (non ECMP) using helm":https://docs.projectcalico.org/getting-started/kubernetes/helm
 * "ArgoCD for CD":https://argo-cd.readthedocs.io/en/stable/
 ** "rook for storage via argocd":https://rook.io/
-Nico Schottelius
+** haproxy for in IPv6-cluster-IPv4-to-IPv6 proxy via argocd
 ** "kubernetes-secret-generator for in cluster secrets":https://github.com/mittwald/kubernetes-secret-generator
 ** "ungleich-certbot managing certs and nginx":https://hub.docker.com/repository/docker/ungleich/ungleich-certbot
 Nico Schottelius
 h3. ungleich kubernetes infrastructure v4 (2021-09)
-Nico Schottelius
+* rook is configured via manifests instead of using the rook-ceph-cluster helm chart
-Nico Schottelius
+* The rook operator is still being installed via helm
 Nico Schottelius
-Nico Schottelius
+h3. ungleich kubernetes infrastructure v3 (2021-07)
 Nico Schottelius
-Nico Schottelius
+* rook is now installed via helm via argocd instead of directly via manifests
 Nico Schottelius
-Nico Schottelius
+h3. ungleich kubernetes infrastructure v2 (2021-05)
 Nico Schottelius
 * Replaced fluxv2 from ungleich k8s v1 with argocd
-Nico Schottelius
+** argocd can apply helm templates directly without needing to go through Chart releases
-Nico Schottelius
+* We are also using argoflow for build flows
 * Planned to add "kaniko":https://github.com/GoogleContainerTools/kaniko for image building
-Nico Schottelius
+h3. ungleich kubernetes infrastructure v1 (2021-01)
 Nico Schottelius
 We are using the following components:
 * "Calico as a CNI":https://www.projectcalico.org/ with BGP, IPv6 only, no encapsulation
 ** Needed for basic networking
 * "kubernetes-secret-generator":https://github.com/mittwald/kubernetes-secret-generator for creating secrets
 ** Needed so that secrets are not stored in the git repository, but only in the cluster
 * "ungleich-certbot":https://hub.docker.com/repository/docker/ungleich/ungleich-certbot
 ** Needed to get letsencrypt certificates for services
 * "rook with ceph rbd + cephfs":https://rook.io/ for storage
 ** rbd for almost everything, *ReadWriteOnce*
 ** cephfs for smaller things, multi access *ReadWriteMany*
 ** Needed for providing persistent storage
 * "flux v2":https://fluxcd.io/
 ** Needed to manage resources automatically

Project

General

Profile

Open Infrastructure

The ungleich kubernetes infrastructure » History » Version 174