The ungleich kubernetes infrastructure » History » Revision 118
« Previous |
Revision 118/219
(diff)
| Next »
Nico Schottelius, 07/08/2022 11:20 AM
The ungleich kubernetes infrastructure and ungleich kubernetes manual¶
- Table of contents
- The ungleich kubernetes infrastructure and ungleich kubernetes manual
- Status
- k8s clusters
- General architecture and components overview
- General k8s operations
- Cheat sheet / external great references
- Allowing to schedule work on the control plane / removing node taints
- Get the cluster admin.conf
- Installing a new k8s cluster
- Deleting a pod that is hanging in terminating state
- Listing nodes of a cluster
- Removing / draining a node
- Readding a node after draining
- (Re-)joining worker nodes after creating the cluster
- (Re-)joining control plane nodes after creating the cluster
- How to fix etcd does not start when rejoining a kubernetes cluster as a control plane
- Hardware Maintenance using ungleich-hardware
- Triggering a cronjob / creating a job from a cronjob
- su-ing into a user that has nologin shell set
- How to print a secret value
- Calico CNI
- ArgoCD / ArgoWorkFlow
- Helm related operations and conventions
- Rook / Ceph Related Operations
- Harbor
- Monitoring / Prometheus
- Nextcloud
- Infrastructure versions
Status¶
This document is pre-production.
This document is to become the ungleich kubernetes infrastructure overview as well as the ungleich kubernetes manual.
k8s clusters¶
Cluster | Purpose/Setup | Maintainer | Master(s) | argo | v4 http proxy | last verified |
c0.k8s.ooo | Dev | - | UNUSED | 2021-10-05 | ||
c1.k8s.ooo | retired | - | 2022-03-15 | |||
c2.k8s.ooo | Dev p7 HW | Nico | server47 server53 server54 | argo | 2021-10-05 | |
c3.k8s.ooo | retired | - | - | 2021-10-05 | ||
c4.k8s.ooo | Dev2 p7 HW | Jin-Guk | server52 server53 server54 | - | ||
c5.k8s.ooo | retired | - | 2022-03-15 | |||
c6.k8s.ooo | Dev p6 VM Jin-Guk | Jin-Guk | ||||
p5.k8s.ooo | production | server34 server36 server38 | argo | - | ||
p6.k8s.ooo | production | server67 server69 server71 | argo | 147.78.194.13 | 2021-10-05 | |
p10.k8s.ooo | production | server63 server65 server83 | argo | 147.78.194.12 | 2021-10-05 | |
k8s.ge.nau.so | development | server107 server108 server109 | argo |
General architecture and components overview¶
- All k8s clusters are IPv6 only
- We use BGP peering to propagate podcidr and serviceCidr networks to our infrastructure
- The main public testing repository is ungleich-k8s
- Private configurations are found in the k8s-config repository
Cluster types¶
Type/Feature | Development | Production |
Min No. nodes | 3 (1 master, 3 worker) | 5 (3 master, 3 worker) |
Recommended minimum | 4 (dedicated master, 3 worker) | 8 (3 master, 5 worker) |
Separation of control plane | optional | recommended |
Persistent storage | required | required |
Number of storage monitors | 3 | 5 |
General k8s operations¶
Cheat sheet / external great references¶
Allowing to schedule work on the control plane / removing node taints¶
- Mostly for single node / test / development clusters
- Just remove the master taint as follows
kubectl taint nodes --all node-role.kubernetes.io/master- kubectl taint nodes --all node-role.kubernetes.io/control-plane-
You can check the node taints using kubectl describe node ...
Get the cluster admin.conf¶
- On the masters of each cluster you can find the file
/etc/kubernetes/admin.conf
- To be able to administrate the cluster you can copy the admin.conf to your local machine
- Multi cluster debugging can very easy if you name the config ~/cX-admin.conf (see example below)
% scp root@server47.place7.ungleich.ch:/etc/kubernetes/admin.conf ~/c2-admin.conf % export KUBECONFIG=~/c2-admin.conf % kubectl get nodes NAME STATUS ROLES AGE VERSION server47 Ready control-plane,master 82d v1.22.0 server48 Ready control-plane,master 82d v1.22.0 server49 Ready <none> 82d v1.22.0 server50 Ready <none> 82d v1.22.0 server59 Ready control-plane,master 82d v1.22.0 server60 Ready,SchedulingDisabled <none> 82d v1.22.0 server61 Ready <none> 82d v1.22.0 server62 Ready <none> 82d v1.22.0
Installing a new k8s cluster¶
- Decide on the cluster name (usually cX.k8s.ooo), X counting upwards
- Using pXX.k8s.ooo for production clusters of placeXX
- Use cdist to configure the nodes with requirements like crio
- Decide between single or multi node control plane setups (see below)
- Single control plane suitable for development clusters
Typical init procedure:
- Single control plane:
kubeadm init --config bootstrap/XXX/kubeadm.yaml
- Multi control plane (HA):
kubeadm init --config bootstrap/XXX/kubeadm.yaml --upload-certs
Deleting a pod that is hanging in terminating state¶
kubectl delete pod <PODNAME> --grace-period=0 --force --namespace <NAMESPACE>
(from https://stackoverflow.com/questions/35453792/pods-stuck-in-terminating-status)
Listing nodes of a cluster¶
[15:05] bridge:~% kubectl get nodes NAME STATUS ROLES AGE VERSION server22 Ready <none> 52d v1.22.0 server23 Ready <none> 52d v1.22.2 server24 Ready <none> 52d v1.22.0 server25 Ready <none> 52d v1.22.0 server26 Ready <none> 52d v1.22.0 server27 Ready <none> 52d v1.22.0 server63 Ready control-plane,master 52d v1.22.0 server64 Ready <none> 52d v1.22.0 server65 Ready control-plane,master 52d v1.22.0 server66 Ready <none> 52d v1.22.0 server83 Ready control-plane,master 52d v1.22.0 server84 Ready <none> 52d v1.22.0 server85 Ready <none> 52d v1.22.0 server86 Ready <none> 52d v1.22.0
Removing / draining a node¶
Usually kubectl drain server
should do the job, but sometimes we need to be more aggressive:
kubectl drain --delete-emptydir-data --ignore-daemonsets serverXX
Readding a node after draining¶
kubectl uncordon serverXX
(Re-)joining worker nodes after creating the cluster¶
- We need to have an up-to-date token
- We use different join commands for the workers and control plane nodes
Generating the join command on an existing control plane node:
kubeadm token create --print-join-command
(Re-)joining control plane nodes after creating the cluster¶
- We generate the token again
- We upload the certificates
- We need to combine/create the join command for the control plane node
Example session:
% kubeadm token create --print-join-command kubeadm join p10-api.k8s.ooo:6443 --token xmff4i.ABC --discovery-token-ca-cert-hash sha256:longhash % kubeadm init phase upload-certs --upload-certs [upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace [upload-certs] Using certificate key: CERTKEY # Then we use these two outputs on the joining node: kubeadm join p10-api.k8s.ooo:6443 --token xmff4i.ABC --discovery-token-ca-cert-hash sha256:longhash --control-plane --certificate-key CERTKEY
Commands to be used on a control plane node:
kubeadm token create --print-join-command kubeadm init phase upload-certs --upload-certs
Commands to be used on the joining node:
JOINCOMMAND --control-plane --certificate-key CERTKEY
SEE ALSO
- https://stackoverflow.com/questions/63936268/how-to-generate-kubeadm-token-for-secondary-control-plane-nodes
- https://blog.scottlowe.org/2019/08/15/reconstructing-the-join-command-for-kubeadm/
How to fix etcd does not start when rejoining a kubernetes cluster as a control plane¶
If during the above step etcd does not come up, kubeadm join
can hang as follows:
[control-plane] Creating static Pod manifest for "kube-apiserver" [control-plane] Creating static Pod manifest for "kube-controller-manager" [control-plane] Creating static Pod manifest for "kube-scheduler" [check-etcd] Checking that the etcd cluster is healthy error execution phase check-etcd: etcd cluster is not healthy: failed to dial endpoint https://[2a0a:e5c0:10:1:225:b3ff:fe20:37 8a]:2379 with maintenance client: context deadline exceeded To see the stack trace of this error execute with --v=5 or higher
Then the problem is likely that the etcd server is still a member of the cluster. We first need to remove it from the etcd cluster and then the join works.
To fix this we do:
- Find a working etcd pod
- Find the etcd members / member list
- Remove the etcd member that we want to re-join the cluster
# Find the etcd pods kubectl -n kube-system get pods -l component=etcd,tier=control-plane # Get the list of etcd servers with the member id kubectl exec -n kube-system -ti ETCDPODNAME -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member list # Remove the member kubectl exec -n kube-system -ti ETCDPODNAME -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member remove MEMBERID
Sample session:
[10:48] line:~% kubectl -n kube-system get pods -l component=etcd,tier=control-plane NAME READY STATUS RESTARTS AGE etcd-server63 1/1 Running 0 3m11s etcd-server65 1/1 Running 3 7d2h etcd-server83 1/1 Running 8 (6d ago) 7d2h [10:48] line:~% kubectl exec -n kube-system -ti etcd-server65 -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member list 356891cd676df6e4, started, server65, https://[2a0a:e5c0:10:1:225:b3ff:fe20:375c]:2380, https://[2a0a:e5c0:10:1:225:b3ff:fe20:375c]:2379, false 371b8a07185dee7e, started, server63, https://[2a0a:e5c0:10:1:225:b3ff:fe20:378a]:2380, https://[2a0a:e5c0:10:1:225:b3ff:fe20:378a]:2379, false 5942bc58307f8af9, started, server83, https://[2a0a:e5c0:10:1:3e4a:92ff:fe79:bb98]:2380, https://[2a0a:e5c0:10:1:3e4a:92ff:fe79:bb98]:2379, false [10:48] line:~% kubectl exec -n kube-system -ti etcd-server65 -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member remove 371b8a07185dee7e Member 371b8a07185dee7e removed from cluster e3c0805f592a8f77
SEE ALSO
- We found the solution using https://stackoverflow.com/questions/67921552/re-installed-node-cannot-join-kubernetes-cluster
Hardware Maintenance using ungleich-hardware¶
Use the following manifest and replace the HOST with the actual host:
apiVersion: v1 kind: Pod metadata: name: ungleich-hardware-HOST spec: containers: - name: ungleich-hardware image: ungleich/ungleich-hardware:0.0.5 args: - sleep - "1000000" volumeMounts: - mountPath: /dev name: dev securityContext: privileged: true nodeSelector: kubernetes.io/hostname: "HOST" volumes: - name: dev hostPath: path: /dev
Also see: The_ungleich_hardware_maintenance_guide
Triggering a cronjob / creating a job from a cronjob¶
To test a cronjob, we can create a job from a cronjob:
kubectl create job --from=cronjob/volume2-daily-backup volume2-manual
This creates a job volume2-manual based on the cronjob volume2-daily
su-ing into a user that has nologin shell set¶
Many times users are having nologin as their shell inside the container. To be able to execute maintenance commands within the
container, we can use su -s /bin/sh
like this:
su -s /bin/sh -c '/path/to/your/script' testuser
How to print a secret value¶
Assuming you want the "password" item from a secret, use:
kubectl get secret SECRETNAME -o jsonpath="{.data.password}" | base64 -d; echo ""
Calico CNI¶
Calico Installation¶
- We install calico using helm
- This has the following advantages:
- Easy to upgrade
- Does not require os to configure IPv6/dual stack settings as the tigera operator figures out things on its own
Usually plain calico can be installed directly using:
helm repo add projectcalico https://docs.projectcalico.org/charts helm install --namespace tigera calico projectcalico/tigera-operator --version v3.23.2 --create-namespace
- Check the tags on https://github.com/projectcalico/calico/tags for the latest release
Installing calicoctl¶
- General installation instructions, including binary download: https://projectcalico.docs.tigera.io/maintenance/clis/calicoctl/install
To be able to manage and configure calico, we need to
install calicoctl
kubectl apply -f https://docs.projectcalico.org/manifests/calicoctl.yaml
Or version specific:
kubectl apply -f https://github.com/projectcalico/calico/blob/v3.20.4/manifests/calicoctl.yaml # For 3.22 kubectl apply -f https://projectcalico.docs.tigera.io/archive/v3.22/manifests/calicoctl.yaml
And making it easier accessible by alias:
alias calicoctl="kubectl exec -i -n kube-system calicoctl -- /calicoctl"
Calico configuration¶
By default our k8s clusters BGP peer
with an upstream router to propagate podcidr and servicecidr.
Default settings in our infrastructure:
- We use a full-mesh using the
nodeToNodeMeshEnabled: true
option - We keep the original next hop so that only the server with the pod is announcing it (instead of ecmp)
- We use private ASNs for k8s clusters
- We do not use any overlay
After installing calico and calicoctl the last step of the installation is usually:
calicoctl create -f - < calico-bgp.yaml
A sample BGP configuration:
--- apiVersion: projectcalico.org/v3 kind: BGPConfiguration metadata: name: default spec: logSeverityScreen: Info nodeToNodeMeshEnabled: true asNumber: 65534 serviceClusterIPs: - cidr: 2a0a:e5c0:10:3::/108 serviceExternalIPs: - cidr: 2a0a:e5c0:10:3::/108 --- apiVersion: projectcalico.org/v3 kind: BGPPeer metadata: name: router1-place10 spec: peerIP: 2a0a:e5c0:10:1::50 asNumber: 213081 keepOriginalNextHop: true
ArgoCD / ArgoWorkFlow¶
Argocd Installation¶
As there is no configuration management present yet, argocd is installed using
kubectl create namespace argocd # Specific Version kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/v2.3.2/manifests/install.yaml # OR: latest stable kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml
Get the argocd credentials¶
kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d; echo ""
Accessing argocd¶
In regular IPv6 clusters:
- Navigate to https://argocd-server.argocd.CLUSTERDOMAIN
In legacy IPv4 clusters
kubectl --namespace argocd port-forward svc/argocd-server 8080:80
- Navigate to https://localhost:8080
Using the argocd webhook to trigger changes¶
- To trigger changes post json https://argocd.example.com/api/webhook
Deploying an application¶
- Applications are deployed via git towards gitea (code.ungleich.ch) and then pulled by argo
- Always include the redmine-url pointing to the (customer) ticket
- Also add the support-url if it exists
Application sample
apiVersion: argoproj.io/v1alpha1 kind: Application metadata: name: gitea-CUSTOMER namespace: argocd spec: destination: namespace: default server: 'https://kubernetes.default.svc' source: path: apps/prod/gitea repoURL: 'https://code.ungleich.ch/ungleich-intern/k8s-config.git' targetRevision: HEAD helm: parameters: - name: storage.data.storageClass value: rook-ceph-block-hdd - name: storage.data.size value: 200Gi - name: storage.db.storageClass value: rook-ceph-block-ssd - name: storage.db.size value: 10Gi - name: storage.letsencrypt.storageClass value: rook-ceph-block-hdd - name: storage.letsencrypt.size value: 50Mi - name: letsencryptStaging value: 'no' - name: fqdn value: 'code.verua.online' project: default syncPolicy: automated: prune: true selfHeal: true info: - name: 'redmine-url' value: 'https://redmine.ungleich.ch/issues/ISSUEID' - name: 'support-url' value: 'https://support.ungleich.ch/Ticket/Display.html?id=TICKETID'
Helm related operations and conventions¶
We use helm charts extensively.
- In production, they are managed via argocd
- In development, helm chart can de developed and deployed manually using the helm utility.
Installing a helm chart¶
One can use the usual pattern of
helm install <releasename> <chartdirectory>
However often you want to reinstall/update when testing helm charts. The following pattern is "better", because it allows you to reinstall, if it is already installed:
helm upgrade --install <releasename> <chartdirectory>
Naming services and deployments in helm charts [Application labels]¶
- We always have {{ .Release.Name }} to identify the current "instance"
- Deployments:
- use
app: <what it is>
, f.i.app: nginx
,app: postgres
, ...
- use
- See more about standard labels on
Rook / Ceph Related Operations¶
Executing ceph commands¶
Using the ceph-tools pod as follows:
kubectl exec -n rook-ceph -ti $(kubectl -n rook-ceph get pods -l app=rook-ceph-tools -o jsonpath='{.items[*].metadata.name}') -- ceph -s
Inspecting the logs of a specific server¶
# Get the related pods kubectl -n rook-ceph get pods -l app=rook-ceph-osd-prepare ... # Inspect the logs of a specific pod kubectl -n rook-ceph logs -f rook-ceph-osd-prepare-server23--1-444qx
Inspecting the logs of the rook-ceph-operator¶
kubectl -n rook-ceph logs -f -l app=rook-ceph-operator
Triggering server prepare / adding new osds¶
The rook-ceph-operator triggers/watches/creates pods to maintain hosts. To trigger a full "re scan", simply delete that pod:
kubectl -n rook-ceph delete pods -l app=rook-ceph-operator
This will cause all the rook-ceph-osd-prepare-..
jobs to be recreated and thus OSDs to be created, if new disks have been added.
Removing an OSD¶
- See Ceph OSD Management
- More specifically: https://github.com/rook/rook/blob/release-1.7/cluster/examples/kubernetes/ceph/osd-purge.yaml
- Then delete the related deployment
Set osd id in the osd-purge.yaml and apply it. OSD should be down before.
apiVersion: batch/v1 kind: Job metadata: name: rook-ceph-purge-osd namespace: rook-ceph # namespace:cluster labels: app: rook-ceph-purge-osd spec: template: metadata: labels: app: rook-ceph-purge-osd spec: serviceAccountName: rook-ceph-purge-osd containers: - name: osd-removal image: rook/ceph:master # TODO: Insert the OSD ID in the last parameter that is to be removed # The OSD IDs are a comma-separated list. For example: "0" or "0,2". # If you want to preserve the OSD PVCs, set `--preserve-pvc true`. # # A --force-osd-removal option is available if the OSD should be destroyed even though the # removal could lead to data loss. args: - "ceph" - "osd" - "remove" - "--preserve-pvc" - "false" - "--force-osd-removal" - "false" - "--osd-ids" - "SETTHEOSDIDHERE" env: - name: POD_NAMESPACE valueFrom: fieldRef: fieldPath: metadata.namespace - name: ROOK_MON_ENDPOINTS valueFrom: configMapKeyRef: key: data name: rook-ceph-mon-endpoints - name: ROOK_CEPH_USERNAME valueFrom: secretKeyRef: key: ceph-username name: rook-ceph-mon - name: ROOK_CEPH_SECRET valueFrom: secretKeyRef: key: ceph-secret name: rook-ceph-mon - name: ROOK_CONFIG_DIR value: /var/lib/rook - name: ROOK_CEPH_CONFIG_OVERRIDE value: /etc/rook/config/override.conf - name: ROOK_FSID valueFrom: secretKeyRef: key: fsid name: rook-ceph-mon - name: ROOK_LOG_LEVEL value: DEBUG volumeMounts: - mountPath: /etc/ceph name: ceph-conf-emptydir - mountPath: /var/lib/rook name: rook-config volumes: - emptyDir: {} name: ceph-conf-emptydir - emptyDir: {} name: rook-config restartPolicy: Never
Deleting the deployment:
[18:05] bridge:~% kubectl -n rook-ceph delete deployment rook-ceph-osd-6 deployment.apps "rook-ceph-osd-6" deleted
Harbor¶
- We user Harbor for caching and as an image registry. Internal app reference: apps/prod/harbor.
- The admin password is in the password store, auto generated per cluster
- At the moment harbor only authenticates against the internal ldap tree
LDAP configuration¶
- The url needs to be ldaps://...
- uid = uid
- rest standard
Monitoring / Prometheus¶
- Via kube-prometheus
Access via ...
- http://prometheus-k8s.monitoring.svc:9090
- http://grafana.monitoring.svc:3000
- http://alertmanager.monitoring.svc:9093
Prometheus Options¶
- helm/kube-prometheus-stack
- Includes dashboards and co.
- manifest based kube-prometheus
- Includes dashboards and co.
- Prometheus Operator (mainly CRD manifest
Nextcloud¶
How to get the nextcloud credentials¶
- The initial username is set to "nextcloud"
- The password is autogenerated and saved in a kubernetes secret
kubectl get secret RELEASENAME-nextcloud -o jsonpath="{.data.PASSWORD}" | base64 -d; echo ""
How to fix "Access through untrusted domain"¶
- Nextcloud stores the initial domain configuration
- If the FQDN is changed, it will show the error message "Access through untrusted domain"
- To fix, edit /var/www/html/config/config.php and correct the domain
- Then delete the pods
Infrastructure versions¶
ungleich kubernetes infrastructure v5 (2021-10)¶
Clusters are configured / setup in this order:
- Bootstrap via kubeadm
- Networking via calico + BGP (non ECMP) using helm
- ArgoCD for CD
- rook for storage via argocd
- haproxy for in IPv6-cluster-IPv4-to-IPv6 proxy via argocd
- kubernetes-secret-generator for in cluster secrets
- ungleich-certbot managing certs and nginx
ungleich kubernetes infrastructure v4 (2021-09)¶
- rook is configured via manifests instead of using the rook-ceph-cluster helm chart
- The rook operator is still being installed via helm
ungleich kubernetes infrastructure v3 (2021-07)¶
- rook is now installed via helm via argocd instead of directly via manifests
ungleich kubernetes infrastructure v2 (2021-05)¶
- Replaced fluxv2 from ungleich k8s v1 with argocd
- argocd can apply helm templates directly without needing to go through Chart releases
- We are also using argoflow for build flows
- Planned to add kaniko for image building
ungleich kubernetes infrastructure v1 (2021-01)¶
We are using the following components:
- Calico as a CNI with BGP, IPv6 only, no encapsulation
- Needed for basic networking
- kubernetes-secret-generator for creating secrets
- Needed so that secrets are not stored in the git repository, but only in the cluster
- ungleich-certbot
- Needed to get letsencrypt certificates for services
- rook with ceph rbd + cephfs for storage
- rbd for almost everything, ReadWriteOnce
- cephfs for smaller things, multi access ReadWriteMany
- Needed for providing persistent storage
- flux v2
- Needed to manage resources automatically
Updated by Nico Schottelius over 2 years ago · 118 revisions