Project

General

Profile

Actions

The ungleich kubernetes infrastructure » History » Revision 102

« Previous | Revision 102/219 (diff) | Next »
Nico Schottelius, 05/21/2022 06:47 PM


The ungleich kubernetes infrastructure and ungleich kubernetes manual

Status

This document is pre-production.
This document is to become the ungleich kubernetes infrastructure overview as well as the ungleich kubernetes manual.

k8s clusters

Cluster Purpose/Setup Maintainer Master(s) argo v4 http proxy last verified
c0.k8s.ooo Dev - UNUSED 2021-10-05
c1.k8s.ooo retired - 2022-03-15
c2.k8s.ooo Dev p7 HW Nico server47 server53 server54 argo 2021-10-05
c3.k8s.ooo retired - - 2021-10-05
c4.k8s.ooo Dev2 p7 HW Jin-Guk server52 server53 server54 -
c5.k8s.ooo retired - 2022-03-15
c6.k8s.ooo Dev p6 VM Jin-Guk Jin-Guk
p5.k8s.ooo production server34 server36 server38 argo -
p6.k8s.ooo production server67 server69 server71 argo 147.78.194.13 2021-10-05
p10.k8s.ooo production server63 server65 server83 argo 147.78.194.12 2021-10-05
fnnf development Nico server75

General architecture and components overview

  • All k8s clusters are IPv6 only
  • We use BGP peering to propagate podcidr and serviceCidr networks to our infrastructure
  • The main public testing repository is ungleich-k8s
    • Private configurations are found in the k8s-config repository

Cluster types

Type/Feature Development Production
Min No. nodes 3 (1 master, 3 worker) 5 (3 master, 3 worker)
Recommended minimum 4 (dedicated master, 3 worker) 8 (3 master, 5 worker)
Separation of control plane optional recommended
Persistent storage required required
Number of storage monitors 3 5

General k8s operations

Cheat sheet / external great references

Allowing to schedule work on the control plane

  • Mostly for single node / test / development clusters
  • Just remove the master taint as follows
kubectl taint nodes --all node-role.kubernetes.io/master-

Get the cluster admin.conf

  • On the masters of each cluster you can find the file /etc/kubernetes/admin.conf
  • To be able to administrate the cluster you can copy the admin.conf to your local machine
  • Multi cluster debugging can very easy if you name the config ~/cX-admin.conf (see example below)
% scp root@server47.place7.ungleich.ch:/etc/kubernetes/admin.conf ~/c2-admin.conf
% export KUBECONFIG=~/c2-admin.conf    
% kubectl get nodes
NAME       STATUS                     ROLES                  AGE   VERSION
server47   Ready                      control-plane,master   82d   v1.22.0
server48   Ready                      control-plane,master   82d   v1.22.0
server49   Ready                      <none>                 82d   v1.22.0
server50   Ready                      <none>                 82d   v1.22.0
server59   Ready                      control-plane,master   82d   v1.22.0
server60   Ready,SchedulingDisabled   <none>                 82d   v1.22.0
server61   Ready                      <none>                 82d   v1.22.0
server62   Ready                      <none>                 82d   v1.22.0               

Installing a new k8s cluster

  • Decide on the cluster name (usually cX.k8s.ooo), X counting upwards
    • Using pXX.k8s.ooo for production clusters of placeXX
  • Use cdist to configure the nodes with requirements like crio
  • Decide between single or multi node control plane setups (see below)
    • Single control plane suitable for development clusters

Typical init procedure:

  • Single control plane: kubeadm init --config bootstrap/XXX/kubeadm.yaml
  • Multi control plane (HA): kubeadm init --config bootstrap/XXX/kubeadm.yaml --upload-certs

Deleting a pod that is hanging in terminating state

kubectl delete pod <PODNAME> --grace-period=0 --force --namespace <NAMESPACE>

(from https://stackoverflow.com/questions/35453792/pods-stuck-in-terminating-status)

Listing nodes of a cluster

[15:05] bridge:~% kubectl get nodes
NAME       STATUS   ROLES                  AGE   VERSION
server22   Ready    <none>                 52d   v1.22.0
server23   Ready    <none>                 52d   v1.22.2
server24   Ready    <none>                 52d   v1.22.0
server25   Ready    <none>                 52d   v1.22.0
server26   Ready    <none>                 52d   v1.22.0
server27   Ready    <none>                 52d   v1.22.0
server63   Ready    control-plane,master   52d   v1.22.0
server64   Ready    <none>                 52d   v1.22.0
server65   Ready    control-plane,master   52d   v1.22.0
server66   Ready    <none>                 52d   v1.22.0
server83   Ready    control-plane,master   52d   v1.22.0
server84   Ready    <none>                 52d   v1.22.0
server85   Ready    <none>                 52d   v1.22.0
server86   Ready    <none>                 52d   v1.22.0

Removing / draining a node

Usually kubectl drain server should do the job, but sometimes we need to be more aggressive:

kubectl drain --delete-emptydir-data --ignore-daemonsets server23

Readding a node after draining

kubectl uncordon serverXX

(Re-)joining worker nodes after creating the cluster

  • We need to have an up-to-date token
  • We use different join commands for the workers and control plane nodes

Generating the join command on an existing control plane node:

kubeadm token create --print-join-command

(Re-)joining control plane nodes after creating the cluster

  • We generate the token again
  • We upload the certificates
  • We need to combine/create the join command for the control plane node

Example session:

% kubeadm token create --print-join-command
kubeadm join p10-api.k8s.ooo:6443 --token xmff4i.ABC --discovery-token-ca-cert-hash sha256:longhash 

% kubeadm init phase upload-certs --upload-certs
[upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
[upload-certs] Using certificate key:
CERTKEY

# Then we use these two outputs on the joining node:

kubeadm join p10-api.k8s.ooo:6443 --token xmff4i.ABC --discovery-token-ca-cert-hash sha256:longhash --control-plane --certificate-key CERTKEY

Commands to be used on a control plane node:

kubeadm token create --print-join-command
kubeadm init phase upload-certs --upload-certs

Commands to be used on the joining node:

JOINCOMMAND --control-plane --certificate-key CERTKEY

SEE ALSO

How to fix etcd does not start when rejoining a kubernetes cluster as a control plane

If during the above step etcd does not come up, kubeadm join can hang as follows:

[control-plane] Creating static Pod manifest for "kube-apiserver"                                                              
[control-plane] Creating static Pod manifest for "kube-controller-manager"                                                     
[control-plane] Creating static Pod manifest for "kube-scheduler"                                                              
[check-etcd] Checking that the etcd cluster is healthy                                                                         
error execution phase check-etcd: etcd cluster is not healthy: failed to dial endpoint https://[2a0a:e5c0:10:1:225:b3ff:fe20:37
8a]:2379 with maintenance client: context deadline exceeded                                                                    
To see the stack trace of this error execute with --v=5 or higher         

Then the problem is likely that the etcd server is still a member of the cluster. We first need to remove it from the etcd cluster and then the join works.

To fix this we do:

  • Find a working etcd pod
  • Find the etcd members / member list
  • Remove the etcd member that we want to re-join the cluster
# Find the etcd pods
kubectl -n kube-system get pods -l component=etcd,tier=control-plane

# Get the list of etcd servers with the member id 
kubectl exec -n kube-system -ti ETCDPODNAME -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert  /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member list

# Remove the member
kubectl exec -n kube-system -ti ETCDPODNAME -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert  /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member remove MEMBERID

Sample session:

[10:48] line:~% kubectl -n kube-system get pods -l component=etcd,tier=control-plane
NAME            READY   STATUS    RESTARTS     AGE
etcd-server63   1/1     Running   0            3m11s
etcd-server65   1/1     Running   3            7d2h
etcd-server83   1/1     Running   8 (6d ago)   7d2h
[10:48] line:~% kubectl exec -n kube-system -ti etcd-server65 -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert  /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member list
356891cd676df6e4, started, server65, https://[2a0a:e5c0:10:1:225:b3ff:fe20:375c]:2380, https://[2a0a:e5c0:10:1:225:b3ff:fe20:375c]:2379, false
371b8a07185dee7e, started, server63, https://[2a0a:e5c0:10:1:225:b3ff:fe20:378a]:2380, https://[2a0a:e5c0:10:1:225:b3ff:fe20:378a]:2379, false
5942bc58307f8af9, started, server83, https://[2a0a:e5c0:10:1:3e4a:92ff:fe79:bb98]:2380, https://[2a0a:e5c0:10:1:3e4a:92ff:fe79:bb98]:2379, false

[10:48] line:~% kubectl exec -n kube-system -ti etcd-server65 -- etcdctl --endpoints '[::1]:2379' --cacert /etc/kubernetes/pki/etcd/ca.crt --cert  /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key member remove 371b8a07185dee7e
Member 371b8a07185dee7e removed from cluster e3c0805f592a8f77

SEE ALSO

Hardware Maintenance using ungleich-hardware

Use the following manifest and replace the HOST with the actual host:

apiVersion: v1
kind: Pod
metadata:
  name: ungleich-hardware-HOST
spec:
  containers:
  - name: ungleich-hardware
    image: ungleich/ungleich-hardware:0.0.5
    args:
    - sleep
    - "1000000" 
    volumeMounts:
      - mountPath: /dev
        name: dev
    securityContext:
      privileged: true
  nodeSelector:
    kubernetes.io/hostname: "HOST" 

  volumes:
    - name: dev
      hostPath:
        path: /dev

Also see: The_ungleich_hardware_maintenance_guide

Calico CNI

Calico Installation

  • We install calico using helm
  • This has the following advantages:
    • Easy to upgrade
    • Does not require os to configure IPv6/dual stack settings as the tigera operator figures out things on its own

Usually plain calico can be installed directly using:

helm repo add projectcalico https://docs.projectcalico.org/charts
helm install calico projectcalico/tigera-operator --version v3.20.4

Installing calicoctl

To be able to manage and configure calico, we need to
install calicoctl

kubectl apply -f https://docs.projectcalico.org/manifests/calicoctl.yaml

Or version specific:

kubectl apply -f https://github.com/projectcalico/calico/blob/v3.20.4/manifests/calicoctl.yaml

# For 3.22
kubectl apply -f https://projectcalico.docs.tigera.io/archive/v3.22/manifests/calicoctl.yaml

And making it easier accessible by alias:

alias calicoctl="kubectl exec -i -n kube-system calicoctl -- /calicoctl" 

Calico configuration

By default our k8s clusters BGP peer
with an upstream router to propagate podcidr and servicecidr.

Default settings in our infrastructure:

  • We use a full-mesh using the nodeToNodeMeshEnabled: true option
  • We keep the original next hop so that only the server with the pod is announcing it (instead of ecmp)
  • We use private ASNs for k8s clusters
  • We do not use any overlay

After installing calico and calicoctl the last step of the installation is usually:

calicoctl create -f - < calico-bgp.yaml

A sample BGP configuration:

---
apiVersion: projectcalico.org/v3
kind: BGPConfiguration
metadata:
  name: default
spec:
  logSeverityScreen: Info
  nodeToNodeMeshEnabled: true
  asNumber: 65534
  serviceClusterIPs:
  - cidr: 2a0a:e5c0:10:3::/108
  serviceExternalIPs:
  - cidr: 2a0a:e5c0:10:3::/108
---
apiVersion: projectcalico.org/v3
kind: BGPPeer
metadata:
  name: router1-place10
spec:
  peerIP: 2a0a:e5c0:10:1::50
  asNumber: 213081
  keepOriginalNextHop: true

ArgoCD / ArgoWorkFlow

Argocd Installation

As there is no configuration management present yet, argocd is installed using

kubectl create namespace argocd

# Specific Version
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/v2.3.2/manifests/install.yaml

# OR: latest stable
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml

Get the argocd credentials

kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d; echo "" 

Accessing argocd

In regular IPv6 clusters:

In legacy IPv4 clusters

kubectl --namespace argocd port-forward svc/argocd-server 8080:80

Using the argocd webhook to trigger changes

Deploying an application

  • Applications are deployed via git towards gitea (code.ungleich.ch) and then pulled by argo
  • Always include the redmine-url pointing to the (customer) ticket
    • Also add the support-url if it exists

Application sample

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: gitea-CUSTOMER
  namespace: argocd
spec:
  destination:
    namespace: default
    server: 'https://kubernetes.default.svc'
  source:
    path: apps/prod/gitea
    repoURL: 'https://code.ungleich.ch/ungleich-intern/k8s-config.git'
    targetRevision: HEAD
    helm:
      parameters:
        - name: storage.data.storageClass
          value: rook-ceph-block-hdd
        - name: storage.data.size
          value: 200Gi
        - name: storage.db.storageClass
          value: rook-ceph-block-ssd
        - name: storage.db.size
          value: 10Gi
        - name: storage.letsencrypt.storageClass
          value: rook-ceph-block-hdd
        - name: storage.letsencrypt.size
          value: 50Mi
        - name: letsencryptStaging
          value: 'no'
        - name: fqdn
          value: 'code.verua.online'
  project: default
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
  info:
    - name: 'redmine-url'
      value: 'https://redmine.ungleich.ch/issues/ISSUEID'
    - name: 'support-url'
      value: 'https://support.ungleich.ch/Ticket/Display.html?id=TICKETID'

Helm related operations and conventions

We use helm charts extensively.

  • In production, they are managed via argocd
  • In development, helm chart can de developed and deployed manually using the helm utility.

Installing a helm chart

One can use the usual pattern of

helm install <releasename> <chartdirectory>

However often you want to reinstall/update when testing helm charts. The following pattern is "better", because it allows you to reinstall, if it is already installed:

helm upgrade --install <releasename> <chartdirectory>

Naming services and deployments in helm charts [Application labels]

Rook / Ceph Related Operations

Executing ceph commands

Using the ceph-tools pod as follows:

kubectl exec -n rook-ceph -ti $(kubectl -n rook-ceph get pods -l app=rook-ceph-tools -o jsonpath='{.items[*].metadata.name}') -- ceph -s

Inspecting the logs of a specific server

# Get the related pods
kubectl -n rook-ceph get pods -l app=rook-ceph-osd-prepare 
...

# Inspect the logs of a specific pod
kubectl -n rook-ceph logs -f rook-ceph-osd-prepare-server23--1-444qx

Inspecting the logs of the rook-ceph-operator

kubectl -n rook-ceph logs -f -l app=rook-ceph-operator

Triggering server prepare / adding new osds

The rook-ceph-operator triggers/watches/creates pods to maintain hosts. To trigger a full "re scan", simply delete that pod:

kubectl -n rook-ceph delete pods -l app=rook-ceph-operator

This will cause all the rook-ceph-osd-prepare-.. jobs to be recreated and thus OSDs to be created, if new disks have been added.

Removing an OSD

Set osd id in the osd-purge.yaml and apply it. OSD should be down before.

apiVersion: batch/v1
kind: Job
metadata:
  name: rook-ceph-purge-osd
  namespace: rook-ceph # namespace:cluster
  labels:
    app: rook-ceph-purge-osd
spec:
  template:
    metadata:
      labels:
        app: rook-ceph-purge-osd
    spec:
      serviceAccountName: rook-ceph-purge-osd
      containers:
        - name: osd-removal
          image: rook/ceph:master
          # TODO: Insert the OSD ID in the last parameter that is to be removed
          # The OSD IDs are a comma-separated list. For example: "0" or "0,2".
          # If you want to preserve the OSD PVCs, set `--preserve-pvc true`.
          #
          # A --force-osd-removal option is available if the OSD should be destroyed even though the
          # removal could lead to data loss.
          args:
            - "ceph" 
            - "osd" 
            - "remove" 
            - "--preserve-pvc" 
            - "false" 
            - "--force-osd-removal" 
            - "false" 
            - "--osd-ids" 
            - "SETTHEOSDIDHERE" 
          env:
            - name: POD_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
            - name: ROOK_MON_ENDPOINTS
              valueFrom:
                configMapKeyRef:
                  key: data
                  name: rook-ceph-mon-endpoints
            - name: ROOK_CEPH_USERNAME
              valueFrom:
                secretKeyRef:
                  key: ceph-username
                  name: rook-ceph-mon
            - name: ROOK_CEPH_SECRET
              valueFrom:
                secretKeyRef:
                  key: ceph-secret
                  name: rook-ceph-mon
            - name: ROOK_CONFIG_DIR
              value: /var/lib/rook
            - name: ROOK_CEPH_CONFIG_OVERRIDE
              value: /etc/rook/config/override.conf
            - name: ROOK_FSID
              valueFrom:
                secretKeyRef:
                  key: fsid
                  name: rook-ceph-mon
            - name: ROOK_LOG_LEVEL
              value: DEBUG
          volumeMounts:
            - mountPath: /etc/ceph
              name: ceph-conf-emptydir
            - mountPath: /var/lib/rook
              name: rook-config
      volumes:
        - emptyDir: {}
          name: ceph-conf-emptydir
        - emptyDir: {}
          name: rook-config
      restartPolicy: Never

Deleting the deployment:

[18:05] bridge:~% kubectl -n rook-ceph delete deployment rook-ceph-osd-6
deployment.apps "rook-ceph-osd-6" deleted

Harbor

  • We user Harbor for caching and as an image registry. Internal app reference: apps/prod/harbor.
  • The admin password is in the password store, auto generated per cluster
  • At the moment harbor only authenticates against the internal ldap tree

LDAP configuration

  • The url needs to be ldaps://...
  • uid = uid
  • rest standard

Monitoring / Prometheus

Access via ...

Prometheus Options

Nextcloud

How to get the nextcloud credentials

  • The initial username is set to "nextcloud"
  • The password is autogenerated and saved in a kubernetes secret
kubectl get secret RELEASENAME-nextcloud -o jsonpath="{.data.PASSWORD}" | base64 -d; echo "" 

How to fix "Access through untrusted domain"

  • Nextcloud stores the initial domain configuration
  • If the FQDN is changed, it will show the error message "Access through untrusted domain"
  • To fix, edit /var/www/html/config/config.php and correct the domain
  • Then delete the pods

Infrastructure versions

ungleich kubernetes infrastructure v5 (2021-10)

Clusters are configured / setup in this order:

ungleich kubernetes infrastructure v4 (2021-09)

  • rook is configured via manifests instead of using the rook-ceph-cluster helm chart
  • The rook operator is still being installed via helm

ungleich kubernetes infrastructure v3 (2021-07)

  • rook is now installed via helm via argocd instead of directly via manifests

ungleich kubernetes infrastructure v2 (2021-05)

  • Replaced fluxv2 from ungleich k8s v1 with argocd
    • argocd can apply helm templates directly without needing to go through Chart releases
  • We are also using argoflow for build flows
  • Planned to add kaniko for image building

ungleich kubernetes infrastructure v1 (2021-01)

We are using the following components:

  • Calico as a CNI with BGP, IPv6 only, no encapsulation
    • Needed for basic networking
  • kubernetes-secret-generator for creating secrets
    • Needed so that secrets are not stored in the git repository, but only in the cluster
  • ungleich-certbot
    • Needed to get letsencrypt certificates for services
  • rook with ceph rbd + cephfs for storage
    • rbd for almost everything, ReadWriteOnce
    • cephfs for smaller things, multi access ReadWriteMany
    • Needed for providing persistent storage
  • flux v2
    • Needed to manage resources automatically

Updated by Nico Schottelius over 2 years ago · 102 revisions