Project

General

Profile

Task #9487

Migrate our ceph clusters into rook

Added by Nico Schottelius 3 months ago. Updated 3 months ago.

Status:
In Progress
Priority:
Normal
Target version:
-
Start date:
07/08/2021
Due date:
% Done:

0%

Estimated time:
PM Check date:

Description

Situation

  • At ungleich we are running multiple ceph cluster, usually one or more per location. Currently our ceph clusters run on Devuan (w/o systemd).
  • Alpine Linux, which we use in combination with Devuan, does not have proper ceph packages.
  • Ceph is moving away from packages in favor for containers managed by cephadm, which requires systemd on the host OS.
  • We are not inclined to switch to systemd, due to long term maintenance issues we have experienced
  • The alternative to cephadm is using ceph with https://rook.io
  • We have successfully tested Alpine Linux + crio or docker + rook on test clusters

Objectives

  • Have proper operational understanding of rook (vs. "plain ceph") by running multiple test clusters, upgrade, fail, destroy and rebuild them (handled in internal ticket #4608)
  • Develop a strategy to migrate our native Ceph clusters into rook without any downtime

High level Strategy per cluster

  • Create a k8s cluster
  • Adjust the rook settings to match the current cluster (fsid, monitors, networking, etc.)
  • Controlled phase in services in rook
  • Retire / replace native running services
  • Repeat for each cluster

Operational steps / detailed plan

(TBC)

Rook layout / purpose

  • Rook usually uses CSI/is responsible for k8s PV/PVC deployment
  • In the cases described in this ticket the primary consumer are external VMs
    • No PVCs need to be managed/created

(TBC)

Staged phase in

It might be possible to phase in rook as follows:

  • 1) additional managers (you can't have too many of them)
  • 2) remove outside managers (shutdown, delete monit jobs)
  • 3) additional monitors
    • Use an EVEN number of additional monitors (f.i. 2 or 4) to stay odd in total
  • 4) When stable: remove 1 outside monitor, raise number to odd number in rook (likely to 5)
  • 5) When stable: switch to rook only monitors, remove all outside monitors

At this point all monitors and managers are controlled by rook. Then we will migrate OSDs:

  • Add some new nodes to the cluster with additional storage, as a k8s worker
    • specify these nodes in rook for device discovery
    • Ensure device classes are properly set in a test cluster before
      • We use "hdd-big" for 3.5" HDD and "ssd" for ssds
        • Depending on the controller, the rotational kernel flag might not be exposed, we need to tune / plan ahead on how to tell rook which host serves which classes

If everything is good so far, we have additional storage nodes running inside rook. Finally, migrating the native nodes to rook:

  • Set cluster to noout, preventing rebalance
  • Shutdown/delete a native node (w/o purging the OSDs)
    • Verify disk format is compatible with rook -> we use plain disks, non LVM
      • If not compatible, maybe replace all OSDs with LVM based beforehand
        • Maybe 1-2 OSDs/day
  • Reboot with Alpine + k8s worker
    • rook should detect existing osds and re-add them
      • Need to test this behaviour on test clusters
  • Unset noout
  • Repeat for all native nodes

History

#1

Updated by Nico Schottelius 3 months ago

  • Description updated (diff)
#2

Updated by Nico Schottelius 3 months ago

  • Description updated (diff)
#3

Updated by Nico Schottelius 3 months ago

  • Description updated (diff)

Also available in: Atom PDF