Version 1 - History - The ungleich ceph handbook - Open Infrastructure - ungleich redmine

1

Nico Schottelius

h1. The ungleich ceph handbook

2

3

h2. Status

4

5

This document is **WORK IN PROGRESS**.

6

7

h2. Introduction

8

9

This article describes the ungleich storage architecture that is based on ceph. It describes our architecture as well maintenance commands. Required for

10

11

h2. Communication guide

12

13

Usually when disks fails no customer communication is necessary, as it is automatically compensated/rebalanced by ceph. However in case multiple disk failures happen at the same time, I/O speed might be reduced and thus customer experience impacted.

14

15

For this reason communicate whenever I/O recovery settings are temporarily tuned.

16

17

h2. Adding a new disk/ssd

18

19

h2. Moving a disk/ssd to another server

20

21

h2. Removing a disk/ssd

22

23

h2. Handling DOWN osds with filesystem errors

24

25

If an email arrives with the subject "monit alert -- Does not exist osd.XX-whoami", the filesystem of an OSD cannot be read anymore. It is very highly likely that the disk / ssd is broken. Steps that need to be done:

26

27

* Login to any ceph monitor (cephX.placeY.ungleich.ch)

28

* Check **ceph -s**, find host using **ceph osd tree**

29

* Login to the affected host

30

* Run the following commands:

31

** ls /var/lib/ceph/osd/ceph-XX

32

** dmesg

33

* Create a new ticket in the datacenter light project

34

** Subject: "Replace broken OSD.XX on serverX.placeY.ungleich.ch"

35

** Add (partial) output of above commands

36

** Use /opt/ungleich-tools/ceph-osd-stop-remove-permanently XX, where XX is the osd id, to remove the disk from the cluster

37

** Remove the physical disk from the host, checkout if there is warranty on it and if yes

38

*** Create a short letter to the vendor, including technical details a from above

39

*** Record when you sent it in

40

*** Put ticket into status waiting

41

** If there is no warranty, dispose it

42

43

44

45

h2. Change ceph speed for i/o recovery

46

47

By default we want to keep I/O recovery traffic low to not impact customer experience. However when multiple disks fail at the same point, we might want to prioritise recover for data safety over performance.

48

49

The default configuration on our servers contains:

50

51

<pre>

52

[osd]

53

osd max backfills = 1

54

osd recovery max active = 1

55

osd recovery op priority = 2

56

</pre>

57

58

The important settings are *osd max backfills* and *osd recovery max active*, the priority is always kept low so that regular I/O has priority.

59

60

To adjust the number of backfills *per osd* and to change the *number of threads* used for recovery, we can use on any node with the admin keyring:

61

62

<pre>

63

ceph tell osd.* injectargs '--osd-max-backfills Y'

64

ceph tell osd.* injectargs '--osd-recovery-max-active X'

65

</pre>

66

67

where Y and X are the values that we want to use. Experience shows that Y=5 and X=5 doubles to triples the recovery performance, whereas X=10 and Y=10 increases recovery performance 5 times.

68

69

h2. Debug scrub errors / inconsistent pg message

70

71

From time to time disks don't save what they are told to save. Ceph scrubbing detects these errors and switches to HEALTH_ERR. Use *ceph health detail* to find out which placement groups (*pgs*) are affected. Usually a *ceph pg repair <number> fixes the problem.

72

73

If this does not help, consult https://ceph.com/geen-categorie/ceph-manually-repair-object/.

Project

General

Profile

Open Infrastructure

The ungleich ceph handbook » History » Version 1