Project

General

Profile

The ungleich hardware maintenance guide » History » Version 17

Nico Schottelius, 02/12/2022 01:17 PM

1 2 Nico Schottelius
{{toc}}
2
3 1 Nico Schottelius
h1. The ungleich hardware maintenance guide
4
5
This guide describes common operations on hardware we use.
6
7 13 Nico Schottelius
h2. Using the ungleich-hardware container in kubernetes and docker
8 12 Nico Schottelius
9 13 Nico Schottelius
To manage hardware on server1 in kubernetes, you can use:
10 12 Nico Schottelius
11
<pre>
12
apiVersion: v1
13
kind: Pod
14
metadata:
15
  name: ungleich-hardware
16
spec:
17
  containers:
18
  - name: ungleich-hardware
19
    image: harbor.ungleich.svc.p10.k8s.ooo/ungleich-public/ungleich-hardware:0.0.3
20
    args:
21
    - sleep
22
    - "1000000"
23
    volumeMounts:
24
      - mountPath: /dev
25
        name: dev
26
    securityContext:
27
      privileged: true
28
  nodeSelector:
29
    kubernetes.io/hostname: "server1"
30
31
  volumes:
32
    - name: dev
33
      hostPath:
34 1 Nico Schottelius
        path: /dev
35
36 13 Nico Schottelius
</pre>
37
38
To use it wit docker:
39
40
<pre>
41 14 Nico Schottelius
docker run -v /dev:/dev --privileged -ti harbor.ungleich.svc.p10.k8s.ooo/ungleich-public/ungleich-hardware:0.0.3
42 12 Nico Schottelius
</pre>
43
44 1 Nico Schottelius
h2. APU Bios Update
45
46
* Download the correct bios from https://pcengines.github.io/
47
** Check whether it's apu1/2/3/4 before downloading
48
* Install flashrom
49
* "Flash bios using flashrom":https://github.com/pcengines/apu2-documentation/blob/master/docs/firmware_flashing.md
50
** @flashrom -w THEROMFILE -p internal@
51
52 4 Nico Schottelius
h2. APU Serial and bootloader configuration
53 3 Nico Schottelius
54
* Ensure that the bootloader has "console=ttyS0,115200" configured
55
* Ensure that there is a getty running on serial
56
* Use grub-bios as the bootloader
57
** Install using @grub-install /dev/sda@
58 1 Nico Schottelius
59 5 Nico Schottelius
h2. Updating the Perc H800 SAS controller
60
61 6 Nico Schottelius
* @wget 'https://dl.dell.com/FOLDER03292738M/3/SAS-RAID_Firmware_XKF5X_LN_12.10.7-0001_A13.BIN?uid=4b8a2506-f4d4-46a9-ab19-3c2a5008a782&fn=SAS-RAID_Firmware_XKF5X_LN_12.10.7-0001_A13.BIN' -O  SAS-RAID_Firmware_XKF5X_LN_12.10.7-0001_A13.BIN@
62 5 Nico Schottelius
* chmod u+x SAS-RAID_Firmware_XKF5X_LN_12.10.7-0001_A13.BIN
63
* ./SAS-RAID_Firmware_XKF5X_LN_12.10.7-0001_A13.BIN
64
65 8 Nico Schottelius
h2. HP servers disk management
66
67 17 Nico Schottelius
* See also https://www.thegeekstuff.com/2014/07/hpacucli-examples/
68 9 Nico Schottelius
69 16 Nico Schottelius
Required kernel modules:
70
71
<pre>
72
sg
73
cciss
74
</pre>
75
76 9 Nico Schottelius
Show all drives/controller overview:
77
78
<pre>
79 15 Nico Schottelius
hpacucli ctrl all show config
80
81 9 Nico Schottelius
hpacucli ctrl slot=0 pd all show
82
</pre>
83
84 11 Nico Schottelius
Add a disk as raid0:
85
86
<pre>
87 1 Nico Schottelius
hpacucli ctrl slot=0 create type=ld drives=1I:1:1 raid=0
88 17 Nico Schottelius
</pre>
89
90
Copy from https://www.thegeekstuff.com/2014/07/hpacucli-examples/ (to cache it mainly):
91
92
<pre>
93
1. Two ways to execute the command
94
95
When you type the command hpacucli, it will display a “=>” prompt as shown below where you can enter all the hpacucli commands explained in the article.
96
97
# hpacucli
98
HP Array Configuration Utility CLI 9.20.9.0
99
Detecting Controllers...Done.
100
Type "help" for a list of supported commands.
101
Type "exit" to close the console.
102
=> rescan
103
104
Or, if you don’t want to get to the hpacucli prompt, you can just enter the following directly in the Linux prompt. The following is exactly same as the above.
105
106
# hpacucli rescan
107
108
2. Display Controller and Disk Status
109
110
To display the detailed status of the controller and the disk status, execute the following command.
111
112
# hpacucli
113
=> ctrl all show config
114
115
Smart Array P410i in Slot 0 (Embedded)    (sn: 50014380101D61C0)
116
117
   array A (SAS, Unused Space: 0  MB)
118
119
      logicaldrive 1 (136.7 GB, RAID 1, OK)
120
121
      physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 146 GB, OK)
122
      physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS, 146 GB, OK)
123
124
   unassigned
125
126
      physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SAS, 300 GB, OK)
127
      physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SAS, 300 GB, OK)
128
      physicaldrive 2I:1:6 (port 2I:box 1:bay 6, SAS, 300 GB, OK)
129
      physicaldrive 2I:1:7 (port 2I:box 1:bay 7, SAS, 300 GB, OK)
130
      physicaldrive 2I:1:8 (port 2I:box 1:bay 8, SAS, 300 GB, OK)
131
132
   SEP (Vendor ID PMCSIERA, Model  SRC 8x6G) 250 (WWID: 50014380101D61CF)
133
134
In this example, as shown in the above output, we have total 7 physical drives. The first RAID group RAID 1 contains 2 physical drives and the remaining physical drives are not assigned to any of the logical drives.
135
3. View Controller Status
136
137
To display the status of just the controller, do the following. In this example, the controller is working perfectly without any issues.
138
139
=> ctrl all show status
140
141
Smart Array P410i in Slot 0 (Embedded)
142
   Controller Status: OK
143
   Cache Status: OK
144
145
4. View Drive Status
146
147
To display the status of the physical drive, do the following. In this example, we have two 146GB physical drives, and 5 300GB physical drives, and all are in perfect condition.
148
149
=> ctrl slot=0 pd all show status
150
151
   physicaldrive 1I:1:1 (port 1I:box 1:bay 1, 146 GB): OK
152
   physicaldrive 1I:1:2 (port 1I:box 1:bay 2, 146 GB): OK
153
   physicaldrive 1I:1:3 (port 1I:box 1:bay 3, 300 GB): OK
154
   physicaldrive 1I:1:4 (port 1I:box 1:bay 4, 300 GB): OK
155
   physicaldrive 2I:1:6 (port 2I:box 1:bay 6, 300 GB): OK
156
   physicaldrive 2I:1:7 (port 2I:box 1:bay 7, 300 GB): OK
157
   physicaldrive 2I:1:8 (port 2I:box 1:bay 8, 300 GB): OK
158
159
5. View Individual Drive Status
160
161
To display the detail status of a specific physical drive, do the following.
162
163
In this example, we like to know the status of “pd” (physical disk) in slot 0. The specific disk is “2I:1:6”, which we figured it out from the output of the previous command.
164
165
As shown in the output below, this displays the Serial Number, Make, Model, Size and Fireware version of this specific disk. This can be very helpful during troubleshooting.
166
167
=> ctrl slot=0 pd 2I:1:6 show detail
168
169
Smart Array P410i in Slot 0 (Embedded)
170
171
   unassigned
172
173
      physicaldrive 2I:1:6
174
         Port: 2I
175
         Box: 1
176
         Bay: 6
177
         Status: OK
178
         Drive Type: Unassigned Drive
179
         Interface Type: SAS
180
         Size: 300 GB
181
         Rotational Speed: 10000
182
         Firmware Revision: HPD4
183
         Serial Number: EB01PC416C4C1214
184
         Model: HP      EG0300FBDSP
185
         Current Temperature (C): 38
186
         Maximum Temperature (C): 56
187
         PHY Count: 2
188
         PHY Transfer Rate: 6.0Gbps, Unknown
189
190
6. View All Logical Drives
191
192
The following command will display all available logical drives on the system. As shown in the output below, we currently have only one logical drive in RAID 1 with total size of around 136GB.
193
194
=> ctrl slot=0 ld all show
195
196
Smart Array P410i in Slot 0 (Embedded)
197
198
   array A
199
200
      logicaldrive 1 (136.7 GB, RAID 1, OK)
201
202
7. Create New RAID 0 Logical Drive
203
204
Execute the following command to create a new logical drive using RAID 0 option.
205
206
=> ctrl slot=0 create type=ld drives=1I:1:3 raid=0
207
208
The above command creates a logical drive with the physical drives 1I:1:3 on RAID 0 configuration in slot 0.
209
8. Create New RAID 1 Logical Drive
210
211
Execute the following command to create a new logical drive using RAID 1 option.
212
213
=> ctrl slot=0 create type=ld drives=1I:1:3,1I:1:4 raid=1
214
215
The above command creates a logical drive with the two physical drives 1I:1:3 and 1I:1:4 on RAID 1 configuration in slot 0.
216
9. Create New RAID 5 Logical Drive
217
218
Execute the following command to create a new logical drive using RAID 5 option.
219
220
=> ctrl slot=0 create type=ld drives=1I:1:3,1I:1:4,2I:1:6,2I:1:7,2I:1:8 raid=5
221
222
The above command creates a logical drive with the five physical drives on RAID 5 configuration in slot 0.
223
224
Once these logical drives are created, you should see the disks from the fdisk and you can format it from there and start using it.
225
226
After you create a logical drive, execute the following command to verify that the LD got created. In this example, it shows that the RAID 5 logical drive got created successfully.
227
228
=> ctrl slot=0 ld all show status
229
230
   logicaldrive 1 (136.7 GB, RAID 1): OK
231
   logicaldrive 2 (1.1 TB, RAID 5): OK
232
233
10. Rescan for New Devices
234
235
If you’ve added new physical hard disk, they won’t automatically show-up immediately. You have to scan for new devices as shown below.
236
237
=> rescan
238
239
11. View Detailed Logical Drive Status
240
241
To display the detailed status of the logical drive, do the following:
242
243
=> ctrl slot=0 ld 2 show
244
245
Smart Array P410i in Slot 0 (Embedded)
246
247
   array B
248
249
      Logical Drive: 2
250
         Size: 1.1 TB
251
         Fault Tolerance: RAID 5
252
         Heads: 255
253
         Sectors Per Track: 32
254
         Cylinders: 65535
255
         Strip Size: 256 KB
256
         Full Stripe Size: 1024 KB
257
         Status: OK
258
         Caching:  Enabled
259
         Parity Initialization Status: In Progress
260
         Unique Identifier: 600508B1001031303144363143301000
261
         Disk Name: /dev/cciss/c0d1
262
         Mount Points: None
263
         Logical Drive Label: A4967E2950014380101D61C008BE
264
         Drive Type: Data
265
266
The above shows the RAID type, the disk name assigned to the logical drive, and other information about the logical drive number 2.
267
12. Delete Logical Drive
268
269
To delete a logical drive with the number 2 use the below command.
270
271
=> ctrl slot=0 ld 2 delete
272
273
Warning: Deleting an array can cause other array letters to become renamed.
274
         E.g. Deleting array A from arrays A,B,C will result in two remaining
275
         arrays A,B ... not B,C
276
277
Warning: Deleting the specified device(s) will result in data being lost.
278
         Continue? (y/n) y
279
280
13. Add New Physical Drive to Logical Volume
281
282
To add the new drives to existing logical volume, do the following.
283
284
=> ctrl slot=0 ld 2 add drives=2I:1:6,2I:1:7
285
286
In this example, we are adding two additional drives specified above to the logical volume number 2.
287
14. Add Spare Disks
288
289
To add the spare disks to arrays that can be used in case of disk failures on one of the logical drives, do the following:
290
291
=> ctrl slot=0 array all add spares=2I:1:6,2I:1:7
292
293
In this example, we are adding two spare disks to the array.
294
15. Enable or Disable Cache
295
296
The below commands enable or disable cache for the entire slot.
297
298
=> ctrl slot=0 modify dwc=disable
299
300
=> ctrl slot=0 modify dwc=enable
301
302
16. Erase Physical Drive
303
304
Execute the following command to erase a physical drive in array B on slot 0.
305
306
=> ctrl slot=0 pd 2I:1:6 modify erase
307
308
17. Blink Physical Disk LED
309
310
To blink the LED on the physical drives for the logical drive 2, do the following. This will make the LEDs blink on all the physical drives that belongs to logical drive 2.
311
312
=> ctrl slot=0 ld 2 modify led=on
313
314
Once you know which drive belongs to logical drive 2, turn the LED blinking off as shown below.
315
316
=> ctrl slot=0 ld 2 modify led=off
317 11 Nico Schottelius
</pre>
318
319 10 Nico Schottelius
h2. Dell servers disk management
320 9 Nico Schottelius
321 10 Nico Schottelius
Listing all disks:
322 1 Nico Schottelius
323 10 Nico Schottelius
<pre>
324
megacli -PDList -aALL
325
</pre>
326 8 Nico Schottelius
327
Adding disks:
328
329
<pre>
330
megacli -CfgLdAdd -r0 [Enclosure Device ID:slot] -aX (X : host is 0. md-array is 1)
331
332
# Sample call, if enclosure and slot are KNOWN (aka not N/A)
333
megacli -CfgLdAdd -r0 [32:0] -a0
334
335
# Sample call, if enclosure is N/A
336
megacli -CfgLdAdd -r0 [:0] -a0
337
</pre>
338
339
Remove cache of disks that are not in the server anymore:
340
341
<pre>
342
megacli -DiscardPreservedCache -Lall -aAll
343
</pre>
344
345
Remove foreign configurations on foreign disks
346
347
<pre>
348
megacli -CfgForeign -Clear -aAll
349
</pre>
350
351
Do both in many cases:
352
353
<pre>
354
megacli -DiscardPreservedCache -Lall -aAll
355
megacli -CfgForeign -Clear -aAll
356
</pre>
357
358 1 Nico Schottelius
h2. SEE ALSO
359
360
* [[Managing OpenWRT]]
361 7 Nico Schottelius
* [[The_ungleich_ceph_handbook]]