Project

General

Profile

The ungleich hardware maintenance guide » History » Version 29

Nico Schottelius, 01/27/2025 01:21 PM

1 2 Nico Schottelius
{{toc}}
2
3 1 Nico Schottelius
h1. The ungleich hardware maintenance guide
4
5
This guide describes common operations on hardware we use.
6
7 13 Nico Schottelius
h2. Using the ungleich-hardware container in kubernetes and docker
8 12 Nico Schottelius
9 13 Nico Schottelius
To manage hardware on server1 in kubernetes, you can use:
10 12 Nico Schottelius
11
<pre>
12
apiVersion: v1
13
kind: Pod
14
metadata:
15
  name: ungleich-hardware
16
spec:
17
  containers:
18
  - name: ungleich-hardware
19 24 Nico Schottelius
    image: harbor.ungleich.svc.p10.k8s.ooo/ungleich-public/ungleich-hardware:0.0.5
20 12 Nico Schottelius
    args:
21
    - sleep
22
    - "1000000"
23
    volumeMounts:
24
      - mountPath: /dev
25
        name: dev
26
    securityContext:
27
      privileged: true
28
  nodeSelector:
29
    kubernetes.io/hostname: "server1"
30
31
  volumes:
32
    - name: dev
33
      hostPath:
34 1 Nico Schottelius
        path: /dev
35
36 13 Nico Schottelius
</pre>
37
38
To use it wit docker:
39
40
<pre>
41 24 Nico Schottelius
docker run -v /dev:/dev --privileged -ti harbor.ungleich.svc.p10.k8s.ooo/ungleich-public/ungleich-hardware:0.0.5
42 12 Nico Schottelius
</pre>
43
44 1 Nico Schottelius
h2. APU Bios Update
45
46
* Download the correct bios from https://pcengines.github.io/
47
** Check whether it's apu1/2/3/4 before downloading
48
* Install flashrom
49
* "Flash bios using flashrom":https://github.com/pcengines/apu2-documentation/blob/master/docs/firmware_flashing.md
50
** @flashrom -w THEROMFILE -p internal@
51
52 4 Nico Schottelius
h2. APU Serial and bootloader configuration
53 3 Nico Schottelius
54
* Ensure that the bootloader has "console=ttyS0,115200" configured
55
* Ensure that there is a getty running on serial
56
* Use grub-bios as the bootloader
57
** Install using @grub-install /dev/sda@
58 1 Nico Schottelius
59 5 Nico Schottelius
h2. Updating the Perc H800 SAS controller
60
61 6 Nico Schottelius
* @wget 'https://dl.dell.com/FOLDER03292738M/3/SAS-RAID_Firmware_XKF5X_LN_12.10.7-0001_A13.BIN?uid=4b8a2506-f4d4-46a9-ab19-3c2a5008a782&fn=SAS-RAID_Firmware_XKF5X_LN_12.10.7-0001_A13.BIN' -O  SAS-RAID_Firmware_XKF5X_LN_12.10.7-0001_A13.BIN@
62 5 Nico Schottelius
* chmod u+x SAS-RAID_Firmware_XKF5X_LN_12.10.7-0001_A13.BIN
63
* ./SAS-RAID_Firmware_XKF5X_LN_12.10.7-0001_A13.BIN
64 25 Nico Schottelius
* https://www.dell.com/support/product-details/de-de/product/poweredge-rc-h800/drivers
65
* Link above is dead, use attachment:RAID_FRMW_LX_R287840.BIN
66 28 Nico Schottelius
** And the extracted rom is attachment:FW1062E.rom
67 5 Nico Schottelius
68 29 Nico Schottelius
h3. Container usage
69
70
* Use the ROM from above
71
* use megacli
72
* reboot afterwards
73
74
<pre>
75
megacli -adpfwflash -f FW1062E.rom -a0
76
</pre>
77
78 26 Nico Schottelius
h3. Container notes
79
80 29 Nico Schottelius
81
82 26 Nico Schottelius
* Firmware does not run in debian (thus not in ungleich-hardware based containers)
83
84
85
<pre>
86 27 Nico Schottelius
docker run -ti --privileged -v /dev:/dev --rm --name fwupdate  redhat/ubi8
87
docker cp ./RAID_FRMW_LX_R287840.BIN fwupdate:/tmp
88 1 Nico Schottelius
...
89
90 27 Nico Schottelius
</pre>
91
92
93
ubi/8 issues:
94
95
<pre>
96
Lesser General Public License, Version 2.1, February 1999.  Under these GNU licenses, you are also entitled to obtain          
97
Collecting inventory...                                                                                                        
98
/tmp/RAID_FRMW_LX_R287840.BIN-18-3462/spsetup.sh: line 888: ./sasdupie: No such file or directory                              
99
/tmp/RAID_FRMW_LX_R287840.BIN-18-3462/spsetup.sh: line 897: ps: command not found                                              
100
101
Inventory collection failed.
102 26 Nico Schottelius
</pre>
103 29 Nico Schottelius
104
105 26 Nico Schottelius
106 8 Nico Schottelius
h2. HP servers disk management
107
108 17 Nico Schottelius
* See also https://www.thegeekstuff.com/2014/07/hpacucli-examples/
109 9 Nico Schottelius
110 16 Nico Schottelius
Required kernel modules:
111
112
<pre>
113
sg
114
cciss
115
</pre>
116
117 9 Nico Schottelius
Show all drives/controller overview:
118
119
<pre>
120 15 Nico Schottelius
hpacucli ctrl all show config
121
122 9 Nico Schottelius
hpacucli ctrl slot=0 pd all show
123
</pre>
124
125 11 Nico Schottelius
Add a disk as raid0:
126
127
<pre>
128 1 Nico Schottelius
hpacucli ctrl slot=0 create type=ld drives=1I:1:1 raid=0
129 17 Nico Schottelius
</pre>
130
131 18 Nico Schottelius
Deleting a logical drive:
132
133
<pre>
134 23 Nico Schottelius
hpacucli ctrl slot=0 ld X delete
135 18 Nico Schottelius
</pre>
136
137 17 Nico Schottelius
Copy from https://www.thegeekstuff.com/2014/07/hpacucli-examples/ (to cache it mainly):
138
139
<pre>
140
1. Two ways to execute the command
141
142
When you type the command hpacucli, it will display a “=>” prompt as shown below where you can enter all the hpacucli commands explained in the article.
143
144
# hpacucli
145
HP Array Configuration Utility CLI 9.20.9.0
146
Detecting Controllers...Done.
147
Type "help" for a list of supported commands.
148
Type "exit" to close the console.
149
=> rescan
150
151
Or, if you don’t want to get to the hpacucli prompt, you can just enter the following directly in the Linux prompt. The following is exactly same as the above.
152
153
# hpacucli rescan
154
155
2. Display Controller and Disk Status
156
157
To display the detailed status of the controller and the disk status, execute the following command.
158
159
# hpacucli
160
=> ctrl all show config
161
162
Smart Array P410i in Slot 0 (Embedded)    (sn: 50014380101D61C0)
163
164
   array A (SAS, Unused Space: 0  MB)
165
166
      logicaldrive 1 (136.7 GB, RAID 1, OK)
167
168
      physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 146 GB, OK)
169
      physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS, 146 GB, OK)
170
171
   unassigned
172
173
      physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SAS, 300 GB, OK)
174
      physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SAS, 300 GB, OK)
175
      physicaldrive 2I:1:6 (port 2I:box 1:bay 6, SAS, 300 GB, OK)
176
      physicaldrive 2I:1:7 (port 2I:box 1:bay 7, SAS, 300 GB, OK)
177
      physicaldrive 2I:1:8 (port 2I:box 1:bay 8, SAS, 300 GB, OK)
178
179
   SEP (Vendor ID PMCSIERA, Model  SRC 8x6G) 250 (WWID: 50014380101D61CF)
180
181
In this example, as shown in the above output, we have total 7 physical drives. The first RAID group RAID 1 contains 2 physical drives and the remaining physical drives are not assigned to any of the logical drives.
182
3. View Controller Status
183
184
To display the status of just the controller, do the following. In this example, the controller is working perfectly without any issues.
185
186
=> ctrl all show status
187
188
Smart Array P410i in Slot 0 (Embedded)
189
   Controller Status: OK
190
   Cache Status: OK
191
192
4. View Drive Status
193
194
To display the status of the physical drive, do the following. In this example, we have two 146GB physical drives, and 5 300GB physical drives, and all are in perfect condition.
195
196
=> ctrl slot=0 pd all show status
197
198
   physicaldrive 1I:1:1 (port 1I:box 1:bay 1, 146 GB): OK
199
   physicaldrive 1I:1:2 (port 1I:box 1:bay 2, 146 GB): OK
200
   physicaldrive 1I:1:3 (port 1I:box 1:bay 3, 300 GB): OK
201
   physicaldrive 1I:1:4 (port 1I:box 1:bay 4, 300 GB): OK
202
   physicaldrive 2I:1:6 (port 2I:box 1:bay 6, 300 GB): OK
203
   physicaldrive 2I:1:7 (port 2I:box 1:bay 7, 300 GB): OK
204
   physicaldrive 2I:1:8 (port 2I:box 1:bay 8, 300 GB): OK
205
206
5. View Individual Drive Status
207
208
To display the detail status of a specific physical drive, do the following.
209
210
In this example, we like to know the status of “pd” (physical disk) in slot 0. The specific disk is “2I:1:6”, which we figured it out from the output of the previous command.
211
212
As shown in the output below, this displays the Serial Number, Make, Model, Size and Fireware version of this specific disk. This can be very helpful during troubleshooting.
213
214
=> ctrl slot=0 pd 2I:1:6 show detail
215
216
Smart Array P410i in Slot 0 (Embedded)
217
218
   unassigned
219
220
      physicaldrive 2I:1:6
221
         Port: 2I
222
         Box: 1
223
         Bay: 6
224
         Status: OK
225
         Drive Type: Unassigned Drive
226
         Interface Type: SAS
227
         Size: 300 GB
228
         Rotational Speed: 10000
229
         Firmware Revision: HPD4
230
         Serial Number: EB01PC416C4C1214
231
         Model: HP      EG0300FBDSP
232
         Current Temperature (C): 38
233
         Maximum Temperature (C): 56
234
         PHY Count: 2
235
         PHY Transfer Rate: 6.0Gbps, Unknown
236
237
6. View All Logical Drives
238
239
The following command will display all available logical drives on the system. As shown in the output below, we currently have only one logical drive in RAID 1 with total size of around 136GB.
240
241
=> ctrl slot=0 ld all show
242
243
Smart Array P410i in Slot 0 (Embedded)
244
245
   array A
246
247
      logicaldrive 1 (136.7 GB, RAID 1, OK)
248
249
7. Create New RAID 0 Logical Drive
250
251
Execute the following command to create a new logical drive using RAID 0 option.
252
253
=> ctrl slot=0 create type=ld drives=1I:1:3 raid=0
254
255
The above command creates a logical drive with the physical drives 1I:1:3 on RAID 0 configuration in slot 0.
256
8. Create New RAID 1 Logical Drive
257
258
Execute the following command to create a new logical drive using RAID 1 option.
259
260
=> ctrl slot=0 create type=ld drives=1I:1:3,1I:1:4 raid=1
261
262
The above command creates a logical drive with the two physical drives 1I:1:3 and 1I:1:4 on RAID 1 configuration in slot 0.
263
9. Create New RAID 5 Logical Drive
264
265
Execute the following command to create a new logical drive using RAID 5 option.
266
267
=> ctrl slot=0 create type=ld drives=1I:1:3,1I:1:4,2I:1:6,2I:1:7,2I:1:8 raid=5
268
269
The above command creates a logical drive with the five physical drives on RAID 5 configuration in slot 0.
270
271
Once these logical drives are created, you should see the disks from the fdisk and you can format it from there and start using it.
272
273
After you create a logical drive, execute the following command to verify that the LD got created. In this example, it shows that the RAID 5 logical drive got created successfully.
274
275
=> ctrl slot=0 ld all show status
276
277
   logicaldrive 1 (136.7 GB, RAID 1): OK
278
   logicaldrive 2 (1.1 TB, RAID 5): OK
279
280
10. Rescan for New Devices
281
282
If you’ve added new physical hard disk, they won’t automatically show-up immediately. You have to scan for new devices as shown below.
283
284
=> rescan
285
286
11. View Detailed Logical Drive Status
287
288
To display the detailed status of the logical drive, do the following:
289
290
=> ctrl slot=0 ld 2 show
291
292
Smart Array P410i in Slot 0 (Embedded)
293
294
   array B
295
296
      Logical Drive: 2
297
         Size: 1.1 TB
298
         Fault Tolerance: RAID 5
299
         Heads: 255
300
         Sectors Per Track: 32
301
         Cylinders: 65535
302
         Strip Size: 256 KB
303
         Full Stripe Size: 1024 KB
304
         Status: OK
305
         Caching:  Enabled
306
         Parity Initialization Status: In Progress
307
         Unique Identifier: 600508B1001031303144363143301000
308
         Disk Name: /dev/cciss/c0d1
309
         Mount Points: None
310
         Logical Drive Label: A4967E2950014380101D61C008BE
311
         Drive Type: Data
312
313
The above shows the RAID type, the disk name assigned to the logical drive, and other information about the logical drive number 2.
314
12. Delete Logical Drive
315
316
To delete a logical drive with the number 2 use the below command.
317
318
=> ctrl slot=0 ld 2 delete
319
320
Warning: Deleting an array can cause other array letters to become renamed.
321
         E.g. Deleting array A from arrays A,B,C will result in two remaining
322
         arrays A,B ... not B,C
323
324
Warning: Deleting the specified device(s) will result in data being lost.
325
         Continue? (y/n) y
326
327
13. Add New Physical Drive to Logical Volume
328
329
To add the new drives to existing logical volume, do the following.
330
331
=> ctrl slot=0 ld 2 add drives=2I:1:6,2I:1:7
332
333
In this example, we are adding two additional drives specified above to the logical volume number 2.
334
14. Add Spare Disks
335
336
To add the spare disks to arrays that can be used in case of disk failures on one of the logical drives, do the following:
337
338
=> ctrl slot=0 array all add spares=2I:1:6,2I:1:7
339
340
In this example, we are adding two spare disks to the array.
341
15. Enable or Disable Cache
342
343
The below commands enable or disable cache for the entire slot.
344
345
=> ctrl slot=0 modify dwc=disable
346
347
=> ctrl slot=0 modify dwc=enable
348
349
16. Erase Physical Drive
350
351
Execute the following command to erase a physical drive in array B on slot 0.
352
353
=> ctrl slot=0 pd 2I:1:6 modify erase
354
355
17. Blink Physical Disk LED
356
357
To blink the LED on the physical drives for the logical drive 2, do the following. This will make the LEDs blink on all the physical drives that belongs to logical drive 2.
358
359
=> ctrl slot=0 ld 2 modify led=on
360
361
Once you know which drive belongs to logical drive 2, turn the LED blinking off as shown below.
362
363
=> ctrl slot=0 ld 2 modify led=off
364 11 Nico Schottelius
</pre>
365
366 19 Nico Schottelius
h2. Dell servers disk management (megacli)
367 9 Nico Schottelius
368 10 Nico Schottelius
Listing all disks:
369 1 Nico Schottelius
370 10 Nico Schottelius
<pre>
371
megacli -PDList -aALL
372
</pre>
373 8 Nico Schottelius
374
Adding disks:
375
376
<pre>
377
megacli -CfgLdAdd -r0 [Enclosure Device ID:slot] -aX (X : host is 0. md-array is 1)
378
379
# Sample call, if enclosure and slot are KNOWN (aka not N/A)
380
megacli -CfgLdAdd -r0 [32:0] -a0
381
382
# Sample call, if enclosure is N/A
383
megacli -CfgLdAdd -r0 [:0] -a0
384
</pre>
385
386
Remove cache of disks that are not in the server anymore:
387
388
<pre>
389
megacli -DiscardPreservedCache -Lall -aAll
390
</pre>
391
392
Remove foreign configurations on foreign disks
393
394
<pre>
395
megacli -CfgForeign -Clear -aAll
396
</pre>
397
398
Do both in many cases:
399
400
<pre>
401
megacli -DiscardPreservedCache -Lall -aAll
402
megacli -CfgForeign -Clear -aAll
403
</pre>
404
405 20 Nico Schottelius
Growing a raid6
406
407
<pre>
408
megacli -ldrecon  -Start -r6 -Add -PhysDrv[12:4] -l0 -a0
409
</pre>
410
411 21 Nico Schottelius
Deleting a logical drive
412
413
<pre>
414
root@2157f4626763:/# megacli -CfgLdDel -L0 -a0
415
                                     
416
Adapter 0: Deleted Virtual Drive-0(target id-0)
417
418
Exit Code: 0x00
419
root@2157f4626763:/# 
420
</pre>
421 20 Nico Schottelius
422 1 Nico Schottelius
h2. SEE ALSO
423
424
* [[Managing OpenWRT]]
425 7 Nico Schottelius
* [[The_ungleich_ceph_handbook]]