Project

General

Profile

The ungleich hardware maintenance guide » History » Version 28

Nico Schottelius, 01/27/2025 01:20 PM

1 2 Nico Schottelius
{{toc}}
2
3 1 Nico Schottelius
h1. The ungleich hardware maintenance guide
4
5
This guide describes common operations on hardware we use.
6
7 13 Nico Schottelius
h2. Using the ungleich-hardware container in kubernetes and docker
8 12 Nico Schottelius
9 13 Nico Schottelius
To manage hardware on server1 in kubernetes, you can use:
10 12 Nico Schottelius
11
<pre>
12
apiVersion: v1
13
kind: Pod
14
metadata:
15
  name: ungleich-hardware
16
spec:
17
  containers:
18
  - name: ungleich-hardware
19 24 Nico Schottelius
    image: harbor.ungleich.svc.p10.k8s.ooo/ungleich-public/ungleich-hardware:0.0.5
20 12 Nico Schottelius
    args:
21
    - sleep
22
    - "1000000"
23
    volumeMounts:
24
      - mountPath: /dev
25
        name: dev
26
    securityContext:
27
      privileged: true
28
  nodeSelector:
29
    kubernetes.io/hostname: "server1"
30
31
  volumes:
32
    - name: dev
33
      hostPath:
34 1 Nico Schottelius
        path: /dev
35
36 13 Nico Schottelius
</pre>
37
38
To use it wit docker:
39
40
<pre>
41 24 Nico Schottelius
docker run -v /dev:/dev --privileged -ti harbor.ungleich.svc.p10.k8s.ooo/ungleich-public/ungleich-hardware:0.0.5
42 12 Nico Schottelius
</pre>
43
44 1 Nico Schottelius
h2. APU Bios Update
45
46
* Download the correct bios from https://pcengines.github.io/
47
** Check whether it's apu1/2/3/4 before downloading
48
* Install flashrom
49
* "Flash bios using flashrom":https://github.com/pcengines/apu2-documentation/blob/master/docs/firmware_flashing.md
50
** @flashrom -w THEROMFILE -p internal@
51
52 4 Nico Schottelius
h2. APU Serial and bootloader configuration
53 3 Nico Schottelius
54
* Ensure that the bootloader has "console=ttyS0,115200" configured
55
* Ensure that there is a getty running on serial
56
* Use grub-bios as the bootloader
57
** Install using @grub-install /dev/sda@
58 1 Nico Schottelius
59 5 Nico Schottelius
h2. Updating the Perc H800 SAS controller
60
61 6 Nico Schottelius
* @wget 'https://dl.dell.com/FOLDER03292738M/3/SAS-RAID_Firmware_XKF5X_LN_12.10.7-0001_A13.BIN?uid=4b8a2506-f4d4-46a9-ab19-3c2a5008a782&fn=SAS-RAID_Firmware_XKF5X_LN_12.10.7-0001_A13.BIN' -O  SAS-RAID_Firmware_XKF5X_LN_12.10.7-0001_A13.BIN@
62 5 Nico Schottelius
* chmod u+x SAS-RAID_Firmware_XKF5X_LN_12.10.7-0001_A13.BIN
63
* ./SAS-RAID_Firmware_XKF5X_LN_12.10.7-0001_A13.BIN
64 25 Nico Schottelius
* https://www.dell.com/support/product-details/de-de/product/poweredge-rc-h800/drivers
65
* Link above is dead, use attachment:RAID_FRMW_LX_R287840.BIN
66 28 Nico Schottelius
** And the extracted rom is attachment:FW1062E.rom
67 5 Nico Schottelius
68 26 Nico Schottelius
h3. Container notes
69
70
* Firmware does not run in debian (thus not in ungleich-hardware based containers)
71
72
73
<pre>
74 27 Nico Schottelius
docker run -ti --privileged -v /dev:/dev --rm --name fwupdate  redhat/ubi8
75
docker cp ./RAID_FRMW_LX_R287840.BIN fwupdate:/tmp
76 1 Nico Schottelius
...
77
78 27 Nico Schottelius
</pre>
79
80
81
ubi/8 issues:
82
83
<pre>
84
Lesser General Public License, Version 2.1, February 1999.  Under these GNU licenses, you are also entitled to obtain          
85
Collecting inventory...                                                                                                        
86
/tmp/RAID_FRMW_LX_R287840.BIN-18-3462/spsetup.sh: line 888: ./sasdupie: No such file or directory                              
87
/tmp/RAID_FRMW_LX_R287840.BIN-18-3462/spsetup.sh: line 897: ps: command not found                                              
88
89
Inventory collection failed.
90 26 Nico Schottelius
</pre>
91
92 8 Nico Schottelius
h2. HP servers disk management
93
94 17 Nico Schottelius
* See also https://www.thegeekstuff.com/2014/07/hpacucli-examples/
95 9 Nico Schottelius
96 16 Nico Schottelius
Required kernel modules:
97
98
<pre>
99
sg
100
cciss
101
</pre>
102
103 9 Nico Schottelius
Show all drives/controller overview:
104
105
<pre>
106 15 Nico Schottelius
hpacucli ctrl all show config
107
108 9 Nico Schottelius
hpacucli ctrl slot=0 pd all show
109
</pre>
110
111 11 Nico Schottelius
Add a disk as raid0:
112
113
<pre>
114 1 Nico Schottelius
hpacucli ctrl slot=0 create type=ld drives=1I:1:1 raid=0
115 17 Nico Schottelius
</pre>
116
117 18 Nico Schottelius
Deleting a logical drive:
118
119
<pre>
120 23 Nico Schottelius
hpacucli ctrl slot=0 ld X delete
121 18 Nico Schottelius
</pre>
122
123 17 Nico Schottelius
Copy from https://www.thegeekstuff.com/2014/07/hpacucli-examples/ (to cache it mainly):
124
125
<pre>
126
1. Two ways to execute the command
127
128
When you type the command hpacucli, it will display a “=>” prompt as shown below where you can enter all the hpacucli commands explained in the article.
129
130
# hpacucli
131
HP Array Configuration Utility CLI 9.20.9.0
132
Detecting Controllers...Done.
133
Type "help" for a list of supported commands.
134
Type "exit" to close the console.
135
=> rescan
136
137
Or, if you don’t want to get to the hpacucli prompt, you can just enter the following directly in the Linux prompt. The following is exactly same as the above.
138
139
# hpacucli rescan
140
141
2. Display Controller and Disk Status
142
143
To display the detailed status of the controller and the disk status, execute the following command.
144
145
# hpacucli
146
=> ctrl all show config
147
148
Smart Array P410i in Slot 0 (Embedded)    (sn: 50014380101D61C0)
149
150
   array A (SAS, Unused Space: 0  MB)
151
152
      logicaldrive 1 (136.7 GB, RAID 1, OK)
153
154
      physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 146 GB, OK)
155
      physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS, 146 GB, OK)
156
157
   unassigned
158
159
      physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SAS, 300 GB, OK)
160
      physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SAS, 300 GB, OK)
161
      physicaldrive 2I:1:6 (port 2I:box 1:bay 6, SAS, 300 GB, OK)
162
      physicaldrive 2I:1:7 (port 2I:box 1:bay 7, SAS, 300 GB, OK)
163
      physicaldrive 2I:1:8 (port 2I:box 1:bay 8, SAS, 300 GB, OK)
164
165
   SEP (Vendor ID PMCSIERA, Model  SRC 8x6G) 250 (WWID: 50014380101D61CF)
166
167
In this example, as shown in the above output, we have total 7 physical drives. The first RAID group RAID 1 contains 2 physical drives and the remaining physical drives are not assigned to any of the logical drives.
168
3. View Controller Status
169
170
To display the status of just the controller, do the following. In this example, the controller is working perfectly without any issues.
171
172
=> ctrl all show status
173
174
Smart Array P410i in Slot 0 (Embedded)
175
   Controller Status: OK
176
   Cache Status: OK
177
178
4. View Drive Status
179
180
To display the status of the physical drive, do the following. In this example, we have two 146GB physical drives, and 5 300GB physical drives, and all are in perfect condition.
181
182
=> ctrl slot=0 pd all show status
183
184
   physicaldrive 1I:1:1 (port 1I:box 1:bay 1, 146 GB): OK
185
   physicaldrive 1I:1:2 (port 1I:box 1:bay 2, 146 GB): OK
186
   physicaldrive 1I:1:3 (port 1I:box 1:bay 3, 300 GB): OK
187
   physicaldrive 1I:1:4 (port 1I:box 1:bay 4, 300 GB): OK
188
   physicaldrive 2I:1:6 (port 2I:box 1:bay 6, 300 GB): OK
189
   physicaldrive 2I:1:7 (port 2I:box 1:bay 7, 300 GB): OK
190
   physicaldrive 2I:1:8 (port 2I:box 1:bay 8, 300 GB): OK
191
192
5. View Individual Drive Status
193
194
To display the detail status of a specific physical drive, do the following.
195
196
In this example, we like to know the status of “pd” (physical disk) in slot 0. The specific disk is “2I:1:6”, which we figured it out from the output of the previous command.
197
198
As shown in the output below, this displays the Serial Number, Make, Model, Size and Fireware version of this specific disk. This can be very helpful during troubleshooting.
199
200
=> ctrl slot=0 pd 2I:1:6 show detail
201
202
Smart Array P410i in Slot 0 (Embedded)
203
204
   unassigned
205
206
      physicaldrive 2I:1:6
207
         Port: 2I
208
         Box: 1
209
         Bay: 6
210
         Status: OK
211
         Drive Type: Unassigned Drive
212
         Interface Type: SAS
213
         Size: 300 GB
214
         Rotational Speed: 10000
215
         Firmware Revision: HPD4
216
         Serial Number: EB01PC416C4C1214
217
         Model: HP      EG0300FBDSP
218
         Current Temperature (C): 38
219
         Maximum Temperature (C): 56
220
         PHY Count: 2
221
         PHY Transfer Rate: 6.0Gbps, Unknown
222
223
6. View All Logical Drives
224
225
The following command will display all available logical drives on the system. As shown in the output below, we currently have only one logical drive in RAID 1 with total size of around 136GB.
226
227
=> ctrl slot=0 ld all show
228
229
Smart Array P410i in Slot 0 (Embedded)
230
231
   array A
232
233
      logicaldrive 1 (136.7 GB, RAID 1, OK)
234
235
7. Create New RAID 0 Logical Drive
236
237
Execute the following command to create a new logical drive using RAID 0 option.
238
239
=> ctrl slot=0 create type=ld drives=1I:1:3 raid=0
240
241
The above command creates a logical drive with the physical drives 1I:1:3 on RAID 0 configuration in slot 0.
242
8. Create New RAID 1 Logical Drive
243
244
Execute the following command to create a new logical drive using RAID 1 option.
245
246
=> ctrl slot=0 create type=ld drives=1I:1:3,1I:1:4 raid=1
247
248
The above command creates a logical drive with the two physical drives 1I:1:3 and 1I:1:4 on RAID 1 configuration in slot 0.
249
9. Create New RAID 5 Logical Drive
250
251
Execute the following command to create a new logical drive using RAID 5 option.
252
253
=> ctrl slot=0 create type=ld drives=1I:1:3,1I:1:4,2I:1:6,2I:1:7,2I:1:8 raid=5
254
255
The above command creates a logical drive with the five physical drives on RAID 5 configuration in slot 0.
256
257
Once these logical drives are created, you should see the disks from the fdisk and you can format it from there and start using it.
258
259
After you create a logical drive, execute the following command to verify that the LD got created. In this example, it shows that the RAID 5 logical drive got created successfully.
260
261
=> ctrl slot=0 ld all show status
262
263
   logicaldrive 1 (136.7 GB, RAID 1): OK
264
   logicaldrive 2 (1.1 TB, RAID 5): OK
265
266
10. Rescan for New Devices
267
268
If you’ve added new physical hard disk, they won’t automatically show-up immediately. You have to scan for new devices as shown below.
269
270
=> rescan
271
272
11. View Detailed Logical Drive Status
273
274
To display the detailed status of the logical drive, do the following:
275
276
=> ctrl slot=0 ld 2 show
277
278
Smart Array P410i in Slot 0 (Embedded)
279
280
   array B
281
282
      Logical Drive: 2
283
         Size: 1.1 TB
284
         Fault Tolerance: RAID 5
285
         Heads: 255
286
         Sectors Per Track: 32
287
         Cylinders: 65535
288
         Strip Size: 256 KB
289
         Full Stripe Size: 1024 KB
290
         Status: OK
291
         Caching:  Enabled
292
         Parity Initialization Status: In Progress
293
         Unique Identifier: 600508B1001031303144363143301000
294
         Disk Name: /dev/cciss/c0d1
295
         Mount Points: None
296
         Logical Drive Label: A4967E2950014380101D61C008BE
297
         Drive Type: Data
298
299
The above shows the RAID type, the disk name assigned to the logical drive, and other information about the logical drive number 2.
300
12. Delete Logical Drive
301
302
To delete a logical drive with the number 2 use the below command.
303
304
=> ctrl slot=0 ld 2 delete
305
306
Warning: Deleting an array can cause other array letters to become renamed.
307
         E.g. Deleting array A from arrays A,B,C will result in two remaining
308
         arrays A,B ... not B,C
309
310
Warning: Deleting the specified device(s) will result in data being lost.
311
         Continue? (y/n) y
312
313
13. Add New Physical Drive to Logical Volume
314
315
To add the new drives to existing logical volume, do the following.
316
317
=> ctrl slot=0 ld 2 add drives=2I:1:6,2I:1:7
318
319
In this example, we are adding two additional drives specified above to the logical volume number 2.
320
14. Add Spare Disks
321
322
To add the spare disks to arrays that can be used in case of disk failures on one of the logical drives, do the following:
323
324
=> ctrl slot=0 array all add spares=2I:1:6,2I:1:7
325
326
In this example, we are adding two spare disks to the array.
327
15. Enable or Disable Cache
328
329
The below commands enable or disable cache for the entire slot.
330
331
=> ctrl slot=0 modify dwc=disable
332
333
=> ctrl slot=0 modify dwc=enable
334
335
16. Erase Physical Drive
336
337
Execute the following command to erase a physical drive in array B on slot 0.
338
339
=> ctrl slot=0 pd 2I:1:6 modify erase
340
341
17. Blink Physical Disk LED
342
343
To blink the LED on the physical drives for the logical drive 2, do the following. This will make the LEDs blink on all the physical drives that belongs to logical drive 2.
344
345
=> ctrl slot=0 ld 2 modify led=on
346
347
Once you know which drive belongs to logical drive 2, turn the LED blinking off as shown below.
348
349
=> ctrl slot=0 ld 2 modify led=off
350 11 Nico Schottelius
</pre>
351
352 19 Nico Schottelius
h2. Dell servers disk management (megacli)
353 9 Nico Schottelius
354 10 Nico Schottelius
Listing all disks:
355 1 Nico Schottelius
356 10 Nico Schottelius
<pre>
357
megacli -PDList -aALL
358
</pre>
359 8 Nico Schottelius
360
Adding disks:
361
362
<pre>
363
megacli -CfgLdAdd -r0 [Enclosure Device ID:slot] -aX (X : host is 0. md-array is 1)
364
365
# Sample call, if enclosure and slot are KNOWN (aka not N/A)
366
megacli -CfgLdAdd -r0 [32:0] -a0
367
368
# Sample call, if enclosure is N/A
369
megacli -CfgLdAdd -r0 [:0] -a0
370
</pre>
371
372
Remove cache of disks that are not in the server anymore:
373
374
<pre>
375
megacli -DiscardPreservedCache -Lall -aAll
376
</pre>
377
378
Remove foreign configurations on foreign disks
379
380
<pre>
381
megacli -CfgForeign -Clear -aAll
382
</pre>
383
384
Do both in many cases:
385
386
<pre>
387
megacli -DiscardPreservedCache -Lall -aAll
388
megacli -CfgForeign -Clear -aAll
389
</pre>
390
391 20 Nico Schottelius
Growing a raid6
392
393
<pre>
394
megacli -ldrecon  -Start -r6 -Add -PhysDrv[12:4] -l0 -a0
395
</pre>
396
397 21 Nico Schottelius
Deleting a logical drive
398
399
<pre>
400
root@2157f4626763:/# megacli -CfgLdDel -L0 -a0
401
                                     
402
Adapter 0: Deleted Virtual Drive-0(target id-0)
403
404
Exit Code: 0x00
405
root@2157f4626763:/# 
406
</pre>
407 20 Nico Schottelius
408 1 Nico Schottelius
h2. SEE ALSO
409
410
* [[Managing OpenWRT]]
411 7 Nico Schottelius
* [[The_ungleich_ceph_handbook]]