Project

General

Profile

The ungleich hardware maintenance guide » History » Version 27

Nico Schottelius, 01/27/2025 01:17 PM

1 2 Nico Schottelius
{{toc}}
2
3 1 Nico Schottelius
h1. The ungleich hardware maintenance guide
4
5
This guide describes common operations on hardware we use.
6
7 13 Nico Schottelius
h2. Using the ungleich-hardware container in kubernetes and docker
8 12 Nico Schottelius
9 13 Nico Schottelius
To manage hardware on server1 in kubernetes, you can use:
10 12 Nico Schottelius
11
<pre>
12
apiVersion: v1
13
kind: Pod
14
metadata:
15
  name: ungleich-hardware
16
spec:
17
  containers:
18
  - name: ungleich-hardware
19 24 Nico Schottelius
    image: harbor.ungleich.svc.p10.k8s.ooo/ungleich-public/ungleich-hardware:0.0.5
20 12 Nico Schottelius
    args:
21
    - sleep
22
    - "1000000"
23
    volumeMounts:
24
      - mountPath: /dev
25
        name: dev
26
    securityContext:
27
      privileged: true
28
  nodeSelector:
29
    kubernetes.io/hostname: "server1"
30
31
  volumes:
32
    - name: dev
33
      hostPath:
34 1 Nico Schottelius
        path: /dev
35
36 13 Nico Schottelius
</pre>
37
38
To use it wit docker:
39
40
<pre>
41 24 Nico Schottelius
docker run -v /dev:/dev --privileged -ti harbor.ungleich.svc.p10.k8s.ooo/ungleich-public/ungleich-hardware:0.0.5
42 12 Nico Schottelius
</pre>
43
44 1 Nico Schottelius
h2. APU Bios Update
45
46
* Download the correct bios from https://pcengines.github.io/
47
** Check whether it's apu1/2/3/4 before downloading
48
* Install flashrom
49
* "Flash bios using flashrom":https://github.com/pcengines/apu2-documentation/blob/master/docs/firmware_flashing.md
50
** @flashrom -w THEROMFILE -p internal@
51
52 4 Nico Schottelius
h2. APU Serial and bootloader configuration
53 3 Nico Schottelius
54
* Ensure that the bootloader has "console=ttyS0,115200" configured
55
* Ensure that there is a getty running on serial
56
* Use grub-bios as the bootloader
57
** Install using @grub-install /dev/sda@
58 1 Nico Schottelius
59 5 Nico Schottelius
h2. Updating the Perc H800 SAS controller
60
61 6 Nico Schottelius
* @wget 'https://dl.dell.com/FOLDER03292738M/3/SAS-RAID_Firmware_XKF5X_LN_12.10.7-0001_A13.BIN?uid=4b8a2506-f4d4-46a9-ab19-3c2a5008a782&fn=SAS-RAID_Firmware_XKF5X_LN_12.10.7-0001_A13.BIN' -O  SAS-RAID_Firmware_XKF5X_LN_12.10.7-0001_A13.BIN@
62 5 Nico Schottelius
* chmod u+x SAS-RAID_Firmware_XKF5X_LN_12.10.7-0001_A13.BIN
63
* ./SAS-RAID_Firmware_XKF5X_LN_12.10.7-0001_A13.BIN
64 25 Nico Schottelius
* https://www.dell.com/support/product-details/de-de/product/poweredge-rc-h800/drivers
65
* Link above is dead, use attachment:RAID_FRMW_LX_R287840.BIN
66 5 Nico Schottelius
67 26 Nico Schottelius
h3. Container notes
68
69
* Firmware does not run in debian (thus not in ungleich-hardware based containers)
70
71
72
<pre>
73 27 Nico Schottelius
docker run -ti --privileged -v /dev:/dev --rm --name fwupdate  redhat/ubi8
74
docker cp ./RAID_FRMW_LX_R287840.BIN fwupdate:/tmp
75 1 Nico Schottelius
...
76
77 27 Nico Schottelius
</pre>
78
79
80
ubi/8 issues:
81
82
<pre>
83
Lesser General Public License, Version 2.1, February 1999.  Under these GNU licenses, you are also entitled to obtain          
84
Collecting inventory...                                                                                                        
85
/tmp/RAID_FRMW_LX_R287840.BIN-18-3462/spsetup.sh: line 888: ./sasdupie: No such file or directory                              
86
/tmp/RAID_FRMW_LX_R287840.BIN-18-3462/spsetup.sh: line 897: ps: command not found                                              
87
88
Inventory collection failed.
89 26 Nico Schottelius
</pre>
90
91 8 Nico Schottelius
h2. HP servers disk management
92
93 17 Nico Schottelius
* See also https://www.thegeekstuff.com/2014/07/hpacucli-examples/
94 9 Nico Schottelius
95 16 Nico Schottelius
Required kernel modules:
96
97
<pre>
98
sg
99
cciss
100
</pre>
101
102 9 Nico Schottelius
Show all drives/controller overview:
103
104
<pre>
105 15 Nico Schottelius
hpacucli ctrl all show config
106
107 9 Nico Schottelius
hpacucli ctrl slot=0 pd all show
108
</pre>
109
110 11 Nico Schottelius
Add a disk as raid0:
111
112
<pre>
113 1 Nico Schottelius
hpacucli ctrl slot=0 create type=ld drives=1I:1:1 raid=0
114 17 Nico Schottelius
</pre>
115
116 18 Nico Schottelius
Deleting a logical drive:
117
118
<pre>
119 23 Nico Schottelius
hpacucli ctrl slot=0 ld X delete
120 18 Nico Schottelius
</pre>
121
122 17 Nico Schottelius
Copy from https://www.thegeekstuff.com/2014/07/hpacucli-examples/ (to cache it mainly):
123
124
<pre>
125
1. Two ways to execute the command
126
127
When you type the command hpacucli, it will display a “=>” prompt as shown below where you can enter all the hpacucli commands explained in the article.
128
129
# hpacucli
130
HP Array Configuration Utility CLI 9.20.9.0
131
Detecting Controllers...Done.
132
Type "help" for a list of supported commands.
133
Type "exit" to close the console.
134
=> rescan
135
136
Or, if you don’t want to get to the hpacucli prompt, you can just enter the following directly in the Linux prompt. The following is exactly same as the above.
137
138
# hpacucli rescan
139
140
2. Display Controller and Disk Status
141
142
To display the detailed status of the controller and the disk status, execute the following command.
143
144
# hpacucli
145
=> ctrl all show config
146
147
Smart Array P410i in Slot 0 (Embedded)    (sn: 50014380101D61C0)
148
149
   array A (SAS, Unused Space: 0  MB)
150
151
      logicaldrive 1 (136.7 GB, RAID 1, OK)
152
153
      physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 146 GB, OK)
154
      physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS, 146 GB, OK)
155
156
   unassigned
157
158
      physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SAS, 300 GB, OK)
159
      physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SAS, 300 GB, OK)
160
      physicaldrive 2I:1:6 (port 2I:box 1:bay 6, SAS, 300 GB, OK)
161
      physicaldrive 2I:1:7 (port 2I:box 1:bay 7, SAS, 300 GB, OK)
162
      physicaldrive 2I:1:8 (port 2I:box 1:bay 8, SAS, 300 GB, OK)
163
164
   SEP (Vendor ID PMCSIERA, Model  SRC 8x6G) 250 (WWID: 50014380101D61CF)
165
166
In this example, as shown in the above output, we have total 7 physical drives. The first RAID group RAID 1 contains 2 physical drives and the remaining physical drives are not assigned to any of the logical drives.
167
3. View Controller Status
168
169
To display the status of just the controller, do the following. In this example, the controller is working perfectly without any issues.
170
171
=> ctrl all show status
172
173
Smart Array P410i in Slot 0 (Embedded)
174
   Controller Status: OK
175
   Cache Status: OK
176
177
4. View Drive Status
178
179
To display the status of the physical drive, do the following. In this example, we have two 146GB physical drives, and 5 300GB physical drives, and all are in perfect condition.
180
181
=> ctrl slot=0 pd all show status
182
183
   physicaldrive 1I:1:1 (port 1I:box 1:bay 1, 146 GB): OK
184
   physicaldrive 1I:1:2 (port 1I:box 1:bay 2, 146 GB): OK
185
   physicaldrive 1I:1:3 (port 1I:box 1:bay 3, 300 GB): OK
186
   physicaldrive 1I:1:4 (port 1I:box 1:bay 4, 300 GB): OK
187
   physicaldrive 2I:1:6 (port 2I:box 1:bay 6, 300 GB): OK
188
   physicaldrive 2I:1:7 (port 2I:box 1:bay 7, 300 GB): OK
189
   physicaldrive 2I:1:8 (port 2I:box 1:bay 8, 300 GB): OK
190
191
5. View Individual Drive Status
192
193
To display the detail status of a specific physical drive, do the following.
194
195
In this example, we like to know the status of “pd” (physical disk) in slot 0. The specific disk is “2I:1:6”, which we figured it out from the output of the previous command.
196
197
As shown in the output below, this displays the Serial Number, Make, Model, Size and Fireware version of this specific disk. This can be very helpful during troubleshooting.
198
199
=> ctrl slot=0 pd 2I:1:6 show detail
200
201
Smart Array P410i in Slot 0 (Embedded)
202
203
   unassigned
204
205
      physicaldrive 2I:1:6
206
         Port: 2I
207
         Box: 1
208
         Bay: 6
209
         Status: OK
210
         Drive Type: Unassigned Drive
211
         Interface Type: SAS
212
         Size: 300 GB
213
         Rotational Speed: 10000
214
         Firmware Revision: HPD4
215
         Serial Number: EB01PC416C4C1214
216
         Model: HP      EG0300FBDSP
217
         Current Temperature (C): 38
218
         Maximum Temperature (C): 56
219
         PHY Count: 2
220
         PHY Transfer Rate: 6.0Gbps, Unknown
221
222
6. View All Logical Drives
223
224
The following command will display all available logical drives on the system. As shown in the output below, we currently have only one logical drive in RAID 1 with total size of around 136GB.
225
226
=> ctrl slot=0 ld all show
227
228
Smart Array P410i in Slot 0 (Embedded)
229
230
   array A
231
232
      logicaldrive 1 (136.7 GB, RAID 1, OK)
233
234
7. Create New RAID 0 Logical Drive
235
236
Execute the following command to create a new logical drive using RAID 0 option.
237
238
=> ctrl slot=0 create type=ld drives=1I:1:3 raid=0
239
240
The above command creates a logical drive with the physical drives 1I:1:3 on RAID 0 configuration in slot 0.
241
8. Create New RAID 1 Logical Drive
242
243
Execute the following command to create a new logical drive using RAID 1 option.
244
245
=> ctrl slot=0 create type=ld drives=1I:1:3,1I:1:4 raid=1
246
247
The above command creates a logical drive with the two physical drives 1I:1:3 and 1I:1:4 on RAID 1 configuration in slot 0.
248
9. Create New RAID 5 Logical Drive
249
250
Execute the following command to create a new logical drive using RAID 5 option.
251
252
=> ctrl slot=0 create type=ld drives=1I:1:3,1I:1:4,2I:1:6,2I:1:7,2I:1:8 raid=5
253
254
The above command creates a logical drive with the five physical drives on RAID 5 configuration in slot 0.
255
256
Once these logical drives are created, you should see the disks from the fdisk and you can format it from there and start using it.
257
258
After you create a logical drive, execute the following command to verify that the LD got created. In this example, it shows that the RAID 5 logical drive got created successfully.
259
260
=> ctrl slot=0 ld all show status
261
262
   logicaldrive 1 (136.7 GB, RAID 1): OK
263
   logicaldrive 2 (1.1 TB, RAID 5): OK
264
265
10. Rescan for New Devices
266
267
If you’ve added new physical hard disk, they won’t automatically show-up immediately. You have to scan for new devices as shown below.
268
269
=> rescan
270
271
11. View Detailed Logical Drive Status
272
273
To display the detailed status of the logical drive, do the following:
274
275
=> ctrl slot=0 ld 2 show
276
277
Smart Array P410i in Slot 0 (Embedded)
278
279
   array B
280
281
      Logical Drive: 2
282
         Size: 1.1 TB
283
         Fault Tolerance: RAID 5
284
         Heads: 255
285
         Sectors Per Track: 32
286
         Cylinders: 65535
287
         Strip Size: 256 KB
288
         Full Stripe Size: 1024 KB
289
         Status: OK
290
         Caching:  Enabled
291
         Parity Initialization Status: In Progress
292
         Unique Identifier: 600508B1001031303144363143301000
293
         Disk Name: /dev/cciss/c0d1
294
         Mount Points: None
295
         Logical Drive Label: A4967E2950014380101D61C008BE
296
         Drive Type: Data
297
298
The above shows the RAID type, the disk name assigned to the logical drive, and other information about the logical drive number 2.
299
12. Delete Logical Drive
300
301
To delete a logical drive with the number 2 use the below command.
302
303
=> ctrl slot=0 ld 2 delete
304
305
Warning: Deleting an array can cause other array letters to become renamed.
306
         E.g. Deleting array A from arrays A,B,C will result in two remaining
307
         arrays A,B ... not B,C
308
309
Warning: Deleting the specified device(s) will result in data being lost.
310
         Continue? (y/n) y
311
312
13. Add New Physical Drive to Logical Volume
313
314
To add the new drives to existing logical volume, do the following.
315
316
=> ctrl slot=0 ld 2 add drives=2I:1:6,2I:1:7
317
318
In this example, we are adding two additional drives specified above to the logical volume number 2.
319
14. Add Spare Disks
320
321
To add the spare disks to arrays that can be used in case of disk failures on one of the logical drives, do the following:
322
323
=> ctrl slot=0 array all add spares=2I:1:6,2I:1:7
324
325
In this example, we are adding two spare disks to the array.
326
15. Enable or Disable Cache
327
328
The below commands enable or disable cache for the entire slot.
329
330
=> ctrl slot=0 modify dwc=disable
331
332
=> ctrl slot=0 modify dwc=enable
333
334
16. Erase Physical Drive
335
336
Execute the following command to erase a physical drive in array B on slot 0.
337
338
=> ctrl slot=0 pd 2I:1:6 modify erase
339
340
17. Blink Physical Disk LED
341
342
To blink the LED on the physical drives for the logical drive 2, do the following. This will make the LEDs blink on all the physical drives that belongs to logical drive 2.
343
344
=> ctrl slot=0 ld 2 modify led=on
345
346
Once you know which drive belongs to logical drive 2, turn the LED blinking off as shown below.
347
348
=> ctrl slot=0 ld 2 modify led=off
349 11 Nico Schottelius
</pre>
350
351 19 Nico Schottelius
h2. Dell servers disk management (megacli)
352 9 Nico Schottelius
353 10 Nico Schottelius
Listing all disks:
354 1 Nico Schottelius
355 10 Nico Schottelius
<pre>
356
megacli -PDList -aALL
357
</pre>
358 8 Nico Schottelius
359
Adding disks:
360
361
<pre>
362
megacli -CfgLdAdd -r0 [Enclosure Device ID:slot] -aX (X : host is 0. md-array is 1)
363
364
# Sample call, if enclosure and slot are KNOWN (aka not N/A)
365
megacli -CfgLdAdd -r0 [32:0] -a0
366
367
# Sample call, if enclosure is N/A
368
megacli -CfgLdAdd -r0 [:0] -a0
369
</pre>
370
371
Remove cache of disks that are not in the server anymore:
372
373
<pre>
374
megacli -DiscardPreservedCache -Lall -aAll
375
</pre>
376
377
Remove foreign configurations on foreign disks
378
379
<pre>
380
megacli -CfgForeign -Clear -aAll
381
</pre>
382
383
Do both in many cases:
384
385
<pre>
386
megacli -DiscardPreservedCache -Lall -aAll
387
megacli -CfgForeign -Clear -aAll
388
</pre>
389
390 20 Nico Schottelius
Growing a raid6
391
392
<pre>
393
megacli -ldrecon  -Start -r6 -Add -PhysDrv[12:4] -l0 -a0
394
</pre>
395
396 21 Nico Schottelius
Deleting a logical drive
397
398
<pre>
399
root@2157f4626763:/# megacli -CfgLdDel -L0 -a0
400
                                     
401
Adapter 0: Deleted Virtual Drive-0(target id-0)
402
403
Exit Code: 0x00
404
root@2157f4626763:/# 
405
</pre>
406 20 Nico Schottelius
407 1 Nico Schottelius
h2. SEE ALSO
408
409
* [[Managing OpenWRT]]
410 7 Nico Schottelius
* [[The_ungleich_ceph_handbook]]