Project

General

Profile

The ungleich hardware maintenance guide » History » Version 26

Nico Schottelius, 01/27/2025 01:08 PM

1 2 Nico Schottelius
{{toc}}
2
3 1 Nico Schottelius
h1. The ungleich hardware maintenance guide
4
5
This guide describes common operations on hardware we use.
6
7 13 Nico Schottelius
h2. Using the ungleich-hardware container in kubernetes and docker
8 12 Nico Schottelius
9 13 Nico Schottelius
To manage hardware on server1 in kubernetes, you can use:
10 12 Nico Schottelius
11
<pre>
12
apiVersion: v1
13
kind: Pod
14
metadata:
15
  name: ungleich-hardware
16
spec:
17
  containers:
18
  - name: ungleich-hardware
19 24 Nico Schottelius
    image: harbor.ungleich.svc.p10.k8s.ooo/ungleich-public/ungleich-hardware:0.0.5
20 12 Nico Schottelius
    args:
21
    - sleep
22
    - "1000000"
23
    volumeMounts:
24
      - mountPath: /dev
25
        name: dev
26
    securityContext:
27
      privileged: true
28
  nodeSelector:
29
    kubernetes.io/hostname: "server1"
30
31
  volumes:
32
    - name: dev
33
      hostPath:
34 1 Nico Schottelius
        path: /dev
35
36 13 Nico Schottelius
</pre>
37
38
To use it wit docker:
39
40
<pre>
41 24 Nico Schottelius
docker run -v /dev:/dev --privileged -ti harbor.ungleich.svc.p10.k8s.ooo/ungleich-public/ungleich-hardware:0.0.5
42 12 Nico Schottelius
</pre>
43
44 1 Nico Schottelius
h2. APU Bios Update
45
46
* Download the correct bios from https://pcengines.github.io/
47
** Check whether it's apu1/2/3/4 before downloading
48
* Install flashrom
49
* "Flash bios using flashrom":https://github.com/pcengines/apu2-documentation/blob/master/docs/firmware_flashing.md
50
** @flashrom -w THEROMFILE -p internal@
51
52 4 Nico Schottelius
h2. APU Serial and bootloader configuration
53 3 Nico Schottelius
54
* Ensure that the bootloader has "console=ttyS0,115200" configured
55
* Ensure that there is a getty running on serial
56
* Use grub-bios as the bootloader
57
** Install using @grub-install /dev/sda@
58 1 Nico Schottelius
59 5 Nico Schottelius
h2. Updating the Perc H800 SAS controller
60
61 6 Nico Schottelius
* @wget 'https://dl.dell.com/FOLDER03292738M/3/SAS-RAID_Firmware_XKF5X_LN_12.10.7-0001_A13.BIN?uid=4b8a2506-f4d4-46a9-ab19-3c2a5008a782&fn=SAS-RAID_Firmware_XKF5X_LN_12.10.7-0001_A13.BIN' -O  SAS-RAID_Firmware_XKF5X_LN_12.10.7-0001_A13.BIN@
62 5 Nico Schottelius
* chmod u+x SAS-RAID_Firmware_XKF5X_LN_12.10.7-0001_A13.BIN
63
* ./SAS-RAID_Firmware_XKF5X_LN_12.10.7-0001_A13.BIN
64 25 Nico Schottelius
* https://www.dell.com/support/product-details/de-de/product/poweredge-rc-h800/drivers
65
* Link above is dead, use attachment:RAID_FRMW_LX_R287840.BIN
66 5 Nico Schottelius
67 26 Nico Schottelius
h3. Container notes
68
69
* Firmware does not run in debian (thus not in ungleich-hardware based containers)
70
71
72
<pre>
73
docker run -ti --privileged -v /dev:/dev  centos
74
75
...
76
77
</pre>
78
79 8 Nico Schottelius
h2. HP servers disk management
80
81 17 Nico Schottelius
* See also https://www.thegeekstuff.com/2014/07/hpacucli-examples/
82 9 Nico Schottelius
83 16 Nico Schottelius
Required kernel modules:
84
85
<pre>
86
sg
87
cciss
88
</pre>
89
90 9 Nico Schottelius
Show all drives/controller overview:
91
92
<pre>
93 15 Nico Schottelius
hpacucli ctrl all show config
94
95 9 Nico Schottelius
hpacucli ctrl slot=0 pd all show
96
</pre>
97
98 11 Nico Schottelius
Add a disk as raid0:
99
100
<pre>
101 1 Nico Schottelius
hpacucli ctrl slot=0 create type=ld drives=1I:1:1 raid=0
102 17 Nico Schottelius
</pre>
103
104 18 Nico Schottelius
Deleting a logical drive:
105
106
<pre>
107 23 Nico Schottelius
hpacucli ctrl slot=0 ld X delete
108 18 Nico Schottelius
</pre>
109
110 17 Nico Schottelius
Copy from https://www.thegeekstuff.com/2014/07/hpacucli-examples/ (to cache it mainly):
111
112
<pre>
113
1. Two ways to execute the command
114
115
When you type the command hpacucli, it will display a “=>” prompt as shown below where you can enter all the hpacucli commands explained in the article.
116
117
# hpacucli
118
HP Array Configuration Utility CLI 9.20.9.0
119
Detecting Controllers...Done.
120
Type "help" for a list of supported commands.
121
Type "exit" to close the console.
122
=> rescan
123
124
Or, if you don’t want to get to the hpacucli prompt, you can just enter the following directly in the Linux prompt. The following is exactly same as the above.
125
126
# hpacucli rescan
127
128
2. Display Controller and Disk Status
129
130
To display the detailed status of the controller and the disk status, execute the following command.
131
132
# hpacucli
133
=> ctrl all show config
134
135
Smart Array P410i in Slot 0 (Embedded)    (sn: 50014380101D61C0)
136
137
   array A (SAS, Unused Space: 0  MB)
138
139
      logicaldrive 1 (136.7 GB, RAID 1, OK)
140
141
      physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 146 GB, OK)
142
      physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS, 146 GB, OK)
143
144
   unassigned
145
146
      physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SAS, 300 GB, OK)
147
      physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SAS, 300 GB, OK)
148
      physicaldrive 2I:1:6 (port 2I:box 1:bay 6, SAS, 300 GB, OK)
149
      physicaldrive 2I:1:7 (port 2I:box 1:bay 7, SAS, 300 GB, OK)
150
      physicaldrive 2I:1:8 (port 2I:box 1:bay 8, SAS, 300 GB, OK)
151
152
   SEP (Vendor ID PMCSIERA, Model  SRC 8x6G) 250 (WWID: 50014380101D61CF)
153
154
In this example, as shown in the above output, we have total 7 physical drives. The first RAID group RAID 1 contains 2 physical drives and the remaining physical drives are not assigned to any of the logical drives.
155
3. View Controller Status
156
157
To display the status of just the controller, do the following. In this example, the controller is working perfectly without any issues.
158
159
=> ctrl all show status
160
161
Smart Array P410i in Slot 0 (Embedded)
162
   Controller Status: OK
163
   Cache Status: OK
164
165
4. View Drive Status
166
167
To display the status of the physical drive, do the following. In this example, we have two 146GB physical drives, and 5 300GB physical drives, and all are in perfect condition.
168
169
=> ctrl slot=0 pd all show status
170
171
   physicaldrive 1I:1:1 (port 1I:box 1:bay 1, 146 GB): OK
172
   physicaldrive 1I:1:2 (port 1I:box 1:bay 2, 146 GB): OK
173
   physicaldrive 1I:1:3 (port 1I:box 1:bay 3, 300 GB): OK
174
   physicaldrive 1I:1:4 (port 1I:box 1:bay 4, 300 GB): OK
175
   physicaldrive 2I:1:6 (port 2I:box 1:bay 6, 300 GB): OK
176
   physicaldrive 2I:1:7 (port 2I:box 1:bay 7, 300 GB): OK
177
   physicaldrive 2I:1:8 (port 2I:box 1:bay 8, 300 GB): OK
178
179
5. View Individual Drive Status
180
181
To display the detail status of a specific physical drive, do the following.
182
183
In this example, we like to know the status of “pd” (physical disk) in slot 0. The specific disk is “2I:1:6”, which we figured it out from the output of the previous command.
184
185
As shown in the output below, this displays the Serial Number, Make, Model, Size and Fireware version of this specific disk. This can be very helpful during troubleshooting.
186
187
=> ctrl slot=0 pd 2I:1:6 show detail
188
189
Smart Array P410i in Slot 0 (Embedded)
190
191
   unassigned
192
193
      physicaldrive 2I:1:6
194
         Port: 2I
195
         Box: 1
196
         Bay: 6
197
         Status: OK
198
         Drive Type: Unassigned Drive
199
         Interface Type: SAS
200
         Size: 300 GB
201
         Rotational Speed: 10000
202
         Firmware Revision: HPD4
203
         Serial Number: EB01PC416C4C1214
204
         Model: HP      EG0300FBDSP
205
         Current Temperature (C): 38
206
         Maximum Temperature (C): 56
207
         PHY Count: 2
208
         PHY Transfer Rate: 6.0Gbps, Unknown
209
210
6. View All Logical Drives
211
212
The following command will display all available logical drives on the system. As shown in the output below, we currently have only one logical drive in RAID 1 with total size of around 136GB.
213
214
=> ctrl slot=0 ld all show
215
216
Smart Array P410i in Slot 0 (Embedded)
217
218
   array A
219
220
      logicaldrive 1 (136.7 GB, RAID 1, OK)
221
222
7. Create New RAID 0 Logical Drive
223
224
Execute the following command to create a new logical drive using RAID 0 option.
225
226
=> ctrl slot=0 create type=ld drives=1I:1:3 raid=0
227
228
The above command creates a logical drive with the physical drives 1I:1:3 on RAID 0 configuration in slot 0.
229
8. Create New RAID 1 Logical Drive
230
231
Execute the following command to create a new logical drive using RAID 1 option.
232
233
=> ctrl slot=0 create type=ld drives=1I:1:3,1I:1:4 raid=1
234
235
The above command creates a logical drive with the two physical drives 1I:1:3 and 1I:1:4 on RAID 1 configuration in slot 0.
236
9. Create New RAID 5 Logical Drive
237
238
Execute the following command to create a new logical drive using RAID 5 option.
239
240
=> ctrl slot=0 create type=ld drives=1I:1:3,1I:1:4,2I:1:6,2I:1:7,2I:1:8 raid=5
241
242
The above command creates a logical drive with the five physical drives on RAID 5 configuration in slot 0.
243
244
Once these logical drives are created, you should see the disks from the fdisk and you can format it from there and start using it.
245
246
After you create a logical drive, execute the following command to verify that the LD got created. In this example, it shows that the RAID 5 logical drive got created successfully.
247
248
=> ctrl slot=0 ld all show status
249
250
   logicaldrive 1 (136.7 GB, RAID 1): OK
251
   logicaldrive 2 (1.1 TB, RAID 5): OK
252
253
10. Rescan for New Devices
254
255
If you’ve added new physical hard disk, they won’t automatically show-up immediately. You have to scan for new devices as shown below.
256
257
=> rescan
258
259
11. View Detailed Logical Drive Status
260
261
To display the detailed status of the logical drive, do the following:
262
263
=> ctrl slot=0 ld 2 show
264
265
Smart Array P410i in Slot 0 (Embedded)
266
267
   array B
268
269
      Logical Drive: 2
270
         Size: 1.1 TB
271
         Fault Tolerance: RAID 5
272
         Heads: 255
273
         Sectors Per Track: 32
274
         Cylinders: 65535
275
         Strip Size: 256 KB
276
         Full Stripe Size: 1024 KB
277
         Status: OK
278
         Caching:  Enabled
279
         Parity Initialization Status: In Progress
280
         Unique Identifier: 600508B1001031303144363143301000
281
         Disk Name: /dev/cciss/c0d1
282
         Mount Points: None
283
         Logical Drive Label: A4967E2950014380101D61C008BE
284
         Drive Type: Data
285
286
The above shows the RAID type, the disk name assigned to the logical drive, and other information about the logical drive number 2.
287
12. Delete Logical Drive
288
289
To delete a logical drive with the number 2 use the below command.
290
291
=> ctrl slot=0 ld 2 delete
292
293
Warning: Deleting an array can cause other array letters to become renamed.
294
         E.g. Deleting array A from arrays A,B,C will result in two remaining
295
         arrays A,B ... not B,C
296
297
Warning: Deleting the specified device(s) will result in data being lost.
298
         Continue? (y/n) y
299
300
13. Add New Physical Drive to Logical Volume
301
302
To add the new drives to existing logical volume, do the following.
303
304
=> ctrl slot=0 ld 2 add drives=2I:1:6,2I:1:7
305
306
In this example, we are adding two additional drives specified above to the logical volume number 2.
307
14. Add Spare Disks
308
309
To add the spare disks to arrays that can be used in case of disk failures on one of the logical drives, do the following:
310
311
=> ctrl slot=0 array all add spares=2I:1:6,2I:1:7
312
313
In this example, we are adding two spare disks to the array.
314
15. Enable or Disable Cache
315
316
The below commands enable or disable cache for the entire slot.
317
318
=> ctrl slot=0 modify dwc=disable
319
320
=> ctrl slot=0 modify dwc=enable
321
322
16. Erase Physical Drive
323
324
Execute the following command to erase a physical drive in array B on slot 0.
325
326
=> ctrl slot=0 pd 2I:1:6 modify erase
327
328
17. Blink Physical Disk LED
329
330
To blink the LED on the physical drives for the logical drive 2, do the following. This will make the LEDs blink on all the physical drives that belongs to logical drive 2.
331
332
=> ctrl slot=0 ld 2 modify led=on
333
334
Once you know which drive belongs to logical drive 2, turn the LED blinking off as shown below.
335
336
=> ctrl slot=0 ld 2 modify led=off
337 11 Nico Schottelius
</pre>
338
339 19 Nico Schottelius
h2. Dell servers disk management (megacli)
340 9 Nico Schottelius
341 10 Nico Schottelius
Listing all disks:
342 1 Nico Schottelius
343 10 Nico Schottelius
<pre>
344
megacli -PDList -aALL
345
</pre>
346 8 Nico Schottelius
347
Adding disks:
348
349
<pre>
350
megacli -CfgLdAdd -r0 [Enclosure Device ID:slot] -aX (X : host is 0. md-array is 1)
351
352
# Sample call, if enclosure and slot are KNOWN (aka not N/A)
353
megacli -CfgLdAdd -r0 [32:0] -a0
354
355
# Sample call, if enclosure is N/A
356
megacli -CfgLdAdd -r0 [:0] -a0
357
</pre>
358
359
Remove cache of disks that are not in the server anymore:
360
361
<pre>
362
megacli -DiscardPreservedCache -Lall -aAll
363
</pre>
364
365
Remove foreign configurations on foreign disks
366
367
<pre>
368
megacli -CfgForeign -Clear -aAll
369
</pre>
370
371
Do both in many cases:
372
373
<pre>
374
megacli -DiscardPreservedCache -Lall -aAll
375
megacli -CfgForeign -Clear -aAll
376
</pre>
377
378 20 Nico Schottelius
Growing a raid6
379
380
<pre>
381
megacli -ldrecon  -Start -r6 -Add -PhysDrv[12:4] -l0 -a0
382
</pre>
383
384 21 Nico Schottelius
Deleting a logical drive
385
386
<pre>
387
root@2157f4626763:/# megacli -CfgLdDel -L0 -a0
388
                                     
389
Adapter 0: Deleted Virtual Drive-0(target id-0)
390
391
Exit Code: 0x00
392
root@2157f4626763:/# 
393
</pre>
394 20 Nico Schottelius
395 1 Nico Schottelius
h2. SEE ALSO
396
397
* [[Managing OpenWRT]]
398 7 Nico Schottelius
* [[The_ungleich_ceph_handbook]]