Project

General

Profile

The ungleich hardware maintenance guide » History » Version 25

Nico Schottelius, 01/27/2025 01:04 PM

1 2 Nico Schottelius
{{toc}}
2
3 1 Nico Schottelius
h1. The ungleich hardware maintenance guide
4
5
This guide describes common operations on hardware we use.
6
7 13 Nico Schottelius
h2. Using the ungleich-hardware container in kubernetes and docker
8 12 Nico Schottelius
9 13 Nico Schottelius
To manage hardware on server1 in kubernetes, you can use:
10 12 Nico Schottelius
11
<pre>
12
apiVersion: v1
13
kind: Pod
14
metadata:
15
  name: ungleich-hardware
16
spec:
17
  containers:
18
  - name: ungleich-hardware
19 24 Nico Schottelius
    image: harbor.ungleich.svc.p10.k8s.ooo/ungleich-public/ungleich-hardware:0.0.5
20 12 Nico Schottelius
    args:
21
    - sleep
22
    - "1000000"
23
    volumeMounts:
24
      - mountPath: /dev
25
        name: dev
26
    securityContext:
27
      privileged: true
28
  nodeSelector:
29
    kubernetes.io/hostname: "server1"
30
31
  volumes:
32
    - name: dev
33
      hostPath:
34 1 Nico Schottelius
        path: /dev
35
36 13 Nico Schottelius
</pre>
37
38
To use it wit docker:
39
40
<pre>
41 24 Nico Schottelius
docker run -v /dev:/dev --privileged -ti harbor.ungleich.svc.p10.k8s.ooo/ungleich-public/ungleich-hardware:0.0.5
42 12 Nico Schottelius
</pre>
43
44 1 Nico Schottelius
h2. APU Bios Update
45
46
* Download the correct bios from https://pcengines.github.io/
47
** Check whether it's apu1/2/3/4 before downloading
48
* Install flashrom
49
* "Flash bios using flashrom":https://github.com/pcengines/apu2-documentation/blob/master/docs/firmware_flashing.md
50
** @flashrom -w THEROMFILE -p internal@
51
52 4 Nico Schottelius
h2. APU Serial and bootloader configuration
53 3 Nico Schottelius
54
* Ensure that the bootloader has "console=ttyS0,115200" configured
55
* Ensure that there is a getty running on serial
56
* Use grub-bios as the bootloader
57
** Install using @grub-install /dev/sda@
58 1 Nico Schottelius
59 5 Nico Schottelius
h2. Updating the Perc H800 SAS controller
60
61 6 Nico Schottelius
* @wget 'https://dl.dell.com/FOLDER03292738M/3/SAS-RAID_Firmware_XKF5X_LN_12.10.7-0001_A13.BIN?uid=4b8a2506-f4d4-46a9-ab19-3c2a5008a782&fn=SAS-RAID_Firmware_XKF5X_LN_12.10.7-0001_A13.BIN' -O  SAS-RAID_Firmware_XKF5X_LN_12.10.7-0001_A13.BIN@
62 5 Nico Schottelius
* chmod u+x SAS-RAID_Firmware_XKF5X_LN_12.10.7-0001_A13.BIN
63
* ./SAS-RAID_Firmware_XKF5X_LN_12.10.7-0001_A13.BIN
64 25 Nico Schottelius
* https://www.dell.com/support/product-details/de-de/product/poweredge-rc-h800/drivers
65
* Link above is dead, use attachment:RAID_FRMW_LX_R287840.BIN
66 5 Nico Schottelius
67 8 Nico Schottelius
h2. HP servers disk management
68
69 17 Nico Schottelius
* See also https://www.thegeekstuff.com/2014/07/hpacucli-examples/
70 9 Nico Schottelius
71 16 Nico Schottelius
Required kernel modules:
72
73
<pre>
74
sg
75
cciss
76
</pre>
77
78 9 Nico Schottelius
Show all drives/controller overview:
79
80
<pre>
81 15 Nico Schottelius
hpacucli ctrl all show config
82
83 9 Nico Schottelius
hpacucli ctrl slot=0 pd all show
84
</pre>
85
86 11 Nico Schottelius
Add a disk as raid0:
87
88
<pre>
89 1 Nico Schottelius
hpacucli ctrl slot=0 create type=ld drives=1I:1:1 raid=0
90 17 Nico Schottelius
</pre>
91
92 18 Nico Schottelius
Deleting a logical drive:
93
94
<pre>
95 23 Nico Schottelius
hpacucli ctrl slot=0 ld X delete
96 18 Nico Schottelius
</pre>
97
98 17 Nico Schottelius
Copy from https://www.thegeekstuff.com/2014/07/hpacucli-examples/ (to cache it mainly):
99
100
<pre>
101
1. Two ways to execute the command
102
103
When you type the command hpacucli, it will display a “=>” prompt as shown below where you can enter all the hpacucli commands explained in the article.
104
105
# hpacucli
106
HP Array Configuration Utility CLI 9.20.9.0
107
Detecting Controllers...Done.
108
Type "help" for a list of supported commands.
109
Type "exit" to close the console.
110
=> rescan
111
112
Or, if you don’t want to get to the hpacucli prompt, you can just enter the following directly in the Linux prompt. The following is exactly same as the above.
113
114
# hpacucli rescan
115
116
2. Display Controller and Disk Status
117
118
To display the detailed status of the controller and the disk status, execute the following command.
119
120
# hpacucli
121
=> ctrl all show config
122
123
Smart Array P410i in Slot 0 (Embedded)    (sn: 50014380101D61C0)
124
125
   array A (SAS, Unused Space: 0  MB)
126
127
      logicaldrive 1 (136.7 GB, RAID 1, OK)
128
129
      physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 146 GB, OK)
130
      physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS, 146 GB, OK)
131
132
   unassigned
133
134
      physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SAS, 300 GB, OK)
135
      physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SAS, 300 GB, OK)
136
      physicaldrive 2I:1:6 (port 2I:box 1:bay 6, SAS, 300 GB, OK)
137
      physicaldrive 2I:1:7 (port 2I:box 1:bay 7, SAS, 300 GB, OK)
138
      physicaldrive 2I:1:8 (port 2I:box 1:bay 8, SAS, 300 GB, OK)
139
140
   SEP (Vendor ID PMCSIERA, Model  SRC 8x6G) 250 (WWID: 50014380101D61CF)
141
142
In this example, as shown in the above output, we have total 7 physical drives. The first RAID group RAID 1 contains 2 physical drives and the remaining physical drives are not assigned to any of the logical drives.
143
3. View Controller Status
144
145
To display the status of just the controller, do the following. In this example, the controller is working perfectly without any issues.
146
147
=> ctrl all show status
148
149
Smart Array P410i in Slot 0 (Embedded)
150
   Controller Status: OK
151
   Cache Status: OK
152
153
4. View Drive Status
154
155
To display the status of the physical drive, do the following. In this example, we have two 146GB physical drives, and 5 300GB physical drives, and all are in perfect condition.
156
157
=> ctrl slot=0 pd all show status
158
159
   physicaldrive 1I:1:1 (port 1I:box 1:bay 1, 146 GB): OK
160
   physicaldrive 1I:1:2 (port 1I:box 1:bay 2, 146 GB): OK
161
   physicaldrive 1I:1:3 (port 1I:box 1:bay 3, 300 GB): OK
162
   physicaldrive 1I:1:4 (port 1I:box 1:bay 4, 300 GB): OK
163
   physicaldrive 2I:1:6 (port 2I:box 1:bay 6, 300 GB): OK
164
   physicaldrive 2I:1:7 (port 2I:box 1:bay 7, 300 GB): OK
165
   physicaldrive 2I:1:8 (port 2I:box 1:bay 8, 300 GB): OK
166
167
5. View Individual Drive Status
168
169
To display the detail status of a specific physical drive, do the following.
170
171
In this example, we like to know the status of “pd” (physical disk) in slot 0. The specific disk is “2I:1:6”, which we figured it out from the output of the previous command.
172
173
As shown in the output below, this displays the Serial Number, Make, Model, Size and Fireware version of this specific disk. This can be very helpful during troubleshooting.
174
175
=> ctrl slot=0 pd 2I:1:6 show detail
176
177
Smart Array P410i in Slot 0 (Embedded)
178
179
   unassigned
180
181
      physicaldrive 2I:1:6
182
         Port: 2I
183
         Box: 1
184
         Bay: 6
185
         Status: OK
186
         Drive Type: Unassigned Drive
187
         Interface Type: SAS
188
         Size: 300 GB
189
         Rotational Speed: 10000
190
         Firmware Revision: HPD4
191
         Serial Number: EB01PC416C4C1214
192
         Model: HP      EG0300FBDSP
193
         Current Temperature (C): 38
194
         Maximum Temperature (C): 56
195
         PHY Count: 2
196
         PHY Transfer Rate: 6.0Gbps, Unknown
197
198
6. View All Logical Drives
199
200
The following command will display all available logical drives on the system. As shown in the output below, we currently have only one logical drive in RAID 1 with total size of around 136GB.
201
202
=> ctrl slot=0 ld all show
203
204
Smart Array P410i in Slot 0 (Embedded)
205
206
   array A
207
208
      logicaldrive 1 (136.7 GB, RAID 1, OK)
209
210
7. Create New RAID 0 Logical Drive
211
212
Execute the following command to create a new logical drive using RAID 0 option.
213
214
=> ctrl slot=0 create type=ld drives=1I:1:3 raid=0
215
216
The above command creates a logical drive with the physical drives 1I:1:3 on RAID 0 configuration in slot 0.
217
8. Create New RAID 1 Logical Drive
218
219
Execute the following command to create a new logical drive using RAID 1 option.
220
221
=> ctrl slot=0 create type=ld drives=1I:1:3,1I:1:4 raid=1
222
223
The above command creates a logical drive with the two physical drives 1I:1:3 and 1I:1:4 on RAID 1 configuration in slot 0.
224
9. Create New RAID 5 Logical Drive
225
226
Execute the following command to create a new logical drive using RAID 5 option.
227
228
=> ctrl slot=0 create type=ld drives=1I:1:3,1I:1:4,2I:1:6,2I:1:7,2I:1:8 raid=5
229
230
The above command creates a logical drive with the five physical drives on RAID 5 configuration in slot 0.
231
232
Once these logical drives are created, you should see the disks from the fdisk and you can format it from there and start using it.
233
234
After you create a logical drive, execute the following command to verify that the LD got created. In this example, it shows that the RAID 5 logical drive got created successfully.
235
236
=> ctrl slot=0 ld all show status
237
238
   logicaldrive 1 (136.7 GB, RAID 1): OK
239
   logicaldrive 2 (1.1 TB, RAID 5): OK
240
241
10. Rescan for New Devices
242
243
If you’ve added new physical hard disk, they won’t automatically show-up immediately. You have to scan for new devices as shown below.
244
245
=> rescan
246
247
11. View Detailed Logical Drive Status
248
249
To display the detailed status of the logical drive, do the following:
250
251
=> ctrl slot=0 ld 2 show
252
253
Smart Array P410i in Slot 0 (Embedded)
254
255
   array B
256
257
      Logical Drive: 2
258
         Size: 1.1 TB
259
         Fault Tolerance: RAID 5
260
         Heads: 255
261
         Sectors Per Track: 32
262
         Cylinders: 65535
263
         Strip Size: 256 KB
264
         Full Stripe Size: 1024 KB
265
         Status: OK
266
         Caching:  Enabled
267
         Parity Initialization Status: In Progress
268
         Unique Identifier: 600508B1001031303144363143301000
269
         Disk Name: /dev/cciss/c0d1
270
         Mount Points: None
271
         Logical Drive Label: A4967E2950014380101D61C008BE
272
         Drive Type: Data
273
274
The above shows the RAID type, the disk name assigned to the logical drive, and other information about the logical drive number 2.
275
12. Delete Logical Drive
276
277
To delete a logical drive with the number 2 use the below command.
278
279
=> ctrl slot=0 ld 2 delete
280
281
Warning: Deleting an array can cause other array letters to become renamed.
282
         E.g. Deleting array A from arrays A,B,C will result in two remaining
283
         arrays A,B ... not B,C
284
285
Warning: Deleting the specified device(s) will result in data being lost.
286
         Continue? (y/n) y
287
288
13. Add New Physical Drive to Logical Volume
289
290
To add the new drives to existing logical volume, do the following.
291
292
=> ctrl slot=0 ld 2 add drives=2I:1:6,2I:1:7
293
294
In this example, we are adding two additional drives specified above to the logical volume number 2.
295
14. Add Spare Disks
296
297
To add the spare disks to arrays that can be used in case of disk failures on one of the logical drives, do the following:
298
299
=> ctrl slot=0 array all add spares=2I:1:6,2I:1:7
300
301
In this example, we are adding two spare disks to the array.
302
15. Enable or Disable Cache
303
304
The below commands enable or disable cache for the entire slot.
305
306
=> ctrl slot=0 modify dwc=disable
307
308
=> ctrl slot=0 modify dwc=enable
309
310
16. Erase Physical Drive
311
312
Execute the following command to erase a physical drive in array B on slot 0.
313
314
=> ctrl slot=0 pd 2I:1:6 modify erase
315
316
17. Blink Physical Disk LED
317
318
To blink the LED on the physical drives for the logical drive 2, do the following. This will make the LEDs blink on all the physical drives that belongs to logical drive 2.
319
320
=> ctrl slot=0 ld 2 modify led=on
321
322
Once you know which drive belongs to logical drive 2, turn the LED blinking off as shown below.
323
324
=> ctrl slot=0 ld 2 modify led=off
325 11 Nico Schottelius
</pre>
326
327 19 Nico Schottelius
h2. Dell servers disk management (megacli)
328 9 Nico Schottelius
329 10 Nico Schottelius
Listing all disks:
330 1 Nico Schottelius
331 10 Nico Schottelius
<pre>
332
megacli -PDList -aALL
333
</pre>
334 8 Nico Schottelius
335
Adding disks:
336
337
<pre>
338
megacli -CfgLdAdd -r0 [Enclosure Device ID:slot] -aX (X : host is 0. md-array is 1)
339
340
# Sample call, if enclosure and slot are KNOWN (aka not N/A)
341
megacli -CfgLdAdd -r0 [32:0] -a0
342
343
# Sample call, if enclosure is N/A
344
megacli -CfgLdAdd -r0 [:0] -a0
345
</pre>
346
347
Remove cache of disks that are not in the server anymore:
348
349
<pre>
350
megacli -DiscardPreservedCache -Lall -aAll
351
</pre>
352
353
Remove foreign configurations on foreign disks
354
355
<pre>
356
megacli -CfgForeign -Clear -aAll
357
</pre>
358
359
Do both in many cases:
360
361
<pre>
362
megacli -DiscardPreservedCache -Lall -aAll
363
megacli -CfgForeign -Clear -aAll
364
</pre>
365
366 20 Nico Schottelius
Growing a raid6
367
368
<pre>
369
megacli -ldrecon  -Start -r6 -Add -PhysDrv[12:4] -l0 -a0
370
</pre>
371
372 21 Nico Schottelius
Deleting a logical drive
373
374
<pre>
375
root@2157f4626763:/# megacli -CfgLdDel -L0 -a0
376
                                     
377
Adapter 0: Deleted Virtual Drive-0(target id-0)
378
379
Exit Code: 0x00
380
root@2157f4626763:/# 
381
</pre>
382 20 Nico Schottelius
383 1 Nico Schottelius
h2. SEE ALSO
384
385
* [[Managing OpenWRT]]
386 7 Nico Schottelius
* [[The_ungleich_ceph_handbook]]