Project

General

Profile

The ungleich hardware maintenance guide » History » Version 18

Nico Schottelius, 02/12/2022 04:45 PM

1 2 Nico Schottelius
{{toc}}
2
3 1 Nico Schottelius
h1. The ungleich hardware maintenance guide
4
5
This guide describes common operations on hardware we use.
6
7 13 Nico Schottelius
h2. Using the ungleich-hardware container in kubernetes and docker
8 12 Nico Schottelius
9 13 Nico Schottelius
To manage hardware on server1 in kubernetes, you can use:
10 12 Nico Schottelius
11
<pre>
12
apiVersion: v1
13
kind: Pod
14
metadata:
15
  name: ungleich-hardware
16
spec:
17
  containers:
18
  - name: ungleich-hardware
19
    image: harbor.ungleich.svc.p10.k8s.ooo/ungleich-public/ungleich-hardware:0.0.3
20
    args:
21
    - sleep
22
    - "1000000"
23
    volumeMounts:
24
      - mountPath: /dev
25
        name: dev
26
    securityContext:
27
      privileged: true
28
  nodeSelector:
29
    kubernetes.io/hostname: "server1"
30
31
  volumes:
32
    - name: dev
33
      hostPath:
34 1 Nico Schottelius
        path: /dev
35
36 13 Nico Schottelius
</pre>
37
38
To use it wit docker:
39
40
<pre>
41 14 Nico Schottelius
docker run -v /dev:/dev --privileged -ti harbor.ungleich.svc.p10.k8s.ooo/ungleich-public/ungleich-hardware:0.0.3
42 12 Nico Schottelius
</pre>
43
44 1 Nico Schottelius
h2. APU Bios Update
45
46
* Download the correct bios from https://pcengines.github.io/
47
** Check whether it's apu1/2/3/4 before downloading
48
* Install flashrom
49
* "Flash bios using flashrom":https://github.com/pcengines/apu2-documentation/blob/master/docs/firmware_flashing.md
50
** @flashrom -w THEROMFILE -p internal@
51
52 4 Nico Schottelius
h2. APU Serial and bootloader configuration
53 3 Nico Schottelius
54
* Ensure that the bootloader has "console=ttyS0,115200" configured
55
* Ensure that there is a getty running on serial
56
* Use grub-bios as the bootloader
57
** Install using @grub-install /dev/sda@
58 1 Nico Schottelius
59 5 Nico Schottelius
h2. Updating the Perc H800 SAS controller
60
61 6 Nico Schottelius
* @wget 'https://dl.dell.com/FOLDER03292738M/3/SAS-RAID_Firmware_XKF5X_LN_12.10.7-0001_A13.BIN?uid=4b8a2506-f4d4-46a9-ab19-3c2a5008a782&fn=SAS-RAID_Firmware_XKF5X_LN_12.10.7-0001_A13.BIN' -O  SAS-RAID_Firmware_XKF5X_LN_12.10.7-0001_A13.BIN@
62 5 Nico Schottelius
* chmod u+x SAS-RAID_Firmware_XKF5X_LN_12.10.7-0001_A13.BIN
63
* ./SAS-RAID_Firmware_XKF5X_LN_12.10.7-0001_A13.BIN
64
65 8 Nico Schottelius
h2. HP servers disk management
66
67 17 Nico Schottelius
* See also https://www.thegeekstuff.com/2014/07/hpacucli-examples/
68 9 Nico Schottelius
69 16 Nico Schottelius
Required kernel modules:
70
71
<pre>
72
sg
73
cciss
74
</pre>
75
76 9 Nico Schottelius
Show all drives/controller overview:
77
78
<pre>
79 15 Nico Schottelius
hpacucli ctrl all show config
80
81 9 Nico Schottelius
hpacucli ctrl slot=0 pd all show
82
</pre>
83
84 11 Nico Schottelius
Add a disk as raid0:
85
86
<pre>
87 1 Nico Schottelius
hpacucli ctrl slot=0 create type=ld drives=1I:1:1 raid=0
88 17 Nico Schottelius
</pre>
89
90 18 Nico Schottelius
Deleting a logical drive:
91
92
<pre>
93
ctrl slot=0 ld 2 delete
94
</pre>
95
96 17 Nico Schottelius
Copy from https://www.thegeekstuff.com/2014/07/hpacucli-examples/ (to cache it mainly):
97
98
<pre>
99
1. Two ways to execute the command
100
101
When you type the command hpacucli, it will display a “=>” prompt as shown below where you can enter all the hpacucli commands explained in the article.
102
103
# hpacucli
104
HP Array Configuration Utility CLI 9.20.9.0
105
Detecting Controllers...Done.
106
Type "help" for a list of supported commands.
107
Type "exit" to close the console.
108
=> rescan
109
110
Or, if you don’t want to get to the hpacucli prompt, you can just enter the following directly in the Linux prompt. The following is exactly same as the above.
111
112
# hpacucli rescan
113
114
2. Display Controller and Disk Status
115
116
To display the detailed status of the controller and the disk status, execute the following command.
117
118
# hpacucli
119
=> ctrl all show config
120
121
Smart Array P410i in Slot 0 (Embedded)    (sn: 50014380101D61C0)
122
123
   array A (SAS, Unused Space: 0  MB)
124
125
      logicaldrive 1 (136.7 GB, RAID 1, OK)
126
127
      physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 146 GB, OK)
128
      physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS, 146 GB, OK)
129
130
   unassigned
131
132
      physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SAS, 300 GB, OK)
133
      physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SAS, 300 GB, OK)
134
      physicaldrive 2I:1:6 (port 2I:box 1:bay 6, SAS, 300 GB, OK)
135
      physicaldrive 2I:1:7 (port 2I:box 1:bay 7, SAS, 300 GB, OK)
136
      physicaldrive 2I:1:8 (port 2I:box 1:bay 8, SAS, 300 GB, OK)
137
138
   SEP (Vendor ID PMCSIERA, Model  SRC 8x6G) 250 (WWID: 50014380101D61CF)
139
140
In this example, as shown in the above output, we have total 7 physical drives. The first RAID group RAID 1 contains 2 physical drives and the remaining physical drives are not assigned to any of the logical drives.
141
3. View Controller Status
142
143
To display the status of just the controller, do the following. In this example, the controller is working perfectly without any issues.
144
145
=> ctrl all show status
146
147
Smart Array P410i in Slot 0 (Embedded)
148
   Controller Status: OK
149
   Cache Status: OK
150
151
4. View Drive Status
152
153
To display the status of the physical drive, do the following. In this example, we have two 146GB physical drives, and 5 300GB physical drives, and all are in perfect condition.
154
155
=> ctrl slot=0 pd all show status
156
157
   physicaldrive 1I:1:1 (port 1I:box 1:bay 1, 146 GB): OK
158
   physicaldrive 1I:1:2 (port 1I:box 1:bay 2, 146 GB): OK
159
   physicaldrive 1I:1:3 (port 1I:box 1:bay 3, 300 GB): OK
160
   physicaldrive 1I:1:4 (port 1I:box 1:bay 4, 300 GB): OK
161
   physicaldrive 2I:1:6 (port 2I:box 1:bay 6, 300 GB): OK
162
   physicaldrive 2I:1:7 (port 2I:box 1:bay 7, 300 GB): OK
163
   physicaldrive 2I:1:8 (port 2I:box 1:bay 8, 300 GB): OK
164
165
5. View Individual Drive Status
166
167
To display the detail status of a specific physical drive, do the following.
168
169
In this example, we like to know the status of “pd” (physical disk) in slot 0. The specific disk is “2I:1:6”, which we figured it out from the output of the previous command.
170
171
As shown in the output below, this displays the Serial Number, Make, Model, Size and Fireware version of this specific disk. This can be very helpful during troubleshooting.
172
173
=> ctrl slot=0 pd 2I:1:6 show detail
174
175
Smart Array P410i in Slot 0 (Embedded)
176
177
   unassigned
178
179
      physicaldrive 2I:1:6
180
         Port: 2I
181
         Box: 1
182
         Bay: 6
183
         Status: OK
184
         Drive Type: Unassigned Drive
185
         Interface Type: SAS
186
         Size: 300 GB
187
         Rotational Speed: 10000
188
         Firmware Revision: HPD4
189
         Serial Number: EB01PC416C4C1214
190
         Model: HP      EG0300FBDSP
191
         Current Temperature (C): 38
192
         Maximum Temperature (C): 56
193
         PHY Count: 2
194
         PHY Transfer Rate: 6.0Gbps, Unknown
195
196
6. View All Logical Drives
197
198
The following command will display all available logical drives on the system. As shown in the output below, we currently have only one logical drive in RAID 1 with total size of around 136GB.
199
200
=> ctrl slot=0 ld all show
201
202
Smart Array P410i in Slot 0 (Embedded)
203
204
   array A
205
206
      logicaldrive 1 (136.7 GB, RAID 1, OK)
207
208
7. Create New RAID 0 Logical Drive
209
210
Execute the following command to create a new logical drive using RAID 0 option.
211
212
=> ctrl slot=0 create type=ld drives=1I:1:3 raid=0
213
214
The above command creates a logical drive with the physical drives 1I:1:3 on RAID 0 configuration in slot 0.
215
8. Create New RAID 1 Logical Drive
216
217
Execute the following command to create a new logical drive using RAID 1 option.
218
219
=> ctrl slot=0 create type=ld drives=1I:1:3,1I:1:4 raid=1
220
221
The above command creates a logical drive with the two physical drives 1I:1:3 and 1I:1:4 on RAID 1 configuration in slot 0.
222
9. Create New RAID 5 Logical Drive
223
224
Execute the following command to create a new logical drive using RAID 5 option.
225
226
=> ctrl slot=0 create type=ld drives=1I:1:3,1I:1:4,2I:1:6,2I:1:7,2I:1:8 raid=5
227
228
The above command creates a logical drive with the five physical drives on RAID 5 configuration in slot 0.
229
230
Once these logical drives are created, you should see the disks from the fdisk and you can format it from there and start using it.
231
232
After you create a logical drive, execute the following command to verify that the LD got created. In this example, it shows that the RAID 5 logical drive got created successfully.
233
234
=> ctrl slot=0 ld all show status
235
236
   logicaldrive 1 (136.7 GB, RAID 1): OK
237
   logicaldrive 2 (1.1 TB, RAID 5): OK
238
239
10. Rescan for New Devices
240
241
If you’ve added new physical hard disk, they won’t automatically show-up immediately. You have to scan for new devices as shown below.
242
243
=> rescan
244
245
11. View Detailed Logical Drive Status
246
247
To display the detailed status of the logical drive, do the following:
248
249
=> ctrl slot=0 ld 2 show
250
251
Smart Array P410i in Slot 0 (Embedded)
252
253
   array B
254
255
      Logical Drive: 2
256
         Size: 1.1 TB
257
         Fault Tolerance: RAID 5
258
         Heads: 255
259
         Sectors Per Track: 32
260
         Cylinders: 65535
261
         Strip Size: 256 KB
262
         Full Stripe Size: 1024 KB
263
         Status: OK
264
         Caching:  Enabled
265
         Parity Initialization Status: In Progress
266
         Unique Identifier: 600508B1001031303144363143301000
267
         Disk Name: /dev/cciss/c0d1
268
         Mount Points: None
269
         Logical Drive Label: A4967E2950014380101D61C008BE
270
         Drive Type: Data
271
272
The above shows the RAID type, the disk name assigned to the logical drive, and other information about the logical drive number 2.
273
12. Delete Logical Drive
274
275
To delete a logical drive with the number 2 use the below command.
276
277
=> ctrl slot=0 ld 2 delete
278
279
Warning: Deleting an array can cause other array letters to become renamed.
280
         E.g. Deleting array A from arrays A,B,C will result in two remaining
281
         arrays A,B ... not B,C
282
283
Warning: Deleting the specified device(s) will result in data being lost.
284
         Continue? (y/n) y
285
286
13. Add New Physical Drive to Logical Volume
287
288
To add the new drives to existing logical volume, do the following.
289
290
=> ctrl slot=0 ld 2 add drives=2I:1:6,2I:1:7
291
292
In this example, we are adding two additional drives specified above to the logical volume number 2.
293
14. Add Spare Disks
294
295
To add the spare disks to arrays that can be used in case of disk failures on one of the logical drives, do the following:
296
297
=> ctrl slot=0 array all add spares=2I:1:6,2I:1:7
298
299
In this example, we are adding two spare disks to the array.
300
15. Enable or Disable Cache
301
302
The below commands enable or disable cache for the entire slot.
303
304
=> ctrl slot=0 modify dwc=disable
305
306
=> ctrl slot=0 modify dwc=enable
307
308
16. Erase Physical Drive
309
310
Execute the following command to erase a physical drive in array B on slot 0.
311
312
=> ctrl slot=0 pd 2I:1:6 modify erase
313
314
17. Blink Physical Disk LED
315
316
To blink the LED on the physical drives for the logical drive 2, do the following. This will make the LEDs blink on all the physical drives that belongs to logical drive 2.
317
318
=> ctrl slot=0 ld 2 modify led=on
319
320
Once you know which drive belongs to logical drive 2, turn the LED blinking off as shown below.
321
322
=> ctrl slot=0 ld 2 modify led=off
323 11 Nico Schottelius
</pre>
324
325 10 Nico Schottelius
h2. Dell servers disk management
326 9 Nico Schottelius
327 10 Nico Schottelius
Listing all disks:
328 1 Nico Schottelius
329 10 Nico Schottelius
<pre>
330
megacli -PDList -aALL
331
</pre>
332 8 Nico Schottelius
333
Adding disks:
334
335
<pre>
336
megacli -CfgLdAdd -r0 [Enclosure Device ID:slot] -aX (X : host is 0. md-array is 1)
337
338
# Sample call, if enclosure and slot are KNOWN (aka not N/A)
339
megacli -CfgLdAdd -r0 [32:0] -a0
340
341
# Sample call, if enclosure is N/A
342
megacli -CfgLdAdd -r0 [:0] -a0
343
</pre>
344
345
Remove cache of disks that are not in the server anymore:
346
347
<pre>
348
megacli -DiscardPreservedCache -Lall -aAll
349
</pre>
350
351
Remove foreign configurations on foreign disks
352
353
<pre>
354
megacli -CfgForeign -Clear -aAll
355
</pre>
356
357
Do both in many cases:
358
359
<pre>
360
megacli -DiscardPreservedCache -Lall -aAll
361
megacli -CfgForeign -Clear -aAll
362
</pre>
363
364 1 Nico Schottelius
h2. SEE ALSO
365
366
* [[Managing OpenWRT]]
367 7 Nico Schottelius
* [[The_ungleich_ceph_handbook]]