The ungleich hardware maintenance guide » History » Version 18
Nico Schottelius, 02/12/2022 04:45 PM
1 | 2 | Nico Schottelius | {{toc}} |
---|---|---|---|
2 | |||
3 | 1 | Nico Schottelius | h1. The ungleich hardware maintenance guide |
4 | |||
5 | This guide describes common operations on hardware we use. |
||
6 | |||
7 | 13 | Nico Schottelius | h2. Using the ungleich-hardware container in kubernetes and docker |
8 | 12 | Nico Schottelius | |
9 | 13 | Nico Schottelius | To manage hardware on server1 in kubernetes, you can use: |
10 | 12 | Nico Schottelius | |
11 | <pre> |
||
12 | apiVersion: v1 |
||
13 | kind: Pod |
||
14 | metadata: |
||
15 | name: ungleich-hardware |
||
16 | spec: |
||
17 | containers: |
||
18 | - name: ungleich-hardware |
||
19 | image: harbor.ungleich.svc.p10.k8s.ooo/ungleich-public/ungleich-hardware:0.0.3 |
||
20 | args: |
||
21 | - sleep |
||
22 | - "1000000" |
||
23 | volumeMounts: |
||
24 | - mountPath: /dev |
||
25 | name: dev |
||
26 | securityContext: |
||
27 | privileged: true |
||
28 | nodeSelector: |
||
29 | kubernetes.io/hostname: "server1" |
||
30 | |||
31 | volumes: |
||
32 | - name: dev |
||
33 | hostPath: |
||
34 | 1 | Nico Schottelius | path: /dev |
35 | |||
36 | 13 | Nico Schottelius | </pre> |
37 | |||
38 | To use it wit docker: |
||
39 | |||
40 | <pre> |
||
41 | 14 | Nico Schottelius | docker run -v /dev:/dev --privileged -ti harbor.ungleich.svc.p10.k8s.ooo/ungleich-public/ungleich-hardware:0.0.3 |
42 | 12 | Nico Schottelius | </pre> |
43 | |||
44 | 1 | Nico Schottelius | h2. APU Bios Update |
45 | |||
46 | * Download the correct bios from https://pcengines.github.io/ |
||
47 | ** Check whether it's apu1/2/3/4 before downloading |
||
48 | * Install flashrom |
||
49 | * "Flash bios using flashrom":https://github.com/pcengines/apu2-documentation/blob/master/docs/firmware_flashing.md |
||
50 | ** @flashrom -w THEROMFILE -p internal@ |
||
51 | |||
52 | 4 | Nico Schottelius | h2. APU Serial and bootloader configuration |
53 | 3 | Nico Schottelius | |
54 | * Ensure that the bootloader has "console=ttyS0,115200" configured |
||
55 | * Ensure that there is a getty running on serial |
||
56 | * Use grub-bios as the bootloader |
||
57 | ** Install using @grub-install /dev/sda@ |
||
58 | 1 | Nico Schottelius | |
59 | 5 | Nico Schottelius | h2. Updating the Perc H800 SAS controller |
60 | |||
61 | 6 | Nico Schottelius | * @wget 'https://dl.dell.com/FOLDER03292738M/3/SAS-RAID_Firmware_XKF5X_LN_12.10.7-0001_A13.BIN?uid=4b8a2506-f4d4-46a9-ab19-3c2a5008a782&fn=SAS-RAID_Firmware_XKF5X_LN_12.10.7-0001_A13.BIN' -O SAS-RAID_Firmware_XKF5X_LN_12.10.7-0001_A13.BIN@ |
62 | 5 | Nico Schottelius | * chmod u+x SAS-RAID_Firmware_XKF5X_LN_12.10.7-0001_A13.BIN |
63 | * ./SAS-RAID_Firmware_XKF5X_LN_12.10.7-0001_A13.BIN |
||
64 | |||
65 | 8 | Nico Schottelius | h2. HP servers disk management |
66 | |||
67 | 17 | Nico Schottelius | * See also https://www.thegeekstuff.com/2014/07/hpacucli-examples/ |
68 | 9 | Nico Schottelius | |
69 | 16 | Nico Schottelius | Required kernel modules: |
70 | |||
71 | <pre> |
||
72 | sg |
||
73 | cciss |
||
74 | </pre> |
||
75 | |||
76 | 9 | Nico Schottelius | Show all drives/controller overview: |
77 | |||
78 | <pre> |
||
79 | 15 | Nico Schottelius | hpacucli ctrl all show config |
80 | |||
81 | 9 | Nico Schottelius | hpacucli ctrl slot=0 pd all show |
82 | </pre> |
||
83 | |||
84 | 11 | Nico Schottelius | Add a disk as raid0: |
85 | |||
86 | <pre> |
||
87 | 1 | Nico Schottelius | hpacucli ctrl slot=0 create type=ld drives=1I:1:1 raid=0 |
88 | 17 | Nico Schottelius | </pre> |
89 | |||
90 | 18 | Nico Schottelius | Deleting a logical drive: |
91 | |||
92 | <pre> |
||
93 | ctrl slot=0 ld 2 delete |
||
94 | </pre> |
||
95 | |||
96 | 17 | Nico Schottelius | Copy from https://www.thegeekstuff.com/2014/07/hpacucli-examples/ (to cache it mainly): |
97 | |||
98 | <pre> |
||
99 | 1. Two ways to execute the command |
||
100 | |||
101 | When you type the command hpacucli, it will display a “=>” prompt as shown below where you can enter all the hpacucli commands explained in the article. |
||
102 | |||
103 | # hpacucli |
||
104 | HP Array Configuration Utility CLI 9.20.9.0 |
||
105 | Detecting Controllers...Done. |
||
106 | Type "help" for a list of supported commands. |
||
107 | Type "exit" to close the console. |
||
108 | => rescan |
||
109 | |||
110 | Or, if you don’t want to get to the hpacucli prompt, you can just enter the following directly in the Linux prompt. The following is exactly same as the above. |
||
111 | |||
112 | # hpacucli rescan |
||
113 | |||
114 | 2. Display Controller and Disk Status |
||
115 | |||
116 | To display the detailed status of the controller and the disk status, execute the following command. |
||
117 | |||
118 | # hpacucli |
||
119 | => ctrl all show config |
||
120 | |||
121 | Smart Array P410i in Slot 0 (Embedded) (sn: 50014380101D61C0) |
||
122 | |||
123 | array A (SAS, Unused Space: 0 MB) |
||
124 | |||
125 | logicaldrive 1 (136.7 GB, RAID 1, OK) |
||
126 | |||
127 | physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 146 GB, OK) |
||
128 | physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS, 146 GB, OK) |
||
129 | |||
130 | unassigned |
||
131 | |||
132 | physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SAS, 300 GB, OK) |
||
133 | physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SAS, 300 GB, OK) |
||
134 | physicaldrive 2I:1:6 (port 2I:box 1:bay 6, SAS, 300 GB, OK) |
||
135 | physicaldrive 2I:1:7 (port 2I:box 1:bay 7, SAS, 300 GB, OK) |
||
136 | physicaldrive 2I:1:8 (port 2I:box 1:bay 8, SAS, 300 GB, OK) |
||
137 | |||
138 | SEP (Vendor ID PMCSIERA, Model SRC 8x6G) 250 (WWID: 50014380101D61CF) |
||
139 | |||
140 | In this example, as shown in the above output, we have total 7 physical drives. The first RAID group RAID 1 contains 2 physical drives and the remaining physical drives are not assigned to any of the logical drives. |
||
141 | 3. View Controller Status |
||
142 | |||
143 | To display the status of just the controller, do the following. In this example, the controller is working perfectly without any issues. |
||
144 | |||
145 | => ctrl all show status |
||
146 | |||
147 | Smart Array P410i in Slot 0 (Embedded) |
||
148 | Controller Status: OK |
||
149 | Cache Status: OK |
||
150 | |||
151 | 4. View Drive Status |
||
152 | |||
153 | To display the status of the physical drive, do the following. In this example, we have two 146GB physical drives, and 5 300GB physical drives, and all are in perfect condition. |
||
154 | |||
155 | => ctrl slot=0 pd all show status |
||
156 | |||
157 | physicaldrive 1I:1:1 (port 1I:box 1:bay 1, 146 GB): OK |
||
158 | physicaldrive 1I:1:2 (port 1I:box 1:bay 2, 146 GB): OK |
||
159 | physicaldrive 1I:1:3 (port 1I:box 1:bay 3, 300 GB): OK |
||
160 | physicaldrive 1I:1:4 (port 1I:box 1:bay 4, 300 GB): OK |
||
161 | physicaldrive 2I:1:6 (port 2I:box 1:bay 6, 300 GB): OK |
||
162 | physicaldrive 2I:1:7 (port 2I:box 1:bay 7, 300 GB): OK |
||
163 | physicaldrive 2I:1:8 (port 2I:box 1:bay 8, 300 GB): OK |
||
164 | |||
165 | 5. View Individual Drive Status |
||
166 | |||
167 | To display the detail status of a specific physical drive, do the following. |
||
168 | |||
169 | In this example, we like to know the status of “pd” (physical disk) in slot 0. The specific disk is “2I:1:6”, which we figured it out from the output of the previous command. |
||
170 | |||
171 | As shown in the output below, this displays the Serial Number, Make, Model, Size and Fireware version of this specific disk. This can be very helpful during troubleshooting. |
||
172 | |||
173 | => ctrl slot=0 pd 2I:1:6 show detail |
||
174 | |||
175 | Smart Array P410i in Slot 0 (Embedded) |
||
176 | |||
177 | unassigned |
||
178 | |||
179 | physicaldrive 2I:1:6 |
||
180 | Port: 2I |
||
181 | Box: 1 |
||
182 | Bay: 6 |
||
183 | Status: OK |
||
184 | Drive Type: Unassigned Drive |
||
185 | Interface Type: SAS |
||
186 | Size: 300 GB |
||
187 | Rotational Speed: 10000 |
||
188 | Firmware Revision: HPD4 |
||
189 | Serial Number: EB01PC416C4C1214 |
||
190 | Model: HP EG0300FBDSP |
||
191 | Current Temperature (C): 38 |
||
192 | Maximum Temperature (C): 56 |
||
193 | PHY Count: 2 |
||
194 | PHY Transfer Rate: 6.0Gbps, Unknown |
||
195 | |||
196 | 6. View All Logical Drives |
||
197 | |||
198 | The following command will display all available logical drives on the system. As shown in the output below, we currently have only one logical drive in RAID 1 with total size of around 136GB. |
||
199 | |||
200 | => ctrl slot=0 ld all show |
||
201 | |||
202 | Smart Array P410i in Slot 0 (Embedded) |
||
203 | |||
204 | array A |
||
205 | |||
206 | logicaldrive 1 (136.7 GB, RAID 1, OK) |
||
207 | |||
208 | 7. Create New RAID 0 Logical Drive |
||
209 | |||
210 | Execute the following command to create a new logical drive using RAID 0 option. |
||
211 | |||
212 | => ctrl slot=0 create type=ld drives=1I:1:3 raid=0 |
||
213 | |||
214 | The above command creates a logical drive with the physical drives 1I:1:3 on RAID 0 configuration in slot 0. |
||
215 | 8. Create New RAID 1 Logical Drive |
||
216 | |||
217 | Execute the following command to create a new logical drive using RAID 1 option. |
||
218 | |||
219 | => ctrl slot=0 create type=ld drives=1I:1:3,1I:1:4 raid=1 |
||
220 | |||
221 | The above command creates a logical drive with the two physical drives 1I:1:3 and 1I:1:4 on RAID 1 configuration in slot 0. |
||
222 | 9. Create New RAID 5 Logical Drive |
||
223 | |||
224 | Execute the following command to create a new logical drive using RAID 5 option. |
||
225 | |||
226 | => ctrl slot=0 create type=ld drives=1I:1:3,1I:1:4,2I:1:6,2I:1:7,2I:1:8 raid=5 |
||
227 | |||
228 | The above command creates a logical drive with the five physical drives on RAID 5 configuration in slot 0. |
||
229 | |||
230 | Once these logical drives are created, you should see the disks from the fdisk and you can format it from there and start using it. |
||
231 | |||
232 | After you create a logical drive, execute the following command to verify that the LD got created. In this example, it shows that the RAID 5 logical drive got created successfully. |
||
233 | |||
234 | => ctrl slot=0 ld all show status |
||
235 | |||
236 | logicaldrive 1 (136.7 GB, RAID 1): OK |
||
237 | logicaldrive 2 (1.1 TB, RAID 5): OK |
||
238 | |||
239 | 10. Rescan for New Devices |
||
240 | |||
241 | If you’ve added new physical hard disk, they won’t automatically show-up immediately. You have to scan for new devices as shown below. |
||
242 | |||
243 | => rescan |
||
244 | |||
245 | 11. View Detailed Logical Drive Status |
||
246 | |||
247 | To display the detailed status of the logical drive, do the following: |
||
248 | |||
249 | => ctrl slot=0 ld 2 show |
||
250 | |||
251 | Smart Array P410i in Slot 0 (Embedded) |
||
252 | |||
253 | array B |
||
254 | |||
255 | Logical Drive: 2 |
||
256 | Size: 1.1 TB |
||
257 | Fault Tolerance: RAID 5 |
||
258 | Heads: 255 |
||
259 | Sectors Per Track: 32 |
||
260 | Cylinders: 65535 |
||
261 | Strip Size: 256 KB |
||
262 | Full Stripe Size: 1024 KB |
||
263 | Status: OK |
||
264 | Caching: Enabled |
||
265 | Parity Initialization Status: In Progress |
||
266 | Unique Identifier: 600508B1001031303144363143301000 |
||
267 | Disk Name: /dev/cciss/c0d1 |
||
268 | Mount Points: None |
||
269 | Logical Drive Label: A4967E2950014380101D61C008BE |
||
270 | Drive Type: Data |
||
271 | |||
272 | The above shows the RAID type, the disk name assigned to the logical drive, and other information about the logical drive number 2. |
||
273 | 12. Delete Logical Drive |
||
274 | |||
275 | To delete a logical drive with the number 2 use the below command. |
||
276 | |||
277 | => ctrl slot=0 ld 2 delete |
||
278 | |||
279 | Warning: Deleting an array can cause other array letters to become renamed. |
||
280 | E.g. Deleting array A from arrays A,B,C will result in two remaining |
||
281 | arrays A,B ... not B,C |
||
282 | |||
283 | Warning: Deleting the specified device(s) will result in data being lost. |
||
284 | Continue? (y/n) y |
||
285 | |||
286 | 13. Add New Physical Drive to Logical Volume |
||
287 | |||
288 | To add the new drives to existing logical volume, do the following. |
||
289 | |||
290 | => ctrl slot=0 ld 2 add drives=2I:1:6,2I:1:7 |
||
291 | |||
292 | In this example, we are adding two additional drives specified above to the logical volume number 2. |
||
293 | 14. Add Spare Disks |
||
294 | |||
295 | To add the spare disks to arrays that can be used in case of disk failures on one of the logical drives, do the following: |
||
296 | |||
297 | => ctrl slot=0 array all add spares=2I:1:6,2I:1:7 |
||
298 | |||
299 | In this example, we are adding two spare disks to the array. |
||
300 | 15. Enable or Disable Cache |
||
301 | |||
302 | The below commands enable or disable cache for the entire slot. |
||
303 | |||
304 | => ctrl slot=0 modify dwc=disable |
||
305 | |||
306 | => ctrl slot=0 modify dwc=enable |
||
307 | |||
308 | 16. Erase Physical Drive |
||
309 | |||
310 | Execute the following command to erase a physical drive in array B on slot 0. |
||
311 | |||
312 | => ctrl slot=0 pd 2I:1:6 modify erase |
||
313 | |||
314 | 17. Blink Physical Disk LED |
||
315 | |||
316 | To blink the LED on the physical drives for the logical drive 2, do the following. This will make the LEDs blink on all the physical drives that belongs to logical drive 2. |
||
317 | |||
318 | => ctrl slot=0 ld 2 modify led=on |
||
319 | |||
320 | Once you know which drive belongs to logical drive 2, turn the LED blinking off as shown below. |
||
321 | |||
322 | => ctrl slot=0 ld 2 modify led=off |
||
323 | 11 | Nico Schottelius | </pre> |
324 | |||
325 | 10 | Nico Schottelius | h2. Dell servers disk management |
326 | 9 | Nico Schottelius | |
327 | 10 | Nico Schottelius | Listing all disks: |
328 | 1 | Nico Schottelius | |
329 | 10 | Nico Schottelius | <pre> |
330 | megacli -PDList -aALL |
||
331 | </pre> |
||
332 | 8 | Nico Schottelius | |
333 | Adding disks: |
||
334 | |||
335 | <pre> |
||
336 | megacli -CfgLdAdd -r0 [Enclosure Device ID:slot] -aX (X : host is 0. md-array is 1) |
||
337 | |||
338 | # Sample call, if enclosure and slot are KNOWN (aka not N/A) |
||
339 | megacli -CfgLdAdd -r0 [32:0] -a0 |
||
340 | |||
341 | # Sample call, if enclosure is N/A |
||
342 | megacli -CfgLdAdd -r0 [:0] -a0 |
||
343 | </pre> |
||
344 | |||
345 | Remove cache of disks that are not in the server anymore: |
||
346 | |||
347 | <pre> |
||
348 | megacli -DiscardPreservedCache -Lall -aAll |
||
349 | </pre> |
||
350 | |||
351 | Remove foreign configurations on foreign disks |
||
352 | |||
353 | <pre> |
||
354 | megacli -CfgForeign -Clear -aAll |
||
355 | </pre> |
||
356 | |||
357 | Do both in many cases: |
||
358 | |||
359 | <pre> |
||
360 | megacli -DiscardPreservedCache -Lall -aAll |
||
361 | megacli -CfgForeign -Clear -aAll |
||
362 | </pre> |
||
363 | |||
364 | 1 | Nico Schottelius | h2. SEE ALSO |
365 | |||
366 | * [[Managing OpenWRT]] |
||
367 | 7 | Nico Schottelius | * [[The_ungleich_ceph_handbook]] |