The ungleich hardware maintenance guide » History » Revision 18
Revision 17 (Nico Schottelius, 02/12/2022 01:17 PM) → Revision 18/23 (Nico Schottelius, 02/12/2022 04:45 PM)
{{toc}}
h1. The ungleich hardware maintenance guide
This guide describes common operations on hardware we use.
h2. Using the ungleich-hardware container in kubernetes and docker
To manage hardware on server1 in kubernetes, you can use:
<pre>
apiVersion: v1
kind: Pod
metadata:
name: ungleich-hardware
spec:
containers:
- name: ungleich-hardware
image: harbor.ungleich.svc.p10.k8s.ooo/ungleich-public/ungleich-hardware:0.0.3
args:
- sleep
- "1000000"
volumeMounts:
- mountPath: /dev
name: dev
securityContext:
privileged: true
nodeSelector:
kubernetes.io/hostname: "server1"
volumes:
- name: dev
hostPath:
path: /dev
</pre>
To use it wit docker:
<pre>
docker run -v /dev:/dev --privileged -ti harbor.ungleich.svc.p10.k8s.ooo/ungleich-public/ungleich-hardware:0.0.3
</pre>
h2. APU Bios Update
* Download the correct bios from https://pcengines.github.io/
** Check whether it's apu1/2/3/4 before downloading
* Install flashrom
* "Flash bios using flashrom":https://github.com/pcengines/apu2-documentation/blob/master/docs/firmware_flashing.md
** @flashrom -w THEROMFILE -p internal@
h2. APU Serial and bootloader configuration
* Ensure that the bootloader has "console=ttyS0,115200" configured
* Ensure that there is a getty running on serial
* Use grub-bios as the bootloader
** Install using @grub-install /dev/sda@
h2. Updating the Perc H800 SAS controller
* @wget 'https://dl.dell.com/FOLDER03292738M/3/SAS-RAID_Firmware_XKF5X_LN_12.10.7-0001_A13.BIN?uid=4b8a2506-f4d4-46a9-ab19-3c2a5008a782&fn=SAS-RAID_Firmware_XKF5X_LN_12.10.7-0001_A13.BIN' -O SAS-RAID_Firmware_XKF5X_LN_12.10.7-0001_A13.BIN@
* chmod u+x SAS-RAID_Firmware_XKF5X_LN_12.10.7-0001_A13.BIN
* ./SAS-RAID_Firmware_XKF5X_LN_12.10.7-0001_A13.BIN
h2. HP servers disk management
* See also https://www.thegeekstuff.com/2014/07/hpacucli-examples/
Required kernel modules:
<pre>
sg
cciss
</pre>
Show all drives/controller overview:
<pre>
hpacucli ctrl all show config
hpacucli ctrl slot=0 pd all show
</pre>
Add a disk as raid0:
<pre>
hpacucli ctrl slot=0 create type=ld drives=1I:1:1 raid=0
</pre>
Deleting a logical drive:
<pre>
ctrl slot=0 ld 2 delete
</pre>
Copy from https://www.thegeekstuff.com/2014/07/hpacucli-examples/ (to cache it mainly):
<pre>
1. Two ways to execute the command
When you type the command hpacucli, it will display a “=>” prompt as shown below where you can enter all the hpacucli commands explained in the article.
# hpacucli
HP Array Configuration Utility CLI 9.20.9.0
Detecting Controllers...Done.
Type "help" for a list of supported commands.
Type "exit" to close the console.
=> rescan
Or, if you don’t want to get to the hpacucli prompt, you can just enter the following directly in the Linux prompt. The following is exactly same as the above.
# hpacucli rescan
2. Display Controller and Disk Status
To display the detailed status of the controller and the disk status, execute the following command.
# hpacucli
=> ctrl all show config
Smart Array P410i in Slot 0 (Embedded) (sn: 50014380101D61C0)
array A (SAS, Unused Space: 0 MB)
logicaldrive 1 (136.7 GB, RAID 1, OK)
physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 146 GB, OK)
physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS, 146 GB, OK)
unassigned
physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SAS, 300 GB, OK)
physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SAS, 300 GB, OK)
physicaldrive 2I:1:6 (port 2I:box 1:bay 6, SAS, 300 GB, OK)
physicaldrive 2I:1:7 (port 2I:box 1:bay 7, SAS, 300 GB, OK)
physicaldrive 2I:1:8 (port 2I:box 1:bay 8, SAS, 300 GB, OK)
SEP (Vendor ID PMCSIERA, Model SRC 8x6G) 250 (WWID: 50014380101D61CF)
In this example, as shown in the above output, we have total 7 physical drives. The first RAID group RAID 1 contains 2 physical drives and the remaining physical drives are not assigned to any of the logical drives.
3. View Controller Status
To display the status of just the controller, do the following. In this example, the controller is working perfectly without any issues.
=> ctrl all show status
Smart Array P410i in Slot 0 (Embedded)
Controller Status: OK
Cache Status: OK
4. View Drive Status
To display the status of the physical drive, do the following. In this example, we have two 146GB physical drives, and 5 300GB physical drives, and all are in perfect condition.
=> ctrl slot=0 pd all show status
physicaldrive 1I:1:1 (port 1I:box 1:bay 1, 146 GB): OK
physicaldrive 1I:1:2 (port 1I:box 1:bay 2, 146 GB): OK
physicaldrive 1I:1:3 (port 1I:box 1:bay 3, 300 GB): OK
physicaldrive 1I:1:4 (port 1I:box 1:bay 4, 300 GB): OK
physicaldrive 2I:1:6 (port 2I:box 1:bay 6, 300 GB): OK
physicaldrive 2I:1:7 (port 2I:box 1:bay 7, 300 GB): OK
physicaldrive 2I:1:8 (port 2I:box 1:bay 8, 300 GB): OK
5. View Individual Drive Status
To display the detail status of a specific physical drive, do the following.
In this example, we like to know the status of “pd” (physical disk) in slot 0. The specific disk is “2I:1:6”, which we figured it out from the output of the previous command.
As shown in the output below, this displays the Serial Number, Make, Model, Size and Fireware version of this specific disk. This can be very helpful during troubleshooting.
=> ctrl slot=0 pd 2I:1:6 show detail
Smart Array P410i in Slot 0 (Embedded)
unassigned
physicaldrive 2I:1:6
Port: 2I
Box: 1
Bay: 6
Status: OK
Drive Type: Unassigned Drive
Interface Type: SAS
Size: 300 GB
Rotational Speed: 10000
Firmware Revision: HPD4
Serial Number: EB01PC416C4C1214
Model: HP EG0300FBDSP
Current Temperature (C): 38
Maximum Temperature (C): 56
PHY Count: 2
PHY Transfer Rate: 6.0Gbps, Unknown
6. View All Logical Drives
The following command will display all available logical drives on the system. As shown in the output below, we currently have only one logical drive in RAID 1 with total size of around 136GB.
=> ctrl slot=0 ld all show
Smart Array P410i in Slot 0 (Embedded)
array A
logicaldrive 1 (136.7 GB, RAID 1, OK)
7. Create New RAID 0 Logical Drive
Execute the following command to create a new logical drive using RAID 0 option.
=> ctrl slot=0 create type=ld drives=1I:1:3 raid=0
The above command creates a logical drive with the physical drives 1I:1:3 on RAID 0 configuration in slot 0.
8. Create New RAID 1 Logical Drive
Execute the following command to create a new logical drive using RAID 1 option.
=> ctrl slot=0 create type=ld drives=1I:1:3,1I:1:4 raid=1
The above command creates a logical drive with the two physical drives 1I:1:3 and 1I:1:4 on RAID 1 configuration in slot 0.
9. Create New RAID 5 Logical Drive
Execute the following command to create a new logical drive using RAID 5 option.
=> ctrl slot=0 create type=ld drives=1I:1:3,1I:1:4,2I:1:6,2I:1:7,2I:1:8 raid=5
The above command creates a logical drive with the five physical drives on RAID 5 configuration in slot 0.
Once these logical drives are created, you should see the disks from the fdisk and you can format it from there and start using it.
After you create a logical drive, execute the following command to verify that the LD got created. In this example, it shows that the RAID 5 logical drive got created successfully.
=> ctrl slot=0 ld all show status
logicaldrive 1 (136.7 GB, RAID 1): OK
logicaldrive 2 (1.1 TB, RAID 5): OK
10. Rescan for New Devices
If you’ve added new physical hard disk, they won’t automatically show-up immediately. You have to scan for new devices as shown below.
=> rescan
11. View Detailed Logical Drive Status
To display the detailed status of the logical drive, do the following:
=> ctrl slot=0 ld 2 show
Smart Array P410i in Slot 0 (Embedded)
array B
Logical Drive: 2
Size: 1.1 TB
Fault Tolerance: RAID 5
Heads: 255
Sectors Per Track: 32
Cylinders: 65535
Strip Size: 256 KB
Full Stripe Size: 1024 KB
Status: OK
Caching: Enabled
Parity Initialization Status: In Progress
Unique Identifier: 600508B1001031303144363143301000
Disk Name: /dev/cciss/c0d1
Mount Points: None
Logical Drive Label: A4967E2950014380101D61C008BE
Drive Type: Data
The above shows the RAID type, the disk name assigned to the logical drive, and other information about the logical drive number 2.
12. Delete Logical Drive
To delete a logical drive with the number 2 use the below command.
=> ctrl slot=0 ld 2 delete
Warning: Deleting an array can cause other array letters to become renamed.
E.g. Deleting array A from arrays A,B,C will result in two remaining
arrays A,B ... not B,C
Warning: Deleting the specified device(s) will result in data being lost.
Continue? (y/n) y
13. Add New Physical Drive to Logical Volume
To add the new drives to existing logical volume, do the following.
=> ctrl slot=0 ld 2 add drives=2I:1:6,2I:1:7
In this example, we are adding two additional drives specified above to the logical volume number 2.
14. Add Spare Disks
To add the spare disks to arrays that can be used in case of disk failures on one of the logical drives, do the following:
=> ctrl slot=0 array all add spares=2I:1:6,2I:1:7
In this example, we are adding two spare disks to the array.
15. Enable or Disable Cache
The below commands enable or disable cache for the entire slot.
=> ctrl slot=0 modify dwc=disable
=> ctrl slot=0 modify dwc=enable
16. Erase Physical Drive
Execute the following command to erase a physical drive in array B on slot 0.
=> ctrl slot=0 pd 2I:1:6 modify erase
17. Blink Physical Disk LED
To blink the LED on the physical drives for the logical drive 2, do the following. This will make the LEDs blink on all the physical drives that belongs to logical drive 2.
=> ctrl slot=0 ld 2 modify led=on
Once you know which drive belongs to logical drive 2, turn the LED blinking off as shown below.
=> ctrl slot=0 ld 2 modify led=off
</pre>
h2. Dell servers disk management
Listing all disks:
<pre>
megacli -PDList -aALL
</pre>
Adding disks:
<pre>
megacli -CfgLdAdd -r0 [Enclosure Device ID:slot] -aX (X : host is 0. md-array is 1)
# Sample call, if enclosure and slot are KNOWN (aka not N/A)
megacli -CfgLdAdd -r0 [32:0] -a0
# Sample call, if enclosure is N/A
megacli -CfgLdAdd -r0 [:0] -a0
</pre>
Remove cache of disks that are not in the server anymore:
<pre>
megacli -DiscardPreservedCache -Lall -aAll
</pre>
Remove foreign configurations on foreign disks
<pre>
megacli -CfgForeign -Clear -aAll
</pre>
Do both in many cases:
<pre>
megacli -DiscardPreservedCache -Lall -aAll
megacli -CfgForeign -Clear -aAll
</pre>
h2. SEE ALSO
* [[Managing OpenWRT]]
* [[The_ungleich_ceph_handbook]]