Project

General

Profile

The ungleich hardware maintenance guide » History » Revision 23

Revision 22 (Nico Schottelius, 02/07/2024 12:00 PM) → Revision 23/29 (Nico Schottelius, 02/07/2024 12:01 PM)

{{toc}} 

 h1. The ungleich hardware maintenance guide 

 This guide describes common operations on hardware we use. 

 h2. Using the ungleich-hardware container in kubernetes and docker 

 To manage hardware on server1 in kubernetes, you can use: 

 <pre> 
 apiVersion: v1 
 kind: Pod 
 metadata: 
   name: ungleich-hardware 
 spec: 
   containers: 
   - name: ungleich-hardware 
     image: harbor.ungleich.svc.p10.k8s.ooo/ungleich-public/ungleich-hardware:0.0.3 
     args: 
     - sleep 
     - "1000000" 
     volumeMounts: 
       - mountPath: /dev 
         name: dev 
     securityContext: 
       privileged: true 
   nodeSelector: 
     kubernetes.io/hostname: "server1" 

   volumes: 
     - name: dev 
       hostPath: 
         path: /dev 

 </pre> 

 To use it wit docker: 

 <pre> 
 docker run -v /dev:/dev --privileged -ti harbor.ungleich.svc.p10.k8s.ooo/ungleich-public/ungleich-hardware:0.0.3 
 </pre> 

 h2. APU Bios Update 

 * Download the correct bios from https://pcengines.github.io/ 
 ** Check whether it's apu1/2/3/4 before downloading 
 * Install flashrom 
 * "Flash bios using flashrom":https://github.com/pcengines/apu2-documentation/blob/master/docs/firmware_flashing.md 
 ** @flashrom -w THEROMFILE -p internal@ 

 h2. APU Serial and bootloader configuration 

 * Ensure that the bootloader has "console=ttyS0,115200" configured 
 * Ensure that there is a getty running on serial 
 * Use grub-bios as the bootloader 
 ** Install using @grub-install /dev/sda@ 

 h2. Updating the Perc H800 SAS controller 

 * @wget 'https://dl.dell.com/FOLDER03292738M/3/SAS-RAID_Firmware_XKF5X_LN_12.10.7-0001_A13.BIN?uid=4b8a2506-f4d4-46a9-ab19-3c2a5008a782&fn=SAS-RAID_Firmware_XKF5X_LN_12.10.7-0001_A13.BIN' -O    SAS-RAID_Firmware_XKF5X_LN_12.10.7-0001_A13.BIN@ 
 * chmod u+x SAS-RAID_Firmware_XKF5X_LN_12.10.7-0001_A13.BIN 
 * ./SAS-RAID_Firmware_XKF5X_LN_12.10.7-0001_A13.BIN 

 h2. HP servers disk management 

 * See also https://www.thegeekstuff.com/2014/07/hpacucli-examples/ 

 Required kernel modules: 

 <pre> 
 sg 
 cciss 
 </pre> 

 Show all drives/controller overview: 

 <pre> 
 hpacucli ctrl all show config 

 hpacucli ctrl slot=0 pd all show 
 </pre> 

 Add a disk as raid0: 

 <pre> 
 hpacucli ctrl slot=0 create type=ld drives=1I:1:1 raid=0 
 </pre> 

 Deleting a logical drive: 

 <pre> 
 hpacucli ctrl slot=0 ld X 2 delete 
 </pre> 

 Copy from https://www.thegeekstuff.com/2014/07/hpacucli-examples/ (to cache it mainly): 

 <pre> 
 1. Two ways to execute the command 

 When you type the command hpacucli, it will display a “=>” prompt as shown below where you can enter all the hpacucli commands explained in the article. 

 # hpacucli 
 HP Array Configuration Utility CLI 9.20.9.0 
 Detecting Controllers...Done. 
 Type "help" for a list of supported commands. 
 Type "exit" to close the console. 
 => rescan 

 Or, if you don’t want to get to the hpacucli prompt, you can just enter the following directly in the Linux prompt. The following is exactly same as the above. 

 # hpacucli rescan 

 2. Display Controller and Disk Status 

 To display the detailed status of the controller and the disk status, execute the following command. 

 # hpacucli 
 => ctrl all show config 

 Smart Array P410i in Slot 0 (Embedded)      (sn: 50014380101D61C0) 

    array A (SAS, Unused Space: 0    MB) 

       logicaldrive 1 (136.7 GB, RAID 1, OK) 

       physicaldrive 1I:1:1 (port 1I:box 1:bay 1, SAS, 146 GB, OK) 
       physicaldrive 1I:1:2 (port 1I:box 1:bay 2, SAS, 146 GB, OK) 

    unassigned 

       physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SAS, 300 GB, OK) 
       physicaldrive 1I:1:4 (port 1I:box 1:bay 4, SAS, 300 GB, OK) 
       physicaldrive 2I:1:6 (port 2I:box 1:bay 6, SAS, 300 GB, OK) 
       physicaldrive 2I:1:7 (port 2I:box 1:bay 7, SAS, 300 GB, OK) 
       physicaldrive 2I:1:8 (port 2I:box 1:bay 8, SAS, 300 GB, OK) 

    SEP (Vendor ID PMCSIERA, Model    SRC 8x6G) 250 (WWID: 50014380101D61CF) 

 In this example, as shown in the above output, we have total 7 physical drives. The first RAID group RAID 1 contains 2 physical drives and the remaining physical drives are not assigned to any of the logical drives. 
 3. View Controller Status 

 To display the status of just the controller, do the following. In this example, the controller is working perfectly without any issues. 

 => ctrl all show status 

 Smart Array P410i in Slot 0 (Embedded) 
    Controller Status: OK 
    Cache Status: OK 

 4. View Drive Status 

 To display the status of the physical drive, do the following. In this example, we have two 146GB physical drives, and 5 300GB physical drives, and all are in perfect condition. 

 => ctrl slot=0 pd all show status 

    physicaldrive 1I:1:1 (port 1I:box 1:bay 1, 146 GB): OK 
    physicaldrive 1I:1:2 (port 1I:box 1:bay 2, 146 GB): OK 
    physicaldrive 1I:1:3 (port 1I:box 1:bay 3, 300 GB): OK 
    physicaldrive 1I:1:4 (port 1I:box 1:bay 4, 300 GB): OK 
    physicaldrive 2I:1:6 (port 2I:box 1:bay 6, 300 GB): OK 
    physicaldrive 2I:1:7 (port 2I:box 1:bay 7, 300 GB): OK 
    physicaldrive 2I:1:8 (port 2I:box 1:bay 8, 300 GB): OK 

 5. View Individual Drive Status 

 To display the detail status of a specific physical drive, do the following. 

 In this example, we like to know the status of “pd” (physical disk) in slot 0. The specific disk is “2I:1:6”, which we figured it out from the output of the previous command. 

 As shown in the output below, this displays the Serial Number, Make, Model, Size and Fireware version of this specific disk. This can be very helpful during troubleshooting. 

 => ctrl slot=0 pd 2I:1:6 show detail 

 Smart Array P410i in Slot 0 (Embedded) 

    unassigned 

       physicaldrive 2I:1:6 
          Port: 2I 
          Box: 1 
          Bay: 6 
          Status: OK 
          Drive Type: Unassigned Drive 
          Interface Type: SAS 
          Size: 300 GB 
          Rotational Speed: 10000 
          Firmware Revision: HPD4 
          Serial Number: EB01PC416C4C1214 
          Model: HP        EG0300FBDSP 
          Current Temperature (C): 38 
          Maximum Temperature (C): 56 
          PHY Count: 2 
          PHY Transfer Rate: 6.0Gbps, Unknown 

 6. View All Logical Drives 

 The following command will display all available logical drives on the system. As shown in the output below, we currently have only one logical drive in RAID 1 with total size of around 136GB. 

 => ctrl slot=0 ld all show 

 Smart Array P410i in Slot 0 (Embedded) 

    array A 

       logicaldrive 1 (136.7 GB, RAID 1, OK) 

 7. Create New RAID 0 Logical Drive 

 Execute the following command to create a new logical drive using RAID 0 option. 

 => ctrl slot=0 create type=ld drives=1I:1:3 raid=0 

 The above command creates a logical drive with the physical drives 1I:1:3 on RAID 0 configuration in slot 0. 
 8. Create New RAID 1 Logical Drive 

 Execute the following command to create a new logical drive using RAID 1 option. 

 => ctrl slot=0 create type=ld drives=1I:1:3,1I:1:4 raid=1 

 The above command creates a logical drive with the two physical drives 1I:1:3 and 1I:1:4 on RAID 1 configuration in slot 0. 
 9. Create New RAID 5 Logical Drive 

 Execute the following command to create a new logical drive using RAID 5 option. 

 => ctrl slot=0 create type=ld drives=1I:1:3,1I:1:4,2I:1:6,2I:1:7,2I:1:8 raid=5 

 The above command creates a logical drive with the five physical drives on RAID 5 configuration in slot 0. 

 Once these logical drives are created, you should see the disks from the fdisk and you can format it from there and start using it. 

 After you create a logical drive, execute the following command to verify that the LD got created. In this example, it shows that the RAID 5 logical drive got created successfully. 

 => ctrl slot=0 ld all show status 

    logicaldrive 1 (136.7 GB, RAID 1): OK 
    logicaldrive 2 (1.1 TB, RAID 5): OK 

 10. Rescan for New Devices 

 If you’ve added new physical hard disk, they won’t automatically show-up immediately. You have to scan for new devices as shown below. 

 => rescan 

 11. View Detailed Logical Drive Status 

 To display the detailed status of the logical drive, do the following: 

 => ctrl slot=0 ld 2 show 

 Smart Array P410i in Slot 0 (Embedded) 

    array B 

       Logical Drive: 2 
          Size: 1.1 TB 
          Fault Tolerance: RAID 5 
          Heads: 255 
          Sectors Per Track: 32 
          Cylinders: 65535 
          Strip Size: 256 KB 
          Full Stripe Size: 1024 KB 
          Status: OK 
          Caching:    Enabled 
          Parity Initialization Status: In Progress 
          Unique Identifier: 600508B1001031303144363143301000 
          Disk Name: /dev/cciss/c0d1 
          Mount Points: None 
          Logical Drive Label: A4967E2950014380101D61C008BE 
          Drive Type: Data 

 The above shows the RAID type, the disk name assigned to the logical drive, and other information about the logical drive number 2. 
 12. Delete Logical Drive 

 To delete a logical drive with the number 2 use the below command. 

 => ctrl slot=0 ld 2 delete 

 Warning: Deleting an array can cause other array letters to become renamed. 
          E.g. Deleting array A from arrays A,B,C will result in two remaining 
          arrays A,B ... not B,C 

 Warning: Deleting the specified device(s) will result in data being lost. 
          Continue? (y/n) y 

 13. Add New Physical Drive to Logical Volume 

 To add the new drives to existing logical volume, do the following. 

 => ctrl slot=0 ld 2 add drives=2I:1:6,2I:1:7 

 In this example, we are adding two additional drives specified above to the logical volume number 2. 
 14. Add Spare Disks 

 To add the spare disks to arrays that can be used in case of disk failures on one of the logical drives, do the following: 

 => ctrl slot=0 array all add spares=2I:1:6,2I:1:7 

 In this example, we are adding two spare disks to the array. 
 15. Enable or Disable Cache 

 The below commands enable or disable cache for the entire slot. 

 => ctrl slot=0 modify dwc=disable 

 => ctrl slot=0 modify dwc=enable 

 16. Erase Physical Drive 

 Execute the following command to erase a physical drive in array B on slot 0. 

 => ctrl slot=0 pd 2I:1:6 modify erase 

 17. Blink Physical Disk LED 

 To blink the LED on the physical drives for the logical drive 2, do the following. This will make the LEDs blink on all the physical drives that belongs to logical drive 2. 

 => ctrl slot=0 ld 2 modify led=on 

 Once you know which drive belongs to logical drive 2, turn the LED blinking off as shown below. 

 => ctrl slot=0 ld 2 modify led=off 
 </pre> 

 h2. Dell servers disk management (megacli) 

 Listing all disks: 

 <pre> 
 megacli -PDList -aALL 
 </pre> 

 Adding disks: 

 <pre> 
 megacli -CfgLdAdd -r0 [Enclosure Device ID:slot] -aX (X : host is 0. md-array is 1) 

 # Sample call, if enclosure and slot are KNOWN (aka not N/A) 
 megacli -CfgLdAdd -r0 [32:0] -a0 

 # Sample call, if enclosure is N/A 
 megacli -CfgLdAdd -r0 [:0] -a0 
 </pre> 

 Remove cache of disks that are not in the server anymore: 

 <pre> 
 megacli -DiscardPreservedCache -Lall -aAll 
 </pre> 

 Remove foreign configurations on foreign disks 

 <pre> 
 megacli -CfgForeign -Clear -aAll 
 </pre> 

 Do both in many cases: 

 <pre> 
 megacli -DiscardPreservedCache -Lall -aAll 
 megacli -CfgForeign -Clear -aAll 
 </pre> 

 Growing a raid6 

 <pre> 
 megacli -ldrecon    -Start -r6 -Add -PhysDrv[12:4] -l0 -a0 
 </pre> 

 Deleting a logical drive 

 <pre> 
 root@2157f4626763:/# megacli -CfgLdDel -L0 -a0 
                                     
 Adapter 0: Deleted Virtual Drive-0(target id-0) 

 Exit Code: 0x00 
 root@2157f4626763:/#  
 </pre> 

 h2. SEE ALSO 

 * [[Managing OpenWRT]] 
 * [[The_ungleich_ceph_handbook]]