Proper preparation and working with ZFS under FreeBSD

Some time ago, the problem arose of building a sufficiently capacious array for storing operational incremental backups. Moreover, they did not particularly want to spend money, but the place was necessary. The solution was simple and fairly convenient. Next, a lot of text.

The basis was taken chassis SuperChassis 825TQ-560LPB

In which 8 Hitachi 1TB disks and the LSI Logic SAS HBA SAS3081E-R controller were inserted. We planned the scheme in advance without a hardware raid based only on the capabilities of ZFS.
')
The first version of the server on FreeBSD 7.2-STABLE lived for about 3 months, during which time the main bugs were caught in the form of a limitation of the used memory vm.kmem_size and vfs.zfs.arc_max.
With a lack of memory allocated to the kernel, the system fell into a panic, with a lack of arc_max buffers, the operation speed dropped very strongly. To boot the system, a 1GB USB flash drive was used, which made it possible to quietly change the version of the system by simply replacing the flash drive with the new build.

The scheme of the constructed array was based on addressing by the name of the disk (device).
Further in the text will be inserts from dmesg and console commands of work.

mpt0: <LSILogic SAS/SATA Adapter> port 0x2000-0x20ff mem 0xdc210000-0xdc213fff,0xdc200000-0xdc20ffff irq 16 at device 0.0 on pci3 mpt0: [ITHREAD] mpt0: MPI Version=1.5.20.0 mpt0: Capabilities: ( RAID-0 RAID-1E RAID-1 ) mpt0: 0 Active Volumes (2 Max) mpt0: 0 Hidden Drive Members (14 Max) da0 at mpt0 bus 0 scbus0 target 0 lun 0 da0: <ATA Hitachi HDT72101 A31B> Fixed Direct Access SCSI-5 device da0: 300.000MB/s transfers da0: Command Queueing enabled da0: 953869MB (1953525168 512 byte sectors: 255H 63S/T 121601C) da1 at mpt0 bus 0 scbus0 target 1 lun 0 da1: <ATA Hitachi HDT72101 A31B> Fixed Direct Access SCSI-5 device da1: 300.000MB/s transfers da1: Command Queueing enabled da1: 953869MB (1953525168 512 byte sectors: 255H 63S/T 121601C) da2 at mpt0 bus 0 scbus0 target 2 lun 0 da2: <ATA Hitachi HDT72101 A31B> Fixed Direct Access SCSI-5 device da2: 300.000MB/s transfers da2: Command Queueing enabled da2: 953869MB (1953525168 512 byte sectors: 255H 63S/T 121601C) da3 at mpt0 bus 0 scbus0 target 3 lun 0 da3: <ATA Hitachi HDT72101 A31B> Fixed Direct Access SCSI-5 device da3: 300.000MB/s transfers da3: Command Queueing enabled da3: 953869MB (1953525168 512 byte sectors: 255H 63S/T 121601C) da4 at mpt0 bus 0 scbus0 target 4 lun 0 da4: <ATA Hitachi HDT72101 A31B> Fixed Direct Access SCSI-5 device da4: 300.000MB/s transfers da4: Command Queueing enabled da4: 953869MB (1953525168 512 byte sectors: 255H 63S/T 121601C) da5 at mpt0 bus 0 scbus0 target 5 lun 0 da5: <ATA Hitachi HDT72101 A31B> Fixed Direct Access SCSI-5 device da5: 300.000MB/s transfers da5: Command Queueing enabled da5: 953869MB (1953525168 512 byte sectors: 255H 63S/T 121601C) da6 at mpt0 bus 0 scbus0 target 7 lun 0 da6: <ATA Hitachi HDT72101 A31B> Fixed Direct Access SCSI-5 device da6: 300.000MB/s transfers da6: Command Queueing enabled da6: 953869MB (1953525168 512 byte sectors: 255H 63S/T 121601C) da7 at mpt0 bus 0 scbus0 target 8 lun 0 da7: <ATA Hitachi HDE72101 A31B> Fixed Direct Access SCSI-5 device da7: 300.000MB/s transfers da7: Command Queueing enabled da7: 953869MB (1953525168 512 byte sectors: 255H 63S/T 121601C) ugen3.2: <JetFlash> at usbus3 umass0: <JetFlash Mass Storage Device, class 0/0, rev 2.00/1.00, addr 2> on usbus3 umass0: SCSI over Bulk-Only; quirks = 0x0100 umass0:2:0:-1: Attached to scbus2 da8 at umass-sim0 bus 0 scbus2 target 0 lun 0 da8: <JetFlash Transcend 1GB 8.07> Removable Direct Access SCSI-2 device da8: 40.000MB/s transfers da8: 963MB (1972224 512 byte sectors: 64H 32S/T 963C) Trying to mount root from ufs:/dev/ufs/FBSDUSB

In principle, everything is more or less clear. There is a controller, there are moons, there are disks, there are names of connected devices to the controller.

 backupstorage # zpool create storage raidz da0 da1 da2 da3 da4 da5 da6 da7
 backupstorage # zpool status -v
   pool: storage
  state: ONLINE
  scrub: none requested
 config:

         NAME STATE READ WRITE CKSUM
         storage ONLINE 0 0 0
           raidz1 ONLINE 0 0 0
             da0 ONLINE 0 0 0
             da1 ONLINE 0 0 0
             da2 ONLINE 0 0 0
             da3 ONLINE 0 0 0
             da4 ONLINE 0 0 0
             da5 ONLINE 0 0 0
             da6 ONLINE 0 0 0
             da7 ONLINE 0 0 0

 errors: No known data errors

Everything works until one of the disks dies. I started to crumble the seventh disk and the controller for timeout threw it out of the array. And I was not sure that the disc itself fell down, because the discs were completely new and were not subjected to any loads.
DFT from Hitachi did not find any errors and said that everything was normal, only sometimes it slowed down. MHDD found many sectors with an access time of about 500 ms, after 50 such sectors I just changed the drive under warranty.

But the disc is really half the trouble. I overcame the disc, but the problems caught by its failures made me think.

Problem one: Array with disc numbering not bound to moons
ZFS in Solaris was designed with reference to the mapping of disks to controller numbers and moons on these controllers. In the case of using such a accounting scheme, you only need to replace the dead disk and synchronize the data in the array. In FreeBSD, as in Linux, device naming usually goes transparent and does not depend on the physical port on the controller. This is actually the biggest ambush.
For example, take and pull out from the system 5 disk, emulating a hardware failure on the disk.

 backupstorage # camcontrol rescan all
 Re-scan of bus 0 was successful
 Re-scan of bus 1 was successful
 Re-scan of bus 2 was successful

 mpt0: mpt_cam_event: 0x16
 mpt0: mpt_cam_event: 0x12
 mpt0: mpt_cam_event: 0x16
 mpt0: mpt_cam_event: 0x16
 mpt0: mpt_cam_event: 0x16
 (da4: mpt0: 0: 4: 0): lost device
 (da4: mpt0: 0: 4: 0): Synchronize cache failed, status == 0x4a, scsi status == 0x0
 (da4: mpt0: 0: 4: 0): removing device entry

 backupstorage # zpool status -v
   pool: storage
  state: DEGRADED
  scrub: none requested
 config:

         NAME STATE READ WRITE CKSUM
         storage DEGRADED 0 0 0
           raidz1 DEGRADED 0 0 0
             da0 ONLINE 0 0 0
             da1 ONLINE 0 0 0
             da2 ONLINE 0 0 0
             da3 ONLINE 0 0 0
             da4 REMOVED 0 0 0
             da5 ONLINE 0 0 0
             da6 ONLINE 0 0 0
             da7 ONLINE 0 0 0

All perfectly. The controller saw that he had lost the disk and marked it as missing.
You can insert a new disk and synchronize the array. BUT if you reboot the server, then
a very interesting picture is waiting for you (I will remove the extra information from the dmesg quoting in order not to overload the screen with text)

da0 at mpt0 bus 0 scbus0 target 0 lun 0 da0: <ATA Hitachi HDT72101 A31B> Fixed Direct Access SCSI-5 device da0: 953869MB (1953525168 512 byte sectors: 255H 63S/T 121601C) da1 at mpt0 bus 0 scbus0 target 1 lun 0 da1: <ATA Hitachi HDT72101 A31B> Fixed Direct Access SCSI-5 device da1: 953869MB (1953525168 512 byte sectors: 255H 63S/T 121601C) da2 at mpt0 bus 0 scbus0 target 2 lun 0 da2: <ATA Hitachi HDT72101 A31B> Fixed Direct Access SCSI-5 device da2: 953869MB (1953525168 512 byte sectors: 255H 63S/T 121601C) da3 at mpt0 bus 0 scbus0 target 3 lun 0 da3: <ATA Hitachi HDT72101 A31B> Fixed Direct Access SCSI-5 device da3: 953869MB (1953525168 512 byte sectors: 255H 63S/T 121601C) da4 at mpt0 bus 0 scbus0 target 5 lun 0 da4: <ATA Hitachi HDT72101 A31B> Fixed Direct Access SCSI-5 device da4: 953869MB (1953525168 512 byte sectors: 255H 63S/T 121601C) da5 at mpt0 bus 0 scbus0 target 7 lun 0 da5: <ATA Hitachi HDT72101 A31B> Fixed Direct Access SCSI-5 device da5: 953869MB (1953525168 512 byte sectors: 255H 63S/T 121601C) da6 at mpt0 bus 0 scbus0 target 8 lun 0 da6: <ATA Hitachi HDE72101 A31B> Fixed Direct Access SCSI-5 device da6: 953869MB (1953525168 512 byte sectors: 255H 63S/T 121601C) SMP: AP CPU #1 Launched! GEOM: da0: the primary GPT table is corrupt or invalid. GEOM: da0: using the secondary instead -- recovery strongly advised. GEOM: da1: the primary GPT table is corrupt or invalid. GEOM: da1: using the secondary instead -- recovery strongly advised. GEOM: da2: the primary GPT table is corrupt or invalid. GEOM: da2: using the secondary instead -- recovery strongly advised. GEOM: da3: the primary GPT table is corrupt or invalid. GEOM: da3: using the secondary instead -- recovery strongly advised. GEOM: da4: the primary GPT table is corrupt or invalid. GEOM: da4: using the secondary instead -- recovery strongly advised. GEOM: da5: the primary GPT table is corrupt or invalid. GEOM: da5: using the secondary instead -- recovery strongly advised. GEOM: da6: the primary GPT table is corrupt or invalid. GEOM: da6: using the secondary instead -- recovery strongly advised. ugen3.2: <JetFlash> at usbus3 umass0: <JetFlash Mass Storage Device, class 0/0, rev 2.00/1.00, addr 2> on usbus3 umass0: SCSI over Bulk-Only; quirks = 0x0100 umass0:2:0:-1: Attached to scbus2 da7 at umass-sim0 bus 0 scbus2 target 0 lun 0 da7: <JetFlash Transcend 1GB 8.07> Removable Direct Access SCSI-2 device da7: 40.000MB/s transfers da7: 963MB (1972224 512 byte sectors: 64H 32S/T 963C) Trying to mount root from ufs:/dev/ufs/FBSDUSB GEOM: da4: the primary GPT table is corrupt or invalid. GEOM: da4: using the secondary instead -- recovery strongly advised. GEOM: da5: the primary GPT table is corrupt or invalid. GEOM: da5: using the secondary instead -- recovery strongly advised. GEOM: da0: the primary GPT table is corrupt or invalid. GEOM: da0: using the secondary instead -- recovery strongly advised. GEOM: da1: the primary GPT table is corrupt or invalid. GEOM: da1: using the secondary instead -- recovery strongly advised. GEOM: da2: the primary GPT table is corrupt or invalid. GEOM: da2: using the secondary instead -- recovery strongly advised. GEOM: da3: the primary GPT table is corrupt or invalid. GEOM: da3: using the secondary instead -- recovery strongly advised. GEOM: da6: the primary GPT table is corrupt or invalid. GEOM: da6: using the secondary instead -- recovery strongly advised. GEOM: da4: the primary GPT table is corrupt or invalid. GEOM: da4: using the secondary instead -- recovery strongly advised.

What do we see?
We see that in comparison with the first dmesg dump, the devices became 8 instead of 9 and the flash drive became da7 instead of da8. ZFS begins to curse that there are problems with reading GPT tags and that everything is bad.
Zpool stupidly collapsed in front of his eyes, as the numbering of the disks left 1 up, which in turn caused a loss of an array of landmarks for work and confusion in the labels of the disks.

 backupstorage # zpool status -v
   pool: storage
  state: UNAVAIL
 status: label is missing
         or invalid.  There are no replicas to continue
         functioning.
 action: Destroy and re-create the pool from a backup source.
    see: http://www.sun.com/msg/ZFS-8000-5E
  scrub: none requested
 config:

         NAME STATE READ WRITE CKSUM
         storage UNAVAIL 0 0 0 insufficient replicas
           raidz1 UNAVAIL 0 0 0 insufficient replicas
             da0 ONLINE 0 0 0
             da1 ONLINE 0 0 0
             da2 ONLINE 0 0 0
             da3 ONLINE 0 0 0
             da4 FAULTED 0 0 0 corrupted data
             da5 FAULTED 0 0 0 corrupted data
             da6 FAULTED 0 0 0 corrupted data
             da6 ONLINE 0 0 0

Please note that we have 2 !!! disk da6. All this is because we have a physical disk defined as da6 and there is a GPT da6 label on the disk that used to be da6.

now try to stick the disk that pulled out.

backupstorage# camcontrol rescan all Re-scan of bus 0 was successful Re-scan of bus 1 was successful Re-scan of bus 2 was successful da8 at mpt0 bus 0 scbus0 target 4 lun 0 da8: <ATA Hitachi HDT72101 A31B> Fixed Direct Access SCSI-5 device da8: 300.000MB/s transfers da8: Command Queueing enabled da8: 953869MB (1953525168 512 byte sectors: 255H 63S/T 121601C) GEOM: da8: the primary GPT table is corrupt or invalid. GEOM: da8: using the secondary instead -- recovery strongly advised.

The disk was defined, but da8 became. Of course, we can try to rebuild the array, but it will not lead to anything good. Therefore, just overload.

 backupstorage # zpool status -v
   pool: storage
  state: ONLINE
  scrub: none requested
 config:

         NAME STATE READ WRITE CKSUM
         storage ONLINE 0 0 0
           raidz1 ONLINE 0 0 0
             da0 ONLINE 0 0 0
             da1 ONLINE 0 0 0
             da2 ONLINE 0 0 0
             da3 ONLINE 0 0 0
             da4 ONLINE 0 0 0
             da5 ONLINE 0 0 0
             da6 ONLINE 0 0 0
             da7 ONLINE 0 0 0

 errors: No known data errors

After rebooting, zfs calmly finds all the disks, swears a bit in the log and continues to work.

GEOM: da0: the primary GPT table is corrupt or invalid. GEOM: da0: using the secondary instead -- recovery strongly advised. GEOM: da1: the primary GPT table is corrupt or invalid. GEOM: da1: using the secondary instead -- recovery strongly advised. GEOM: da2: the primary GPT table is corrupt or invalid. GEOM: da2: using the secondary instead -- recovery strongly advised. GEOM: da3: the primary GPT table is corrupt or invalid. GEOM: da3: using the secondary instead -- recovery strongly advised. GEOM: da4: the primary GPT table is corrupt or invalid. GEOM: da4: using the secondary instead -- recovery strongly advised. GEOM: da5: the primary GPT table is corrupt or invalid. GEOM: da5: using the secondary instead -- recovery strongly advised. GEOM: da6: the primary GPT table is corrupt or invalid. GEOM: da6: using the secondary instead -- recovery strongly advised. GEOM: da7: the primary GPT table is corrupt or invalid. GEOM: da7: using the secondary instead -- recovery strongly advised. GEOM: da0: the primary GPT table is corrupt or invalid. GEOM: da0: using the secondary instead -- recovery strongly advised. GEOM: da2: the primary GPT table is corrupt or invalid. GEOM: da2: using the secondary instead -- recovery strongly advised. GEOM: da7: the primary GPT table is corrupt or invalid. GEOM: da7: using the secondary instead -- recovery strongly advised.

The resilver is almost painless, since we didn’t work with the data on the broken array.

 GEOM: da0: the primary GPT table is corrupt or invalid.
 GEOM: da0: using the secondary instead - recovery strongly advised.
 GEOM: da3: the primary GPT table is corrupt or invalid.
 GEOM: da3: using the secondary instead - recovery strongly advised.
 ====================

 backupstorage # zpool status -v
   pool: storage
  state: ONLINE
  scrub: resilver completed after 0h0m with 0 errors on Wed Nov 25 11:06:15
 config:

         NAME STATE READ WRITE CKSUM
         storage ONLINE 0 0 0
           raidz1 ONLINE 0 0 0
             da0 ONLINE 0 0 0
             da1 ONLINE 0 0 0
             da2 ONLINE 0 0 0 512 resilvered
             da3 ONLINE 0 0 0 512 resilvered
             da4 ONLINE 0 0 0
             da5 ONLINE 0 0 0
             da6 ONLINE 0 0 0 512 resilvered
             da7 ONLINE 0 0 0 512 resilvered

 errors: No known data errors

In general, we were convinced that such a scheme is not very reliable with automatic disk numbering in the controller. Although if the number of disks does not change - they can be rearranged in any order. The array will work properly.

Problem two: geomlabel on disks.
In order to try to solve the problem with the name of the disks, it was decided to try to label them with labels and add tags to the array. When you create a label on the disk - it is written to the end of the disk, when the disk is initialized, the system reads the label and creates virtual devices / dev / label / label name.

 backupstorage # zpool create storage raidz label / disk0 label / disk1 label / disk2 label / disk3 label / disk4 label / disk5 label / disk6 label / disk7
 backupstorage # zpool status -v
   pool: storage
  state: ONLINE
  scrub: none requested
 config:

         NAME STATE READ WRITE CKSUM
         storage ONLINE 0 0 0
           raidz1 ONLINE 0 0 0
             label / disk0 ONLINE 0 0 0
             label / disk1 ONLINE 0 0 0
             label / disk2 ONLINE 0 0 0
             label / disk3 ONLINE 0 0 0
             label / disk4 ONLINE 0 0 0
             label / disk5 ONLINE 0 0 0
             label / disk6 ONLINE 0 0 0
             label / disk7 ONLINE 0 0 0

 errors: No known data errors
 backupstorage # ls / dev / label / disk *
 / dev / label / disk0 / dev / label / disk2 / dev / label / disk4 / dev / label / disk6
 / dev / label / disk1 / dev / label / disk3 / dev / label / disk5 / dev / label / disk7

The array lives normally, now we pull out disk1 and set aside.

Reboot the server and look in the log.

 GEOM: da0: corrupt or invalid GPT detected.
 GEOM: da0: GPT rejected - may not be recoverable.
 GEOM: da1: corrupt or invalid GPT detected.
 GEOM: da1: GPT rejected - may not be recoverable.
 GEOM: da2: corrupt or invalid GPT detected.
 GEOM: da2: GPT rejected - may not be recoverable.
 GEOM: da3: corrupt or invalid GPT detected.
 GEOM: da3: GPT rejected - may not be recoverable.
 GEOM: da4: corrupt or invalid GPT detected.
 GEOM: da4: GPT rejected - may not be recoverable.
 GEOM: da5: corrupt or invalid GPT detected.
 GEOM: da5: GPT rejected - may not be recoverable.
 GEOM: da6: corrupt or invalid GPT detected.
 GEOM: da6: GPT rejected - may not be recoverable.
 GEOM: label / disk0: corrupt or invalid GPT detected.
 GEOM: label / disk0: GPT rejected - may not be recoverable.
 GEOM: label / disk2: corrupt or invalid GPT detected.
 GEOM: label / disk2: GPT rejected - may not be recoverable.
 GEOM: label / disk3: corrupt or invalid GPT detected.
 GEOM: label / disk3: GPT rejected - may not be recoverable.
 GEOM: label / disk4: corrupt or invalid GPT detected.
 GEOM: label / disk4: GPT rejected - may not be recoverable.
 GEOM: label / disk5: corrupt or invalid GPT detected.
 GEOM: label / disk5: GPT rejected - may not be recoverable.
 GEOM: label / disk6: corrupt or invalid GPT detected.
 GEOM: label / disk6: GPT rejected - may not be recoverable.
 GEOM: label / disk7: corrupt or invalid GPT detected.
 GEOM: label / disk7: GPT rejected - may not be recoverable.
 da7 at umass-sim0 bus 0 scbus2 target 0 lun 0
 da7: <JetFlash Transcend 1GB 8.07> Removable Direct Access SCSI-2 device
 da7: 40.000MB / s transfers
 da7: 963MB (1972224 512 byte sectors: 64H 32S / T 963C)
 Trying to mount root from ufs: / dev / ufs / FBSDUSB
 GEOM: da0: corrupt or invalid GPT detected.
 GEOM: da0: GPT rejected - may not be recoverable.
 GEOM: da1: corrupt or invalid GPT detected.
 GEOM: da1: GPT rejected - may not be recoverable.
 GEOM: label / disk2: corrupt or invalid GPT detected.
 GEOM: label / disk2: GPT rejected - may not be recoverable.
 GEOM: da2: corrupt or invalid GPT detected.
 GEOM: da2: GPT rejected - may not be recoverable.
 GEOM: label / disk3: corrupt or invalid GPT detected.
 GEOM: label / disk3: GPT rejected - may not be recoverable.
 GEOM: da3: corrupt or invalid GPT detected.
 GEOM: da3: GPT rejected - may not be recoverable.
 GEOM: label / disk4: corrupt or invalid GPT detected.
 GEOM: label / disk4: GPT rejected - may not be recoverable.
 GEOM: da4: corrupt or invalid GPT detected.
 GEOM: da4: GPT rejected - may not be recoverable.
 GEOM: label / disk5: corrupt or invalid GPT detected.
 GEOM: label / disk5: GPT rejected - may not be recoverable.
 GEOM: da5: corrupt or invalid GPT detected.
 GEOM: da5: GPT rejected - may not be recoverable.

But with all the abuse, the array turns out to be alive. with a live structure and DEGRADED status, which is much better than in the first case.

 backupstorage # zpool status
   pool: storage
  state: DEGRADED
 status: One or more devices has experienced an unrecoverable error.  An
         attempt was made to correct the error.  Applications are unaffected.
 action: Determine if clear
         using 'zpool clear' or replace the device with 'zpool replace'.
    see: http://www.sun.com/msg/ZFS-8000-9P
  scrub: none requested
 config:

         NAME STATE READ WRITE CKSUM
         storage DEGRADED 0 0 0
           raidz1 DEGRADED 0 0 0
             label / disk0 ONLINE 0 0 0
             label / disk1 REMOVED 0 94 0
             label / disk2 ONLINE 0 0 0
             label / disk3 ONLINE 0 0 0
             label / disk4 ONLINE 0 0 0
             label / disk5 ONLINE 0 0 0
             label / disk6 ONLINE 0 0 0
             label / disk7 ONLINE 0 0 0

 errors: No known data errors
 ====================

Now we remove the disk that we pulled out and insert it back into the system.
The label on the disc is recorded at the very end of the disc, so the regular zeroing of the beginning of the disc will not help us. you have to either end the end or just the whole disk. It all depends on your time.

 mpt0: mpt_cam_event: 0x16
 mpt0: mpt_cam_event: 0x12
 mpt0: mpt_cam_event: 0x16
 da8 at mpt0 bus 0 scbus0 target 1 lun 0
 da8: <ATA Hitachi HDT72101 A31B> Fixed Direct Access SCSI-5 device
 da8: 300.000MB / s transfers
 da8: Command Queueing enabled
 da8: 953869MB (1953525168 512 byte sectors: 255H 63S / T 121601C)

Give the disk “disk8” label and try to replace the “dead” disk with a new one.

 backupstorage # ls / dev / label /
 disk0 disk2 disk3 disk4 disk5 disk6 disk7 disk8

 backupstorage # zpool replace storage label / disk1 label / disk8
 cannot replace label / disk1 with label / disk8: label / disk8 is busy
 backupstorage # zpool replace -f storage label / disk1 label / disk8
 cannot replace label / disk1 with label / disk8: label / disk8 is busy

The system refuses to change the disk for us, citing its busyness. Replace the "dead" disk can only be a direct device name. Why this was done, I did not fully understand. We discard the idea of replacing the labels on the new disk and try to simply label the “disk1” label on the new disk.

 backupstorage # glabel label disk1 da8

Now we need to say that the disk has returned to the system.

 backupstorage # zpool online storage label / disk1
 backupstorage # zpool status
   pool: storage
  state: ONLINE
 status: One or more devices has experienced an unrecoverable error.  An
         attempt was made to correct the error.  Applications are unaffected.
 action: Determine if clear
         using 'zpool clear' or replace the device with 'zpool replace'.
    see: http://www.sun.com/msg/ZFS-8000-9P
  scrub: resilver completed after 0h0m with 0 errors on Wed Nov 25 18:29:17 2009
 config:

         NAME STATE READ WRITE CKSUM
         storage ONLINE 0 0 0
           raidz1 ONLINE 0 0 0
             label / disk0 ONLINE 0 0 0 6.50K resilvered
             label / disk1 ONLINE 0 94 1 10.5K resilvered
             label / disk2 ONLINE 0 0 0 6K resilvered
             label / disk3 ONLINE 0 0 0 3.50K resilvered
             label / disk4 ONLINE 0 0 0 6.50K resilvered
             label / disk5 ONLINE 0 0 0 6.50K resilvered
             label / disk6 ONLINE 0 0 0 5.50K resilvered
             label / disk7 ONLINE 0 0 0 3K resilvered

 errors: No known data errors

Everything falls into place and after synchronization you can reset the pool status to normal.
And here the problem begins. Since the disk label made by glabel is written at the end of the disk, zfs basically does not know anything about where it is recorded. And when the disk is full, it erases this label. The disk in the array is assigned a physical name and we return to point 1 of our problems.

Solution to the problem
The solution to the problem turned out to be a bit trite and simple. A long time ago, FreeBSD began to be able to do GPT partitions on large volumes of disks. In FreeBSD 7.2, GPT partition naming did not work and was accessed by the direct device name / dev / da0p1 (example for the first GPT partition)
In FreeBSD 8.0, a change was made to the GPT tag system. And now GPT partitions can be named and referred to as virtual devices via the / dev / gpt / partition label.

All we really need to do is to name the partitions on the disks and build an array of them. How to do this is written in ZFS Boot Howto which is very quickly googled.

 backupstorage # gpart create -s GPT ad0
 backupstorage # gpart add -b 34 -s 1953525101 -i 1 -t freebsd-zfs -l disk0 ad0
 backupstorage # gpart show
 => 34 1953525101 da0 GPT (932G)
           34 1953525101 1 freebsd-zfs (932G)
 backupstorage # gpart show -l
 => 34 1953525101 da0 GPT (932G)
           34 1953525101 1 disk0 (932G)
 backupstorage # ls / dev / gpt
 disk0

After creating a GPT partition on a gpart show, you can see where the data area on the disk begins and ends. Next, we create this partition and give it the label “disk0”.
We perform this operation for all disks in the system and collect an array from the resulting partitions.

 backupstorage # zpool status -v
   pool: storage
  state: ONLINE
  scrub: none requested
 config:

         NAME STATE READ WRITE CKSUM
         storage ONLINE 0 0 0
           raidz1 ONLINE 0 0 0
             gpt / disk0 ONLINE 0 0 0
             gpt / disk1 ONLINE 0 0 0
             gpt / disk2 ONLINE 0 0 0
             gpt / disk3 ONLINE 0 0 0
             gpt / disk4 ONLINE 0 0 0
             gpt / disk5 ONLINE 0 0 0
             gpt / disk6 ONLINE 0 0 0
             gpt / disk7 ONLINE 0 0 0

 errors: No known data errors

The server is calmly experiencing any reboot and disk swapping, as well as replacing them with the same label. The speed of this array on the network reaches 70-80 megabytes per second. The local writing speed to the array, depending on the buffer occupancy, reaches 200 megabytes per second.

PS: when using gpt tags, I came across a strange glitch that the system did not see the new label on the disk, but then it passed by itself.
PPS: to enthusiasts who try to run FreeBSD 8.0 from a flash drive (if they haven’t repaired this yet) - this dirty hack comes in handy.

Index: sys/kern/vfs_mount.c =================================================================== RCS file: /home/ncvs/src/sys/kern/vfs_mount.c,v retrieving revision 1.308 diff -u -r1.308 vfs_mount.c --- sys/kern/vfs_mount.c 5 Jun 2009 14:55:22 -0000 1.308 +++ sys/kern/vfs_mount.c 29 Sep 2009 17:08:25 -0000 @@ -1645,6 +1645,9 @@ options = NULL; + /* NASTY HACK: wait for USB sticks to appear */ + pause("usbhack", hz * 10); + root_mount_prepare(); mount_zone = uma_zcreate("Mountpoints", sizeof(struct mount),

In the new USB stack, when loading the USB kernel, devices are not always detected on time and an error occurs when mounting system partitions.
This hack exposes in the system a timeout of 10 seconds to wait for readiness of the USB drive.

PPPS: if you have questions, ask.

Settings loader.conf for a machine with 2 gigabytes of memory.

vm.kmem_size = "1536M"
vm.kmem_size_max = "1536M"
vfs.zfs.arc_max = "384M"

Source: https://habr.com/ru/post/77722/

All Articles

Proper preparation and working with ZFS under FreeBSD

More articles: