ZFS on CentOS: work on the bugs

Since I have been using ZFS for quite some time (since the days of OpenSolaris), and am very pleased with this FS in Linux, despite its “non-Orthodox” license, I naturally read a fresh article about installing this FS on CentOS .

Noticing in the guide to the accomplishment of the feat a few mistakes, I decided not to pass by, as I usually do. Unfortunately, I cannot answer in the comments, for a completely obvious reason.

Work on the bugs, and some useful tips under the cat.

1. Installing ZFS on top of the mdadm array is unnecessary waste of CPU resources and unnecessary disk I / O. ZFS itself will perfectly create a RAID-0/1/5/6 (z2) / z3.
')
2. When using GRUB2 there is absolutely no point in a separate section for / boot. GRUB2, with the fix specified in the article, perfectly loads the OS located on ZFS, and can read the contents of the / boot directory located in the root file system without any problems.

How it is done

On disks (suppose there are two), you need only two partitions: one for GRUB2, one for the disk pool. An example of disk partitioning (GPT without UEFI, if you use UEFI, set the partition type accordingly):

Disk 1:

/dev/sda1 2048 206847 204800 100M BIOS boot /dev/sda2 206848 1953525134 1953318287 931,4G FreeBSD ZFS

Disk 2:

 /dev/sdb1 2048 206847 204800 100M BIOS boot /dev/sdb2 206848 1953525134 1953318287 931,4G FreeBSD ZFS

Creating a pool (mirror):

 $ zpool create -o ashift=12 diskpool mirror /dev/sda2 /dev/sdb2

Where diskpool is the name of the pool, naturally the name can be chosen according to the taste / naming scheme of the pools.

Check the pool created like this:

 $ zpool status dispool

We will see among others:

  NAME STATE READ WRITE CKSUM diskpool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 sda2 ONLINE 0 0 0 sdb2 ONLINE 0 0 0

If there are many drives (for example, 4th), and performance is important to you, you can create a RAID-10 :

 $ zpool create -o ashift=12 diskpool mirror /dev/sda2 /dev/sdb2 mirror /dev/sdc2 /dev/sdd2

When checking the status of the pool we will see:

  NAME STATE READ WRITE CKSUM diskpool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 sda2 ONLINE 0 0 0 sdb2 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 sdc2 ONLINE 0 0 0 sdd2 ONLINE 0 0 0

If you have many disks and you need more capacious RAID :

Note

raidz - RAID-5. One P-sum - by xor

raidz2 - RAID-6. One P-sum - on xor, one Q-sum - read-solomon code on GF (2 ^ 8). GF is not fair, is optimized for performance, and therefore contains 0 and does not have the right to be called RAID-6, but raidz2 calculates the Q amount 8 times faster than RAID-6.

raidz3 - with triple parity (I have never looked into the source code to check what is used to create the 3rd amount)

Create a RAID5 on 4 disks:

 $ zpool create -o ashift=12 diskpool raidz /dev/sda2 /dev/sdb2 /dev/sdc2 /dev/sdd2

When checking the pool we will see:

  NAME STATE READ WRITE CKSUM diskpool ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 sda2 ONLINE 0 0 0 sdb2 ONLINE 0 0 0 sdc2 ONLINE 0 0 0 sdd2 ONLINE 0 0 0

Let's create RAID6 on 5 disks:

 $ zpool create -o ashift=12 diskpool raidz2 /dev/sda2 /dev/sdb2 /dev/sdc2 /dev/sdd2 /dev/sde2

Create a RAID-50 on 12 disks:

 $ zpool create -o ashift=12 diskpool raidz /dev/sda2 /dev/sdb2 /dev/sdc2 /dev/sdd2 /dev/sde2 /dev/sdf2 raidz /dev/sdh2 /dev/sdi2 /dev/sdj2 /dev/sdk2 /dev/sdl2 /dev/sdm2

The principle is obvious - any combinations are available. You can combine different types of RAID in the same pool, but of course there is no point in this, but there are limitations:

1. ARC2 SSD cashe is not available for pools in which there is a mirror.
2. Mixing different types of RAIDs in one pool ensures unpredictable pool performance (in the worst sense, a sandwich down).

Since I mentioned the ARC2 cache SSD:

 #/dev/sdf - SSD  $ zpool create -o ashift=12 diskpool raidz1 /dev/sda2 /dev/sdb2 /dev/sdc2 /dev/sdd2 /dev/sde2 cache /dev/sdf

When checking the pool we will see:

  NAME STATE READ WRITE CKSUM diskpool ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 sda2 ONLINE 0 0 0 sdb2 ONLINE 0 0 0 sdc2 ONLINE 0 0 0 sdd2 ONLINE 0 0 0 cache sdf ONLINE 0 0 0

When creating a pool, you can specify additional options like this:

 $ zpool create -o ashift=12 -o listsnapshots=on diskpool raidz /dev/sda2 /dev/sdb2 /dev/sdc2 /dev/sdd2

All options except ashift and some feature @ you can change already after creating a pool.

Attention! If you are using zfs version 0.6.5 and higher, then you must disable the following features when creating a pool:

feature@spacemap_histogram
feature@enabled_txg
feature@hole_birth
feature@extensible_dataset
feature@embedded_data
feature@bookmarks
feature@filesystem_limits
feature@large_blocks

If they are included, GRUB2 will not be able to boot from such a file system. These are new buns, about which GRUB2 is not yet known.

So, let's create a pool with all the parameters we need:

 $ zpool create -o ashift=12 -o listsnapshots=on \ - feature@spacemap_histogram=disabled\ - feature@enabled_txg=disabled\ - feature@hole_birth=disabled\ - feature@extensible_dataset=disabled\ - feature@embedded_data=disabled\ - feature@bookmarks=disabled\ - feature@filesystem_limits=disabled\ - feature@large_blocks=disabled\ diskpool raidz /dev/sda2 /dev/sdb2 /dev/sdc2 /dev/sdd2

You can also immediately specify options for file systems created later with the -O key (more on them later).

Now it is necessary:

1. Create file systems correctly
2. Specify to the pool the main root system and, if desired, an alternative (very handy thing)
3. Install GRUB2 on all disks in the pool.
4. Other

After creating the pool, you have the default file system associated with the pool:

 $ zfs list

 NAME USED AVAIL REFER MOUNTPOINT diskpool 471G 442G 136K legacy

If the zfs list command in the MOUNTPOINT column for this file system does not contain "legacy", then this should be fixed immediately:

 $ zfs set mountpoint=legacy diskpool

In general, this is the root file system, but we will not use it, but create a separate virtual root file system like this:

 $ zfs create -o utf8only=on -o compression=lz4 -o atime=off -o relatime=on -o acltype=posixacl -o mountpoint=legacy -o xattr=on diskpool/ROOT

This FS also does not have a mount point, and also contains a number of useful options that will be inherited by all later created FS (if you do not specify other options).

Used options

Options atime = off and relatime = on will significantly improve the performance of the file system by sacrificing the file access time stamps.

The compression = lz4 option will enable the “very productive” version of the lzjb compression algorithm on the FS. Somewhere there are even tests, and, I remember, they impressed me. To include compression or not is not only a matter of taste, but also comfort in work, and also very much depends on the purpose of the FS. About this, perhaps, I will write in a subsequent article.

Do you want utf8 support in filenames and no nuisance troubles? The best choice is the utf8only = on option.

Well, xattr support is definitely needed (xattr = on). I personally met the emergence of POSIX ACL support (option acltype = posixacl) in ZFSonLinux as a holiday (kill, but I don’t remember in which version this feature was added).

Next, we’ll tell the pool that this is our bootable file system:

 $ zpool set bootfs=diskpool/ROOT

Next, follow the instructions of the original article in the OS installation section, with the following differences:

1. We do not create a separate / boot boot file and do not mount anything in the / boot directory
2. No / etc / fstab yet
3. Installing GRUB2 on disks should be changed as follows:

 $ grub2-install --modules=zfs --boot-directory=/boot /dev/sda $ grub2-install --modules=zfs --boot-directory=/boot /dev/sdb $ grub2-mkconfig -o /boot/grub/grub.cfg

4. Before you start rebuilding the initramfs, be sure to remove /mnt/etc/zfs/zpool.cache.

Then again, all according to the instructions.

Notes

It is not at all necessary to use disk device aliases from the / dev / disk / by- * directories (each disk knows the composition of the pool by wwn). You can also edit /etc/zfs/vdev_id.conf and give the disks their own names using the alias option:

 alias disk1 wwn-0x5000c50045957af3-part2 alias disk2 wwn-0x50004cf20b0d7fa0-part2

If you are using multipath, add the following options in the same file:

 multipath yes # PCI_ID HBA PORT CHANNEL NAME channel 85:00.0 1 A channel 85:00.0 0 B channel 86:00.0 1 A channel 86:00.0 0 B

Naturally, replacing the PCI ID for HBA with your own.

To be honest, I never installed ZFS on enterprise distributions. The reasons are obvious. The feat of kvaps in this case is also obvious. Respect

Source: https://habr.com/ru/post/268807/

All Articles

ZFS on CentOS: work on the bugs

More articles: