ZFS is the best file system (yet)

ZFS should be awesome, but it pisses me off a bit that it seems to be stuck in the past - even before it was recognized as the coolest and best file system. It is inflexible, it lacks modern integration with flash memory and is not directly supported by most operating systems. But I store all my valuable data on ZFS, since it is this one that provides the best level of protection for the SOHO (small office / home office) environment. And that's why.

The first storage system directive: do not return incorrect data!

ZFS revolution. Around 2006

In my articles on FreeNAS, I insistently repeated that “ZFS is the best file system,” but if you look at my social media posts, it will become clear that I really don’t really like it. I came to the conclusion that such a contradiction requires an explanation and a context, so we will risk disturbing ZFS fans and do it.

When ZFS first appeared in 2005, it was absolutely timely, but it is still stuck there. ZFS developers did a lot of the right things by combining the best functions of a volume manager with the zettabyte-size file system on Solaris 10:
')

ZFS has achieved a level of scalability that every modern file system should have, with virtually no restrictions on the amount of data and metadata and file size.
ZFS checks the checksums of all data and metadata for damage detection; it is an absolutely necessary feature for long-term large-scale data storage.
When ZFS detects an error, it can automatically recover data from mirrors, parity blocks, or alternative storage locations.
Mirroring and RAID-Z are built into the system, due to which numerous drives are organically combined into one logical volume.
ZFS has robust features for preparing snapshots and mirrors, including the ability to incrementally update data on other volumes.
Data can be compressed on the fly, deduplication is also supported.

When ZFS appeared, it was a revolutionary system , compared to the old volume dispatchers and file systems. And Sun opened most of the ZFS source code, allowing it to be ported to other operating systems. As the industry's favorite toy, ZFS quickly appeared on Linux and FreeBSD, and even Apple began to introduce it as part of the next-generation file system on Mac OS X! The future seemed so bright!

Checksums for user data are needed, otherwise you will inevitably lose data: “ Why do data integrity checks be required on large disks ” and “ First storage system directive: do not lose data ”

From 2007 to 2010: ZFS went downhill

But something terrible happened to ZFS on the way to its triumph: lawsuits, problems with licenses and FUD - a tactic of psychological manipulation from ill-wishers.

The first clouds appeared in 2007, when NetApp filed a lawsuit against Sun on the grounds that ZFS infringed their WAFL patents. Sun responded with a counterclaim the same year - and legal litigation dragged on. Although there was definitely no NetApp code in ZFS, the copy mechanism when writing to snapshots was similar to WAFL, and some of us in the industry were concerned that the NetApp lawsuit would affect the availability of ZFS open source. These risks were enough for Apple to abandon ZFS support on Mac OS X 10.6 “Snow Leopard” right before the release of this OS.

Here's a great blog about ZFS and Apple from Adam Leventhal, who worked on this project in the company: ZFS: Apple's New Filesystem That Wasn't

Then Sun went through difficult times, and Oracle took the opportunity to buy the company. This sowed new doubts about the future of ZFS, since Oracle is known as not a big fan of widespread public support for free projects. And the CDDL license that Oracle has applied to the ZFS code is recognized as incompatible with GPLv2, which is used in Linux, which makes it impossible to use ZFS in the world's most popular OS for servers.

Although the OpenSolaris project continued after Oracle was acquired, ZFS was included in FreeBSD, but this was largely outside the corporate sector. Of course, NexentaStor and GreenBytes helped promote ZFS in the corporate sector, but the lack of support for Sun servers by Oracle also began to influence the situation.

What are the problems with ZFS now?

OpenZFS is almost the same as the file system that was ten years ago.

Many people remain skeptical of deduplication, which requires a lot of expensive memory. And I really mean expensive: almost every ZFS FAQ definitely requires only ECC memory and at least 8 GB. In my own experience with FreeNAS, for an active small server with ZFS, 32 GB is suitable, and it costs $ 200-300, even at today's prices.

And ZFS didn’t really adapt to flash memory, which is now used everywhere. Although the flash can be used for ZIL and L2ARC caches, this is a dubious advantage for systems with enough RAM, and ZFS does not have the real hybrid data storage feature. It is ridiculous that several gigabytes of SLC flash memory are commonly mentioned in the ZFS documentation when there are already 3D NAND multi-terabyte drives on the market. And no one talks about NVMe, although this is the standard for high-end PCs.

And there is another question of flexibility, more precisely, its absence. If you created a ZFS volume, it is almost fixed for life. There are only three ways to expand the storage pool:

Replace absolutely all disks in the pool with larger capacity disks (which is cool, but expensive).
Create a disk sequence with a different set of disks (which can lead to unbalanced performance, redundancy, and a bunch of other potentially silly errors).
Build a new pool and transfer datasets there with the zfs send command (this is what I do, although there are some tricks here).

In addition to the third method, you can not reduce the pool ZFS. Worse, you cannot change the type of data protection without rebuilding the entire pool, including adding second and third parity disks. FreeNAS is spending a fair amount of time in good faith trying to dissuade newbies from using RAID-Z1 ^[1] , and complains if they still choose such a scheme.

All this may seem small, insignificant cavils, but in aggregate they subjectively send ZFS to the Middle Ages, after using Drobo, Synology, or modern cloud storage systems. With ZFS, you need to “buy disks, lots of memory, create a RAID array, and never touch it again,” which is not entirely consistent with modern storage utilization ^[2] .

What options?

Probably, I presented ZFS not quite in a favorable light. It was once revolutionary, but is now beginning to show limitations and fall out of the context of the modern world with flash data storage. So are there alternatives?

Linux has some decent volume dispatchers and file systems, and most use LVM or MD and ext4. Btrfs was very pleased with the file system specialists, which combines the functions of a volume manager and a ZFS-style file system, but with additional flexibility beyond what ReiserFS has splashed on. And Btrfs really could have become “ZFS for Linux”, but not so long ago the development stumbled after last year’s terrible bug with data loss from RAID 5 and 6 raids, and almost nothing is heard about them. But I still think that in five years I will recommend Linux users to use Btrfs, especially with its powerful potential for use in containers ^[3] .

For Windows, Microsoft also rolls out its own new-generation ReFS file system using B + trees (similar to Btrfs), with crazy scaling and resilience and data protection ^[4] . Combined with Storage Spaces, Microsoft will have a next-generation viable storage system for Windows Server, which can even use SSD and 3D-XPoint as a level or cache.

And there is Apple, which is rumored to change the storage system several times before settling on APFS , which was released this year at macOS High Sierra. APFS is in many ways similar to Btrfs and ReFS, although implemented in a completely different way, with greater user orientation. Yielding in some areas (user data is not checked by checksum and compression is not supported), APFS is the exact system needed for iOS and macOS. And APFS is the last nail in the coffin of the “ZFS on Mac OS X” idea.

Each of the three main operating systems now has a new generation file system (and volume manager). Linux has Btrfs, Windows has ReFS and Storage Spaces, and macOS has APFS. FreeBSD seems to have maintained its commitment to ZFS, but this is a small part of the market. And each enterprise-level system has already progressed far beyond what ZFS and enterprise-level ZFS systems from Sun, Nexenta, and iXsystems can do.

But ZFS is still far superior to old file systems for the home user. Due to the lack of integrity checks, redundancy, and recovery from NTFS errors (Windows), HFS + (macOS), and ext3 / 4 (Linux), they are absolutely not suitable for long-term data storage. And even ReFS and APFS, due to lack of integrity checking, are not suitable where data loss is unacceptable.

Author position: use ZFS (for now)

It's sad to admit, but for 2017, ZFS is the best file system for long-term large-scale data storage. Although it is sometimes difficult to work with it (except for FreeBSD, Solaris, and specialized devices), reliability and reliability make ZFS the only reliable tool for storing data outside of corporate storage systems. After all, reliable data storage is the only thing that a file system really needs to do . All my important data goes straight to ZFS, from photos to music, from movies to office files. Not long before, I trust anything other than ZFS!

Footnotes

1. For modern large drives, RAID-Z2 and RAID-Z3 with more redundancy are preferable. ↑
2. Strange, although multiple pools and removable disks work fine on ZFS, almost no one talks about this use case. We are always talking about a single pool called “tank”, which includes all the disks in the system. ↑
3. One thing that Btrfs truly lacks is support for flash, and especially hybrid storage systems. But personally, I would prefer that they first implement support for RAID-6. ↑
4. Although checksums for data in ReFS are still disabled by default. ↑

Source: https://habr.com/ru/post/334596/

All Articles