With the development of information technology, the need for storing and processing large amounts of information is growing. To store a large amount of data using distributed file systems. About one of these file systems and will be discussed in this article.
While working on the project, I was faced with the need to store a large amount of data. One of the project's programs, written in c ++, generates a large amount of statistics that need to be stored somewhere further. Working autonomously on a server, such a program generates hundreds of gigabytes of information, and only growth in the volume of information generated is projected in the future.
Accordingly, the question arises - where to store all this data? If you store data in the north, on which the program is running, then disk space will be exhausted very soon. It became clear that it was necessary to use a distributed file system for data storage, but only which one? Googling the Internet we find various distributed file systems, but the look involuntarily stops at the new distributed file system QFS (Quantcast File System). Let's understand why. Do not confuse with QFS (Quick File System) - a file system from Sun Microsystems.
What is QFS (Quantcast File System)
Quantcast File System (QFS) is a high-performance, fault-tolerant, distributed file system designed to support MapReduce technology and other applications that sequentially read and write large files.
')
QFS is an open source distributed file system distributed under the
Apache 2.0 license, developed by Quantcast and presented as an alternative to HDFS. It was formed from KFS (Kosmos File System), which Quantcast began using for secondary storage. In 2011, they transferred the main data processing to QFS and stopped using HDFS. Over the next year, they recorded and read more than 4 exabytes of information, which became a guarantee of product readiness for publication. In September 2012, QFS 1.0 was released. File system code is now available at
github.com/quantcast/qfsA bit about the company Quantcast from Wikipedia:
Quantcast is a web analytics company that acts as an independent audience meter. The company's activity is mainly to help advertisers and advertising sites to find each other, and then act as a neutral third-party measurer, which determines the amount of advertising sold.
QFS architecture

QFS consists of 3 components
- Metaserver
A central metadata server that manages the file system directory structure and the mapping of files to physical storage. - Chunk server
The server that stores the data. - Client Library
A library that provides a file system API to allow applications to interact with QFS
QFS features
- Scalability
You can add another Chunk Server to your system at any time. When it is added, Chunk Server will connect to Metaserver and become part of the system. You will not need to restart Metaserver - Balancing
When placing data, Metaserver tries to keep the data balanced in all nodes of the system. - Rebalancing
Metaserver will rebalance data between nodes in the system when the server detects that some nodes are underutilized (for example, the chunk server uses less than 20% of the disk space) and other nodes are overloaded (for example, the chunk server uses more than 80% of the disk space). - Fall Resistant
Resilience to data loss is one of the most important properties of a distributed file system. QFS supports piecewise replication (i.e. storing multiple copies of each piece) and Reed-Solomon 6 + 3 encoding , a non-binary cyclic code that allows to correct errors in data blocks — thereby with the same degree of fault tolerance as in classical replication, the total amount of stored data. less information! Example, if HDFS requires 300 terabytes of free space to store 100 terabytes of data with replication on 3 servers, then only 150 terabytes are required to store in QFS with the same fault tolerance. - Re-replication
If the degree of file replication falls below the set value (for example, due to an extended shutdown of the chunk server), Metaserver automatically replicates the inaccessible pieces of the file. This procedure is performed in the background without overloading the system. - Data integrity
To cope with the damage on the disks of data blocks, the checksum of each block is calculated. Each time a reader is checked, the checksum of the block is checked, and if the block is damaged, then a replication mechanism is used to restore it. - Client-side caching of meta-information
The QFS client library caches directory-related metadata to avoid re-searching the server when converting paths. This information is cleared from the cache after 30 seconds. - Write to file
The QFS client library uses write-back cache. Once the cache is full, data is sent to chunk servers. An application can manually reset data to a chunk server using the sync () method. - Version control
Data chunks have a version so that you can identify obsolete data. Consider the following scenario:
- Suppose that the chunk servers s1, s2 and s3 store version V of a piece C
- Suppose that the server s1 fell and while the server s1 lies the client began recording in C
- Recording will be performed on s2 and s3 servers. The version of the piece will be changed to version V2
- When s1 starts up, it will ask the metaserver for versions of all the pieces it stores.
The metaserver will notice that the s1 server stores the outdated version of the C piece and will report this to the s1 server. - S1 server will remove obsolete piece C
- Client Failover
The client library is resistant to chunk server failures. During the reading, if the client library determines that the chunk server with which information is being exchanged has become unavailable, the client library switches to another chunk server and continues reading. This property of fault tolerance is transparent to the application. - FUSE support on Linux
- Supports Unix style permissions
Known issues and limitations
- There is only one Metaserver in the system. This is a single point of failure. Metaserver files are stored on a local disk to prevent the loss of the contents of the file system, this information should be periodically copied to a remote server.
- The maximum replication value is 64
- Lack of dynamic balancing. Metaserver does not currently support replication of pieces of a file when a file gets hot. The system, however, performs limited form balancing whenever it determines that the discs are underutilized on some nodes and overloaded with information on other nodes.
- No support for snapshot creation
- Restriction of random writing to files encoded by the Reed-Solomon algorithm. In other words, in practice, only sequential writing to files encoded by the Reed-Solomon algorithm is possible.
- Simultaneous reading and writing from / to a piece is not supported. A piece of the file cannot be read until the writing program closes it.
Fast start
Installing QFS in a test configuration is fairly straightforward.
An example of a C ++ application working with QFS can be found in the project archive at /examples/cc/qfssample_main.cc
Famous rake
Experiments with this file system revealed some problems.
- C ++ Client Library is not stable on 32-bit Debian. Data loss while writing, sometimes a segmentation error occurs.
- If Meta server and Chunk server are on the same server, in the Chunk server settings specify the IPv4 address of the Meta server, do not specify localhost, otherwise the client library will try to connect to the server at 127.0.0.1
Interesting links (English)
Conclusion
The essence of this post is not in the overview of various distributed file systems, and not in comparing the capabilities of QFS, for example with HDFS Hadoop. The purpose of this post is to draw the attention of the domestic IT community to a new, interesting against the background of others, distributed QFS file system (Quancast File System). Let it still be young and not without some flaws, but certainly ready to take its place in the DFS ecosystem in the near future.
As far as I know, this is the first Russian-language article dedicated to this project, so I chose the introductory style of presentation, without seriously delving into the architecture of the project, its installation and configuration.