Dear Habrasoobschestvo!
I want to ask your advice in choosing a distributed cluster file system, since I have no experience with them, and they themselves are quite different and have a lot of features. In addition, there is still a relative information hunger in this direction - there is simply no specificity.
The system is based on Linux.
What I need from this file system:
- Distribution of data on network nodes
- Automatic creation of replicas (need 3, better if you can customize)
- For customers, this should look like a full-fledged POSIX (you can not quite, but close to) file system. For users, this should look like a regular file system.
- Built-in High Avaliability, auto repair, add new nodes on the fly
- XFS support is desirable
')
Task?
The task is essentially simple - WEB hosting, i.e. the sites and their files will be stored in the repository. The WEB server will be connected to the repository and work with it directly as with the file system.
Results of independent searches:
The first thing that came across is
DRBD . This is just replication, can be used for geographic replication. In general, not FS.
Next came
GFS (Global File System) . After researching the information on it, it was found that the system is not distributed, but simply allows customers to join the central repository and all work with it at the same time. For small volumes in general, this is very good. Fault tolerance can be organized using the same DRBD mirroring data. However, if you need a large amount, you will have to dodge with expensive storage systems, because This system works with block devices that are connected via iSCSI, FC, InfiniBand, etc. With large volumes, the costs go up sharply because of the need to buy expensive pieces of iron, besides another 2, so that the second would be a slave from the first in stock. Of course, I don’t know, maybe some kind of virtual block device can be built from a pack of servers, but in my opinion this is a perversion.
And then I finally got to the bottom of
GlusterFS (
Off. Site ). Judging by the description - what we need. A distributed cluster file system, with data replication, data distribution across network nodes, is scaled almost linearly. It has automatic recovery, adding nodes to the cluster on the fly, etc., in general a full-fledged adult FS. Used on many production clusters around the world.
Actually, questions:
- Is there anyone who has worked with such systems. What to expect from them, what are the pitfalls?
- Maybe someone knows other, more suitable FS?
PS Hadoop, MogileFS and others do not offer, it is more frameworks for embedding in applications. I need a file system-level solution.
PSS Please note that we are discussing a fully functional and stable FS that can be used in production. Many offer products that are in early development (PohmelFS) and / or have a bunch of restrictions (GridFS, in which there are neither permissions, nor folders, and even creating files - an experimental feature. GridFS is made on top of MongoDB).