📜 ⬆️ ⬇️

Intel Enterprise Edition - “Shade” for Luster

Creating high-load cluster systems is not an easy task in itself; it is further complicated by the fact that such solutions require maximum balance. There is no place for “crutches” and “patches”, each component in the work must squeeze the maximum number of flops and iops. This, of course, applies to one of the critical components of any hardware solution - the file system. In the development of supercomputers, several variants of specialized file systems were created, the most popular of which was Luster , which began to be developed in the last century and which is currently supported by Intel. In the 3 years since Intel’s purchase of Whamcloud, the developer of Luster, by Intel, the product has been enhanced with new features and tools. In this post, you will learn how.

As already mentioned, Luster is a fairly common system with a long history of development; currently, more than 60 percent of supercomputers from the Top500 use it as a file system. Therefore, you probably should not spend much time describing it; A very detailed introduction can be found, for example, in the Wiki . Nevertheless, it is necessary to say a few words about the Luster construction scheme, since this will be necessary for further.



So, Luster is a distributed parallel file system, that is, a set of servers that store their data and operate independently of each other. It is based on the management server and the metadata server (MGS / MDS). MDS is a metadata repository (file names and their attributes), MGS is the place where information is stored on which servers the file system is located. Server data can be on different computers, they can be on one. The objects themselves (fragments of data from the contents of the file) are on different devices (OST) managed by storage servers (OSS). Unlike traditional file systems, Luster inode is used as a key to search for a structure with information about the actual partitioning and location of data. Thus, an additional layer of abstraction is created.
')
The main advantage of this approach is that the file fragments are stored on different servers and the request for them occurs in parallel. Instead of waiting for the data to be considered as one large piece from one place, Luster breaks a large piece into smaller ones and loads them in parallel from different places. Exceptional parallelization capabilities provide the required data processing speed, which is essentially limited only by the throughput of physical connections. The file system does not impose hard locks on files, but flexibly ensures data integrity using a special mechanism. This is very similar to synchronizing caches between different processors in a multiprocessor system.

The current limit on the total size of the stored data is 512 petabytes.



Intel, like a number of other software vendors, offers its Luster-based solution. It is based, of course, on Luster FS itself, freely distributed under the GNU GPL software license - today version 2.7. But then add and gadgets, created by a special division of the Intel High Performance Data Division (HPDD) and included in the Intel Enterprise Edition for Luster Software, begin.

For developers, the Hadoop adapter will first of all be useful, allowing you to run MapReduce applications directly on Luster. Thus two hares are killed at once. Firstly, Hadoop users have access to files located on Luster without the need to use the regular distributed Hadoop file system and additional copy operations.

Secondly, the system as a whole becomes simpler and more pleasant: Hadoop coexists with Luster and takes advantage of it, without consuming a separate place for itself. Another useful development and deployment tool is an API set (including REST) ​​that allows you to easily and quickly integrate third-party software and data storage systems with Luster.



For server and storage administrators, there is even more pleasant news. The Intel Enterprise Edition for Luster Software package contains the Intel Manager for Luster Software graphical application for launching, configuring, monitoring and administering the Luster system, as well as demonstrating faults in it. The manager provides a graphical interface for any action related to file system management, and also visualizes Luster statistics by numerous criteria, thus showing its status. Another necessary tool for an administrator is a command line interface with the ability to write scripts to automate routine maintenance and management processes.



So, on the basis of a good product, which is still freely distributed, Intel has created a whole ecosystem of software - first of all, for ease of use and implementation in complex solutions. Well, experts have the right to decide for themselves whether to simply use Luster or Luster with a “shade” in the form of Intel Enterprise Edition for Luster Software.

Thanks to the leading developer of the HPDD division , Dmitry Eremin, for his help in writing the post.

Source: https://habr.com/ru/post/257489/


All Articles