📜 ⬆️ ⬇️

Designing software platform protected NAS


Suppose the NAS hardware is built and the OS is installed on it, for example, as shown here . And now you have a working server with Debian, which is loaded, connected to the network, and you have full physical access to it.


Now you need to design an environment that allows you to easily and safely add, delete application services, and manage their work.


Inspired by an article from a certain Cloud Architect , I decided to make a system in which most services work in containers.


In addition, similar methods (for example, partitioning I / O spaces by container virtualization) are used in fairly important systems of the nuclear industry .


It is very convenient and safe:



But in his version there is something that I did not like:



This article offers an option that works on my NAS and so far suits me perfectly.


In order to improve readability, the topic is divided into two articles: design and implementation .


Disk Organization


Here I will repeat a little and for the sake of completeness I will briefly present the disk organization of the system.

System SSD


Two SSDs are reserved for the system, and the second SSD is a mirror of the first.


Data structure on SSD:


  • part_boot - partition with bootloader. Size = 1GB.
  • part_system - the partition with the system. Size = 32 GB (Recommended size: 16 GB * 2).
  • part_slog - a section with SLOG. Size = 5 GB.
    SLOG can not be more than the size of RAM = 32 GB + 48 GB swap = 80 GB, 16 GB, taking into account the possible expansion of the swap).
    96 GB - absolute limit.
    The usual rule for setting the size of a SLOG is to allocate a volume of no more than that which the system can skip in 5 seconds.
    In the case of a board with 2 1G interfaces, SLOGsz = 1000/8 2 5 = 1250 MB.
    The size with the margin is 5 GB.

part_system and part_slog encrypted in XTS mode.


In general, their organization is:


 SSD1: [part_boot] -> [zfs_mirror] <---> SSD2 SSD1: [part_system] -> [crypto_xts] -> [zfs_mirror] <---> SSD2 SSD1: [part_slog] -> [crypto_xts] -> [zfs_zil_mirror] <---> SSD2 

Partitions are duplicated using ZFS.


SSD with L2ARC cache


The bottom layer is encrypted using XTS mode on a random key.


Contains two sections:


  • part_swap - swap partition. Size = 48 GB (max RAM * 1.5 = 32 GB * 1.5).
  • part_l2arc - L2ARC. Size = 196 GB (ARC size * [3..10], ARC size = 0.6 * max RAM size, ie 58 - 196 GB, besides, with deduplication disabled, you need ~ 1 GB L2ARC per 1 TB of data).

swap and l2arc are encrypted with a random key.


A random key for the swap partition is acceptable because the system will not use hibernation.


Under L2ARC, all remaining space is allocated, its real necessity with a memory size of up to 32 GB is questionable.


The size of L2ARC is required to be more precisely adjusted in the process of system operation according to the statistics of cache hits.


Organization:


 SSD3: | -> [part_swap] -> [crypto_xts] -> [system swap] | -> [part_l2arc] -> [crypto_xts] -> [l2arc] 

Disc basket


Since At the first stage it was planned to use 4 disks from 8 possible, all disks in the basket are included in 2 ZFS VDEV.


Each disk first has an XTS encryption layer. A physical ZFS device is organized on top of it.


4 physical devices are combined into one RAIDZ1. If you do not mind the disk space, or more devices (for example, you plan to immediately buy all the disks), it is recommended to make a RAIDZ2 and one array.


Performance Measurements
  • Direct copy to disk in blocks of 4 MB: 185 MB / s (dsync: 136 MB / s)
  • LUKS partition with AES-NI command support: 184 MB / s (dsync: 135 MB / s)
  • ZFS volume = 170 MB / s (dsync: 50 MB / s).
  • LUKS on ZFS volume = 274 MB / s (dsync: 38 MB / s).
  • ZFS on LUKS = 187 MB / s (dsync: 50 MB / s)

Hence the conclusion: in spite of the "assurances of experts", it is better to place the ZFS pool over LUKS, and not vice versa. LUKS almost does not contribute overhead (with AES-NI). And it is always possible to enable disk write caching manually (as well as choosing the block size, which is variable in ZFS, too).


The complete scheme is as follows:


 HDD1: [crypto_xts] -> [zfs_phdev] | HDD2: [crypto_xts] -> [zfs_phdev] | HDD3: [crypto_xts] -> [zfs_phdev] | -> [RAIDZ1] -> [tank0] HDD4: [crypto_xts] -> [zfs_phdev] | HDD5: [crypto_xts] -> [zfs_phdev] | HDD6: [crypto_xts] -> [zfs_phdev] | HDD7: [crypto_xts] -> [zfs_phdev] | -> [RAIDZ1] -> [tank1] HDD8: [crypto_xts] -> [zfs_phdev] | 

File System Organization


Pool structure


The NAS will contain various application systems, described below.
Each system adds its own directory to the fixed points of the directory structure of the pool, but the directory names for each system will be described during its design.


Below is the file structure of the pool, common to all systems:



Chart code
 @startsalt { {T +/ ++ tank0 +++ docker ++++ lib ++++ services +++ apps +++ repos +++ user_data ++++ music ++++ videos ++++ pictures ++++  . ++ tank1 +++ apps +++ user_data } } @endsalt 

In the diagram:



Storage tank1 allocated for the future: it will be implemented in the case of expanding disk space.


The composition of the system software and the interaction of components


This is what NAS looks like in the context of systems interacting with it:



Chart code
 @startuml ' ----------------------------------------------------- 'left to right direction scale 0.72 package Internet #efefff { cloud "Let's\nEncrypt" as le { } cloud "Cloud DNS" #ffffff { frame "A-" as af { artifact "system.NAS.cloudns.cc" as d1 artifact "omv.NAS.cloudns.cc" as d2 artifact "ldap.NAS.cloudns.cc" as d3 artifact "ssp.NAS.cloudns.cc" as d4 artifact "cloud.NAS.cloudns.cc" as d5 artifact "git.NAS.cloudns.cc" as d6 artifact "backup.NAS.cloudns.cc" as d7 } frame "TXT-" as tf { artifact "Domain:_acme-challenge.*.NAS.cloudns.cc\nTxt value:9ihDbjxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" } } cloud " " as ms #ffffff } 'Internet end package LAN #efffef { node  as router #ffffff { component "DNS " { artifact ".*\.nas" as ndns } component "NAT" { artifact " 80" as rop80 artifact " 443" as rop443 artifact " 5022" as rop5022 } } node NAS #ffefef { component "" { artifact " 22" as nasp22 artifact " 80" as nasp80 artifact " 443" as nasp443 } package Docker { component "letsencrypt-dns" as led { } component "nginx-reverse-proxy" as nrp { } component "nginx-local" as ngl { } component "LDAP    WEB GUI" as ldaps { } component "LDAP SSP" as ssp { } collections " \n(Backup, Cloud, DLNA, etc.)" as services } artifact " " as cert_file component "WEB  OMV" as omvwg { } component " OMV" as omv { } component "SSH " as sshd { } component "NUT" as nut { } } } 'LAN end node  as ups #ffefef actor  as user user <- af user <-> router user <- ms nut <-- ups : USB le <-- af le <-- tf led .-> tf : \n led <-. le : \n led .> cert_file nrp <. cert_file af -> router ndns -> NAS rop80 <--> nasp80 rop443 <--> nasp443 rop5022 <--> nasp22 nasp22 <--> sshd nasp80 <--> nrp nasp443 <--> nrp nrp <--> ngl : \ndocker0 nrp <--> ldaps : \ndocker0 nrp <--> ssp : \ndocker0 ldaps <--> ssp : \ndocker0 nrp <--> services ngl <--> omvwg :   omv <--> omvwg :   omv .> ms : \n omv ..> sshd : \n WEB GUI omv ..> nut : \n WEB GUI @enduml 

This cluttered diagram reflects the composition of its services and most of the interactions.
Further components, the appointment of services and operation algorithms will be described in more detail.


operating system


The system, as shown above, is installed on SSDs that are included in the mirror using ZFS. As the OS, OpenMediaVault is selected - the storage management system and WEB GUI (hereinafter - OMV).


It is rather simple to install it with a package, and everything else will be pulled up according to dependencies: the kernel, additional repositories, etc.


System software


The central components are:



On a physical machine, only OMV, SSH, and demons that the user does not access work. All other systems work inside the Docker containers.


user management


User authentication is done through LDAP. This is done to manage users centrally, and most services support this mechanism, unlike, for example, RADIUS servers and similar, albeit more convenient, new and lightweight solutions.


The LDAP server works in a container, but it can also be accessed from the host network.
Services (gitlab, OMV, cloud, etc.) are configured to use an LDAP server.
Users can change passwords using LDAP Self Service Password .


When adding a user to the system, it is first necessary to register the user using the console or the WEB-interface. I am using PHP LDAP Admin .


If you want to give users rights to operate with the OS, it is possible to use PAM LDAP .


Network


Physical organization


As seen in the deployment diagram, the NAS is located behind a router on the local network. Ideally, in order to increase network security, it would be nice to isolate it in the DMZ using a second router, but this is not necessary.


The user can access the NAS from both the Internet and the local network. Any appeal affects the router.


Since the system has several interfaces, more than one is connected to the router simultaneously (in my case two) and their bonding is organized.


First, the interfaces in this case are not idle.
Secondly, it increases reliability, and in some modes and throughput.


Logical organization


As it is possible to see from the diagram, the router participates in the system operation.


In the case of access from an external network, the router organizes port forwarding. Ports 80 and 443 are used to provide access to services via HTTP and HTTPS, respectively. Port 5022 is forwarded to NAS port 22 for SSH access. Ideally, it is better to have some discipline for assigning port numbers: for example, ports 10001-10999 are assigned to access host services in the NAS, ports 110001-11999 are assigned to access the second home server, etc.


When accessing from the Internet, you need to be able to bind several domain names to your IP. This is implemented in different ways, but I use the option with a cloudy DNS, providing a DNS zone. As such, ClouDNS has been applied.


In the case of access from the local network, the router provides a DNS server. If you move the DNS server from the router to the NAS, the system will be completely autonomous. But it does not make much sense, because Without a functioning router, which serves to organize a local network (including connecting the NAS to the network) and communicating with external networks, it is impossible to use the NAS anyway.


Its DNS server should be able to return a specific IP if the name falls under a regular expression.


For the NAS, there are two entries: " .*\.nas " and " .*\.NAS\.cloudns\.cc ", where the NAS is registered in the ClouDNS zone.


As a result, regardless of whether there is Internet, the router will redirect all calls from the local network to domains in the NAS.cloudns.cc zone on the NAS.


An HTTP request, hitting the port of NAS 80 or 443, is redirected to the port of the container with nginx-reverse-proxy.


It returns a signed certificate to the user for secure access over HTTPS. Then, depending on the domain name, forwards the request to the container with the required service. For example, a request for cloud.NAS.cloudns.cc will be redirected to the container in which the personal cloud is running.


There are two types of services:



In the case of services of the second type, the redirection is organized through the nginx-local container containing the domain names of the "iron" host.


The process of passing the request within the NAS is shown in the diagram below. The certificate renewal process shown on the diagram is described below.



Chart code
 @startuml actor  as user participant "cloudns" as cld participant "letsencrypt-dns" as led participant "Let's Encrypt" as le participant "nginx-reverse-proxy" as nrp participant "" as app participant "nginx-local" as ngl user -> cld :  IP  Cloudns\n[nas.nas.cloudns.cc] user <- cld :  IP\n[1.2.3.4] user -> cld :   user <- cld :  group    led -> le :  \n led <- le :   led -> cld : DNS challenge,\n API led <- cld : \n led -> le :  \n   cld <- le :    cld -> le :  led <- le : \nc end user -> nrp :   IP [1.2.3.4]\n  80, 443,\n   [nas.nas.cloudns.cc]\n   nrp -> app :   \n   [nas.*]\n \n docker0 group      Docker  app --> ngl :    localhost ngl -> "  " :  ngl <- "  " :  app <-- ngl :   end nrp <- app :  user <- nrp :     @enduml 

Certificate

Periodically, a container with letsencrypt-dns receives a certificate for a group of domains. The certificate is obtained using certbot . Before obtaining a certificate, Let's Encrypt service checks whether the domain belongs to the one who requests a certificate for it.


With the case of ClouDNS, this is done using the so-called DNS challenge :



Insertion into TXT can be done manually or using an API. In order to unify access to different DNS providers, there is a library and a tool Lexicon .


Unfortunately, there is a minus for ClouDNS: its API is paid. Considering that, I did not attach the certificate immediately, and I did not want to redo everything, I just bought access, which costs $ 42 for 2 years (and it would be strange to pinch forty bucks, with a total NAS price of more than $ 3000).


If desired, it is possible to find normal services with a free API.


Network Services


Infrastructure services related to the network that are required for operation and maintenance:



It is possible to use nginx-proxy-companion for obtaining certificates, but it did not work for me.


Interaction with the power source


As the core of the power management system, the NUT daemon was chosen, which is supported by the OMV plugin and which does not have any serious universal alternatives.


Accordingly, the Eaton uninterruptible power supply was initially chosen so that problems in the Linux + NUT bundle did not arise with it.


Eatons, in this case, are generally very well supported . Its only serious drawback is the noise. But it is easily corrected by replacing the fan, which was described in the "iron" article.


In order for the NAS to interact properly and the UPS, you need to configure responses to the events described below.


When the battery reaches its maximum service life, the following is performed:



When the power supply is disconnected for more than 1 minute, it is performed:



When the battery drops below the critical level, it is executed:



Security measures from unauthorized access


Here are just general security measures regarding the NAS:



Reliability Tools


In order to reduce the probability of system failure, are used:



If the SMART parameters of any disk reach critical values, the following is performed:



System management


Software management is carried out through:



WEB interface


The system provides basic services accessible via HTTPS (in the NAS.cloudns.cc zone from the Internet, or in the nas zone on the local network):



Additional subsystems will add their own interfaces, which are described in the same way.


Organization of application systems


Preliminary composition of systems:



Each of the systems will be described separately.


Utility Components


DBMS


Since The DBMS is required for most systems, at the initial stage there was a desire to select one DBMS, based on the capabilities and requirements of the subsystems, and run in a single copy. But in the end, it turned out that using several DBMS, depending on the implementation of the subsystem, is simpler and not particularly costly in terms of resources. On this option, I have stopped.


Additional reservation system


Tasks of this system:



Composition:





')

Source: https://habr.com/ru/post/359344/


All Articles