📜 ⬆️ ⬇️

iSCSI storage for the poor

Good day, dear community!

In this article I would like to share the experience of creating disk storage, which resulted in many experiments, trial, error, discoveries, seasoned with bitter disappointments. And, finally, ended in some interesting, relatively budget and fast storage.

If you have a similar task or if you are just interested in the title, then welcome to habrakat.

Prologue


So, recently, our department faced the task of providing a cluster of VMware ESXi 5.1 hypervisors with large storage. On it, we planned to locate the encrypted maildir for dovecot and the “cloud” file storage. A prerequisite for allocating a budget was to provide storage space for company-critical information, and this section should be encrypted.
')

Iron


Unfortunately, and perhaps fortunately, we were not burdened with a large budget for such ambitious tasks. Therefore, we, as true fans of maximalism, could not afford any kind of brand storage, and within the allocated material resources we chose the following hardware:


All this cost us about 200 thousand rubles.

Implementation


We decided to issue the target, that is, allocate storage resources to consumers, using iSCSI and NFS. The most reasonable and quick solution, of course, would be to use FCoE, so as not to get into TCP with the corresponding overhead, which, in general, could be done with our network cards, but, unfortunately, we do not have an SFP switch with support FCoE, buy it was not possible, since it would cost us 500 TR from above.
Once again, having smoked the Internet, we found a way out of this in the vn2vn technology, but ESXi learns how to work with vn2vn only to the 6.x version, so, without thinking further, they started to think about what it is.

Our corporate standard for Linux servers is CentOS, but in the current kernel (2.6.32-358) encryption is very slow, so I had to use Fedora as the OS. Of course, this is a Red Hat test site, but in the latest Linux kernels, data is encrypted almost on the fly, and the rest is not what we need.
In addition, the current 19 version will be used as the basis for RHEL 7, and therefore will allow us in the future to safely switch to CentOS 7.

Targets


In order not to inflate the article and do not move away from the topic, I omit all the uninteresting ones such as assembling iron, butting with the controller, installing the OS and others. I will also try to describe as little as possible the target itself and limit myself only to its work with the ESXi initiator.

From Target, we wanted to get the following:

Meet, here they are.

LIO
linux-iscsi.org
With the Linux kernel 3.10.10, it showed me 300 MB / s of writing and 600 MB / s of reading in blockio mode. He showed the same numbers with a fileio and also with a RAM disk. The graphs showed that the recording speed jumps very much, probably due to the fact that the ESXi initiator requires recording synchronization. For the same reason, the number of IOPS per record was the same with fileio and blockio.
In the meillists, it was recommended to disable emulate_fua_write, but this did not lead to any changes. Moreover, with the 3.9.5 kernel, it shows the best results, which also makes us think about its future.
LIO, judging by the description, can do a lot of things, but most features are available only in the commercial version. The site, which, in my opinion, should be primarily a source of information, is full of advertisements, which causes a negative. In the end, they decided to refuse.

istgt
www.peach.ne.jp/archives/istgt
Used in FreeBSD.
Target works quite well, except for a few but.
Firstly, it does not know how to blockio, secondly, it cannot use different MaxRec and MaxXtran, at least I did not succeed. For small MaxRec values, the sequential write speed did not exceed 250 MB / s, and the read was at a quite high level - 700 MB / s. Approximately 40K of iops, I received 4k randomly recorded with a queue depth of 32. With an increase in MaxRec, the write speed rises to 700 MB / s, the reading drops to 600 MB / s. Iops fall to read 30K and 20K to write.
That is, somehow it would be possible to find a middle ground, changing the settings, but somehow it seemed not to be difficult.

STGT
stgt.sourceforge.net
With this target there were problems with setting up the interface with the hypervisor. ESXi is constantly confused LUN - took one for the other, or stopped to see at all. There was a suspicion of a problem in incorrect binding of serial numbers, but writing them in configs did not help.
The speed is also not pleased. Achieving more than 500 MB / sec from it, neither read nor write failed. The amount of IOPS for reading is 20K, for writing it is approximately 15K.
As a result, problems with the config and low rates in speed. Refuse.

IET
iscsitarget.sourceforge.net
Worked almost flawlessly. Read and write 700MB / sec. IOPS on reading about 30K, on ​​record 2000.
The ESXi initiator forced the target to write data to the disk immediately, without using the system cache. Also, a few scared reviews about him in the maillists - many reported on unstable work under load.

SCST
scst.sourceforge.net
And finally got to the leader of our race.
After rebuilding the kernel and the minimum configuration of the target itself, we received 750MB / s of reading and 950MB / s of writing. IOPS in fileio mode - 44K for reading and 37K for writing. Immediately, almost without a tambourine.
This target seemed to me the perfect choice.

iSCSI for VMWare ESXi 5.1 on SCST and Fedora


And now, in fact, for the sake of which we all gathered here.
A small instruction on how to set up an ESXi initiator. I did not immediately decide to try to write an article on Habr, so the instruction will not be step by step - I restore it from memory, but it will contain the main points of the settings that allowed us to achieve the desired results.

ESXi 5.1 Preparation

The following settings have been made in the hypervisor:


You will need to disable Interrupt Moderation and LRO for network adapters. You can do this with the commands:

ethtool -C vmnicX rx-usecs 0 rx-frames 1 rx-usecs-irq 0 rx-framesirq 0 esxcfg-advcfg -s 0 /Net/TcpipDefLROEnabled esxcli system module parameters set -m ixgbe -p "InterruptThrottleRate=0" 


The reasons why it is worth doing:
www.odbms.org/download/vmw-vfabric-gemFire-best-practices-guide.pdf
www.vmware.com/files/pdf/techpaper/VMW-Tuning-Latency-Sensitive-Workloads.pdf

In order not to set these values ​​again, you can add them to this script:
/etc/rc.local.d/local.sh


Fedora preparation

Download and install the latest version of Fedora at a minimum.

Update the system and reboot:

 [root@nas ~]$ yum -y update && reboot 


The system will work only on the local network, so I turned off the firewall and SELinux:

 [root@nas ~]$ systemctl stop firewalld.service [root@nas ~]$ systemctl disable firewalld.service [root@nas ~]$ cat /etc/sysconfig/selinux SELINUX=disabled SELINUXTYPE=targeted 


Configure network interfaces and disable the NetworkManager.service service. It is not compatible with BRIDGE interfaces, and this was necessary for NFS.

 [root@nas ~]$ systemctl disable NetworkManager.service [root@nas ~]$ chkconfig network on 


LRO is disabled on network cards.

 [root@nas ~]$ cat /etc/rc.d/rc.local #!/bin/bash ethtool -K ethX lro off 


Following the recommendations from Intel, the following system parameters have been changed:

 [root@nas ~]$ cat /etc/sysctl.d/ixgbe.conf net.ipv4.tcp_sack = 0 net.ipv4.tcp_timestamps = 0 net.ipv4.tcp_rmem = 10000000 10000000 10000000 net.ipv4.tcp_wmem = 10000000 10000000 10000000 net.ipv4.tcp_mem = 10000000 10000000 10000000 net.core.rmem_max = 524287 net.core.wmem_max = 524287 net.core.rmem_default = 524287 net.core.wmem_default = 524287 net.core.optmem_max = 524287 net.core.netdev_max_backlog = 300000 


Target preparation

To use SCST, it is recommended to add patches to the kernel. This is optional, but with them the performance is higher.
During the creation of the repository, the latest version of the kernel was - 3.10.10-200. By the time you read the article, the kernel can already be updated, but I do not think that this will have a strong impact on the process.

Creating an rpm package with a modified kernel is described in detail here:
fedoraproject.org/wiki/Building_a_custom_kernel/en

But in order to avoid difficulties I will describe the preparation in detail.

Create a user:
 [root@nas ~]$ useradd mockbuild 


Let's go to his environment:
 [root@nas ~]$ su mockbuild [mockbuild@nas root]$ cd 


Install the build packages and prepare the kernel sources:
 [mockbuild@nas ~]$ su -c 'yum install yum-utils rpmdevtools' [mockbuild@nas ~]$ rpmdev-setuptree [mockbuild@nas ~]$ yumdownloader --source kernel [mockbuild@nas ~]$ su -c 'yum-builddep kernel-3.10.10-200.fc19.src.rpm' [mockbuild@nas ~]$ rpm -Uvh kernel-3.10.10-200.fc19.src.rpm [mockbuild@nas ~]$ cd ~/rpmbuild/SPECS [mockbuild@nas ~]$ rpmbuild -bp --target=`uname -m` kernel.spec 


Now patches will be required. Download SCST from svn repository:
 [mockbuild@nas ~]$ svn co https://scst.svn.sourceforge.net/svnroot/scst/trunk scst-svn 


Copy the necessary parts in ~ / rpmbuild / SOURCES /
 [mockbuild@nas ~]$ cp scst-svn/iscsi-scst/kernel/patches/put_page_callback-3.10.patch ~/rpmbuild/SOURCES/ [mockbuild@nas ~]$ cp scst-svn/scst/kernel/scst_exec_req_fifo-3.10.patch ~/rpmbuild/SOURCES/ 


Add a line to the kernel config:
 [mockbuild@nas ~]$ vim ~/rpmbuild/SOURCES/config-generic ... CONFIG_TCP_ZERO_COPY_TRANSFER_COMPLETION_NOTIFICATION=y ... 


Let's start editing kernel.spec.
 [mockbuild@nas ~]$ cd ~/rpmbuild/SPECS [mockbuild@nas ~]$ vim kernel.spec 


We change:
 #% define buildid .local 

On:
 %define buildid .scst 


We add our patches, preferably after all the others:
 Patch25091: put_page_callback-3.10.patch Patch25092: scst_exec_req_fifo-3.10.patch 

Add a command to apply the patch, it is recommended to add after the remaining entries:
 ApplyPatch put_page_callback-3.10.patch ApplyPatch scst_exec_req_fifo-3.10.patch 


After all the actions, run the build of rpm kernel packages with the included firmware files:
 [mockbuild@nas ~]$ rpmbuild -bb --with baseonly --with firmware --without debuginfo --target=`uname -m` kernel.spec 


After the build is completed, install the firmware and kernel header files:
 [mockbuild@nas ~]$ cd ~/rpmbuild/RPMS/x86_64/ [mockbuild@nas ~]$ su -c 'rpm -ivh kernel-firmware-3.10.10-200.scst.fc19.x86_64.rpm kernel-3.10.10-200.scst.fc19.x86_64.rpm kernel-devel-3.10.10-200.scst.fc19.x86_64.rpm kernel-headers-3.10.10-200.scst.fc19.x86_64.rpm' 


Reboot.

After a successful download, I hope, go to the directory with the SCST sources and by the root user build the target itself:
 [root@nas ~]$ make scst scst_install iscsi iscsi_install scstadm scstadm_install 


After the assembly, add the service to autorun:
 [root@nas ~]$ systemctl enable "scst.service" 


And configure the config in /etc/scst.conf. For example, my:
 [root@nas ~]$ cat /etc/scst.conf HANDLER vdisk_fileio { DEVICE mail { filename /dev/mapper/mail nv_cache 1 } DEVICE cloud { filename /dev/sdb3 nv_cache 1 } DEVICE vmstore { filename /dev/sdb4 nv_cache 1 } } TARGET_DRIVER iscsi { enabled 1 TARGET iqn.2013-09.local.nas:raid10-ssdcache { LUN 0 mail LUN 1 cloud LUN 2 vmstore enabled 1 } } 


Create files that allow or prohibit connections to the target from specific addresses, if you need it:
 [root@nas ~]$ cat /etc/initiators.allow ALL 10.0.0.0/24 [root@nas ~]$ cat /etc/initiators.deny ALL ALL 


After configuring the configuration files, run SCST:
 [root@nas ~]$ /etc/init.d/scst start 


If everything was done correctly, then the corresponding target will appear in ESXi.

Thank you for reading to the end!

Source: https://habr.com/ru/post/200466/


All Articles