Free VMware vSphere Storage Appliance replacement based on DRBD

VMware recently announced new products in the vSphere 5 line, and we were very interested in what is a VMware vSphere Storage Appliance?

In short, the bottom line is the ability to build a fault-tolerant virtual infrastructure without an external storage system. For the implementation, two or three virtual machines are installed (one for each host), which replicate the free space of the ESXi server disk subsystem and provide it as shared storage to all the same ESXi hosts. Detailed in Russian storage appliance is described here .

An interesting idea, but the price bites - around $ 6K. In addition, if you think about performance, is it possible that there will be a drawdown in the speed of the disk array? Approaching the question from the other side, you can think of many other ways of organizing external storage. For example, you can create external storage from almost any hardware with the required number of disks and installed Openfiler, FreeNAS, Nexenta, Open-E software — these software products have the ability to replicate between systems.
')
This approach is practiced by many companies that do not have the opportunity to purchase an expensive storage system of a famous manufacturer that would provide sufficient performance and reliability. As a rule, such systems are equipped with two controllers, a redundant power supply system, high-speed drives and so on ...

But back to the beginning and look at the scheme that VMware offers:

What do we see? 3 ESXi hosts with virtual machines deployed on them, one for each host. Machines are collected in a cluster and give us the internal drives as external.

The idea of collecting such a solution from one of the available tools has been in the air for a long time, but it did not find any justification. And here VMware itself gave an impetus to try everything in a test environment.

Solutions for building fault-tolerant storage - a bunch, for example, based on Openfiler + DRBD + Heartbeat. But at the heart of all these decisions lies the idea of building an external repository. Why not try to do something similar, but based on virtual machines?

As a foundation, take 2 virtual machines with OS Ubuntu, Ubuntu documentation on building failover iSCSI-target and try to make your Appliance.

Partitioning disks on both cluster nodes:

/dev/sda1 - 10 GB / (primary' ext3, Bootable flag: on)
/dev/sda5 - 1 GB swap (logical)

/dev/sdb1 - 1 GB (primary) DRBD meta-. .
/dev/sdc1 - 1 GB (primary) DRBD , iSCSI. .
/dev/sdd1 - 50 GB (primary) DRBD iSCSI-target.

The size of the disk sdd1 is selected for an example. Actually, all the remaining free space on the local storage of the ESXi host is taken.

ISCSI network:

iSCSI server1: node1.demo.local IP address: 10.11.55.55
iSCSI server2: node2.demo.local IP address: 10.11.55.56
iSCSI Virtual IP address 10.11.55.50

Private network:

iSCSI server1: node1-private IP address: 192.168.22.11
iSCSI server2: node2-private IP address: 192.168.22.12

/ etc / network / interfaces:

For node1:

auto eth0
iface eth0 inet static
address 10.11.55.55
netmask 255.0.0.0
gateway 10.0.0.1

auto eth1
iface eth1 inet static
address 192.168.22.11
netmask 255.255.255.0

For node2:

auto eth0
iface eth0 inet static
address 10.11.55.56
netmask 255.0.0.0
gateway 10.0.0.1

auto eth1
iface eth1 inet static
address 192.168.22.12
netmask 255.255.255.0

The / etc / hosts file for both nodes:

127.0.0.1 localhost
10.11.55.55 node1.demo.local node1
10.11.55.56 node2.demo.local node2
192.168.22.11 node1-private
192.168.22.12 node2-private

Package installation:
apt-get -y install ntp ssh drbd8-utils heartbeat jfsutils

Reboot the servers.

We change the owners of files and permissions to them:

chgrp haclient /sbin/drbdsetup
chmod ox /sbin/drbdsetup
chmod u+s /sbin/drbdsetup
chgrp haclient /sbin/drbdmeta
chmod ox /sbin/drbdmeta
chmod u+s /sbin/drbdmeta

Use /etc/drbd.conf to describe the configuration. We define 2 resources:
1. DRBD device that will contain iSCSI configuration files;
2. DRBD device that will become our iSCSI target.

For node1:

/etc/drbd.conf:

resource iscsi.config {
protocol C;

handlers {
pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f";
pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f";
local-io-error "echo o > /proc/sysrq-trigger ; halt -f";
outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5";
}

startup {
degr-wfc-timeout 120;
}

disk {
on-io-error detach;
}

net {
cram-hmac-alg sha1;
shared-secret "password";
after-sb-0pri disconnect;
after-sb-1pri disconnect;
after-sb-2pri disconnect;
rr-conflict disconnect;
}

syncer {
rate 100M;
verify-alg sha1;
al-extents 257;
}

on node1 {
device /dev/drbd0;
disk /dev/sdc1;
address 192.168.22.11:7788;
meta-disk /dev/sdb1[0];
}

on node2 {
device /dev/drbd0;
disk /dev/sdc1;
address 192.168.22.12:7788;
meta-disk /dev/sdb1[0];
}
}

resource iscsi.target.0 {
protocol C;

handlers {
pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f";
pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f";
local-io-error "echo o > /proc/sysrq-trigger ; halt -f";
outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5";
}

startup {
degr-wfc-timeout 120;
}

disk {
on-io-error detach;
}

net {
cram-hmac-alg sha1;
shared-secret "password";
after-sb-0pri disconnect;
after-sb-1pri disconnect;
after-sb-2pri disconnect;
rr-conflict disconnect;
}

syncer {
rate 100M;
verify-alg sha1;
al-extents 257;
}

on node1 {
device /dev/drbd1;
disk /dev/sdd1;
address 192.168.22.11:7789;
meta-disk /dev/sdb1[1];
}

on node2 {
device /dev/drbd1;
disk /dev/sdd1;
address 192.168.22.12:7789;
meta-disk /dev/sdb1[1];
}
}

Copy the configuration to the second node:
scp /etc/drbd.conf root@10.11.55.56:/etc/

We initialize disks with meta-data on both servers:

[node1]dd if=/dev/zero of=/dev/sd1
[node1]dd if=/dev/zero of=/dev/sdd1
[node1]drbdadm create-md iscsi.config
[node1]drbdadm create-md iscsi.target.0

[node2]dd if=/dev/zero of=/dev/sd1
[node2]dd if=/dev/zero of=/dev/sdd1
[node2]drbdadm create-md iscsi.config
[node2]drbdadm create-md iscsi.target.0

We start drbd:

[node1]/etc/init.d/drbd start
[node2]/etc/init.d/drbd start

Now you need to decide which server will be primary and which secondary to perform synchronization between disks. Assume that the primary (Primary) will be node1.
Run the command on the first node:
[node1]drbdadm -- --overwrite-data-of-peer primary iscsi.config

Command output

cat /proc/drbd:

version: 8.3.9 (api:88/proto:86-95)
srcversion: CF228D42875CF3A43F2945A
0: cs:Connected ro: Primary/Secondary ds:UpToDate/UpToDate C r-----
ns:1048542 nr:0 dw:0 dr:1048747 al:0 bm:64 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
1: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r-----
ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:52428768

Format and mount the / dev / drbd0 partition:

[node1]mkfs.ext3 /dev/drbd0
[node1]mkdir -p /srv/data
[node1]mount /dev/drbd0 /srv/data

Create a file on the first node and then switch the second to Primary mode:
[node1]dd if=/dev/zero of=/srv/data/test.zeros bs=1M count=100

For node1:

[node1]umount /srv/data
[node1]drbdadm secondary iscsi.config

For node2:

[node2]mkdir -p /srv/data
[node2]drbdadm primary iscsi.config
[node2]mount /dev/drbd0 /srv/data

On the second node, a file of 100 MB in size will be visible.
ls –l /srv/data

Delete it and again switch to the first node:

On node2:

[node2]rm /srv/data/test.zeros
[node2]umount /srv/data
[node2]drbdadm secondary iscsi.config

On node1:

[node1]drbdadm primary iscsi.config
[node1]mount /dev/drbd0 /srv/data

Run the command ls / srv / data. If there is no data on the partition, then the replication was successful.

Go to the installation of iSCSI-target. Select the first node as Primary and perform partition synchronization:

[node1]drbdadm -- --overwrite-data-of-peer primary iscsi.target.0

cat /proc/drbd

version: 8.3.9 (api:88/proto:86-95)
srcversion: CF228D42875CF3A43F2945A
0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
ns:135933 nr:96 dw:136029 dr:834 al:39 bm:8 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
1: cs:SyncSource ro: Primary/Secondary ds:UpToDate/Inconsistent C r-----
ns:1012864 nr:0 dw:0 dr:1021261 al:0 bm:61 lo:1 pe:4 ua:64 ap:0 ep:1 wo:f oos:51416288
[>....................] sync'ed: 2.0% (50208/51196)M
finish: 0:08:27 speed: 101,248 (101,248) K/sec

Wait for sync ...

cat /proc/drbd

version: 8.3.9 (api:88/proto:86-95)
srcversion: CF228D42875CF3A43F2945A
0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
ns:135933 nr:96 dw:136029 dr:834 al:39 bm:8 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
1: cs:Connected ro: Primary/Secondary ds:UpToDate/UpToDate C r-----
ns:52428766 nr:0 dw:0 dr:52428971 al:0 bm:3200 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

Install the iscsitarget package on both nodes:

[node1]apt-get -y install iscsitarget
[node2]apt-get -y install iscsitarget

Enable the option to run iscsi as a service:

[node1]sed -is/false/true/ /etc/default/iscsitarget
[node2]sed -is/false/true/ /etc/default/iscsitarget

Delete entries from all scripts:

[node1]update-rc.d -f iscsitarget remove
[node2]update-rc.d -f iscsitarget remove

Moving config files to drbd partition:

[node1]mkdir /srv/data/iscsi
[node1] mv /etc/iet/ietd.conf /srv/data/iscsi
[node1]ln -s /srv/data/iscsi/ietd.conf /etc/iet/ietd.conf
[node2]rm /etc/iet/ietd.conf
[node2]ln -s /srv/data/iscsi/ietd.conf /etc/iet/ietd.conf

We describe the iSCSI-target in the /srv/data/iscsi/ietd.conf file:

Target iqn.2011-08.local.demo:storage.disk.0
# IncomingUser geekshlby secret - ,
# OutgoingUser geekshlby password
Lun 0 Path=/dev/drbd1,Type=blockio
Alias disk0
MaxConnections 1
InitialR2T Yes
ImmediateData No
MaxRecvDataSegmentLength 8192
MaxXmitDataSegmentLength 8192
MaxBurstLength 262144
FirstBurstLength 65536
DefaultTime2Wait 2
DefaultTime2Retain 20
MaxOutstandingR2T 8
DataPDUInOrder Yes
DataSequenceInOrder Yes
ErrorRecoveryLevel 0
HeaderDigest CRC32C,None
DataDigest CRC32C,None
Wthreads 8

Now you need to configure heartbeat to control the iSCSI-target virtual IP address in case of node failure.

We describe the cluster in the /etc/heartbeat/ha.cf file:

logfacility local0
keepalive 2
deadtime 30
warntime 10
initdead 120
bcast eth0
bcast eth1
node node1
node node2

Authentication mechanism

/etc/heartbeat/authkeys:

auth 2
2 sha1 NoOneKnowsIt

Change the permissions on the / etc / heartbeat / authkeys file:
chmod 600 /etc/heartbeat/authkeys

We describe the cluster resources in the / etc / heartbeat / haresources file — the main node, virtual IP, file systems, and services that will be launched:

/etc/heartbeat/haresources

node1 drbddisk::iscsi.config Filesystem::/dev/drbd0::/srv/data::ext3
node1 IPaddr::10.11.55.50/8/eth0 drbddisk::iscsi.target.0 iscsitarget

Copy the configuration to the second node:

[node1]scp /etc/heartbeat/ha.cf root@10.11.55.56:/etc/heartbeat/
[node1]scp /etc/heartbeat/authkeys root@10.11.55.56:/etc/heartbeat/
[node1]scp /etc/heartbeat/haresources root@10.11.55.56:/etc/heartbeat/

Unmount / srv / data, make the first node as Secondary.
Starting a heartbeat
[node1]/etc/init.d/heartbeat start

Reboot both servers.

[node1]/etc/init.d/drbd start
[node2]/etc/init.d/drbd start

[node1]drbdadm secondary iscsi.config -
[node1]drbdadm secondary iscsi.target.0 -

[node2]drbdadm primary iscsi.config
[node2]drbdadm primary iscsi.target.0

[node1]cat /proc/drbd

[node1]/etc/init.d/heartbeat start

After the heartbeat starts, we transfer the first node to the primary mode, the second to the secondary mode (otherwise, it will not start).

[node2]drbdadm secondary iscsi.config
[node2]drbdadm secondary iscsi.target.0

[node1]drbdadm primary iscsi.config
[node1]drbdadm primary iscsi.target.0

We look tail –f / var / log / syslog
We wait…
Some time later…

Aug 26 08:32:14 node1 harc[11878]: info: Running /etc/ha.d//rc.d/ip-request-resp ip-request-resp
Aug 26 08:32:14 node1 ip-request-resp[11878]: received ip-request-resp IPaddr::10.11.55.50/8/eth0 OK yes
Aug 26 08:32:14 node1 ResourceManager[11899]: info: Acquiring resource group: node1 IPaddr::10.11.55.50/8/eth0 drbddisk::iscsi.target.0 iscsitarget
Aug 26 08:32:14 node1 IPaddr[11926]: INFO: Resource is stopped
Aug 26 08:32:14 node1 ResourceManager[11899]: info: Running /etc/ha.d/resource.d/IPaddr 10.11.55.50/8/eth0 start
Aug 26 08:32:14 node1 IPaddr[12006]: INFO: Using calculated netmask for 10.11.55.50: 255.0.0.0
Aug 26 08:32:14 node1 IPaddr[12006]: INFO: eval ifconfig eth0:0 10.11.55.50 netmask 255.0.0.0 broadcast 10.255.255.255
Aug 26 08:32:14 node1 avahi-daemon[477]: Registering new address record for 10.11.55.50 on eth0.IPv4.
Aug 26 08:32:14 node1 IPaddr[11982]: INFO: Success
Aug 26 08:32:15 node1 ResourceManager[11899]: info: Running /etc/init.d/iscsitarget start
Aug 26 08:32:15 node1 kernel: [ 5402.722552] iSCSI Enterprise Target Software - version 1.4.20.2
Aug 26 08:32:15 node1 kernel: [ 5402.723978] iscsi_trgt: Registered io type fileio
Aug 26 08:32:15 node1 kernel: [ 5402.724057] iscsi_trgt: Registered io type blockio
Aug 26 08:32:15 node1 kernel: [ 5402.724061] iscsi_trgt: Registered io type nullio
Aug 26 08:32:15 node1 heartbeat: [12129]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL
Aug 26 08:32:15 node1 harc[12129]: info: Running /etc/ha.d//rc.d/ip-request-resp ip-request-resp
Aug 26 08:32:15 node1 ip-request-resp[12129]: received ip-request-resp IPaddr::10.11.55.50/8/eth0 OK yes
Aug 26 08:32:15 node1 ResourceManager[12155]: info: Acquiring resource group: node1 IPaddr::10.11.55.50/8/eth0 drbddisk::iscsi.target.0 iscsitarget
Aug 26 08:32:15 node1 IPaddr[12186]: INFO: Running OK
Aug 26 08:33:08 node1 ntpd[1634]: Listen normally on 11 eth0:0 10.11.55.50 UDP 123
Aug 26 08:33:08 node1 ntpd[1634]: new interface(s) found: waking up resolver

ifconfig
eth0 Link encap:Ethernet HWaddr 00:50:56:20:f9:6c
inet addr:10.11.55.55 Bcast:10.255.255.255 Mask:255.0.0.0
inet6 addr: fe80::20c:29ff:fe20:f96c/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:3622 errors:0 dropped:0 overruns:0 frame:0
TX packets:8081 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:302472 (302.4 KB) TX bytes:6943622 (6.9 MB)
Interrupt:19 Base address:0x2000

eth0:0 Link encap:Ethernet HWaddr 00:50:56:20:f9:6c
inet addr:10.11.55.50 Bcast:10.255.255.255 Mask:255.0.0.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Interrupt:19 Base address:0x2000

eth1 Link encap:Ethernet HWaddr 00:50:56:20:f9:76
inet addr:192.168.22.11 Bcast:192.168.22.255 Mask:255.255.255.0
inet6 addr: fe80::20c:29ff:fe20:f976/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:1765 errors:0 dropped:0 overruns:0 frame:0
TX packets:3064 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:171179 (171.1 KB) TX bytes:492567 (492.5 KB)
Interrupt:19 Base address:0x2080

We connect the resulting iSCSI-target to both ESX (i) hosts. After both hosts have seen the storage system, we assemble the HA cluster. Although there is no space left for creating virtual machines on the hosts themselves, this place is now represented as virtual storage. If any of the nodes fail, the virtual machine on the second node will switch to Primary mode and will continue to work as an iSCSI target.

Using hdparm, I measured the speed of the disk in a virtual machine installed on the target:

Naturally, for a serious production-systems such storage system is not suitable. But if there are no high-loaded virtual machines or it is necessary to test the possibility of building an HA cluster, then this method of providing shared storage has the right to life.

After reading this material, many may say that it is “wrong”, “there will be a performance drawdown”, “the possibility of failure of both nodes”, etc. Yes! Maybe it will be so, but then, for some reason, VMware has released its Storage Appliance?

PS: By the way, who is too lazy to shovel everything manually, there is a Management Console for setting up a DRBD cluster: http://www.drbd.org/mc/screenshot-gallery/ .

madbug
Senior Systems Engineer DEPO Computers

Source: https://habr.com/ru/post/130573/

All Articles

Free VMware vSphere Storage Appliance replacement based on DRBD

More articles: