⬆️ ⬇️

Free VMware vSphere Storage Appliance replacement based on DRBD

VMware recently announced new products in the vSphere 5 line, and we were very interested in what is a VMware vSphere Storage Appliance?



In short, the bottom line is the ability to build a fault-tolerant virtual infrastructure without an external storage system. For the implementation, two or three virtual machines are installed (one for each host), which replicate the free space of the ESXi server disk subsystem and provide it as shared storage to all the same ESXi hosts. Detailed in Russian storage appliance is described here .



An interesting idea, but the price bites - around $ 6K. In addition, if you think about performance, is it possible that there will be a drawdown in the speed of the disk array? Approaching the question from the other side, you can think of many other ways of organizing external storage. For example, you can create external storage from almost any hardware with the required number of disks and installed Openfiler, FreeNAS, Nexenta, Open-E software — these software products have the ability to replicate between systems.

')

This approach is practiced by many companies that do not have the opportunity to purchase an expensive storage system of a famous manufacturer that would provide sufficient performance and reliability. As a rule, such systems are equipped with two controllers, a redundant power supply system, high-speed drives and so on ...



But back to the beginning and look at the scheme that VMware offers:







What do we see? 3 ESXi hosts with virtual machines deployed on them, one for each host. Machines are collected in a cluster and give us the internal drives as external.



The idea of ​​collecting such a solution from one of the available tools has been in the air for a long time, but it did not find any justification. And here VMware itself gave an impetus to try everything in a test environment.



Solutions for building fault-tolerant storage - a bunch, for example, based on Openfiler + DRBD + Heartbeat. But at the heart of all these decisions lies the idea of ​​building an external repository. Why not try to do something similar, but based on virtual machines?



As a foundation, take 2 virtual machines with OS Ubuntu, Ubuntu documentation on building failover iSCSI-target and try to make your Appliance.



Partitioning disks on both cluster nodes:



/dev/sda1 - 10 GB / (primary' ext3, Bootable flag: on)

/dev/sda5 - 1 GB swap (logical)



/dev/sdb1 - 1 GB (primary) DRBD meta-. .

/dev/sdc1 - 1 GB (primary) DRBD , iSCSI. .

/dev/sdd1 - 50 GB (primary) DRBD iSCSI-target.




The size of the disk sdd1 is selected for an example. Actually, all the remaining free space on the local storage of the ESXi host is taken.



ISCSI network:

iSCSI server1: node1.demo.local IP address: 10.11.55.55

iSCSI server2: node2.demo.local IP address: 10.11.55.56

iSCSI Virtual IP address 10.11.55.50




Private network:

iSCSI server1: node1-private IP address: 192.168.22.11

iSCSI server2: node2-private IP address: 192.168.22.12




/ etc / network / interfaces:



For node1:

auto eth0

iface eth0 inet static

address 10.11.55.55

netmask 255.0.0.0

gateway 10.0.0.1



auto eth1

iface eth1 inet static

address 192.168.22.11

netmask 255.255.255.0




For node2:

auto eth0

iface eth0 inet static

address 10.11.55.56

netmask 255.0.0.0

gateway 10.0.0.1



auto eth1

iface eth1 inet static

address 192.168.22.12

netmask 255.255.255.0




The / etc / hosts file for both nodes:



127.0.0.1 localhost

10.11.55.55 node1.demo.local node1

10.11.55.56 node2.demo.local node2

192.168.22.11 node1-private

192.168.22.12 node2-private




Package installation:

apt-get -y install ntp ssh drbd8-utils heartbeat jfsutils



Reboot the servers.



We change the owners of files and permissions to them:

chgrp haclient /sbin/drbdsetup

chmod ox /sbin/drbdsetup

chmod u+s /sbin/drbdsetup

chgrp haclient /sbin/drbdmeta

chmod ox /sbin/drbdmeta

chmod u+s /sbin/drbdmeta




Use /etc/drbd.conf to describe the configuration. We define 2 resources:

1. DRBD device that will contain iSCSI configuration files;

2. DRBD device that will become our iSCSI target.



For node1:



/etc/drbd.conf:



resource iscsi.config {

protocol C;



handlers {

pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f";

pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f";

local-io-error "echo o > /proc/sysrq-trigger ; halt -f";

outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5";

}



startup {

degr-wfc-timeout 120;

}



disk {

on-io-error detach;

}



net {

cram-hmac-alg sha1;

shared-secret "password";

after-sb-0pri disconnect;

after-sb-1pri disconnect;

after-sb-2pri disconnect;

rr-conflict disconnect;

}



syncer {

rate 100M;

verify-alg sha1;

al-extents 257;

}



on node1 {

device /dev/drbd0;

disk /dev/sdc1;

address 192.168.22.11:7788;

meta-disk /dev/sdb1[0];

}



on node2 {

device /dev/drbd0;

disk /dev/sdc1;

address 192.168.22.12:7788;

meta-disk /dev/sdb1[0];

}

}



resource iscsi.target.0 {

protocol C;



handlers {

pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f";

pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f";

local-io-error "echo o > /proc/sysrq-trigger ; halt -f";

outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5";

}



startup {

degr-wfc-timeout 120;

}



disk {

on-io-error detach;

}



net {

cram-hmac-alg sha1;

shared-secret "password";

after-sb-0pri disconnect;

after-sb-1pri disconnect;

after-sb-2pri disconnect;

rr-conflict disconnect;

}



syncer {

rate 100M;

verify-alg sha1;

al-extents 257;

}



on node1 {

device /dev/drbd1;

disk /dev/sdd1;

address 192.168.22.11:7789;

meta-disk /dev/sdb1[1];

}



on node2 {

device /dev/drbd1;

disk /dev/sdd1;

address 192.168.22.12:7789;

meta-disk /dev/sdb1[1];

}

}




Copy the configuration to the second node:

scp /etc/drbd.conf root@10.11.55.56:/etc/



We initialize disks with meta-data on both servers:

[node1]dd if=/dev/zero of=/dev/sd1

[node1]dd if=/dev/zero of=/dev/sdd1

[node1]drbdadm create-md iscsi.config

[node1]drbdadm create-md iscsi.target.0



[node2]dd if=/dev/zero of=/dev/sd1

[node2]dd if=/dev/zero of=/dev/sdd1

[node2]drbdadm create-md iscsi.config

[node2]drbdadm create-md iscsi.target.0




We start drbd:

[node1]/etc/init.d/drbd start

[node2]/etc/init.d/drbd start




Now you need to decide which server will be primary and which secondary to perform synchronization between disks. Assume that the primary (Primary) will be node1.

Run the command on the first node:

[node1]drbdadm -- --overwrite-data-of-peer primary iscsi.config



Command output

cat /proc/drbd:



version: 8.3.9 (api:88/proto:86-95)

srcversion: CF228D42875CF3A43F2945A

0: cs:Connected ro: Primary/Secondary ds:UpToDate/UpToDate C r-----

ns:1048542 nr:0 dw:0 dr:1048747 al:0 bm:64 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

1: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r-----

ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:52428768




Format and mount the / dev / drbd0 partition:

[node1]mkfs.ext3 /dev/drbd0

[node1]mkdir -p /srv/data

[node1]mount /dev/drbd0 /srv/data




Create a file on the first node and then switch the second to Primary mode:

[node1]dd if=/dev/zero of=/srv/data/test.zeros bs=1M count=100



For node1:

[node1]umount /srv/data

[node1]drbdadm secondary iscsi.config




For node2:

[node2]mkdir -p /srv/data

[node2]drbdadm primary iscsi.config

[node2]mount /dev/drbd0 /srv/data




On the second node, a file of 100 MB in size will be visible.

ls –l /srv/data



Delete it and again switch to the first node:



On node2:

[node2]rm /srv/data/test.zeros

[node2]umount /srv/data

[node2]drbdadm secondary iscsi.config




On node1:

[node1]drbdadm primary iscsi.config

[node1]mount /dev/drbd0 /srv/data




Run the command ls / srv / data. If there is no data on the partition, then the replication was successful.



Go to the installation of iSCSI-target. Select the first node as Primary and perform partition synchronization:

[node1]drbdadm -- --overwrite-data-of-peer primary iscsi.target.0



cat /proc/drbd



version: 8.3.9 (api:88/proto:86-95)

srcversion: CF228D42875CF3A43F2945A

0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----

ns:135933 nr:96 dw:136029 dr:834 al:39 bm:8 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

1: cs:SyncSource ro: Primary/Secondary ds:UpToDate/Inconsistent C r-----

ns:1012864 nr:0 dw:0 dr:1021261 al:0 bm:61 lo:1 pe:4 ua:64 ap:0 ep:1 wo:f oos:51416288

[>....................] sync'ed: 2.0% (50208/51196)M

finish: 0:08:27 speed: 101,248 (101,248) K/sec




Wait for sync ...



cat /proc/drbd



version: 8.3.9 (api:88/proto:86-95)

srcversion: CF228D42875CF3A43F2945A

0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----

ns:135933 nr:96 dw:136029 dr:834 al:39 bm:8 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

1: cs:Connected ro: Primary/Secondary ds:UpToDate/UpToDate C r-----

ns:52428766 nr:0 dw:0 dr:52428971 al:0 bm:3200 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0




Install the iscsitarget package on both nodes:

[node1]apt-get -y install iscsitarget

[node2]apt-get -y install iscsitarget




Enable the option to run iscsi as a service:

[node1]sed -is/false/true/ /etc/default/iscsitarget

[node2]sed -is/false/true/ /etc/default/iscsitarget




Delete entries from all scripts:

[node1]update-rc.d -f iscsitarget remove

[node2]update-rc.d -f iscsitarget remove




Moving config files to drbd partition:

[node1]mkdir /srv/data/iscsi

[node1] mv /etc/iet/ietd.conf /srv/data/iscsi

[node1]ln -s /srv/data/iscsi/ietd.conf /etc/iet/ietd.conf

[node2]rm /etc/iet/ietd.conf

[node2]ln -s /srv/data/iscsi/ietd.conf /etc/iet/ietd.conf




We describe the iSCSI-target in the /srv/data/iscsi/ietd.conf file:



Target iqn.2011-08.local.demo:storage.disk.0

# IncomingUser geekshlby secret - ,

# OutgoingUser geekshlby password

Lun 0 Path=/dev/drbd1,Type=blockio

Alias disk0

MaxConnections 1

InitialR2T Yes

ImmediateData No

MaxRecvDataSegmentLength 8192

MaxXmitDataSegmentLength 8192

MaxBurstLength 262144

FirstBurstLength 65536

DefaultTime2Wait 2

DefaultTime2Retain 20

MaxOutstandingR2T 8

DataPDUInOrder Yes

DataSequenceInOrder Yes

ErrorRecoveryLevel 0

HeaderDigest CRC32C,None

DataDigest CRC32C,None

Wthreads 8




Now you need to configure heartbeat to control the iSCSI-target virtual IP address in case of node failure.



We describe the cluster in the /etc/heartbeat/ha.cf file:



logfacility local0

keepalive 2

deadtime 30

warntime 10

initdead 120

bcast eth0

bcast eth1

node node1

node node2




Authentication mechanism

/etc/heartbeat/authkeys:



auth 2

2 sha1 NoOneKnowsIt




Change the permissions on the / etc / heartbeat / authkeys file:

chmod 600 /etc/heartbeat/authkeys



We describe the cluster resources in the / etc / heartbeat / haresources file — the main node, virtual IP, file systems, and services that will be launched:



/etc/heartbeat/haresources



node1 drbddisk::iscsi.config Filesystem::/dev/drbd0::/srv/data::ext3

node1 IPaddr::10.11.55.50/8/eth0 drbddisk::iscsi.target.0 iscsitarget




Copy the configuration to the second node:

[node1]scp /etc/heartbeat/ha.cf root@10.11.55.56:/etc/heartbeat/

[node1]scp /etc/heartbeat/authkeys root@10.11.55.56:/etc/heartbeat/

[node1]scp /etc/heartbeat/haresources root@10.11.55.56:/etc/heartbeat/




Unmount / srv / data, make the first node as Secondary.

Starting a heartbeat

[node1]/etc/init.d/heartbeat start



Reboot both servers.



[node1]/etc/init.d/drbd start

[node2]/etc/init.d/drbd start



[node1]drbdadm secondary iscsi.config -

[node1]drbdadm secondary iscsi.target.0 -



[node2]drbdadm primary iscsi.config

[node2]drbdadm primary iscsi.target.0



[node1]cat /proc/drbd



[node1]/etc/init.d/heartbeat start




After the heartbeat starts, we transfer the first node to the primary mode, the second to the secondary mode (otherwise, it will not start).



[node2]drbdadm secondary iscsi.config

[node2]drbdadm secondary iscsi.target.0



[node1]drbdadm primary iscsi.config

[node1]drbdadm primary iscsi.target.0




We look tail –f / var / log / syslog

We wait…

Some time later…



Aug 26 08:32:14 node1 harc[11878]: info: Running /etc/ha.d//rc.d/ip-request-resp ip-request-resp

Aug 26 08:32:14 node1 ip-request-resp[11878]: received ip-request-resp IPaddr::10.11.55.50/8/eth0 OK yes

Aug 26 08:32:14 node1 ResourceManager[11899]: info: Acquiring resource group: node1 IPaddr::10.11.55.50/8/eth0 drbddisk::iscsi.target.0 iscsitarget

Aug 26 08:32:14 node1 IPaddr[11926]: INFO: Resource is stopped

Aug 26 08:32:14 node1 ResourceManager[11899]: info: Running /etc/ha.d/resource.d/IPaddr 10.11.55.50/8/eth0 start

Aug 26 08:32:14 node1 IPaddr[12006]: INFO: Using calculated netmask for 10.11.55.50: 255.0.0.0

Aug 26 08:32:14 node1 IPaddr[12006]: INFO: eval ifconfig eth0:0 10.11.55.50 netmask 255.0.0.0 broadcast 10.255.255.255

Aug 26 08:32:14 node1 avahi-daemon[477]: Registering new address record for 10.11.55.50 on eth0.IPv4.

Aug 26 08:32:14 node1 IPaddr[11982]: INFO: Success

Aug 26 08:32:15 node1 ResourceManager[11899]: info: Running /etc/init.d/iscsitarget start

Aug 26 08:32:15 node1 kernel: [ 5402.722552] iSCSI Enterprise Target Software - version 1.4.20.2

Aug 26 08:32:15 node1 kernel: [ 5402.723978] iscsi_trgt: Registered io type fileio

Aug 26 08:32:15 node1 kernel: [ 5402.724057] iscsi_trgt: Registered io type blockio

Aug 26 08:32:15 node1 kernel: [ 5402.724061] iscsi_trgt: Registered io type nullio

Aug 26 08:32:15 node1 heartbeat: [12129]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL

Aug 26 08:32:15 node1 harc[12129]: info: Running /etc/ha.d//rc.d/ip-request-resp ip-request-resp

Aug 26 08:32:15 node1 ip-request-resp[12129]: received ip-request-resp IPaddr::10.11.55.50/8/eth0 OK yes

Aug 26 08:32:15 node1 ResourceManager[12155]: info: Acquiring resource group: node1 IPaddr::10.11.55.50/8/eth0 drbddisk::iscsi.target.0 iscsitarget

Aug 26 08:32:15 node1 IPaddr[12186]: INFO: Running OK

Aug 26 08:33:08 node1 ntpd[1634]: Listen normally on 11 eth0:0 10.11.55.50 UDP 123

Aug 26 08:33:08 node1 ntpd[1634]: new interface(s) found: waking up resolver




ifconfig

eth0 Link encap:Ethernet HWaddr 00:50:56:20:f9:6c

inet addr:10.11.55.55 Bcast:10.255.255.255 Mask:255.0.0.0

inet6 addr: fe80::20c:29ff:fe20:f96c/64 Scope:Link

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

RX packets:3622 errors:0 dropped:0 overruns:0 frame:0

TX packets:8081 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:1000

RX bytes:302472 (302.4 KB) TX bytes:6943622 (6.9 MB)

Interrupt:19 Base address:0x2000



eth0:0 Link encap:Ethernet HWaddr 00:50:56:20:f9:6c

inet addr:10.11.55.50 Bcast:10.255.255.255 Mask:255.0.0.0

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

Interrupt:19 Base address:0x2000




eth1 Link encap:Ethernet HWaddr 00:50:56:20:f9:76

inet addr:192.168.22.11 Bcast:192.168.22.255 Mask:255.255.255.0

inet6 addr: fe80::20c:29ff:fe20:f976/64 Scope:Link

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

RX packets:1765 errors:0 dropped:0 overruns:0 frame:0

TX packets:3064 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:1000

RX bytes:171179 (171.1 KB) TX bytes:492567 (492.5 KB)

Interrupt:19 Base address:0x2080




We connect the resulting iSCSI-target to both ESX (i) hosts. After both hosts have seen the storage system, we assemble the HA cluster. Although there is no space left for creating virtual machines on the hosts themselves, this place is now represented as virtual storage. If any of the nodes fail, the virtual machine on the second node will switch to Primary mode and will continue to work as an iSCSI target.



Using hdparm, I measured the speed of the disk in a virtual machine installed on the target:







Naturally, for a serious production-systems such storage system is not suitable. But if there are no high-loaded virtual machines or it is necessary to test the possibility of building an HA cluster, then this method of providing shared storage has the right to life.



After reading this material, many may say that it is “wrong”, “there will be a performance drawdown”, “the possibility of failure of both nodes”, etc. Yes! Maybe it will be so, but then, for some reason, VMware has released its Storage Appliance?



PS: By the way, who is too lazy to shovel everything manually, there is a Management Console for setting up a DRBD cluster: http://www.drbd.org/mc/screenshot-gallery/ .



madbug

Senior Systems Engineer DEPO Computers

Source: https://habr.com/ru/post/130573/



All Articles