📜 ⬆️ ⬇️

Budget SAN Storage on LSI Syncro Part 1


The second part of

So, I’ll continue my rare articles on the topic “how not to pay HP / EMC / IBM many kilos (or even mega) dollars and collect your storage no worse.” I did not bring the last cycle to a victorious end, but I still designed 90% of the thoughts in the text.

Our current goal will be a fault-tolerant “All-Flash” (that is, only from SSDs, without hard drives, although this is not essential) storage for the needs of the vSphere cluster, several times cheaper than branded counterparts and with very good performance. We will connect to it via Fiber Channel, but no one bothers to make iSCSI, FCoE or even, oh, horror, Infiniband.
')

Syncro


As the name implies, the basis for this whole almshouse will be a rather unique product on the market called Syncro CS from LSI (now Avago).

What is it and what is remarkable?

In fact, this is a set of two conventional LSI 9286-8e controllers (or 9271-8i, if internal ports are needed) and two supercapacitors to save the cache memory on the controller's USB flash drive in case of power loss. At the same time, the cost of the kit is several times higher than the price of a similar kit without HA functionality. But, when compared with DRBD-based solutions, this difference is more than offset by the lack of the need to have a double set of drives.

But the most interesting is in the firmware. Thanks to it, these controllers, being connected to one SAS network (for example, a disk basket with expanders), establish communication with each other through it and operate in a failover cluster mode.

For us, this is interesting:

Through the SAS network, these controllers synchronize their write caches (the so-called cache coherency), exchange Heartbeats, telling each other that they are alive and well, and so on.

They have some important features that have been clarified both by sweat and blood empirically, as well as by thoughtful reading of documentation:

ALUA


Since the speed of work with the array of the host and the slave controllers is different, we will have to build a storage with the ALUA paradigm, which was invented just for such cases. There are all sorts of special features, but for us the point is that it is possible to mark part of the ports on the storages as Optimized (that is, recommended for use), and others as Unoptimized (respectively, not recommended). All these ports can serve I / O at any time, but if there are live paths to Optimized ports, the initiator will use them and keep the rest in stock. If all the optimized ports fail, then the initiator (ESXi in our case), after some thought, will allow I / O through the Unoptimized ports.

From the point of view of VMWare ESXi, it looks like this:


The first three paths lead us to the slave storage, and the second three - to the main, which can serve I / O at full speed.

If we didn’t use ALUA, then I / O requests (in Round-Robin mode) would go to both storages, which would lead to uneven delays and speeds, essentially resting on the performance of the slave storage.

Achtung!
Many programs that work with SANs (for example, Microsoft Cluster Services) rely on SCSI Persistent Reservations, which allow you to block LUNs and perform all kinds of indecent acts with it. So, these same SCSI PR will not be replicated to another server .

That is, if the initiator places a lock on the LUN via the path leading to the first server, then the path leading to the second server will not see this lock. This is especially important for operation in Round Robin mode, since the paths change constantly and it is not known in advance what path the lock will be set to. In the case of VMWare ESXi, this is, in principle, not a problem, since SCSI PR is not used from the VMFS5 version, and the Atomic-Test-and-Set instruction (which, however, is also not replicated, but blocks only the necessary the area on the LUN and for a short time, not the whole LUN), and ALUA should let ESXi know so that it does not use the path to the second server while the main server is alive.

There are some commercial solutions based on SCST that implement synchronization of SCSI PR, but I have not tried them. And, according to rumors, some such decision will, perhaps, soon be open to the public.

All right, with the theory finished, we will start practice.

Hard & soft


Iron


Controller Servers:

Disk shelf:

FC switches:

Cost & Performance


All this happiness (together with the switches) at the time of purchase (November 2014) cost about 4 million rubles, which, with a “raw capacity” of 19.2TB flash, is very, very inexpensive. For comparison: for a low-end MSA 2040 dual-controller piece of iron with a 24 x 800GB SSD on board, HP now wants something around 18 million (yes, I know about all sorts of discounts, and the dollar has grown, but still the price order, I think something like this).

In terms of speed, they promise no more than 85k IOPS, we will have a little more: from one node I shot 720k IOPS with 75% CPU usage with vdisk_nullio backend (it reads zero when reading, discards the same as the dev / null analog).

Moreover, the CPU load, for some reason, abruptly increases with an increase in IOPS from 600k to 720k - from 40% to 75%, until it grows evenly. The expected linear speed rests on FC interfaces — about 3GGB / s (4 ports at 800Mb / s in theory, considering 8b / 10b coding for 8Gbit FC).

From Syncro, I was able to squeeze 420k IOPS , which is close to the theoretical 450k IOPS limit that LSI promises. When working with SSDs, it is very important to disable the write cache on the array (write-through mode) and enable Direct I / O, otherwise performance rests on 150k IOPS even when reading . These are the features of the FastPath technology of LSI, it is activated only under these two conditions.
The linear read speed from the array reaches 3.8GB / s. Here, most likely, we rest against the performance of the controller itself (the PCI-E 3.0 x8 bus can pump 8GB / s).

More detailed benchmarks, I think, will be in the second part.

Build servers


There is nothing special to tell.
We install a motherboard, processor, memory, two FC controllers and one Syncro into each server.
I also installed a small basket (Supermicro MCP-220-82609-0N) on the back of the case on two disks and inserted the SSD under the AXIS. The main basket of the case remains untapped so far, but in the future it can also be connected to Syncro, although this will destroy the fault-tolerant nature of the solution (turned off one node - at the same time cut down the backplane).

An important advice on reliability is to set memory mirroring in BIOS, which is a kind of RAID1 for DDR.
Brains will be less than half, but reliability will increase dramatically. Well and drive memtest th week, for fidelity. In the BIOS logs, read about ECC errors, if any - change the failed module and drive again. For if so far these are correctable errors, then they can become fatal, such cases have been.

Wiring diagram


SAS


The following scheme is brazenly steal from the Syncro documentation:


The deep meaning of such a connection is that if one of the disk shelves fails
the second will continue to work. If you connect the shelves one by one (daisy chain),
then if the first shelf in the chain fails, the connection with all the others will also be lost.
And here we have something like a ring topology, when additional shelves are connected
between these two.

In our case, there is only one shelf (even though there are three backplanes in it), so we will connect it
in the usual way, like this:

Inside the shelf we connect backplanes in series.

FC


We connect servers to FC switches as follows:

Thus, if any of the HBA cards and / or one switch fails, we will have connectivity.

Knead! Switches should not be interconnected by trunks and other ISLs!
It should be two independent factories.
For if we make a mistake in setting up one of them (or the software fails), this will not affect the second (in FC zoning and other settings apply to all switches of the factory).

I zoned according to the principle “one port of initiator and one port of storage”.
If there are a large number of initiators, this is time consuming, since the number of zones will be equal to [number of ports of initiators] X [number of ports of storages] . But, having a list of ports of storages and zone initiators can easily be generated by a script :) Laziness is the engine of progress.

Software part



Everything is more or less standard here, we put the OS on a soft-working Linux raid (mdraid) as there is no iron raid in these motherboards.

Next, configure and build the kernel.
My config, can someone come in handy
CONFIG_64BIT=y CONFIG_X86_64=y CONFIG_X86=y CONFIG_INSTRUCTION_DECODER=y CONFIG_OUTPUT_FORMAT="elf64-x86-64" CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig" CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_HAVE_LATENCYTOP_SUPPORT=y CONFIG_MMU=y CONFIG_NEED_DMA_MAP_STATE=y CONFIG_NEED_SG_DMA_LENGTH=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_BUG=y CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y CONFIG_GENERIC_HWEIGHT=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_RWSEM_XCHGADD_ALGORITHM=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_ARCH_HAS_CPU_RELAX=y CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y CONFIG_ARCH_HAS_CPU_AUTOPROBE=y CONFIG_HAVE_SETUP_PER_CPU_AREA=y CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y CONFIG_ARCH_HIBERNATION_POSSIBLE=y CONFIG_ARCH_SUSPEND_POSSIBLE=y CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y CONFIG_ARCH_WANT_GENERAL_HUGETLB=y CONFIG_ZONE_DMA32=y CONFIG_AUDIT_ARCH=y CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y CONFIG_X86_64_SMP=y CONFIG_X86_HT=y CONFIG_ARCH_HWEIGHT_CFLAGS="-fcall-saved-rdi -fcall-saved-rsi -fcall-saved-rdx -fcall-saved-rcx -fcall-saved-r8 -fcall-saved-r9 -fcall-saved-r10 -fcall-saved-r11" CONFIG_ARCH_SUPPORTS_UPROBES=y CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config" CONFIG_IRQ_WORK=y CONFIG_BUILDTIME_EXTABLE_SORT=y CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_CROSS_COMPILE="" CONFIG_LOCALVERSION="" CONFIG_HAVE_KERNEL_GZIP=y CONFIG_HAVE_KERNEL_BZIP2=y CONFIG_HAVE_KERNEL_LZMA=y CONFIG_HAVE_KERNEL_XZ=y CONFIG_HAVE_KERNEL_LZO=y CONFIG_HAVE_KERNEL_LZ4=y CONFIG_KERNEL_XZ=y CONFIG_DEFAULT_HOSTNAME="(none)" CONFIG_SYSVIPC=y CONFIG_SYSVIPC_SYSCTL=y CONFIG_POSIX_MQUEUE=y CONFIG_POSIX_MQUEUE_SYSCTL=y CONFIG_FHANDLE=y CONFIG_GENERIC_IRQ_PROBE=y CONFIG_GENERIC_IRQ_SHOW=y CONFIG_GENERIC_PENDING_IRQ=y CONFIG_IRQ_FORCED_THREADING=y CONFIG_SPARSE_IRQ=y CONFIG_CLOCKSOURCE_WATCHDOG=y CONFIG_ARCH_CLOCKSOURCE_DATA=y CONFIG_GENERIC_TIME_VSYSCALL=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_BUILD=y CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y CONFIG_GENERIC_CMOS_UPDATE=y CONFIG_TICK_ONESHOT=y CONFIG_NO_HZ_COMMON=y CONFIG_NO_HZ_IDLE=y CONFIG_HIGH_RES_TIMERS=y CONFIG_TICK_CPU_ACCOUNTING=y CONFIG_TASKSTATS=y CONFIG_TASK_DELAY_ACCT=y CONFIG_TASK_XACCT=y CONFIG_TASK_IO_ACCOUNTING=y CONFIG_TREE_RCU=y CONFIG_RCU_STALL_COMMON=y CONFIG_RCU_FANOUT=64 CONFIG_RCU_FANOUT_LEAF=16 CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y CONFIG_LOG_BUF_SHIFT=18 CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y CONFIG_ARCH_SUPPORTS_INT128=y CONFIG_ARCH_WANTS_PROT_NUMA_PROT_NONE=y CONFIG_ARCH_USES_NUMA_PROT_NONE=y CONFIG_NUMA_BALANCING_DEFAULT_ENABLED=y CONFIG_NUMA_BALANCING=y CONFIG_CGROUPS=y CONFIG_CGROUP_SCHED=y CONFIG_FAIR_GROUP_SCHED=y CONFIG_CFS_BANDWIDTH=y CONFIG_RT_GROUP_SCHED=y CONFIG_BLK_CGROUP=y CONFIG_NAMESPACES=y CONFIG_SCHED_AUTOGROUP=y CONFIG_SYSCTL=y CONFIG_ANON_INODES=y CONFIG_HAVE_UID16=y CONFIG_SYSCTL_EXCEPTION_TRACE=y CONFIG_HAVE_PCSPKR_PLATFORM=y CONFIG_UID16=y CONFIG_KALLSYMS=y CONFIG_PRINTK=y CONFIG_BUG=y CONFIG_ELF_CORE=y CONFIG_PCSPKR_PLATFORM=y CONFIG_BASE_FULL=y CONFIG_FUTEX=y CONFIG_EPOLL=y CONFIG_SIGNALFD=y CONFIG_TIMERFD=y CONFIG_EVENTFD=y CONFIG_SHMEM=y CONFIG_AIO=y CONFIG_PCI_QUIRKS=y CONFIG_HAVE_PERF_EVENTS=y CONFIG_PERF_EVENTS=y CONFIG_VM_EVENT_COUNTERS=y CONFIG_SLUB_DEBUG=y CONFIG_SLUB=y CONFIG_SLUB_CPU_PARTIAL=y CONFIG_HAVE_OPROFILE=y CONFIG_OPROFILE_NMI_TIMER=y CONFIG_JUMP_LABEL=y CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y CONFIG_ARCH_USE_BUILTIN_BSWAP=y CONFIG_HAVE_IOREMAP_PROT=y CONFIG_HAVE_KPROBES=y CONFIG_HAVE_KRETPROBES=y CONFIG_HAVE_OPTPROBES=y CONFIG_HAVE_KPROBES_ON_FTRACE=y CONFIG_HAVE_ARCH_TRACEHOOK=y CONFIG_HAVE_DMA_ATTRS=y CONFIG_GENERIC_SMP_IDLE_THREAD=y CONFIG_HAVE_REGS_AND_STACK_ACCESS_API=y CONFIG_HAVE_DMA_API_DEBUG=y CONFIG_HAVE_HW_BREAKPOINT=y CONFIG_HAVE_MIXED_BREAKPOINTS_REGS=y CONFIG_HAVE_USER_RETURN_NOTIFIER=y CONFIG_HAVE_PERF_EVENTS_NMI=y CONFIG_HAVE_PERF_REGS=y CONFIG_HAVE_PERF_USER_STACK_DUMP=y CONFIG_HAVE_ARCH_JUMP_LABEL=y CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG=y CONFIG_HAVE_ALIGNED_STRUCT_PAGE=y CONFIG_HAVE_CMPXCHG_LOCAL=y CONFIG_HAVE_CMPXCHG_DOUBLE=y CONFIG_ARCH_WANT_COMPAT_IPC_PARSE_VERSION=y CONFIG_ARCH_WANT_OLD_COMPAT_IPC=y CONFIG_HAVE_ARCH_SECCOMP_FILTER=y CONFIG_SECCOMP_FILTER=y CONFIG_HAVE_CC_STACKPROTECTOR=y CONFIG_CC_STACKPROTECTOR_NONE=y CONFIG_HAVE_CONTEXT_TRACKING=y CONFIG_HAVE_VIRT_CPU_ACCOUNTING_GEN=y CONFIG_HAVE_IRQ_TIME_ACCOUNTING=y CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y CONFIG_HAVE_ARCH_SOFT_DIRTY=y CONFIG_MODULES_USE_ELF_RELA=y CONFIG_HAVE_IRQ_EXIT_ON_IRQ_STACK=y CONFIG_OLD_SIGSUSPEND3=y CONFIG_COMPAT_OLD_SIGACTION=y CONFIG_SLABINFO=y CONFIG_RT_MUTEXES=y CONFIG_BASE_SMALL=0 CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y CONFIG_STOP_MACHINE=y CONFIG_BLOCK=y CONFIG_BLK_DEV_BSG=y CONFIG_PARTITION_ADVANCED=y CONFIG_MSDOS_PARTITION=y CONFIG_EFI_PARTITION=y CONFIG_BLOCK_COMPAT=y CONFIG_IOSCHED_NOOP=y CONFIG_IOSCHED_DEADLINE=y CONFIG_IOSCHED_CFQ=y CONFIG_CFQ_GROUP_IOSCHED=y CONFIG_DEFAULT_DEADLINE=y CONFIG_DEFAULT_IOSCHED="deadline" CONFIG_PADATA=y CONFIG_INLINE_SPIN_UNLOCK_IRQ=y CONFIG_INLINE_READ_UNLOCK=y CONFIG_INLINE_READ_UNLOCK_IRQ=y CONFIG_INLINE_WRITE_UNLOCK=y CONFIG_INLINE_WRITE_UNLOCK_IRQ=y CONFIG_ARCH_SUPPORTS_ATOMIC_RMW=y CONFIG_MUTEX_SPIN_ON_OWNER=y CONFIG_ZONE_DMA=y CONFIG_SMP=y CONFIG_X86_SUPPORTS_MEMORY_FAILURE=y CONFIG_SCHED_OMIT_FRAME_POINTER=y CONFIG_NO_BOOTMEM=y CONFIG_MCORE2=y CONFIG_X86_INTERNODE_CACHE_SHIFT=6 CONFIG_X86_L1_CACHE_SHIFT=6 CONFIG_X86_INTEL_USERCOPY=y CONFIG_X86_USE_PPRO_CHECKSUM=y CONFIG_X86_P6_NOP=y CONFIG_X86_TSC=y CONFIG_X86_CMPXCHG64=y CONFIG_X86_CMOV=y CONFIG_X86_MINIMUM_CPU_FAMILY=64 CONFIG_X86_DEBUGCTLMSR=y CONFIG_CPU_SUP_INTEL=y CONFIG_CPU_SUP_AMD=y CONFIG_CPU_SUP_CENTAUR=y CONFIG_HPET_TIMER=y CONFIG_HPET_EMULATE_RTC=y CONFIG_DMI=y CONFIG_SWIOTLB=y CONFIG_IOMMU_HELPER=y CONFIG_NR_CPUS=32 CONFIG_SCHED_SMT=y CONFIG_SCHED_MC=y CONFIG_PREEMPT_NONE=y CONFIG_X86_UP_APIC_MSI=y CONFIG_X86_LOCAL_APIC=y CONFIG_X86_IO_APIC=y CONFIG_X86_MCE=y CONFIG_X86_MCE_INTEL=y CONFIG_X86_MCE_THRESHOLD=y CONFIG_X86_THERMAL_VECTOR=y CONFIG_X86_16BIT=y CONFIG_X86_ESPFIX64=y CONFIG_MICROCODE=y CONFIG_MICROCODE_INTEL=y CONFIG_MICROCODE_OLD_INTERFACE=y CONFIG_X86_MSR=y CONFIG_X86_CPUID=y CONFIG_ARCH_PHYS_ADDR_T_64BIT=y CONFIG_ARCH_DMA_ADDR_T_64BIT=y CONFIG_DIRECT_GBPAGES=y CONFIG_NUMA=y CONFIG_X86_64_ACPI_NUMA=y CONFIG_NODES_SPAN_OTHER_NODES=y CONFIG_NODES_SHIFT=2 CONFIG_ARCH_SPARSEMEM_ENABLE=y CONFIG_ARCH_SPARSEMEM_DEFAULT=y CONFIG_ARCH_SELECT_MEMORY_MODEL=y CONFIG_ILLEGAL_POINTER_VALUE=0xdead000000000000 CONFIG_SELECT_MEMORY_MODEL=y CONFIG_SPARSEMEM_MANUAL=y CONFIG_SPARSEMEM=y CONFIG_NEED_MULTIPLE_NODES=y CONFIG_HAVE_MEMORY_PRESENT=y CONFIG_SPARSEMEM_EXTREME=y CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER=y CONFIG_SPARSEMEM_VMEMMAP=y CONFIG_HAVE_MEMBLOCK=y CONFIG_HAVE_MEMBLOCK_NODE_MAP=y CONFIG_ARCH_DISCARD_MEMBLOCK=y CONFIG_MEMORY_ISOLATION=y CONFIG_PAGEFLAGS_EXTENDED=y CONFIG_SPLIT_PTLOCK_CPUS=4 CONFIG_ARCH_ENABLE_SPLIT_PMD_PTLOCK=y CONFIG_COMPACTION=y CONFIG_MIGRATION=y CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION=y CONFIG_PHYS_ADDR_T_64BIT=y CONFIG_ZONE_DMA_FLAG=1 CONFIG_BOUNCE=y CONFIG_VIRT_TO_BUS=y CONFIG_DEFAULT_MMAP_MIN_ADDR=0 CONFIG_ARCH_SUPPORTS_MEMORY_FAILURE=y CONFIG_MEMORY_FAILURE=y CONFIG_TRANSPARENT_HUGEPAGE=y CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y CONFIG_CROSS_MEMORY_ATTACH=y CONFIG_X86_RESERVE_LOW=64 CONFIG_MTRR=y CONFIG_MTRR_SANITIZER=y CONFIG_MTRR_SANITIZER_ENABLE_DEFAULT=1 CONFIG_MTRR_SANITIZER_SPARE_REG_NR_DEFAULT=1 CONFIG_X86_PAT=y CONFIG_ARCH_USES_PG_UNCACHED=y CONFIG_ARCH_RANDOM=y CONFIG_X86_SMAP=y CONFIG_SECCOMP=y CONFIG_HZ_100=y CONFIG_HZ=100 CONFIG_SCHED_HRTICK=y CONFIG_PHYSICAL_START=0x1000000 CONFIG_PHYSICAL_ALIGN=0x1000000 CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y CONFIG_USE_PERCPU_NUMA_NODE_ID=y CONFIG_ACPI=y CONFIG_ACPI_FAN=y CONFIG_ACPI_PROCESSOR=y CONFIG_ACPI_THERMAL=y CONFIG_ACPI_NUMA=y CONFIG_X86_PM_TIMER=y CONFIG_ACPI_HED=y CONFIG_ACPI_APEI=y CONFIG_ACPI_APEI_GHES=y CONFIG_ACPI_APEI_PCIEAER=y CONFIG_ACPI_APEI_MEMORY_FAILURE=y CONFIG_ACPI_EXTLOG=y CONFIG_CPU_FREQ=y CONFIG_CPU_FREQ_GOV_COMMON=y CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE=y CONFIG_CPU_FREQ_GOV_PERFORMANCE=y CONFIG_CPU_FREQ_GOV_ONDEMAND=y CONFIG_X86_ACPI_CPUFREQ=y CONFIG_CPU_IDLE=y CONFIG_CPU_IDLE_MULTIPLE_DRIVERS=y CONFIG_CPU_IDLE_GOV_LADDER=y CONFIG_CPU_IDLE_GOV_MENU=y CONFIG_INTEL_IDLE=y CONFIG_PCI=y CONFIG_PCI_DIRECT=y CONFIG_PCI_MMCONFIG=y CONFIG_PCI_DOMAINS=y CONFIG_PCIEPORTBUS=y CONFIG_PCIEAER=y CONFIG_PCIEASPM=y CONFIG_PCIEASPM_PERFORMANCE=y CONFIG_PCI_MSI=y CONFIG_PCI_LABEL=y CONFIG_ISA_DMA_API=y CONFIG_AMD_NB=y CONFIG_BINFMT_ELF=y CONFIG_COMPAT_BINFMT_ELF=y CONFIG_ARCH_BINFMT_ELF_RANDOMIZE_PIE=y CONFIG_BINFMT_SCRIPT=y CONFIG_COREDUMP=y CONFIG_IA32_EMULATION=y CONFIG_X86_X32=y CONFIG_COMPAT=y CONFIG_COMPAT_FOR_U64_ALIGNMENT=y CONFIG_SYSVIPC_COMPAT=y CONFIG_X86_DEV_DMA_OPS=y CONFIG_IOSF_MBI=m CONFIG_NET=y CONFIG_PACKET=y CONFIG_PACKET_DIAG=y CONFIG_UNIX=y CONFIG_UNIX_DIAG=y CONFIG_INET=y CONFIG_TCP_ZERO_COPY_TRANSFER_COMPLETION_NOTIFICATION=y CONFIG_IP_MULTICAST=y CONFIG_NET_IPIP=y CONFIG_NET_IPGRE_DEMUX=y CONFIG_NET_IP_TUNNEL=y CONFIG_NET_IPGRE=y CONFIG_NET_IPGRE_BROADCAST=y CONFIG_INET_TUNNEL=y CONFIG_INET_LRO=y CONFIG_INET_DIAG=y CONFIG_INET_TCP_DIAG=y CONFIG_INET_UDP_DIAG=y CONFIG_TCP_CONG_ADVANCED=y CONFIG_TCP_CONG_HTCP=y CONFIG_DEFAULT_HTCP=y CONFIG_DEFAULT_TCP_CONG="htcp" CONFIG_STP=y CONFIG_BRIDGE=y CONFIG_HAVE_NET_DSA=y CONFIG_VLAN_8021Q=y CONFIG_LLC=y CONFIG_NETLINK_MMAP=y CONFIG_NETLINK_DIAG=y CONFIG_RPS=y CONFIG_RFS_ACCEL=y CONFIG_XPS=y CONFIG_NET_RX_BUSY_POLL=y CONFIG_BQL=y CONFIG_NET_FLOW_LIMIT=y CONFIG_HAVE_BPF_JIT=y CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug" CONFIG_DEVTMPFS=y CONFIG_DEVTMPFS_MOUNT=y CONFIG_STANDALONE=y CONFIG_PREVENT_FIRMWARE_BUILD=y CONFIG_FW_LOADER=y CONFIG_FIRMWARE_IN_KERNEL=y CONFIG_EXTRA_FIRMWARE="" CONFIG_FW_LOADER_USER_HELPER=y CONFIG_CONNECTOR=y CONFIG_PROC_EVENTS=y CONFIG_ARCH_MIGHT_HAVE_PC_PARPORT=y CONFIG_PNP=y CONFIG_PNPACPI=y CONFIG_BLK_DEV=y CONFIG_BLK_DEV_LOOP=y CONFIG_BLK_DEV_LOOP_MIN_COUNT=8 CONFIG_HAVE_IDE=y CONFIG_SCSI_MOD=y CONFIG_RAID_ATTRS=y CONFIG_SCSI=y CONFIG_SCSI_DMA=y CONFIG_SCSI_NETLINK=y CONFIG_BLK_DEV_SD=y CONFIG_CHR_DEV_SG=y CONFIG_SCSI_MULTI_LUN=y CONFIG_SCSI_CONSTANTS=y CONFIG_SCSI_SCAN_ASYNC=y CONFIG_SCSI_FC_ATTRS=y CONFIG_SCSI_SAS_ATTRS=y CONFIG_SCSI_LOWLEVEL=y CONFIG_MEGARAID_SAS=y CONFIG_ATA=y CONFIG_ATA_VERBOSE_ERROR=y CONFIG_ATA_ACPI=y CONFIG_SATA_PMP=y CONFIG_SATA_AHCI=y CONFIG_SATA_AHCI_PLATFORM=y CONFIG_MD=y CONFIG_BLK_DEV_MD=y CONFIG_MD_AUTODETECT=y CONFIG_MD_RAID0=y CONFIG_MD_RAID1=y CONFIG_MD_RAID10=y CONFIG_MD_RAID456=y CONFIG_BLK_DEV_DM_BUILTIN=y CONFIG_BLK_DEV_DM=y CONFIG_DM_CRYPT=y CONFIG_DM_ZERO=y CONFIG_DM_UEVENT=y CONFIG_NETDEVICES=y CONFIG_NET_CORE=y CONFIG_BONDING=y CONFIG_NET_FC=y CONFIG_NETCONSOLE=y CONFIG_NETCONSOLE_DYNAMIC=y CONFIG_NETPOLL=y CONFIG_NETPOLL_TRAP=y CONFIG_NET_POLL_CONTROLLER=y CONFIG_TUN=y CONFIG_ETHERNET=y CONFIG_MDIO=y CONFIG_NET_VENDOR_INTEL=y CONFIG_E1000E=y CONFIG_IGB=y CONFIG_IGB_HWMON=y CONFIG_IGB_DCA=y CONFIG_IXGBE=y CONFIG_IXGBE_HWMON=y CONFIG_IXGBE_DCA=y CONFIG_PPP=y CONFIG_PPP_DEFLATE=y CONFIG_PPP_FILTER=y CONFIG_PPP_MULTILINK=y CONFIG_PPPOE=y CONFIG_PPP_ASYNC=y CONFIG_PPP_SYNC_TTY=y CONFIG_SLHC=y CONFIG_INPUT=y CONFIG_INPUT_MOUSEDEV=y CONFIG_INPUT_MOUSEDEV_PSAUX=y CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024 CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768 CONFIG_INPUT_KEYBOARD=y CONFIG_KEYBOARD_ATKBD=y CONFIG_SERIO=y CONFIG_ARCH_MIGHT_HAVE_PC_SERIO=y CONFIG_SERIO_I8042=y CONFIG_SERIO_SERPORT=y CONFIG_SERIO_LIBPS2=y CONFIG_TTY=y CONFIG_VT=y CONFIG_CONSOLE_TRANSLATIONS=y CONFIG_VT_CONSOLE=y CONFIG_HW_CONSOLE=y CONFIG_VT_HW_CONSOLE_BINDING=y CONFIG_UNIX98_PTYS=y CONFIG_SERIAL_8250=y CONFIG_SERIAL_8250_DEPRECATED_OPTIONS=y CONFIG_SERIAL_8250_PNP=y CONFIG_SERIAL_8250_CONSOLE=y CONFIG_FIX_EARLYCON_MEM=y CONFIG_SERIAL_8250_DMA=y CONFIG_SERIAL_8250_PCI=y CONFIG_SERIAL_8250_NR_UARTS=2 CONFIG_SERIAL_8250_RUNTIME_UARTS=2 CONFIG_SERIAL_CORE=y CONFIG_SERIAL_CORE_CONSOLE=y CONFIG_HPET=y CONFIG_HPET_MMAP=y CONFIG_HPET_MMAP_DEFAULT=y CONFIG_DEVPORT=y CONFIG_I2C=y CONFIG_I2C_BOARDINFO=y CONFIG_I2C_CHARDEV=y CONFIG_I2C_HELPER_AUTO=y CONFIG_I2C_ALGOBIT=y CONFIG_I2C_I801=y CONFIG_I2C_SCMI=y CONFIG_PPS=y CONFIG_PTP_1588_CLOCK=y CONFIG_ARCH_WANT_OPTIONAL_GPIOLIB=y CONFIG_HWMON=y CONFIG_HWMON_VID=y CONFIG_SENSORS_CORETEMP=y CONFIG_SENSORS_JC42=y CONFIG_SENSORS_W83627EHF=y CONFIG_SENSORS_ACPI_POWER=y CONFIG_THERMAL=y CONFIG_THERMAL_HWMON=y CONFIG_THERMAL_DEFAULT_GOV_STEP_WISE=y CONFIG_THERMAL_GOV_STEP_WISE=y CONFIG_SSB_POSSIBLE=y CONFIG_BCMA_POSSIBLE=y CONFIG_VGA_ARB=y CONFIG_VGA_ARB_MAX_GPUS=16 CONFIG_VGA_CONSOLE=y CONFIG_DUMMY_CONSOLE=y CONFIG_HID=y CONFIG_HIDRAW=y CONFIG_HID_GENERIC=y CONFIG_HID_A4TECH=y CONFIG_HID_APPLE=y CONFIG_HID_BELKIN=y CONFIG_HID_CHERRY=y CONFIG_HID_CHICONY=y CONFIG_HID_CYPRESS=y CONFIG_HID_EZKEY=y CONFIG_HID_KENSINGTON=y CONFIG_HID_LOGITECH=y CONFIG_HID_MICROSOFT=y CONFIG_HID_MONTEREY=y CONFIG_USB_HID=y CONFIG_USB_HIDDEV=y CONFIG_USB_OHCI_LITTLE_ENDIAN=y CONFIG_USB_SUPPORT=y CONFIG_USB_COMMON=y CONFIG_USB_ARCH_HAS_HCD=y CONFIG_USB=y CONFIG_USB_DEFAULT_PERSIST=y CONFIG_USB_XHCI_HCD=y CONFIG_USB_EHCI_HCD=y CONFIG_USB_EHCI_ROOT_HUB_TT=y CONFIG_USB_EHCI_TT_NEWSCHED=y CONFIG_USB_EHCI_PCI=y CONFIG_USB_EHCI_HCD_PLATFORM=y CONFIG_USB_UHCI_HCD=y CONFIG_USB_ACM=y CONFIG_USB_WDM=y CONFIG_USB_STORAGE=y CONFIG_USB_SERIAL=y CONFIG_USB_SERIAL_CONSOLE=y CONFIG_USB_SERIAL_GENERIC=y CONFIG_USB_SERIAL_FTDI_SIO=y CONFIG_USB_SERIAL_PL2303=y CONFIG_USB_SERIAL_WWAN=y CONFIG_USB_SERIAL_OPTION=y CONFIG_EDAC=y CONFIG_EDAC_MM_EDAC=y CONFIG_EDAC_GHES=y CONFIG_EDAC_I7CORE=y CONFIG_EDAC_SBRIDGE=y CONFIG_RTC_LIB=y CONFIG_RTC_CLASS=y CONFIG_RTC_HCTOSYS=y CONFIG_RTC_SYSTOHC=y CONFIG_RTC_HCTOSYS_DEVICE="rtc0" CONFIG_RTC_INTF_SYSFS=y CONFIG_RTC_INTF_PROC=y CONFIG_RTC_INTF_DEV=y CONFIG_RTC_DRV_CMOS=y CONFIG_DMADEVICES=y CONFIG_INTEL_IOATDMA=y CONFIG_DMA_ENGINE=y CONFIG_DMA_ACPI=y CONFIG_ASYNC_TX_DMA=y CONFIG_DMA_ENGINE_RAID=y CONFIG_DCA=y CONFIG_CLKEVT_I8253=y CONFIG_I8253_LOCK=y CONFIG_CLKBLD_I8253=y CONFIG_FIRMWARE_MEMMAP=y CONFIG_DMIID=y CONFIG_DMI_SCAN_MACHINE_NON_EFI_FALLBACK=y CONFIG_UEFI_CPER=y CONFIG_DCACHE_WORD_ACCESS=y CONFIG_EXT4_FS=y CONFIG_EXT4_USE_FOR_EXT23=y CONFIG_JBD2=y CONFIG_FS_MBCACHE=y CONFIG_EXPORTFS=y CONFIG_FILE_LOCKING=y CONFIG_FSNOTIFY=y CONFIG_INOTIFY_USER=y CONFIG_FANOTIFY=y CONFIG_FUSE_FS=y CONFIG_CUSE=y CONFIG_ISO9660_FS=y CONFIG_JOLIET=y CONFIG_ZISOFS=y CONFIG_UDF_FS=y CONFIG_UDF_NLS=y CONFIG_FAT_FS=y CONFIG_MSDOS_FS=y CONFIG_VFAT_FS=y CONFIG_FAT_DEFAULT_CODEPAGE=437 CONFIG_FAT_DEFAULT_IOCHARSET="iso8859-1" CONFIG_PROC_FS=y CONFIG_PROC_SYSCTL=y CONFIG_PROC_PAGE_MONITOR=y CONFIG_SYSFS=y CONFIG_TMPFS=y CONFIG_HUGETLBFS=y CONFIG_HUGETLB_PAGE=y CONFIG_CONFIGFS_FS=y CONFIG_MISC_FILESYSTEMS=y CONFIG_PSTORE=y CONFIG_NLS=y CONFIG_NLS_DEFAULT="utf8" CONFIG_NLS_CODEPAGE_437=y CONFIG_NLS_CODEPAGE_855=y CONFIG_NLS_CODEPAGE_866=y CONFIG_NLS_CODEPAGE_1251=y CONFIG_NLS_ASCII=y CONFIG_NLS_ISO8859_1=y CONFIG_NLS_ISO8859_5=y CONFIG_NLS_ISO8859_15=y CONFIG_NLS_KOI8_R=y CONFIG_NLS_UTF8=y CONFIG_TRACE_IRQFLAGS_SUPPORT=y CONFIG_PRINTK_TIME=y CONFIG_DEFAULT_MESSAGE_LOGLEVEL=4 CONFIG_FRAME_WARN=1024 CONFIG_STRIP_ASM_SYMS=y CONFIG_ARCH_WANT_FRAME_POINTERS=y CONFIG_MAGIC_SYSRQ=y CONFIG_MAGIC_SYSRQ_DEFAULT_ENABLE=0x1 CONFIG_HAVE_DEBUG_KMEMLEAK=y CONFIG_DEBUG_MEMORY_INIT=y CONFIG_HAVE_DEBUG_STACKOVERFLOW=y CONFIG_HAVE_ARCH_KMEMCHECK=y CONFIG_PANIC_ON_OOPS_VALUE=0 CONFIG_PANIC_TIMEOUT=0 CONFIG_DEBUG_BUGVERBOSE=y CONFIG_RCU_CPU_STALL_TIMEOUT=60 CONFIG_ARCH_HAS_DEBUG_STRICT_USER_COPY_CHECKS=y CONFIG_USER_STACKTRACE_SUPPORT=y CONFIG_HAVE_FUNCTION_TRACER=y CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y CONFIG_HAVE_FUNCTION_GRAPH_FP_TEST=y CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST=y CONFIG_HAVE_DYNAMIC_FTRACE=y CONFIG_HAVE_DYNAMIC_FTRACE_WITH_REGS=y CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y CONFIG_HAVE_SYSCALL_TRACEPOINTS=y CONFIG_HAVE_FENTRY=y CONFIG_HAVE_C_RECORDMCOUNT=y CONFIG_TRACING_SUPPORT=y CONFIG_HAVE_ARCH_KGDB=y CONFIG_STRICT_DEVMEM=y CONFIG_X86_VERBOSE_BOOTUP=y CONFIG_EARLY_PRINTK=y CONFIG_DOUBLEFAULT=y CONFIG_HAVE_MMIOTRACE_SUPPORT=y CONFIG_IO_DELAY_TYPE_0X80=0 CONFIG_IO_DELAY_TYPE_0XED=1 CONFIG_IO_DELAY_TYPE_UDELAY=2 CONFIG_IO_DELAY_TYPE_NONE=3 CONFIG_IO_DELAY_0X80=y CONFIG_DEFAULT_IO_DELAY_TYPE=0 CONFIG_OPTIMIZE_INLINING=y CONFIG_DEFAULT_SECURITY_DAC=y CONFIG_DEFAULT_SECURITY="" CONFIG_XOR_BLOCKS=y CONFIG_ASYNC_CORE=y CONFIG_ASYNC_MEMCPY=y CONFIG_ASYNC_XOR=y CONFIG_ASYNC_PQ=y CONFIG_ASYNC_RAID6_RECOV=y CONFIG_CRYPTO=y CONFIG_CRYPTO_ALGAPI=y CONFIG_CRYPTO_ALGAPI2=y CONFIG_CRYPTO_AEAD=y CONFIG_CRYPTO_AEAD2=y CONFIG_CRYPTO_BLKCIPHER=y CONFIG_CRYPTO_BLKCIPHER2=y CONFIG_CRYPTO_HASH=y CONFIG_CRYPTO_HASH2=y CONFIG_CRYPTO_RNG2=y CONFIG_CRYPTO_PCOMP2=y CONFIG_CRYPTO_MANAGER=y CONFIG_CRYPTO_MANAGER2=y CONFIG_CRYPTO_USER=y CONFIG_CRYPTO_GF128MUL=y CONFIG_CRYPTO_PCRYPT=y CONFIG_CRYPTO_WORKQUEUE=y CONFIG_CRYPTO_CRYPTD=y CONFIG_CRYPTO_ABLK_HELPER=y CONFIG_CRYPTO_GLUE_HELPER_X86=y CONFIG_CRYPTO_CBC=y CONFIG_CRYPTO_LRW=y CONFIG_CRYPTO_XTS=y CONFIG_CRYPTO_CMAC=y CONFIG_CRYPTO_HMAC=y CONFIG_CRYPTO_XCBC=y CONFIG_CRYPTO_VMAC=y CONFIG_CRYPTO_CRC32C=y CONFIG_CRYPTO_CRC32C_INTEL=y CONFIG_CRYPTO_CRC32=y CONFIG_CRYPTO_CRC32_PCLMUL=y CONFIG_CRYPTO_CRCT10DIF=y CONFIG_CRYPTO_CRCT10DIF_PCLMUL=y CONFIG_CRYPTO_SHA1=y CONFIG_CRYPTO_SHA1_SSSE3=y CONFIG_CRYPTO_SHA256_SSSE3=y CONFIG_CRYPTO_SHA512_SSSE3=y CONFIG_CRYPTO_SHA256=y CONFIG_CRYPTO_SHA512=y CONFIG_CRYPTO_GHASH_CLMUL_NI_INTEL=y CONFIG_CRYPTO_AES=y CONFIG_CRYPTO_AES_X86_64=y CONFIG_CRYPTO_AES_NI_INTEL=y CONFIG_HAVE_KVM=y CONFIG_RAID6_PQ=y CONFIG_BITREVERSE=y CONFIG_GENERIC_STRNCPY_FROM_USER=y CONFIG_GENERIC_STRNLEN_USER=y CONFIG_GENERIC_NET_UTILS=y CONFIG_GENERIC_FIND_FIRST_BIT=y CONFIG_GENERIC_PCI_IOMAP=y CONFIG_GENERIC_IOMAP=y CONFIG_GENERIC_IO=y CONFIG_ARCH_USE_CMPXCHG_LOCKREF=y CONFIG_CRC_CCITT=y CONFIG_CRC16=y CONFIG_CRC_T10DIF=y CONFIG_CRC_ITU_T=y CONFIG_CRC32=y CONFIG_CRC32_SLICEBY8=y CONFIG_LIBCRC32C=y CONFIG_ZLIB_INFLATE=y CONFIG_ZLIB_DEFLATE=y CONFIG_GENERIC_ALLOCATOR=y CONFIG_HAS_IOMEM=y CONFIG_HAS_IOPORT=y CONFIG_HAS_DMA=y CONFIG_CHECK_SIGNATURE=y CONFIG_CPU_RMAP=y CONFIG_DQL=y CONFIG_NLATTR=y CONFIG_ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE=y CONFIG_DDR=y 


We start compilation and assembly in the deb-package:
 # cd /usr/src/linux-3.14.xx # fakeroot make-kpkg clean # CONCURRENCY_LEVEL=12 fakeroot make-kpkg --us --uc --jobs 12 --stem=kernel-scst --revision=1 kernel_image 

Then we extort SCST, build and install it (specify the path to the kernel sources):
 # svn checkout svn://svn.code.sf.net/p/scst/svn/branches/3.0.x scst-svn # cd scst-svn # BUILD_2X_MODULE=y CONFIG_SCSI_QLA_FC=y CONFIG_SCSI_QLA2XXX_TARGET=y KDIR="/usr/src/linux-3.14.xx" make all install 

A more detailed SCST installation instruction with a driver for QLogic can be found on the project site (but the QLogic driver is assembled there from their GIT tree, which did not seem very stable to me. We will take the driver from the SCST kit).

As a result, we received a package with the kernel + /lib/modules/3.14.xx/extra directory with SCST modules, which will need to be copied manually to the server. Of course, you can think of a way to integrate them directly into the .deb package, but I was too lazy.

To use FC cards, you also need firmware, which can be either uploaded to the adapter (more precisely, updated, because some already exist there), and just put it in / lib / firmware and the driver will load it from there. I, for fidelity, and the manufacturer recommends, did both. You can flash either through the QLogic qaucli Linux utility, or from FreeDOS (or EFI) using their own software from the site.

Download the firmware (in our case, it is ql2500_fw.bin) and put it in its place:
 # mkdir -p /lib/firmware # cd /lib/firmware # wget http://ldriver.qlogic.com/firmware/ql2500_fw.bin 

Next, we need the SCST management utility under the capacious name scstadmin and the libraries it uses. We take the utility itself in the SCST source tree: scstadmin / scstadmin.sysfs / scstadmin and put it on our servers somewhere in / usr / bin so that everyone can see. Then take the scstadmin / scstadmin.sysfs / scst-0.9.10 / lib / SCST directory and put it in / usr / lib / perl / <Perl version> .

Then we install Pacemaker to steer our cluster and generate an authorization key.
Pacemaker will switch ALUA modes depending on the status of the nodes.

 # apt-get -t wheezy-backports install pacemaker # corosync-keygen 

The / etc / corosync / authkey key file is transferred to the second server in the same place.

To control SCST through Pacemaker we need a “resource”, which I dragged from the ESOS project and rotated for myself.
Resource
 #! /bin/sh # # $Id$ # # Resource Agent for managing the Generic SCSI Target Subsystem # for Linux (SCST) and related daemons. # # License: GNU General Public License (GPL) # (c) 2012-2014 Marc A. Smith # # Initialization : ${OCF_FUNCTIONS_DIR=${OCF_ROOT}/lib/heartbeat} . ${OCF_FUNCTIONS_DIR}/ocf-shellfuncs MODULES="scst scst_vdisk qla2x00tgt" SCST_CFG="/etc/scst.conf" PRE_SCST_CONF="/etc/pre-scst_xtra_conf" POST_SCST_CONF="/etc/post-scst_xtra_conf" SCST_SYSFS="/sys/kernel/scst_tgt" ALUA_STATES="active nonoptimized standby unavailable offline transitioning" NO_CLOBBER="/tmp/scst_ra-no_clobber" # For optional SCST modules if [ -f "/lib/modules/$(uname -r)/extra/ocs_fc_scst.ko" ]; then MODULES="${MODULES} ocs_fc_scst" fi if [ -f "/lib/modules/$(uname -r)/extra/chfcoe.ko" ]; then MODULES="${MODULES} chfcoe" fi scst_start() { # Exit immediately if configuration is not valid scst_validate_all || exit ${?} # If resource is already running, bail out early if scst_monitor; then ocf_log info "Resource is already running." return ${OCF_SUCCESS} fi # If our pre-SCST file exists, run it if [ -f "${PRE_SCST_CONF}" ]; then ocf_log info "Pre-SCST user config. file found; running..." ocf_run -warn sh "${PRE_SCST_CONF}" fi # Load all modules ocf_log info "Loading kernel modules..." for i in ${MODULES}; do ocf_log debug "scst_start() -> Module: ${i}" if [ -d /sys/module/${i} ]; then ocf_log warn "The ${i} module is already loaded!" else ocf_run modprobe ${i} || exit ${OCF_ERR_GENERIC} fi done # Configure SCST if [ -f "${SCST_CFG}" ]; then ocf_log info "Applying SCST configuration..." ocf_run scstadmin -config "${SCST_CFG}" # Prevent scst_stop() from clobbering the configuration file if [ ${?} -ne 0 ]; then ocf_log err "Something is wrong with the SCST configuration!" ocf_run touch "${NO_CLOBBER}" exit ${OCF_ERR_GENERIC} else if [ -f "${NO_CLOBBER}" ]; then ocf_run rm -f "${NO_CLOBBER}" fi fi fi # If our post-SCST file exists, run it if [ -f "${POST_SCST_CONF}" ]; then ocf_log info "Post-SCST user config. file found; running..." ocf_run -warn sh "${POST_SCST_CONF}" fi # If we are using ALUA, be sure we are using the "Slave" state initially if ocf_is_true ${OCF_RESKEY_alua}; then check_alua # Set the local target group ALUA state ocf_log debug "scst_start() -> Setting target group" \ "'${OCF_RESKEY_local_tgt_grp}' ALUA state to" \ "'${OCF_RESKEY_s_alua_state}'..." ocf_run scstadmin -noprompt -set_tgrp_attr \ ${OCF_RESKEY_local_tgt_grp} -dev_group \ ${OCF_RESKEY_device_group} -attributes \ state\=${OCF_RESKEY_s_alua_state} || exit ${OCF_ERR_GENERIC} # For now, we simply assume the other node is the Master ocf_log debug "scst_start() -> Setting target group" \ "'${OCF_RESKEY_remote_tgt_grp}' ALUA state to" \ "'${OCF_RESKEY_m_alua_state}'..." ocf_run scstadmin -noprompt -set_tgrp_attr \ ${OCF_RESKEY_remote_tgt_grp} -dev_group \ ${OCF_RESKEY_device_group} -attributes \ state\=${OCF_RESKEY_m_alua_state} || exit ${OCF_ERR_GENERIC} fi # Make sure the resource started correctly while ! scst_monitor; do ocf_log debug "scst_start() -> Resource has not started yet, waiting..." sleep 1 done # Only return $OCF_SUCCESS if _everything_ succeeded as expected return ${OCF_SUCCESS} } scst_stop() { # Exit immediately if configuration is not valid scst_validate_all || exit ${?} # Check the current resource state scst_monitor local rc=${?} case "${rc}" in "${OCF_SUCCESS}") # Currently running; normal, expected behavior ocf_log info "Resource is currently running." ;; "${OCF_RUNNING_MASTER}") # Running as a Master; need to demote before stopping ocf_log info "Resource is currently running as Master." scst_demote || ocf_log warn "Demote failed, trying to stop anyway..." ;; "${OCF_NOT_RUNNING}") # Currently not running; nothing to do ocf_log info "Resource is already stopped." return ${OCF_SUCCESS} ;; esac # Unload the modules (in reverse) ocf_log info "Unloading kernel modules..." for i in $(echo ${MODULES} | tr ' ' '\n' | tac | tr '\n' ' '); do ocf_log debug "scst_stop() -> Module: ${i}" if [ -d /sys/module/${i} ]; then ocf_run rmmod -w ${i} || exit ${OCF_ERR_GENERIC} else ocf_log warn "The ${i} module is not loaded!" fi done # Make sure the resource stopped correctly while scst_monitor; do ocf_log info "scst_stop() -> Resource has not stopped yet, waiting..." sleep 1 done # Only return $OCF_SUCCESS if _everything_ succeeded as expected return ${OCF_SUCCESS} } scst_monitor() { # Exit immediately if configuration is not valid scst_validate_all || exit ${?} # Check if SCST is loaded local rc if [ -e "${SCST_SYSFS}/version" ]; then ocf_log debug "scst_monitor() -> SCST version:" \ "$(cat ${SCST_SYSFS}/version)" ocf_log debug "scst_monitor() -> Resource is running." crm_master -l reboot -v 100 rc=${OCF_SUCCESS} else ocf_log debug "scst_monitor() -> Resource is not running." crm_master -l reboot -D rc=${OCF_NOT_RUNNING} return ${rc} fi # If we are using ALUA, then we can test if we are Master or not if ocf_is_true ${OCF_RESKEY_alua}; then dev_grp_path="${SCST_SYSFS}/device_groups/${OCF_RESKEY_device_group}" tgt_grp_path="${dev_grp_path}/target_groups/${OCF_RESKEY_local_tgt_grp}" tgt_grp_state="$(head -1 ${tgt_grp_path}/state)" ocf_log debug "scst_monitor() -> SCST local target" \ "group state: ${tgt_grp_state}" if [ "x${tgt_grp_state}" = "x${OCF_RESKEY_m_alua_state}" ]; then rc=${OCF_RUNNING_MASTER} fi fi return ${rc} } scst_validate_all() { # Test for required binaries check_binary scstadmin # There can only be one instance of SCST running per node if [ ! -z "${OCF_RESKEY_CRM_meta_clone_node_max}" ] && [ "${OCF_RESKEY_CRM_meta_clone_node_max}" -ne 1 ]; then ocf_log err "The 'clone-node-max' parameter must equal '1'." exit ${OCF_ERR_CONFIGURED} fi # If ALUA support is enabled, we need to check the parameters if ocf_is_true ${OCF_RESKEY_alua}; then # Make sure they are set to something if [ -z "${OCF_RESKEY_device_group}" ]; then ocf_log err "The 'device_group' parameter is not set!" exit ${OCF_ERR_CONFIGURED} fi if [ -z "${OCF_RESKEY_local_tgt_grp}" ]; then ocf_log err "The 'local_tgt_grp' parameter is not set!" exit ${OCF_ERR_CONFIGURED} fi if [ -z "${OCF_RESKEY_remote_tgt_grp}" ]; then ocf_log err "The 'remote_tgt_grp' parameter is not set!" exit ${OCF_ERR_CONFIGURED} fi if [ -z "${OCF_RESKEY_m_alua_state}" ]; then ocf_log err "The 'm_alua_state' parameter is not set!" exit ${OCF_ERR_CONFIGURED} fi if [ -z "${OCF_RESKEY_s_alua_state}" ]; then ocf_log err "The 's_alua_state' parameter is not set!" exit ${OCF_ERR_CONFIGURED} fi # Currently, we only support using one Master with this RA if [ ! -z "${OCF_RESKEY_CRM_meta_master_max}" ] && [ "${OCF_RESKEY_CRM_meta_master_max}" -ne 1 ]; then ocf_log err "The 'master-max' parameter must equal '1'." exit ${OCF_ERR_CONFIGURED} fi if [ ! -z "${OCF_RESKEY_CRM_meta_master_node_max}" ] && [ "${OCF_RESKEY_CRM_meta_master_node_max}" -ne 1 ]; then ocf_log err "The 'master-node-max' parameter must equal '1'." exit ${OCF_ERR_CONFIGURED} fi fi return ${OCF_SUCCESS} } scst_meta_data() { cat <<-EOF <?xml version="1.0"?> <!DOCTYPE resource-agent SYSTEM "ra-api-1.dtd"> <resource-agent name="scst" version="0.1"> <version>0.1</version> <longdesc lang="en">The SCST OCF resource agent for ESOS; includes SCST ALUA support.</longdesc> <shortdesc lang="en">SCST OCF RA script for ESOS.</shortdesc> <parameters> <parameter name="alua" unique="0" required="0"> <longdesc lang="en">Use to enable/disable updating ALUA status in SCST.</longdesc> <shortdesc lang="en">The 'alua' parameter.</shortdesc> <content type="boolean" default="false" /> </parameter> <parameter name="device_group" unique="0" required="0"> <longdesc lang="en">The name of the SCST device group (used with ALUA support).</longdesc> <shortdesc lang="en">The 'device_group' parameter.</shortdesc> <content type="string" default="" /> </parameter> <parameter name="local_tgt_grp" unique="0" required="0"> <longdesc lang="en">The name of the SCST local target group (used with ALUA support).</longdesc> <shortdesc lang="en">The 'local_tgt_grp' parameter.</shortdesc> <content type="string" default="" /> </parameter> <parameter name="remote_tgt_grp" unique="0" required="0"> <longdesc lang="en">The name of the SCST remote target group (used with ALUA support).</longdesc> <shortdesc lang="en">The 'remote_tgt_grp' parameter.</shortdesc> <content type="string" default="" /> </parameter> <parameter name="m_alua_state" unique="0" required="0"> <longdesc lang="en">The ALUA state (eg, active) for a Master node (used with ALUA support).</longdesc> <shortdesc lang="en">The 'm_alua_state' parameter.</shortdesc> <content type="string" default="active" /> </parameter> <parameter name="s_alua_state" unique="0" required="0"> <longdesc lang="en">The ALUA state (eg, nonoptimized) for a Slave node (used with ALUA support).</longdesc> <shortdesc lang="en">The 's_alua_state' parameter.</shortdesc> <content type="string" default="nonoptimized" /> </parameter> </parameters> <actions> <action name="meta-data" timeout="5" /> <action name="start" timeout="120" /> <action name="stop" timeout="60" /> <action name="monitor" timeout="20" depth="0" interval="10" role="Master" /> <action name="monitor" timeout="20" depth="0" interval="20" role="Slave" /> <action name="notify" timeout="20" /> <action name="promote" timeout="20" /> <action name="demote" timeout="20" /> <action name="reload" timeout="20" /> <action name="validate-all" timeout="20" /> </actions> </resource-agent> EOF } scst_usage() { echo "usage: ${0} {start|stop|monitor|validate-all|promote|demote|reload|notify|meta-data}" echo "" echo "Expects to have a fully populated OCF RA-compliant environment set." } scst_promote() { # Exit immediately if configuration is not valid scst_validate_all || exit ${?} # Test the resource's current state scst_monitor local rc=${?} case "${rc}" in "${OCF_SUCCESS}") # Running as Slave; normal, expected behavior ocf_log debug "scst_promote() -> Resource is" \ "currently running as Slave." ;; "${OCF_RUNNING_MASTER}") # Already a Master; unexpected, but not a problem ocf_log info "Resource is already running as Master." return ${OCF_SUCCESS} ;; "${OCF_NOT_RUNNING}") # Currently not running; need to start before promoting ocf_log info "Resource is currently not running." scst_start ;; *) # Failed resource; let the cluster manager recover ocf_log err "Unexpected error, cannot promote." exit ${rc} ;; esac # Promote only makes sense if we are using ALUA if ocf_is_true ${OCF_RESKEY_alua}; then check_alua # Set the local target group to the "Master" ALUA state ocf_log debug "scst_promote() -> Setting target group" \ "'${OCF_RESKEY_local_tgt_grp}' ALUA state to" \ "'${OCF_RESKEY_m_alua_state}'..." ocf_run scstadmin -noprompt -set_tgrp_attr \ ${OCF_RESKEY_local_tgt_grp} -dev_group \ ${OCF_RESKEY_device_group} -attributes \ state\=${OCF_RESKEY_m_alua_state} || exit ${OCF_ERR_GENERIC} # Since there can only be one Master, set the remote target group ocf_log debug "scst_promote() -> Setting target group" \ "'${OCF_RESKEY_remote_tgt_grp}' ALUA state to" \ "'${OCF_RESKEY_s_alua_state}'..." ocf_run scstadmin -noprompt -set_tgrp_attr \ ${OCF_RESKEY_remote_tgt_grp} -dev_group \ ${OCF_RESKEY_device_group} -attributes \ state\=${OCF_RESKEY_s_alua_state} || exit ${OCF_ERR_GENERIC} else ocf_log err "The ALUA parameters need to be configured before using MS." exit ${OCF_ERR_CONFIGURED} fi # After the resource has been promoted, check whether the promotion worked while true; do scst_monitor if [ ${?} -eq ${OCF_RUNNING_MASTER} ]; then ocf_log info "Resource was promoted successfully." break else ocf_log debug "scst_promote() -> Resource still" \ "awaiting promotion." sleep 1 fi done # Only return $OCF_SUCCESS if _everything_ succeeded as expected return ${OCF_SUCCESS} } scst_demote() { # Exit immediately if configuration is not valid scst_validate_all || exit ${?} # Test the resource's current state scst_monitor local rc=${?} case "${rc}" in "${OCF_RUNNING_MASTER}") # Running as Master; normal, expected behavior ocf_log debug "scst_demote() -> Resource is" \ "currently running as Master." ;; "${OCF_SUCCESS}") # Already running as Slave; nothing to do ocf_log debug "scst_demote() -> Resource is" \ "currently running as Slave." return ${OCF_SUCCESS} ;; "${OCF_NOT_RUNNING}") # Not running; getting a demote action in this state is unexpected ocf_log err "Resource is currently not running." exit ${OCF_ERR_GENERIC} ;; *) # Failed resource; let the cluster manager recover ocf_log err "Unexpected error, cannot demote." exit ${rc} ;; esac # Demote only makes sense if we are using ALUA if ocf_is_true ${OCF_RESKEY_alua}; then check_alua # Set the local target group to the "Slave" ALUA state ocf_log debug "scst_demote() -> Setting target group" \ "'${OCF_RESKEY_local_tgt_grp}' ALUA state to" \ "'${OCF_RESKEY_s_alua_state}'..." ocf_run scstadmin -noprompt -set_tgrp_attr \ ${OCF_RESKEY_local_tgt_grp} -dev_group \ ${OCF_RESKEY_device_group} -attributes \ state\=${OCF_RESKEY_s_alua_state} || exit ${OCF_ERR_GENERIC} # If we're a Slave, we assume the remote side is the Master ocf_log debug "scst_demote() -> Setting target group" \ "'${OCF_RESKEY_remote_tgt_grp}' ALUA state to" \ "'${OCF_RESKEY_m_alua_state}'..." ocf_run scstadmin -noprompt -set_tgrp_attr \ ${OCF_RESKEY_remote_tgt_grp} -dev_group \ ${OCF_RESKEY_device_group} -attributes \ state\=${OCF_RESKEY_m_alua_state} || exit ${OCF_ERR_GENERIC} else ocf_log err "The ALUA parameters need to be configured before using MS." exit ${OCF_ERR_CONFIGURED} fi # After the resource has been demoted, check whether the demotion worked while true; do scst_monitor if [ ${?} -eq ${OCF_RUNNING_MASTER} ]; then ocf_log debug "scst_demote() -> Resource still" \ "awaiting demotion." sleep 1 else ocf_log info "Resource was demoted successfully." break fi done # Only return $OCF_SUCCESS if _everything_ succeeded as expected return ${OCF_SUCCESS} } scst_notify() { # We're currently not using this ocf_log debug "scst_notify() -> Received a" \ "'${OCF_RESKEY_CRM_meta_notify_type}' /" \ "'${OCF_RESKEY_CRM_meta_notify_operation}' notification." return ${OCF_SUCCESS} } check_alua() { # Make sure the directories exist in the SCST sysfs structure if [ ! -d "${SCST_SYSFS}/device_groups/${OCF_RESKEY_device_group}" ]; then ocf_log err "The '${OCF_RESKEY_device_group}' device group does not exist!" exit ${OCF_ERR_INSTALLED} fi target_groups="${SCST_SYSFS}/device_groups/${OCF_RESKEY_device_group}/target_groups" if [ ! -d "${target_groups}/${OCF_RESKEY_local_tgt_grp}" ]; then ocf_log err "The '${OCF_RESKEY_local_tgt_grp}' target group does not exist!" exit ${OCF_ERR_INSTALLED} fi if [ ! -d "${target_groups}/${OCF_RESKEY_remote_tgt_grp}" ]; then ocf_log err "The '${OCF_RESKEY_remote_tgt_grp}' target group does not exist!" exit ${OCF_ERR_INSTALLED} fi # Check that the given ALUA states are valid local valid_m_alua_state=0 local valid_s_alua_state=0 for i in ${ALUA_STATES}; do if [ "x${OCF_RESKEY_m_alua_state}" = "x${i}" ]; then valid_m_alua_state=1 fi if [ "x${OCF_RESKEY_s_alua_state}" = "x${i}" ]; then valid_s_alua_state=1 fi done if [ ${valid_m_alua_state} -eq 0 ]; then ocf_log err "The 'm_alua_state' value is not valid: ${OCF_RESKEY_m_alua_state}" exit ${OCF_ERR_INSTALLED} fi if [ ${valid_s_alua_state} -eq 0 ]; then ocf_log err "The 's_alua_state' value is not valid: ${OCF_RESKEY_s_alua_state}" exit ${OCF_ERR_INSTALLED} fi } # Make sure meta-data and usage always succeed case ${__OCF_ACTION} in meta-data) scst_meta_data exit ${OCF_SUCCESS} ;; usage|help) scst_usage exit ${OCF_SUCCESS} ;; esac # Anything other than meta-data and usage must pass validation scst_validate_all || exit ${?} # Translate each action into the appropriate function call case ${__OCF_ACTION} in start) scst_start ;; stop) scst_stop ;; status|monitor) scst_monitor ;; notify) scst_notify ;; promote) scst_promote ;; demote) scst_demote ;; reload) ocf_log info "Reloading..." scst_start ;; validate-all) ;; migrate_to|migrate_from) scst_usage exit ${OCF_ERR_UNIMPLEMENTED} ;; *) scst_usage exit ${OCF_ERR_UNIMPLEMENTED} ;; esac # Log a debug message and exit rc=${?} ocf_log debug "${OCF_RESOURCE_INSTANCE} ${__OCF_ACTION} returned: ${rc}" exit ${rc} 


You need to put this file as /usr/lib/ocf/resource.d/esos/scst and make it executable.

Since for the stability of a cluster of two nodes there is little (there will be Split-Brain if the connectivity is broken), then we will have another third node that will create a quorum, but will not execute any resources. This feature of Pacemaker comes into play here, that the “health” of resources is checked on all nodes of the cluster, regardless of whether this resource can be executed there at all (new versions seem to have added some feature that disables this feature, but in the version from the Debian repository does not seem to be that happy yet). Therefore, for a quorum node, we need a fake resource that will simply say “everything is calm in Baghdad”.
Fake resource
 #! /bin/sh # # $Id$ # # Resource Agent for managing the Generic SCSI Target Subsystem # for Linux (SCST) and related daemons. # # License: GNU General Public License (GPL) # (c) 2012-2014 Marc A. Smith # # Initialization : ${OCF_FUNCTIONS_DIR=${OCF_ROOT}/lib/heartbeat} . ${OCF_FUNCTIONS_DIR}/ocf-shellfuncs MODULES="scst scst_vdisk qla2x00tgt" SCST_CFG="/etc/scst.conf" PRE_SCST_CONF="/etc/pre-scst_xtra_conf" POST_SCST_CONF="/etc/post-scst_xtra_conf" SCST_SYSFS="/sys/kernel/scst_tgt" ALUA_STATES="active nonoptimized standby unavailable offline transitioning" NO_CLOBBER="/tmp/scst_ra-no_clobber" scst_monitor() { return ${OCF_NOT_RUNNING} } scst_meta_data() { cat <<-EOF <?xml version="1.0"?> <!DOCTYPE resource-agent SYSTEM "ra-api-1.dtd"> <resource-agent name="scst" version="0.1"> <version>0.1</version> <longdesc lang="en">The SCST OCF resource agent for ESOS; includes SCST ALUA support.</longdesc> <shortdesc lang="en">SCST OCF RA script for ESOS.</shortdesc> <parameters> <parameter name="alua" unique="0" required="0"> <longdesc lang="en">Use to enable/disable updating ALUA status in SCST.</longdesc> <shortdesc lang="en">The 'alua' parameter.</shortdesc> <content type="boolean" default="false" /> </parameter> <parameter name="device_group" unique="0" required="0"> <longdesc lang="en">The name of the SCST device group (used with ALUA support).</longdesc> <shortdesc lang="en">The 'device_group' parameter.</shortdesc> <content type="string" default="" /> </parameter> <parameter name="local_tgt_grp" unique="0" required="0"> <longdesc lang="en">The name of the SCST local target group (used with ALUA support).</longdesc> <shortdesc lang="en">The 'local_tgt_grp' parameter.</shortdesc> <content type="string" default="" /> </parameter> <parameter name="remote_tgt_grp" unique="0" required="0"> <longdesc lang="en">The name of the SCST remote target group (used with ALUA support).</longdesc> <shortdesc lang="en">The 'remote_tgt_grp' parameter.</shortdesc> <content type="string" default="" /> </parameter> <parameter name="m_alua_state" unique="0" required="0"> <longdesc lang="en">The ALUA state (eg, active) for a Master node (used with ALUA support).</longdesc> <shortdesc lang="en">The 'm_alua_state' parameter.</shortdesc> <content type="string" default="active" /> </parameter> <parameter name="s_alua_state" unique="0" required="0"> <longdesc lang="en">The ALUA state (eg, nonoptimized) for a Slave node (used with ALUA support).</longdesc> <shortdesc lang="en">The 's_alua_state' parameter.</shortdesc> <content type="string" default="nonoptimized" /> </parameter> </parameters> <actions> <action name="meta-data" timeout="5" /> <action name="start" timeout="120" /> <action name="stop" timeout="60" /> <action name="monitor" timeout="20" depth="0" interval="10" role="Master" /> <action name="monitor" timeout="20" depth="0" interval="20" role="Slave" /> <action name="notify" timeout="20" /> <action name="promote" timeout="20" /> <action name="demote" timeout="20" /> <action name="reload" timeout="20" /> <action name="validate-all" timeout="20" /> </actions> </resource-agent> EOF } scst_usage() { echo "usage: ${0} {start|stop|monitor|validate-all|promote|demote|reload|notify|meta-data}" echo "" echo "Expects to have a fully populated OCF RA-compliant environment set." } # Make sure meta-data and usage always succeed case ${__OCF_ACTION} in meta-data) scst_meta_data exit ${OCF_SUCCESS} ;; usage|help) scst_usage exit ${OCF_SUCCESS} ;; esac # Translate each action into the appropriate function call case ${__OCF_ACTION} in start) ;; stop) ;; status|monitor) scst_monitor ;; notify) ;; promote) ;; demote) ;; reload) ocf_log info "Reloading..." ;; validate-all) ;; migrate_to|migrate_from) scst_usage exit ${OCF_ERR_UNIMPLEMENTED} ;; *) scst_usage exit ${OCF_ERR_UNIMPLEMENTED} ;; esac # Log a debug message and exit rc=${?} ocf_log debug "${OCF_RESOURCE_INSTANCE} ${__OCF_ACTION} returned: ${rc}" exit ${rc} 


Its put on a quorum-node is similar to the main nodes.

So, finally, we will create arrays and configure the SCST configuration.
For the first part, probably enough.

Let's see what devices we have:
storcli64 / c0 / eall / sall show
 Controller = 0 Status = Success Description = Show Drive Information Succeeded. Drive Information : ================= ------------------------------------------------------------------------- EID:Slt DID State DG Size Intf Med SED PI SeSz Model Sp ------------------------------------------------------------------------- 37:0 61 UGood - 372.093 GB SAS SSD NN 512B S842E400M2 U 37:1 62 UGood - 372.093 GB SAS SSD NN 512B S842E400M2 U 37:2 63 UGood - 372.093 GB SAS SSD NN 512B S842E400M2 U 37:3 64 UGood - 372.093 GB SAS SSD NN 512B S842E400M2 U 37:4 65 UGood - 372.093 GB SAS SSD NN 512B S842E400M2 U 37:5 66 UGood - 372.093 GB SAS SSD NN 512B S842E400M2 U 37:6 67 UGood - 372.093 GB SAS SSD NN 512B S842E400M2 U 37:7 68 UGood - 372.093 GB SAS SSD NN 512B S842E400M2 U 37:8 69 UGood - 372.093 GB SAS SSD NN 512B S842E400M2 U 37:9 70 UGood - 372.093 GB SAS SSD NN 512B S842E400M2 U 37:10 71 UGood - 372.093 GB SAS SSD NN 512B S842E400M2 U 37:11 72 UGood - 372.093 GB SAS SSD NN 512B S842E400M2 U 37:12 73 UGood - 372.093 GB SAS SSD NN 512B S842E400M2 U 37:13 74 UGood - 372.093 GB SAS SSD NN 512B S842E400M2 U 37:14 75 UGood - 372.093 GB SAS SSD NN 512B S842E400M2 U 37:15 76 UGood - 372.093 GB SAS SSD NN 512B S842E400M2 U 37:16 77 UGood - 372.093 GB SAS SSD NN 512B S842E400M2 U 37:17 78 UGood - 372.093 GB SAS SSD NN 512B S842E400M2 U 37:18 15 UGood - 372.093 GB SAS SSD NN 512B S842E400M2 U 37:19 19 UGood - 372.093 GB SAS SSD NN 512B S842E400M2 U 37:20 79 UGood - 372.093 GB SAS SSD NN 512B S842E400M2 U 37:21 80 UGood - 372.093 GB SAS SSD NN 512B S842E400M2 U 37:22 81 UGood - 372.093 GB SAS SSD NN 512B S842E400M2 U 37:23 82 UGood - 372.093 GB SAS SSD NN 512B S842E400M2 U 59:0 8 UGood - 372.093 GB SAS SSD NN 512B S842E400M2 U 59:1 9 UGood - 372.093 GB SAS SSD NN 512B S842E400M2 U 59:2 10 UGood - 372.093 GB SAS SSD NN 512B S842E400M2 U 59:3 11 UGood - 372.093 GB SAS SSD NN 512B S842E400M2 U 59:4 12 UGood - 372.093 GB SAS SSD NN 512B S842E400M2 U 59:5 14 UGood - 372.093 GB SAS SSD NN 512B S842E400M2 U 59:6 21 UGood - 372.093 GB SAS SSD NN 512B S842E400M2 U 59:7 22 UGood - 372.093 GB SAS SSD NN 512B S842E400M2 U 59:8 23 UGood - 372.093 GB SAS SSD NN 512B S842E400M2 U 59:9 24 UGood - 372.093 GB SAS SSD NN 512B S842E400M2 U 59:10 18 UGood - 372.093 GB SAS SSD NN 512B S842E400M2 U 59:11 17 UGood - 372.093 GB SAS SSD NN 512B S842E400M2 U 59:12 25 UGood - 372.093 GB SAS SSD NN 512B S842E400M2 U 59:13 26 UGood - 372.093 GB SAS SSD NN 512B S842E400M2 U 59:14 16 UGood - 372.093 GB SAS SSD NN 512B S842E400M2 U 59:15 27 UGood - 372.093 GB SAS SSD NN 512B S842E400M2 U 59:16 28 UGood - 372.093 GB SAS SSD NN 512B S842E400M2 U 59:17 29 UGood - 372.093 GB SAS SSD NN 512B S842E400M2 U 59:18 30 UGood - 372.093 GB SAS SSD NN 512B S842E400M2 U 59:19 31 UGood - 372.093 GB SAS SSD NN 512B S842E400M2 U 59:20 32 UGood - 372.093 GB SAS SSD NN 512B S842E400M2 U 59:21 33 UGood - 372.093 GB SAS SSD NN 512B S842E400M2 U 59:22 34 UGood - 372.093 GB SAS SSD NN 512B S842E400M2 U 59:23 35 UGood - 372.093 GB SAS SSD NN 512B S842E400M2 U ------------------------------------------------------------------------- 


Clearly, then we will create two RAID6 arrays (23 + 1 Hot Spare), one on each backplane:
 # storcli /c0 add vd r6 name=SSD-RAID6-1 drives=37:0-22 WT nora direct Strip=64 # storcli /c0 add vd r6 name=SSD-RAID6-2 drives=59:0-22 WT nora direct Strip=64 

These are recommended LSI parameters for SSD. Judging by my tests, the stripe size from 8Kb to 128Kb almost does not affect the speed of work.

SCST configuration: /etc/scst.conf
Server-1 (with comments)
 #   ##  blockio      Page Cache,  SSD   HANDLER vdisk_blockio { ##    ,      SCST   t10_dev_id  usn ##   ESXi  LUN. DEVICE SSD-RAID6-1 { ##     (   /dev/sdX) filename /dev/disk/by-id/scsi-3600605b008b4be401c91ac4abce21c9b ##    write_through 1 ##    SSD rotational 0 } DEVICE SSD-RAID6-2 { filename /dev/disk/by-id/scsi-3600605b008b4be401c91ac53bd668eda write_through 1 rotational 0 } } #   TARGET_DRIVER qla2x00t { ## WWN ,    /sys/kernel/scst_tgt/targets/qla2x00t TARGET 21:00:00:24:ff:54:09:32 { HW_TARGET enabled 1 #    ALUA  (1-4     5-8  ) rel_tgt_id 1 ##      LUN 0 SSD-RAID6-1 LUN 1 SSD-RAID6-2 } TARGET 21:00:00:24:ff:54:09:33 { HW_TARGET enabled 1 rel_tgt_id 2 LUN 0 SSD-RAID6-1 LUN 1 SSD-RAID6-2 } TARGET 21:00:00:24:ff:54:09:80 { HW_TARGET enabled 1 rel_tgt_id 3 LUN 0 SSD-RAID6-1 LUN 1 SSD-RAID6-2 } TARGET 21:00:00:24:ff:54:09:81 { HW_TARGET enabled 1 rel_tgt_id 4 LUN 0 SSD-RAID6-1 LUN 1 SSD-RAID6-2 } } #  ALUA,   DEVICE_GROUP default { ##     DEVICE SSD-RAID6-1 DEVICE SSD-RAID6-2 ##    -1,   TARGET_GROUP local { ## ID ,  group_id 256 ## ALUA  state active ##  -1 TARGET 21:00:00:24:ff:54:09:32 TARGET 21:00:00:24:ff:54:09:33 TARGET 21:00:00:24:ff:54:09:80 TARGET 21:00:00:24:ff:54:09:81 } ##   -2 TARGET_GROUP remote { group_id 257 ## ALUA  state nonoptimized ##  -2     TARGET 21:00:00:24:ff:4a:af:b2 { rel_tgt_id 5 } TARGET 21:00:00:24:ff:4a:af:b3 { rel_tgt_id 6 } TARGET 21:00:00:24:ff:54:09:06 { rel_tgt_id 7 } TARGET 21:00:00:24:ff:54:09:07 { rel_tgt_id 8 } } } 


Server-2 (symmetrical)
 HANDLER vdisk_blockio { DEVICE SSD-RAID6-1 { filename /dev/disk/by-id/scsi-3600605b008b4be401c91ac4abce21c9b write_through 1 rotational 0 } DEVICE SSD-RAID6-2 { filename /dev/disk/by-id/scsi-3600605b008b4be401c91ac53bd668eda write_through 1 rotational 0 } } TARGET_DRIVER qla2x00t { TARGET 21:00:00:24:ff:4a:af:b2 { HW_TARGET enabled 1 rel_tgt_id 5 LUN 0 SSD-RAID6-1 LUN 1 SSD-RAID6-2 } TARGET 21:00:00:24:ff:4a:af:b3 { HW_TARGET enabled 1 rel_tgt_id 6 LUN 0 SSD-RAID6-1 LUN 1 SSD-RAID6-2 } TARGET 21:00:00:24:ff:54:09:06 { HW_TARGET enabled 1 rel_tgt_id 7 LUN 0 SSD-RAID6-1 LUN 1 SSD-RAID6-2 } TARGET 21:00:00:24:ff:54:09:07 { HW_TARGET enabled 1 rel_tgt_id 8 LUN 0 SSD-RAID6-1 LUN 1 SSD-RAID6-2 } } DEVICE_GROUP default { DEVICE SSD-RAID6-1 DEVICE SSD-RAID6-2 TARGET_GROUP local { group_id 257 state nonoptimized TARGET 21:00:00:24:ff:4a:af:b2 TARGET 21:00:00:24:ff:4a:af:b3 TARGET 21:00:00:24:ff:54:09:06 TARGET 21:00:00:24:ff:54:09:07 } TARGET_GROUP remote { group_id 256 state active TARGET 21:00:00:24:ff:54:09:32 { rel_tgt_id 1 } TARGET 21:00:00:24:ff:54:09:33 { rel_tgt_id 2 } TARGET 21:00:00:24:ff:54:09:80 { rel_tgt_id 3 } TARGET 21:00:00:24:ff:54:09:81 { rel_tgt_id 4 } } } 



Everything, for the first part, I think, is enough. , :)

Source: https://habr.com/ru/post/252403/


All Articles