📜 ⬆️ ⬇️

Ceph in ProxMox on ZFS

In his work (system administrator) you always have to look for things and knowledge that are unique to your region. One of these things in our office is ProxMox, installed on the ZFS file system, which allows you to use a good raid array without using iron controllers. One day, thinking about how we could still surprise and please our customers, we decided to plant it all on the distributed Ceph file system. I don’t know how adequate the decision was, but I decided to make my wish come true. And then it started ... I shoveled mountains of articles and forums, but did not find one adequate manual describing in detail what and how to do, therefore, having coped with everything, this article was born, who are interested, welcome under cat.


image



So, in principle, everything is done in the console and we don’t really need ProxMox web-muzzle. I did everything in test mode, so two virtual machines with four disks were raised inside a not very powerful hardware for proxmox (a sort of nested doll). Four disks were originally due to the fact that I wanted to raise, like on the future, not the test hardware, on the ZFS10, but the goldfish did not come out for unknown reasons (in fact, it was too lazy to understand). It turned out that ProxMox was unable to partition the ZFS10 on virtual disks, so it was decided to use a slightly different “geography”. ProxMox itself was put on one of the disks, ZFS1 was raised on the other two, the third was supposedly under the Ceph magazine, but I finally forgot about it, so for now let's leave it alone. So let's get started.


There will be a small introductory:


Proxmox is freshly installed in two places. The nodes are called ceph1 and ceph2. We make everything on both nodes the same, except for those places that I will mark. Our network is 192.168.111.0/24. The first node (ceph1) has the address 192.168.111.1, the second (ceph2) - 192.168.111.2. The drives on both nodes have the following meanings: / dev / vda is the drive on which ProxMox resides, / dev / vdb and / dev / vdc are drives designed for ZFS, / dev / vdd is a drive for Ceph log.


The first thing we need to do is change the paid ProxMox repository that requires a subscription to a free one:


nano /etc/apt/sources.list.d/pve-enterprise.list 

There we comment on a single line and enter a new one below:


 deb http://download.proxmox.com/debian jessie pve-no-subscription 

Next, update our ProxMox:


 apt update && apt dist-upgrade 

Install packages to work with Ceph:


 pveceph install -version hammer 

The next step is to make a cluster of proxmoxes.


On the first node we execute sequentially:


 pvecm create mycluster 

where mycluster is the name of our cluster.


On the second node:


 pvecm add 192.168.111.1 

We agree that you need to accept the ssh key and enter the root password from the first node.


Checking the whole thing with pvecm status


Next, we initialize the Ceph configuration (done only on the first node, which will be “main”):


 pveceph init --network 192.168.111.0/24 

this will create a symlink for us on /etc/ceph/ceph.conf, from which we will continue to build on.


Immediately after this, we need to add an option to the [osd] section:


 [osd] journal dio = false 

This is due to the fact that ZFS does not know how to directIO.


The next thing we do is prepare our ZFS pool. To do this, you need to mark the disks in GPT:


 fdisk /dev/vdb 

There we consistently press g and w (g to create a GPT table and w to accept the changes). The same is repeated on / dev / vdc.


Create a mirrored ZFS pool, we will call it as is customary in ProxMox - rpool:


 zpool create rpool mirror /dev/vdb /dev/vdc 

Check the zpool status -v command and get (at least should):


 pool: rpool state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 vdb ONLINE 0 0 0 vdc ONLINE 0 0 0 errors: No known data errors 

ZFS pool we have created, it's time to do the most important - ceph.


Create a file system (a strange name, but it is taken from the ZFS docks) for our Ceph monitor:


 zfs create -o mountpoint=/var/lib/ceph/mon rpool/ceph-monfs 

Let's create the monitor itself (first on the first node, then on the second):


 pveceph createmon 

Then begins what it was necessary to tinker with, namely, how to make a block device for Ceph OSD (and it works with them) in ZFS and so that it also works.


And everything is done simply - through zvol:


 zfs create -V 90G rpool/ceph-osdfs 

90G is how much we give to our Ceph to be torn apart. So little because the server is virtual and I did not give him more than 100G.


Well, let's do it myself Ceph OSD:


 ceph-disk prepare --zap-disk --fs-type xfs --cluster ceph --cluster-uuid FSID /dev/zd0 

--fs-type XFS is chosen here because XFS is the default FS from Ceph. The FSID is our Ceph ID, which can be found in /etc/ceph/ceph.conf. Well, / dev / zd0 is our zvol.


If after that your df -h doesn't show something like this:


 /dev/zd0p1 85G 35M 85G 1% /var/lib/ceph/osd/ceph-0 

it means something went wrong and you either need to reboot, or once again you need to create a ceph OSD.


In general, we have already made our ceph on this and we can continue to drive them in the ProxMox webmoney and create the necessary RDB storage on it, but you cannot use it (actually, for what it was all started). It is treated in a simple way (for this, all the same storage must be created) - you need to copy the ceph key from the first node to the second.


Open the ProxMox storage configuration:


 nano /etc/pve/storage.cfg 

And enter the RBD we need there:


 rbd: test monhost 192.168.111.1:6789;192.168.111.2:6789 pool rbd krbd 1 username admin content images 

Here, test is the name of our repository, and IP addresses are where ceph monitors are located, that is, our proxmoxes. The remaining options are default.


Next, create a daddy for the key on the second node:


 mkdir /etc/pve/priv/ceph 

And copy the key with the first:


 scp ceph1:/etc/ceph/ceph.client.admin.keyring /etc/pve/priv/ceph/test.keyring 

Here ceph1 is our first node, and test is the name of the repository.


At this point you can put an end - the storage is active and working, we can use all the ceph buns.


Thanks for attention!


In order to raise all this, use these links:


» Https://pve.proxmox.com/wiki/Storage:_Ceph
» Https://pve.proxmox.com/wiki/Ceph_Server
» Http://xgu.ru/wiki/ZFS
» Https://forum.proxmox.com


')

Source: https://habr.com/ru/post/318548/


All Articles