Rook or not Rook - that is the question

Earlier this month, on May 3, a major release of the “management system for distributed data warehouses in Kubernetes” - Rook 1.0.0 was announced. More than a year ago we published a general review of the Rook. At the same time, we were asked to tell about the experience of its use in practice - and now, in time for such a significant milestone in the history of the project, we are happy to share our accumulated impressions.

In short, Rook is a set of operators for Kubernetes that fully take control of deployment, management, automatic recovery of storage solutions such as Ceph, EdgeFS, Minio, Cassandra, CockroachDB.
')
At the moment, the most developed (and only in a stable stage) solution is the rook-ceph-operator .

Note : Among the significant changes in the release of Rook 1.0.0 related to Ceph are Ceph Nautilus support and the ability to use NFS for CephFS or RGW buckets. Of the others, “ripening” support for EdgeFS to beta level stands out.

So, in this article we:

answer the question of what advantages we see in using Rook to deploy Ceph in the Kubernetes cluster;
share experiences and impressions of using Rook in production;
We will tell why we say “Yes!” to Rook, and about our plans for him.

Let's start with the general concepts and theories.

“I have the advantage of one Ladu!” (Unknown chess player)

One of the main advantages of Rook is that the interaction with data warehouses is carried out through the mechanisms of Kubernetes. This means that you no longer need to copy the commands to configure Ceph from a piece of paper to the console.

- Want to deploy in a CephFS cluster? Just write the YAML file!
- What? Do you want to expand the object store with the S3 API? Just write the second YAML file!

Rook created by all the rules of a typical operator. Interaction with it takes place with the help of CRD (Custom Resource Definitions) , in which we describe the characteristics of Ceph entities we need (since this is the only stable implementation, by default the article will talk about Ceph unless explicitly stated otherwise) . According to the set parameters, the operator will automatically execute the commands necessary for setting up.

Let's take a concrete look at the example of creating the Object Store, or rather, CephObjectStoreUser .

 apiVersion: ceph.rook.io/v1 kind: CephObjectStore metadata: name: {{ .Values.s3.crdName }} namespace: kube-rook spec: metadataPool: failureDomain: host replicated: size: 3 dataPool: failureDomain: host erasureCoded: dataChunks: 2 codingChunks: 1 gateway: type: s3 sslCertificateRef: port: 80 securePort: instances: 1 allNodes: false --- apiVersion: ceph.rook.io/v1 kind: CephObjectStoreUser metadata: name: {{ .Values.s3.crdName }} namespace: kube-rook spec: store: {{ .Values.s3.crdName }} displayName: {{ .Values.s3.username }}

The parameters listed in the listing are fairly standard and hardly need comments, but you should pay particular attention to those that are highlighted in template variables.

The general scheme of work is reduced to the fact that through the YAML file we “order” resources, for which the operator executes the necessary commands and returns to us the “not very real” secret with which we can continue to work (see below) . And from the variables listed above, the command and the name of the secret will be composed.

What is this team? When creating a user for object storage, the Rook operator inside the pod will do the following:

 radosgw-admin user create --uid="rook-user" --display-name="{{ .Values.s3.username }}"

The result of this command is the JSON structure:

 { "user_id": "rook-user", "display_name": "{{ .Values.s3.username }}", "keys": [ { "user": "rook-user", "access_key": "NRWGT19TWMYOB1YDBV1Y", "secret_key": "gr1VEGIV7rxcP3xvXDFCo4UDwwl2YoNrmtRlIAty" } ], ... }

Keys is what future applications will need to access object storage via the S3 API. The Rook operator kindly selects them and adds them to your namespace as a secret with the name rook-ceph-object-user-{{ $.Values.s3.crdName }}-{{ $.Values.s3.username }} .

To use the data from this secret, it is enough to add them to the container as environment variables. As an example, I will give a template for Job, in which we automatically create buckets for each user environment:

 {{- range $bucket := $.Values.s3.bucketNames }} apiVersion: batch/v1 kind: Job metadata: name: create-{{ $bucket }}-bucket-job annotations: "helm.sh/hook": post-install "helm.sh/hook-weight": "2" spec: template: metadata: name: create-{{ $bucket }}-bucket-job spec: restartPolicy: Never initContainers: - name: waitdns image: alpine:3.6 command: ["/bin/sh", "-c", "while ! getent ahostsv4 rook-ceph-rgw-{{ $.Values.s3.crdName }}; do sleep 1; done" ] - name: config image: rook/ceph:v1.0.0 command: ["/bin/sh", "-c"] args: ["s3cmd --configure --access_key=$(ACCESS-KEY) --secret_key=$(SECRET-KEY) -s --no-ssl --dump-config | tee /config/.s3cfg"] volumeMounts: - name: config mountPath: /config env: - name: ACCESS-KEY valueFrom: secretKeyRef: name: rook-ceph-object-user-{{ $.Values.s3.crdName }}-{{ $.Values.s3.username }} key: AccessKey - name: SECRET-KEY valueFrom: secretKeyRef: name: rook-ceph-object-user-{{ $.Values.s3.crdName }}-{{ $.Values.s3.username }} key: SecretKey containers: - name: create-bucket image: rook/ceph:v1.0.0 command: - "s3cmd" - "mb" - "--host=rook-ceph-rgw-{{ $.Values.s3.crdName }}" - "--host-bucket= " - "s3://{{ $bucket }}" ports: - name: s3-no-sll containerPort: 80 volumeMounts: - name: config mountPath: /root volumes: - name: config emptyDir: {} --- {{- end }}

All the actions listed in this Job were performed without going beyond Kubernetes. The structures described in the YAML files are folded into a Git repository and reused many times. In this we see a huge plus for DevOps engineers and the CI / CD process as a whole.

With Rook and Rados in joy

Using the Ceph + RBD bundle imposes certain restrictions on mounting volumes to pods.

In particular, the namespace must have a secret to access Ceph in order for stateful applications to function. Normally, if you have 2-3 environments in your namespaces: you can go and copy the secret manually. But what to do if for each feature for developers a separate environment with its own namespace is created?

We solved this problem with the help of a shell-operator , which automatically copied secrets to new namespaces (an example of such a hook is described in this article ).

 #! /bin/bash if [[ $1 == “--config” ]]; then cat <<EOF {"onKubernetesEvent":[ {"name": "OnNewNamespace", "kind": "namespace", "event": ["add"] } ]} EOF else NAMESPACE=$(kubectl get namespace -o json | jq '.items | max_by( .metadata.creationTimestamp ) | .metadata.name') kubectl -n ${CEPH_SECRET_NAMESPACE} get secret ${CEPH_SECRET_NAME} -o json | jq ".metadata.namespace=\"${NAMESPACE}\"" | kubectl apply -f - fi

However, when using Rook, this problem simply does not exist. The mounting process takes place with the help of our own drivers based on Flexvolume or CSI (while in the beta stage) and therefore does not require secrets.

Rook automatically solves many problems, which pushes us to use it in new projects.

Siege Rook

We conclude the practical part of the unfolding of Rook and Ceph for the possibility of conducting their own experiments. In order to take this impregnable tower by storm it was easier, the developers prepared the Helm-package. Let's download it:

 $ helm fetch rook-master/rook-ceph --untar --version 1.0.0

There are many different settings in the rook-ceph/values.yaml file. The most important thing is to specify tolerations for agents and searches. For what you can use the taints / tolerations mechanism, we described in detail in this article .

In short, we do not want the pods with the client application to be located on the same nodes as the disks for data storage. The reason is simple: so the work of the Rook agents will not affect the application itself.

So, open the rook-ceph/values.yaml favorite editor and add the following block to the end:

 discover: toleration: NoExecute tolerationKey: node-role/storage agent: toleration: NoExecute tolerationKey: node-role/storage mountSecurityMode: Any

For each node reserved for data storage, add the corresponding taint:

 $ kubectl taint node ${NODE_NAME} node-role/storage="":NoExecute

After that set the helm-chart with the command:

 $ helm install --namespace ${ROOK_NAMESPACE} ./rook-ceph

Now you need to create a cluster and specify the location of the OSD :

 apiVersion: ceph.rook.io/v1 kind: CephCluster metadata: clusterName: "ceph" finalizers: - cephcluster.ceph.rook.io generation: 1 name: rook-ceph spec: cephVersion: image: ceph/ceph:v13 dashboard: enabled: true dataDirHostPath: /var/lib/rook/osd mon: allowMultiplePerNode: false count: 3 network: hostNetwork: true rbdMirroring: workers: 1 placement: all: tolerations: - key: node-role/storage operator: Exists storage: useAllNodes: false useAllDevices: false config: osdsPerDevice: "1" storeType: filestore resources: limits: memory: "1024Mi" requests: memory: "1024Mi" nodes: - name: host-1 directories: - path: "/mnt/osd" - name: host-2 directories: - path: "/mnt/osd" - name: host-3 directories: - path: "/mnt/osd"

Check the status of Ceph - expect to see HEALTH_OK :

 $ kubectl -n ${ROOK_NAMESPACE} exec $(kubectl -n ${ROOK_NAMESPACE} get pod -l app=rook-ceph-operator -o name -o jsonpath='{.items[0].metadata.name}') -- ceph -s

At the same time, we will check that the pods with the client application do not fall on the nodes reserved for Ceph:

 $ kubectl -n ${APPLICATION_NAMESPACE} get pods -o custom-columns=NAME:.metadata.name,NODE:.spec.nodeName

Further optional components are customized. More information about them is specified in the documentation . For administration, we strongly recommend installing a dashboard and toolbox.

Rook'i-hooks: Is Rook enough for everything?

As you can see, the development of Rook is in full swing. But there are still problems that do not allow us to completely abandon the manual configuration of Ceph:

No Rook driver can export metrics for using mounted blocks, which deprives us of monitoring.
Flexvolume and CSI cannot resize volumes (unlike RBD), so Rook loses a useful (and sometimes critical!) Tool.
Rook is still not as flexible as regular Ceph. If we want to configure the CephFS metadata pool to be stored on the SSD, and the data itself on the HDD, we will need to manually register separate groups of devices in CRUSH maps.
Despite the fact that the rook-ceph-operator is considered stable, at the moment there are certain problems when upgrading Ceph from version 13 to 14.

findings

“Now Rook is closed from the outside world with pawns, but we believe that one day she will play a decisive role in the game!” (The quotation was invented specifically for this article)

The Rook project, undoubtedly, won our hearts - we believe that [with all its pluses and minuses] it definitely deserves your attention.

Our future plans boil down to making the rook-ceph a module for an addon-operator , which will make its use in our many Kubernetes clusters even easier and more convenient.