How I worked at CoreOS for a year

The first time I heard about CoreOS was from Peter Lemenkov at the Yandex conference “Road to the Clouds” in September 2013. Then I could not even think that I would participate in the development of this OS.

The second time about CoreOS I remembered in October 2014, when I received the task of translating microservices written in Ruby (which used, oddly enough, different versions of Ruby), into a more favorable environment for continuous integration. Then I launched CoreOS for the first time, and it seemed awfully inconvenient to use. Her documentation was superficial. The services that turned CoreOS into a cluster OS had many flaws and caused only a feeling of annoyance due to constant errors. There was no mention of transferring even part of the infrastructure to CoreOS.

For the third time, in March 2015, the task was received to provide support services as part of the community support for CoreOS. About how I coped, and will be discussed.

Acquaintance

The first task was to build a cluster that performs functions close to the production system of one of the customers I worked with. I had to experiment with a bunch of Kafka-Storm-Cassandra. During the execution of the proof-of-concept, I met all the same flaws in the documentation and the etcd and fleet code. Even then, it seemed illogical to me to raise a Zookeeper cluster, when the system already has etcd. Unfortunately, so far no one has a desire to write a translator of the Zookeeper Jute protocol in the REST API etcd . The greatest difficulties then caused writing topologies for Apache Storm. Thanks to the https://github.com/Yelp/pyleus project, I was able to avoid the Java / Clojure topology descriptions. The working proof-of-concept was even successfully demonstrated to one of the potential customers for implementation, but, unfortunately, the project could not be implemented due to funding problems.

Practice

Using the experience and stuffed bumps in the process of learning CoreOS, an improvement was launched. From the official IRC channel #coreos, I received questions that were answered and wrote documentation. The most important thing is to take all the incoming questions seriously and try to answer even the most stupid. It is worth noting that the new technologies for which I provided support were as unknown to me as they were for users who ask questions about them. At that time, when the user just asked a question, I had to reproduce the user environment at myself and myself to understand the details, climb into the source code, and sometimes correct errors there. Such a baptism of fire allowed to study many of the nuances of systemd, the Linux kernel, to practice C and learn to write Go.

Documentation

Very often, user questions related to systemd. Although the official systemd documentation describes everything in detail, users need examples, they don’t have time to read (what not to say, but to win the heart of the user, ready examples are needed, and a lot). As part of CoreOS support, I wrote quite a few examples and additional documentation related to systemd. For some of them, Google even gave the first links to the CoreOS documentation pages. Then the championship took over the Archlinux Wiki. Examples and explanations were supposed to be everywhere where the user could interpret the information wrong. Almost any user’s misunderstanding regarding documentation or the “what if?” Question that arose in my mind turned into a pull request for github. If the answer to the user's question is not in the documentation, correct it.

Reproduction

Before publishing the answer, if possible, it should be checked in the stable, beta and alpha releases of CoreOS. To do this, I had to adapt the bash script, which with the help of libvirt raised virtual machines for the internal infrastructure (I will probably write a separate post about it), using the already ready cloud VM images of Ubuntu. The script, if necessary, downloads the official CoreOS image of the specified release ( https://stable.release.core-os.net/amd64-usr/current/ ), creates a snapshot based on it and raises the virtual machine with the cloud-config mounted to it. When using shapshot'ov the process of creating a "clean" virtual machine takes only 20 seconds (when using SSD even faster). When creating a cluster of virtual machines, one base image will be used, which saves disk time and space. For example, a cluster of three virtual machines rises in just 30 seconds. The advantage of libvirt-qemu solution over Vagrant in speed. Even the libvirt provider in Vagrant did not give such speed. During the day I had to create and delete clusters that repeat the user environment several dozen times. The configuration of the entire cluster was formed using just one cloud-config. Now, instead of cloudinit, Ignition is being actively implemented, which is performed at the initial bootup stage of the system.

Steering

Do not forget about the project Kubernetes, which is closely used with CoreOS products. To study Kubernetes, I wrote cloud-config, which reproduces the configuration exactly as stated in the official documentation, written by the Kubernetes team from CoreOS. It also allowed for a few minutes to reproduce the problem encountered by the user and propose a solution.

Decision

I tried to give a solution to any user question, or at least workaround. Even access to CoreOS lobby didn’t help me much, because I provided support in the European part of the world, and the entire CoreOS team worked in the Pacific time zone. In order not to wait until my colleagues wake up on the other side of the Earth, I often studied the source code to understand the process of software operation, and sometimes I immediately corrected errors in the code.

Opinion

So far, many are skeptical and categorically looking in the direction of containerization. One need only mention Docker or CoreOS, so a shower ** pours on you. I always try to draw attention to the fact that each task has its own solution tool. And each tool has its pros and cons. If the system is already running on virtual machines and there is no need to change or spoil the improvement, then let it work further. Nobody forces you to transfer everything to containers. But if the task has already been set or there is free time to play around with the containers, then a misunderstanding arises. People who are used to working with virtual machines, kickstarts, and configuration management systems use the old approach to containers, and then complain that containers do not work. I will not dilute the flame, just once again I will give a link to http://12factor.net/ and remind you that in most cases containers do not have to store the configuration inside and be dependent on the host.

Perfection

There is no limit to perfection. Work has always been. On calm days, I spent time on my TODO-list. I wrote down all the small ideas or flaws in the list, and this list never ended.

Context

The most difficult was to keep in mind and work on several tasks. Sometimes I was chatting with five users at the same time in the IRC channel. It trains brains very well, but a long context switch is exhausting.

Fleet

As soon as the number of requests for support began to fall, I joined a small team working on the fleet project. At that time, the project was abandoned, because all attention shifted to Kubernetes. But there were users who needed to work with systemd as a cluster. For three months we wrote about 15 new functional tests, implemented their automatic check in https://semaphoreci.com . By the release of version v0.12.0, we have closed 36 problems and made 20 changes to the code. By version v0.13.0, we plan to close about 20 problems and make 16 changes to the code.

Today

Today, the CoreOS support team is formed in San Francisco, and I switched to working with other projects. During the year I was lucky enough to communicate with people like Matthew Garrett , Brian 'redbeard' Harrington , Lennart Poettering , Kelsey Hightower and others.

PS I guess I spoiled the income for the provision of commercial support:

nalum: anyone using the CoreOS managed linux service?
balboah: no we just ask kayrus :)
nalum: Who is kayrus?
balboah

PPS The CoreOS Fest conference is taking place in Berlin next week. Those who wish to participate can still have time to buy tickets: https://coreos.com/fest/

Source: https://habr.com/ru/post/282964/

All Articles