📜 ⬆️ ⬇️

“The incident with Gitlab is a very good and revealing story”, - Alexey Lesovsky about PostgreSQL administration

Friends, today's publication opens a new section in the blog of the PG Day Russia conference: interviews with experts in the field of database maintenance. A conversation with a professional is a great opportunity to lift the veil of secrecy, learn the secrets of the profession, find out what and how the colleagues who have dedicated their lives to work with the DBMS earn. We hope that the upcoming issues will help you to look at the workflow from a new perspective, give you the opportunity to ask your question, get advice, or navigate further steps along your own career path.

In our pilot interview, we spoke with Alexey Lesovsky , DBA from Data Egret (formerly PostgreSQL-Consulting). Alexey is a specialist with many years of experience in the administration of PostgreSQL. Regular visitors to technical conferences know, not by hearsay, that his reports and master classes are renowned for the depth of work and attention to detail.

PG Day : Lesha, let's start with introductory information. Tell me briefly about yourself, how did you decide to become a DBA and how did you even get to such a life?

AL : In general, I didn’t have an idea to become a DBA. I did not seek this. I worked as a system administrator in a company that was involved in web projects, administered Linux servers, was engaged in virtualization. Their whole stack was built on modern technology. There were rails, there were memekashi, radishes and there was Postgres.
')


The team got tired of fussing around MySQL at some point, and they decided to transfer existing projects to Postgres. So I administered Postgres there as well. At first it was tasks from the series to see something, add, change rights, roll out staging for developers. Gradually, the range of tasks expanded. My colleagues and I carried out upgrades from the major versions and did it quite extraordinarily, not according to official documentation. Resorted to all sorts of interesting tools. As a result, I got carried away with databases, got carried away with Postgres.

Then I got a job at another company, where I met my future colleagues from the consulting field (Data Egret - Ed.), They worked as separate DBAs. We became friends and I went to work for them and became fully engaged in the administration of Postgres, database administration. And here practically I am not engaged more in Linux. I haven't been busy with administration for a long time. But with Linux and with PostgreSQL I still work.

PG Day : Despite the fact that you do not administer Linux, you still have very extensive experience, and you talk a lot about it. I think everyone will be interested in how you will learn some new things about Linux, what do you use for this? How do you maintain your skill level?

AL : First of all, this is LWN mailing . There are a lot of announcements there, what is happening in Linux itself, in its core, what changes and innovations are a very cool resource. In addition, I read PostgreSQL Planet - this is such a blog aggregator dedicated to PostgreSQL, contains many blogs and new articles are published very often. There is still a very good resource http://highscalability.com, it is dedicated to productivity: how big companies solve their IT problems. Very informative blog.

And, of course, I read a lot of official documentation . Postgres is a very complicated thing, there are so many things that I still don’t know. And all this can be found in the official documentation. But, if something is not in the documentation, you can always read the source code for PostgreSQL. The code is very well and thoroughly documented. You can always open the source code, start from some place and slowly get to the moment you need, see, search and find what interests you.

PG Day : Since we're talking about Linux, what tools do you use for your daily work with Linux? What do you think is most important in the first place?

AL : At home, I have Gentoo Linux on my computer, I have been using this distribution for a long time, I consider it fairly reliable and stable. The desktop environment is KDE , the terminal is yakuake, a very convenient Quake-like console that appears on hotkeys. Then everything is more or less standard - Chrome, Skype, Slack, Telegram, Libre Fffice, vim. There is also an offline program for taking notes - Keepnote, in it I keep notes, the knowledge base, todo lists. There used to be an even more cool Mytetra , but notes began to disappear in it, there were problems with the launch, and I left it. And if a business trip happens, I take a laptop with Windows, on which almost everything is the same. In general, no exotic.

The most important thing, I, of course, consider the availability of the Internet, given that I work from home and have the opportunity to work from anywhere in the world, it is important for me that there is Internet. It is also important to have an SSH client , SSH is such a protocol for working with remote Linux servers and in general, wherever there is an SSH daemon. In general, the most important is the Internet and SSH-client. And if I work on a Windows workstation, Putty is enough for me. I connect to the servers, I just have a plain text console, which is provided by SSH, and a command shell. There I already work with the servers themselves and use the utilities.

What utilities do I use on the servers? This is a standard admin set, quite simple. The top command, commands from the sysstat package . If you need to look at something more specific, related to hardware or drivers, then utilities that are designed for specific hardware are used.

But the main utility is, of course, top . Everything always begins with it. If you need to look at something, determine where to dig further, I open the top, I look, I plan to move further.

PG Day : Basically, in your work, the tools are usually quite household and standard. But have there ever been occasions when you had to dig deep, use some non-trivial solutions? Maybe even write some patches? Has he encountered such situations in his practice?

AL : I’ll say right away that I didn’t have to write patches, and I don’t plan much in the future, because this is a very long workflow. There are a lot of instructions on this topic, how to become a developer of the Linux kernel or Postgres, how to execute, send, receive patches, how to interact with the community.

But, nevertheless, complex and initially incomprehensible problems are periodically encountered, and there are tools with which to dig. Tools, as a rule, are also quite specific. They have kilometer pages of documentation and at first it is very difficult to use them, especially for beginners. But, nevertheless, when you use them often, you gradually understand what keys are and when to use them. Basically, these are the strace and ltrace utilities, if our program crashes with some kind of error or segfault occurs, or some kind of abnormal behavior that terminates the program. What do these utilities do? They allow you to keep track of which system and library calls the program performs during its operation. Very often they have to be combined with command line tools, for example, run in cycles or only when certain conditions occur.

Another utility from the same series is perf . This is such a powerful hammer that allows you to dig deep into what the program does. I launch perf, I specify parameters for the analysis and tracing. The program collects information. She gets the output file, which is then either analyzed using the same perf or other auxiliary scripts. But these programs have a rather high threshold of entry and you can use common examples that are given in the documentation. But gradually you need to understand what you can explore with these programs and somehow check, test, watch on combat loads. This requires experience and practice.

PG Day : Recently, you started a lot of lectures, master classes, trainings. Tell us about your impressions of this experience. What are people who come to your shows interested in? What do you like when working with the audience?

AL : Yes, that's right, I have been giving lectures, reports, speaking with master classes, I have been doing this for two years now. I started in 2014. I like to talk about what experience I have, what practical scenarios I have, cases have been in work, and this is the main thing. You can not tell the reports and master classes, if you have to deal with it.

In my reports I try to always pay attention to the practical side of exploitation , that is, less theory and more practice. What I encountered in my work, what problems, how we solved them with colleagues or independently. I try to cover all this in my reports. I hope that people go to my reports precisely because of this, because of the practical aspects of the activity.

PG Day : In the summer of 2017, you plan to give another big master class for the upcoming PG Day. I think the audience will be interested to know what awaits them at this master class, what you will tell, what problems you will tell, who might be interested in this?

AL : Yes, I am preparing a workshop on Postgres tuning, Linux tuning and troubleshooting for the summer post-season conference in St. Petersburg. I want to talk about how to configure Linux so that Postgres works well on it. How can I look for problems in Linux when Postgres runs on it? How to configure Postgres on the same Linux and how to search and fix problems in Postgres, which works on this very Linux.

I will talk about practical tools , with a minimum of theory. How to run utilities and tools, what settings affect the operation of the operating system and Postgres. Why do you need to customize them?

The main feature that is planned for the master class is a server on which all these techniques can be demonstrated. On this server, I will model scenarios for the behavior of Postgres, the operating system, and we will be able to “troubleshoot” them with the participants, look at the metrics, take some action, try to solve these problems. In the course of this master class, I will show all my main methods of troubleshooting and try to tell people how to do this with Postgres. I hope that participants can learn from some of my experience.

PG Day : I noticed that you were going to pay attention to the issue of equipment selection. Where did the need arise to carefully select equipment for working with the base, in particular Postgres?

AL : I decided to capture this moment, because we choose the operating system on which Postgres will work, and the operating system will work on some hardware. Accordingly, the question of the choice of iron also often arises. In the master class, I will spend some of the time selecting the hardware, which servers to choose and how to configure them. Workflow on the server can be very different: web servers, virtualization, databases. Some settings, such as virtualization for databases, are simply not needed. You can turn them off and save a little bit on performance. In the master class I will tell about it.

PG Day : Do not hold out and ask one question about file systems. What conference will not come, fierce battles, how to choose a file system for the database, and how they are better. How will your story differ from these traditional battles “file system No. 1 against file system No. 2”?

AL : In fact, we live in 2017, the issue of choosing file systems was already decided five years ago, and even earlier. Now for databases there are two file systems. These are ext4 and xfs . Both file systems give approximately the same results in any tests and are fairly stable in operation. They practically do not find any bugs. They work stably and well. Therefore, in my master class there will be no comparisons, the thesis is quite simple: use xfs, or use ext4, that's all.

No need to rush between exotic file systems like btrfs, zfs on Linux or, God forbid, ReiserFS. Personally, I prefer ext4 because there is such a joke as a reserved space for root files, 5% is allocated. This thing is very useful and very often rescued when disk space quickly runs out. This reserve can be reduced and a lot of free space appears in the file system. In xfs, too, there is such a thing, but it is somehow cleverly buried and does not obviously work. And by default there is a very small reserve. So at the moment I like ext4 more.

PG Day : In conclusion, we will ask a couple of questions that were wishes from the audience. The first question about the sensational story with Gitlab. There was a lot of controversy about whether or not they were messy. What can you essentially say on this issue? How to avoid such problems and what did they do wrong?

AL : Actually, this is a very good and revealing story. Their database was handled by a jack of all trades. From this whole story, it seemed to me that he didn’t have enough knowledge how to work with PostgreSQL, despite the fact that Posgres has very large and extensive documentation. He found that there is a lag between the master and the replica, the so-called lag. And he decided to perezalit cue . It is not right. In fact, it was necessary to act quite differently. And the second point is inattention, the lack of established procedures. In serious companies there are always instructions on how and what to do in emergency situations. All switchovers are spelled out with a level of detail that does not imply any ambiguity. All commands are spelled out, all decisions, branches yes / no.

In the event of such a situation, a person, an engineer on duty or an on-duty administrator on duty, gets this instruction and does everything according to the instructions. The absence of these instructions played a cruel joke. So I had to restore, improvise on the go, do something there, write to Twitter. Everyone was interested in watching it. People had fun. However, for the fact that the person made a mistake, it seemed to me, no extremely harsh measures were taken there. On the one hand, this is good, because unfortunately everyone makes mistakes. And I also make mistakes. On the other hand, the main thing is to learn from your mistakes and not to repeat them.

PG Day : If I correctly made a generalizing conclusion from what you said, in your practice, problems are usually due to the fact that people do not have any instructions and a clear understanding of what to do or there are some other common rakes on which people jump? Maybe some kind of farewell that you want to give to people?

AL : In fact, I don’t remember any well-established or frequent rake. Situations are always different, it is impossible to write instructions for all situations. We must build on the risks that exist in the infrastructure or in the same database. There is a risk of data loss, there is a risk of losing access to the database. These risks and they can be calculated. You can write basic instructions on them and use them.

It is also very often that something is overlooked somewhere. The place ended on the disk, the backups stopped being removed, the replica fell off, the caching broke and that's it. Everything is simple here, you need to use monitoring and configure to monitor everything.

What can advise people? Be attentive to the details , look at which console you are in, try to document all the unpleasant and new stories that have happened to you - you will have a good knowledge base. From these stories you can always draw some conclusions so that these stories do not repeat in the future. The most important thing is to be attentive and, of course, to study what you work with. If you work with Postgres, learn Postgres, work with Ruby - learn ruby. Take time to self-education.

Subscribe to newsletters that are associated with Postgres, and in general with databases. Such mailings allow you to monitor trends, what is happening now in the field of DBMS. From the same newsletters, you can take links to useful blogs, and read them between newsletters. Still necessarily need fundamental knowledge related to the device DBMS. There are interesting books about the theory of device transaction engines, transaction logs and much more (Stonebraker, Weikum & Vossen, etc.)

We hope that the interview with Alexey was interesting for you. We are pleased to invite everyone to St. Petersburg to our PG Day'17 Russia conference, during which Alexei will conduct an intensive training Call of Postgres: Advanced Operations , which contains a wealth of information on how to configure PostgreSQL and hardware with the Linux operating system for optimal performance. Will consider the numerous components and their configuration. Separately, the methods of diagnosing and troubleshooting the operating system and Postgres itself will be studied.

See you at PG Day'17 Russia !

Source: https://habr.com/ru/post/328006/


All Articles