Becoming a PostgreSQL Contributor

In this article I would like to talk about how the process of developing PostgreSQL looks through the eyes of one of the contributors to this very PostgreSQL. I started developing this DBMS in December 2015, when I got a job at Postgres Professional. That is, not so long ago. So, the memories of the moments that at first seemed not quite obvious to me are still fresh. I would like to outline them so that new people coming to our team, as well as all those who wish to try themselves in the role of the developer of an open relational DBMS, will be easier. I will talk about how the PostgreSQL development process looks, what tools I use in my daily work, how to make patches, and so on. Interested please follow under cat.

Set of tools

The question that excites the minds of millions - which IDE or text editor to use? :) Practice shows that you can develop PostgreSQL in just about anything. Some of my colleagues use Sublime Text, some prefer Vim, some Emacs, there are also KDevelop and Visual Studio Code users. At first I personally used CLion quite successfully, now I switched to Vim + ctags. In general, the main thing is that the editor has syntax highlighting, a transition to the definition, maybe some simple things like renaming variables and spelling. Some clever automatic refactoring is unlikely to be needed. The fact is that a patch with the result of such refactorings is unlikely to be accepted so easily.

The second no less exciting question is which OS or Linux distribution to choose? In our company, many developers use Ubuntu. There are also MacOS users. Under Windows, it seems, no one is sitting - for developing for this platform they usually launch a virtual machine. There is one Arch Linux user. I have personally used Ubuntu for a long time, but recently hit my head and switched to FreeBSD. In general, any * nix system should come up.

PostgreSQL is successfully compiled by GCC, CLang and Visual Studio, possibly by some other compiler (Intel C ++ Compiler?). Moreover, the community seeks to maintain code compatibility with all these compilers. So you can use any compiler. You can also use your favorite debugger, be it GDB, LLDB, something built into your IDE or some kind of WinDbg.
')
The PostgreSQL code lives in Git . In addition to the official repository, there is still a mirror on GitHub , but this is purely a mirror. It is pointless to open issues there and send there pullrequests. During patch development, nobody cares what version control system you use. But the patch is usually sent as an output from the git diff command.

In the first approximation, like, I forgot nothing. From time to time I also use perf, tcpdump, strace / truss, dtrace, rr, lcov, various static analyzers and other tools. But the need for them arises rather as an exception. The main development tools are a text editor, git, compiler, debugger and, of course, the brain. Yes, and another email client. But I will tell about it below.

Build, run tests and so on

PostgreSQL currently uses Autotools. Autotools itself is not a very nice thing. In addition, not designed for Windows. Therefore, for building PostgreSQL, a special set of Perl scripts is provided for this platform, which is somewhat crutch. My colleague Yury Zhuravlev is trying to push a patch that translates PostgreSQL to CMake . But everything is not easy there, since the current system of PostgreSQL extensions is strongly tied to Autotools.

All projects using Autotools are built in approximately the same way:

./configure --prefix=... make -j4 -s make check make install

For a quick local deployment of PostgreSQL, I use such a set of scripts , many of which are shared with me by Stas Kelvich .

The subtle point that all novice contributors in PostgreSQL have flown in without exception - if you made a change to the .h file, do not forget to run make clean. By default, if you change the .h file, its dependent .c files are not reassembled. If you do not know, you can observe a wide range of interesting magic effects :)

The idea for the first patch, and how else can you help the project

Often a person searching for ideas for a patch is sent to the TODO list . In my opinion, this is rather harmful advice, for a variety of reasons. First, this list is not always up to date. Secondly, there are points about which no one knows exactly how to do them correctly, and therefore it was decided to simply add a point to the TODO, perhaps sometime an epiphany would come. Finally, thirdly, most of the tasks from this list are quite complex. I would advise starting with something simpler.

The easiest way is to search for typos in the code and documentation. There are really a lot of them. This happens for the reason that before the merge of the proposed patches, committers often rewrite them a little, quite a bit. The result is a completely new patch that no one read, hence the typos. You can simply follow new commits and send 1-2 patches every week. Correction of comments to the code is difficult to break something, so your patch will be gladly accepted.

It so happens that some pieces of code can be a bit refactored. This is also a fairly simple change. We make the code more beautiful and correct, run tests, if nothing is broken - we offer a patch.

Bug fixes. The pgsql-bugs @ newsletter regularly bugs bugs (usually minor). Usually fixing a bug is a freebie. We write a test that reproduces the bug. We rewrite the code so that the test no longer falls. Helmet patch

Optimization. Also a freebie - the code should do the same, only faster. We write a benchmark that reproduces the performance problem, rewrite the code so that it works faster, patch the helmet.

Improved documentation and comments. For example, you are trying to understand how the code works, but you do not understand. Looks like you have found a place where code comments can be improved!

You can often find what to patch by assembling a project with some unusual compiler (for example, a very old or very new version of GCC) on an unusual platform (ARM, PowerPC, ...) under an unusual operating system (NetBSD, OpenIndiana). Tests are usually not sypyatsya, but a pair of Warnings when compiling can slip. It also often helps to get some static analyzer out of code.

If you have no idea for your patch, you can significantly help the project by making a code review and / or testing someone else's patch. Programmers, as a rule, love to write code, but they don’t really like to review and test it. Therefore, there are really not enough reviewers in the PostgreSQL community. By the way, the reviewer is pretty simple. You need to make sure that the patch is applied, the code then compiles and passes the tests, and also that the task that the author has set for himself is solved. If it is not clear to you how to verify this, the author may not have described this well enough This is an occasion to ask the author a question in the appropriate thread and transfer the patch to the status waiting on author. And if at the same time you are also able to read the code and give adequate advice on renaming variables and splitting procedures into several, then you just do not have the price! About code review, issuing patches on commitfest and various states of patches will be discussed later.

About mailing lists and blogs

All communication between PostgreSQL developers takes place on the pgsql-hackers @ mailing list. It also makes sense to subscribe to pgsql-committers @ . There are notifications about the last merge in the master, sometimes a discussion of a specific commit is made. The traffic in these two mailing lists is not that big, this is not LKML for you. It’s quite realistic to read them from your main mailbox without any filters (although I’m reading not all threads in a row). I personally receive them all on a working e-mail.

It may still make sense to subscribe to pgsql-general @ (general questions) and the already mentioned pgsql-bugs @ (bug reports). But strictly speaking, this is not required for development.

Regarding the choice of mail client. In principle, any one will do. Many use Thunderbird. I sat for a long time on Claws Mail, and now I crawled on Mutt . I saw one of my colleagues using GMail.

It is a good idea not to send an HTML letter to the list. The text of the letter in width is limited to 72 characters. It is clear that you can only use English. The use of attachments, in contrast to the same LKML, is not prohibited. Heavy attachments are better to upload somewhere, and not send directly to the mailing list.

In the PostgreSQL community, as far as I know, there is no code of conduct. But this does not negate the need to be polite, not to use sarcasm, never get personal, and so on. Emails, especially in English, are often somewhat dry. Therefore, a good idea would be to use more words in the text such as please, thank you, and so on. I personally try to start any letter with words like “Thank you everyone for great comments!” And finish with something like “As always, please.” Try it and you will be surprised how much friendlier the community will become.

Perhaps it would be worthwhile to say a few words about the main actors in the ezine, such as Tom Lane, Simon Riggs, Robert Haas, Andres Freund, Alvaro Herrera, Bruce Momjian, and others. But the problem is that there are quite a few actors, and it is difficult to say in advance who will be interested in your patch. Therefore, I’ll just say that it’s a good idea at first to read the signatures of the people who answer you, look at which domains have their e-mail address, search for their names in git log or in Google after all.

By the way, some people from the PostgreSQL community write blogs (which can be found thanks to Google), to which it is not inconvenient to subscribe. I am currently subscribed to the following PostgreSQL related RSS feeds:

 # PostgreSQL http://postgresmen.ru/news.xml http://planet.postgresql.org/rss20.xml http://habrahabr.ru/rss/company/postgrespro/blog/ http://www.postgrespro.ru/rss http://www.postgresql.org/news.rss http://postgresweekly.com/rss/1ijl6aaa http://postgres-edu.blogspot.com/feeds/posts/default http://feeds.feedburner.com/depesz http://rhaas.blogspot.com/feeds/posts/default http://amitkapila16.blogspot.com/feeds/posts/default http://obartunov.livejournal.com/data/rss

Note that the list includes PostgreSQL Planet , which aggregates many blogs that are not listed.

How to send a patch

In general, before starting work on a large patch, it makes sense to write to pgsql-hackers @ a letter-proposal describing what you want to do, how, and why. It may turn out that nobody needs it. Or vice versa, that this is so necessary, that in the past 5 years, several solutions have been proposed that you do not know about, and which you should first get acquainted with. Well, or you can just give a couple of tips on implementation, where you should see which boundary cases to take into account, and so on. PostgreSQL developers are busy people who have plenty of business to do, so don't be afraid that someone stole your ingenious idea. Rather, they will tell you that this is unlikely to work and will provide an opportunity to prove the opposite.

About the design code. ANSI C is used in PostgreSQL, so forget about C11, C ++ or Rust right away. The pgindent utility is used to format the code. Instructions for building it can be found in the PostgreSQL sources, in the src / tools / pgindent / README file . Before creating a patch, always run the code through pgindent, otherwise no one will even watch it. (But make sure that pgindent does not make changes where you have not changed anything! In this case, it may be easier to format the code manually.) For the rest, there are no particularly strict rules. Just look how the code is designed in the area of the place where you are stuck, and try to write the same way.

When the patch is ready, send it to pgsql-hackers @, specifying the [PATCH] label and a brief description in the subject. In the body of the letter, tell us what problem the patch solves, and how it does it. Read the mailing list archive to see how it usually looks. If the patch is small, for example, corrects a couple of typos, it can be taken immediately and without any special questions. In more complex cases, the patch must be sent to the nearest commit fest :

PostgreSQL commitfest

A commit fest is the local name of the sprint. One commitfest lasts one month. For example, the September commitfest is now open. All new patches are added to it. At the beginning of September, consideration of patches from the September commit fest will begin, and all new patches will be added in November (there is no commit fest in October, bugs are fixed for a month, and so on). This continues until March, only 4 commitments - in September, November, January and March. Then comes kodfriz, bugs are fixed, alpha and beta releases are formed.

Patches on commitfest come in different states. They all have talking names. Needs review means that the patch requires a revision. Waiting on author means that some actions are required by the author of the patch. Ready for committer means that the patch has passed the code and there are no more questions for it. One of the committers can familiarize themselves with it and either hold it back or return it to the author for revision. Well, and so on.

Be patient. If nobody reacts to your patch, it does not mean that nobody needs it. Just now everyone is busy with other patches. If your patch is in the commit fest and does not hang in Waiting on author, no one will forget about it, do not worry. If the reviewer or committer answered you, carefully read the answer, make the appropriate changes to the patch and send it a new version. To argue with reviewers or committers, in my personal experience, the occupation is very ungrateful. Faster to fix the code and send the corrected patch. Moreover, it is often then that you understand that the reviewer or the committer, in general, was right, but you are not. However, some of my colleagues have a different experience, and on the contrary, they believe that one should always argue.

While you are waiting for a reaction to your patch, a good idea would be to make someone else's patch yourself. There is such an unspoken rule in the PostgreSQL community - if you send a patch for a patch, and you don’t review anyone yourself, you’ll very quickly stop revising it. Moreover, the sooner other patches on the commitfest are accepted or rejected, the faster the turn will reach yours, the more time you will have to make changes before the commitfest closes.

Conclusion

Additional materials for self-study:

Video course "Hacking PostgreSQL" by Anastasia Lubennikova . A wonderful course on PostgreSQL internals. Available videos and slides.
Book Database System Implementation . As it says, this is exactly how PostgreSQL works.
Basics of debugging with GDB . There you will also find links to articles about debugging using LLDB, using the great RR tool, and more.
How to profile code using perf , bcc / eBPF and other tools. In the articles you will also find links to materials on DTrace and SystemTap.
Valgrind tutorials and static analyzers for C / C ++ . These tools help to find various kinds of errors in the code, it is extremely useful to be able to use them.
Our company permanently hires . The work is interesting, although somewhat specific. Having become accustomed to weekly sprints with weekly rolling out of a new code, for a long time it was not easy for me to reorganize.

That's all I wanted to talk about today. If you have questions, I will be happy to answer them in the comments.

Continued: Contributing to PostgreSQL: examples of real patches, part 1 of N

Source: https://habr.com/ru/post/308442/

All Articles