Hello to all!
Today I am pleased to announce the course “Hacking PostgreSQL” from 16 lessons, where we will explore together the features of the open database architecture and make changes at the source code level. The course will be held in Moscow, at the Postgres Professional site. Beginning of the course is scheduled for February 2016. Lectures will begin immediately after the February conference
pgconf.ru and will be held once a week in the evening. We will upload videos and lecture materials as they are processed.
The course is compiled from the personal experience of the developers of our company, materials from conferences, articles and thoughtful reading of the documentation and source codes. First of all, it is addressed to novice PostgreSQL core developers. But it will be interesting to DBA, which sometimes have to get into the code, and just everyone who is not indifferent to the architecture of a large system, who wants to learn “How does it really work?”
We specifically announce the course in advance to prepare lectures based on your comments. And also because there are prerequisites:
- SQL. Specific knowledge is not required, but you should be aware of how indices, transactions, and so on work on the part of SQL. Materials to choose from: PostgreSQL Tutorial and SQL Tutorial
- C. There will be a lot of code. If you just want to see, it’s enough to be able to read. But in order to perform practical tasks, it is better to refresh knowledge. For example, in this course Practical Programming in C
- Basic data structures and algorithms. You can read here and see the course here .
We want to create an open database of knowledge about the internal structure of the database using the example of PostgreSQL. We hope that heated discussions with administrators will reach a new level of mutual understanding, and users with developers will finally be able to speak the same language. And while preparing the course, we reread the code, correct minor issues and outdated comments and find interesting tasks that can be added to the TODO.
')
You can not grasp the immensity, especially in 16 classes. Although we tried very hard. As a result, the course program turned out this:
1. Architecture Overview
The first lecture is intended to give a general idea of PostgreSQL subsystems and their connections, as well as briefly define the terms that we will use in the following lectures.
2. PostgreSQL Community and Developer Tools
A digression about how the PostgreSQL community works, how international development is coordinated, what steps you need to go through to get your patch accepted, and some useful little things that will help a novice kernel developer. As well as an overview of the tools you need to know how to use.
3. Extensibility
In the third lecture, the creation of its own extension (contrib) is discussed in stages. This is the easiest and most natural way to join the development and add new functionality to PostgreSQL. In addition, the internal extensibility of the postgress will be considered on the example of adding a new data type.
4. Review Source Code
In this lecture, we will follow the path of performing various queries, from receiving the query text to issuing the result.
5. Features of the code
In order to understand the source code of the system, and even more to develop it, you need to get acquainted with the accepted agreements. This lecture focuses on the Datum data type, variable-length object handling, various macros, and calling conventions.
6. System directory
The system catalog contains metadata about all system objects. In addition to information about user tables, functions, and triggers, it also stores information about data types, operators, and methods for accessing them, and much more. The lecture provides an overview of the main catalog tables and the interface for interacting with them.
7. Physical data presentation
After this lecture, you will know how attributes are laid in a line, lines on a page, pages in a table, and tables in a database. And also how Postgres handles data alignment and storing large attributes. This will help to see some of the limitations of the architecture and to understand (and maybe come up with new) techniques for designing the optimal database schema.
8. Work with memory
For convenience and efficiency of working with memory instead of the standard C functions (malloc / free), PostgreSQL uses their analogs palloc / pfree. This lecture will discuss how memory contexts are organized, how to use them correctly in code, and what situations can lead to unexpectedly large memory consumption.
9. Shared memory and locks
In this chapter, you will learn how PostgreSQL’s memory manager is organized, how many different types of locks are used to make parallel transactions work correctly, and what actually changes the memory settings in postgresql.conf.
10. Nodes & Trees
Information about the SQL query in PostgreSQL is contained in structures of type Node. They are used as nodes of the query tree (Query tree), in parse trees (Parse tree) and the scheduler (Plan tree). In this lecture, we will look at the main types of nodes and analyze the operation algorithms of these trees.
11. PostgreSQL Core Patch
12. Debugging. Testing functionality and performance
In the lecture, we will talk about debugging and testing tools that can be used to explore PostgreSQL. And also about how to stop comparing warm to soft and start testing performance correctly.
13. Transactions. MVCC and VACUUM
14. WAL. Both recovery and replication
WAL, aka Write-Ahead Log, aka Advanced Write Log. In this lecture we will try to tell the most useful about its format, use and settings.
15. Indices
A story about how different types of indexes are arranged in PostgreSQL, reasoning about data structures, tips on how to efficiently use and maintain indexes. As well as an invitation to discuss with the developers about the prospects and new ideas for development.
16. Trends in the development of the DBMS in general and PostgreSQL in particular
The final lecture of this course, where we discuss the prospects for the development of PostgreSQL. Different types of clusters, column storage, in-memory structures, parallel query processing, hints, spatial data, and other interesting tasks that are too large for this course.
I will co-ordinate and read this course, Anastasia Lubennikova, PostgreSQL core developer.
Suggestions and suggestions leave in the comments.
UPD:
The course will be open and free. There may be restrictions only on the capacity of the audience. Therefore, sign up in advance so that we can estimate the actual number of applicants.
Sign up
here