📜 ⬆️ ⬇️

Questions about relational subd which are never enough time

Introduction


By the will of fate, I have been working in an enterprise for about 5 years, during which time a pool of questions about relational databases has accumulated, and these regularly pop up in my head before going to bed.

But to find out detailed answers to them is time-consuming, for they send multi-volume manuals, linguistics & statistics and thick books to a wonderful country.

I ask for help from the community in eradicating their own and others' ignorance.
Note: Answers to some of the questions were covered in this article , but not fully disclosed.
')
I will hide my empirical assumptions under cat.

Questions


  1. What happens in the subd at the size of the table + cache + indexes << RAM?

    Assumptions and new questions
    Logic and some experience suggest that if the base in the settings contains a size smaller than the RAM, and the system has enough memory to place it, the base must fit the entire RAM, but since ACID requires consistency and durability — it should be committed to disk, at least through the log subsystem. Is it really? If not so, is it possible to configure the subd so that reading is always carried out from RAM?

  2. If the size of the table >> RAM, but there is no index on it (or we don’t get into it), how does the memory manager determine how much space can be occupied by the selection from the table?

    Assumptions
    You can try to find the amount of space required statistically, by the distribution of values ​​within the columns (there is usually such data in the subdist statistics subsystem.).
    The amount of available memory - you can count on average load lately, and find the expected amount of free space.

  3. How and where is the sequential selection of table entries (and from where, memory / disk) for filtering? What kind of locks will occur (for DB with and with versioning of records)?

    Assumptions
    • Most likely, there is a facade of a single memory manager, which is able to pull the value from the cache / index, and if it is not there, take it from the disk.
    • Surely, it has a queue of processing requests for receiving rows. Surely, the order of the architecture: query optimizer, which determines where to select the value - table / index, manager, which determines where the index / table is in memory or on disk, cache - are there index / table rows with a given identifier, pull up the index memory / tables, running the desired part of the index / table through the RAM with a sample of the desired values.
    • Where - most likely, in a separate memory pool of the process, if there is no space - wasps or we map it to disk ourselves


  4. What happens if the size of a suitable sample >> RAM? Which databases are able to give data in the input sample stream, and which - only on the full readiness of the sample? Can this behavior be customized?

    Assumptions and new questions
    • The locks will, of course, depend on the settings of the required consistency of the database (wiki, Isolation level_transactions).
    • The question to which I still have no easy answer - can the lines change when reading a large volume, despite the flow of reading? Will this fall under the “dirty reading” situation? What happens if another transaction tries to change the string we are reading now? Is reading a transaction that can block other transactions? When versioning lines, everything is easier - we read lines with version = last committed at the time of the start of reading
    • Theoretically, it should be put on the disk by means of bd / os


  5. How does the query optimizer formally prove the equivalence of conditions in the ON clause for the JOIN query and the WHERE clause of this query? An example under a cat.

    Example and Assumptions
    • In pg, I have repeatedly noticed that there is no difference in performance and query plan

      select * from a inner join b on a.id = b.id where an > N and bn <N 
      and
       select * from (select * from a where an > N) as a inner join (select * from b where bn <N ) ab on a.id = b.id 

    • Even for more complex queries, where the conditions were somewhat “higher” in the query construction tree, pg somehow proved equivalence and chose not all data for join, but only a suitable one for the sample.
    • Those. There is a certain mechanism of propagation of the conditions of the sample on the query tree. MS-SQl 2008 behaved worse and showed a significant difference in the time of request. Why?
      What methods of logic / programming can prove the equivalence of on and where?
      When and where does it work?


  6. How does the index work for more than one field? How does the work with the b-tree index occur, if it does not fit entirely into RAM? Through the virtual memory os? Or is there a way to partially load the index?

    What is the question
    I can not understand how b-tree is organized when there are> 1 columns in the index and they are of different types. The logical assumption is that the key of the tree is a certain hash function of the columns, but then it is obviously necessary to observe the condition that
    H (a, b)> H (a) & H (a, b)> H (b) & H (a, b) <H (a +1, b) & H (a, b + 1) <H (a + 1)
    I guess that a perceptual hash with some “bits” can help with this, where the discharge corresponds to the column, but the length of the hash confuses me.
    Another option seems to me not to make a hash, but to explicitly compare the values ​​of the index fields when loading / unloading from it. But then it's not clear how it will work.

     select * from A where Ab > 1 Ac < 3 
    if the index is declared as
    <Aa, Ab, Ac>
    How are we going to bypass the tree without having restrictions on the first level?
    Will the scheduler use this index?
    Related questions - does the order of listing index fields in the WHERE clause matter?
    Will we be able to get into the index if some of the fields are listed in the ON clause of the subquery, and some in the WHERE of a more general query, i.e. Does the redistribution of conditions work together with the assessment hit / do not fall into the index?
    How can this be influenced?

    More specialized questions that are also hard to answer.

    1. Under what conditions is it worth doing a partitioned table?
    2. When are there too many partitions?
    3. What size should I think about sharding?
    4. When to normalize, and when to data schema?
    5. How long is the string and why it is for so many databases that it is delivered to a separate storage?
    6. Why in pg using CTE in queries is much faster than implementing a temporary table? Is it so? If so, are there any exceptions?

    Explanations


    I would be glad to find such answers myself, but most of the questions are drowned under floods, or hidden deep in the documentation / source of large projects. Responses from the comments will be attached to the relevant questions. What I can - I will try to find myself.

    Thank you for understanding.

Source: https://habr.com/ru/post/321388/


All Articles