Indices and Selectivity (PostgreSQL)

An index across a field in a database can potentially speed up a SELECT operation with a condition on this field, can speed up a query like this: ORDER BY is a LIMIT 20 field, but the index significantly slows down table changes, etc.

When is an index needed, when will it help and will be used for SELECTs? It all depends on the selectivity of the index, i.e. from the count of lines that we get if we set the condition:

  indexed_field = value

An excellent candidate for indexing is selectivity 1, i.e. a unique index (for example, id), when by the specified value we find a maximum of one record.
')
Consider as an example a table of users with fields of information about the region: country (country_id) and city (city_id). Well, when the selectivity is <5% (for example, the field city_id user). At the same time, PostgreSQL is smart, it considers not selectivity “in general” across the field, but selectivity in the form of a histogram for individual field values. Those. if we set a condition like

  country = Russia

then we get 10% of the records from the database, and if the condition

  country = Uruguay

then we get 2 entries, and this is what PostgreSQL understands. (Of course, here we propose that there are far fewer users from Uruguay on our server than there are users from Russia).

So, if the selectivity is bad (we get a lot of records), PostgreSQL will prefer to perform a full database scan without using an index. And this index only hinders.

PS Cross post from my blog

Source: https://habr.com/ru/post/36705/

All Articles

Indices and Selectivity (PostgreSQL)

More articles: