📜 ⬆️ ⬇️

Overview of index types Oracle, MySQL, PostgreSQL, MS SQL

In one of the comments there was a request to tell more about the indices, and since, in RuNet, there is practically no summary of the supported indices of various DBMSs, in this review I will look at which types of indices are supported in the most popular DBMS

B-tree


The B-Tree index family is the most commonly used type of index, organized as a balanced tree, of ordered keys. They are supported by almost all DBMS, both relational and non-relational, and for almost all types of data.



Since most probably know them well (or can read about them, for example, here ), the only thing that should perhaps be noted here is that this type of index is optimal for a set with a good distribution of values ​​and high power (cardinality -number of unique values).
')

Spatial indices


Currently, all DBMS data has spatial data types and functions for working with them, for Oracle it is a set of types and functions in the MDSYS schema, for PostgreSQL it is point, line, lseg, polygon, box, path, polygon, circle, in MySQL - geometry, point, linestring, polygon, multipoint, multilinestring, multipolygon, geometrycollection, MS SQL - Point, MultiPoint, LineString, MultiLineString, Polygon, MultiPolygon, GeometryCollection.
In the scheme of work of spatial queries, usually there are two stages or two stages of filtering. DBMS with weak spatial support work out only the first stage (coarse filtering, MySQL). As a rule, an approximate, approximate representation of objects is used at this stage. The most common type of approximation is the Minimum Bounding Rectangle (MBR) [100].
For spatial data types, there are special indexing methods based on R-trees (R-Tree index) and grids (Grid-based Spatial index).

Spatial grid

Spatial grid (spatial grid) index is a tree-like structure similar to a B-tree, but is used to provide access to spatial (Spatial) data, that is, to index multi-dimensional information, such as geographic data with two-dimensional coordinates (latitude and longitude). ). In this structure, the nodes of the tree are the cells of space. For example, for a two-dimensional space: first, the entire parent area will be divided into a grid of strictly defined resolution, then each grid cell in which the number of objects exceeds the maximum set of objects in the cell will be divided into the next level subgrid. This process will continue until the maximum nesting is reached (if set), or until everything is divided up to cells that do not exceed the maximum objects.


In the case of three-dimensional or multidimensional space, these will be rectangular parallelepipeds (cuboids) or parallelotopes.


Quadtree

Quadtree is a subspecies of the Grid-based Spatial index, in which there are always 4 descendants in the parent cell and the grid resolution varies depending on the nature or complexity of the data.



R-tree

R-Tree (Regions Tree) is also a tree-like data structure similar to the Spatial Grid, proposed in 1984 by Antonin Guttman. This data structure also divides the space into many hierarchically nested cells, but which, unlike the Spatial Grid, do not have to completely cover the parent cell and can intersect.
Various algorithms can be applied to splitting overflowed vertices, which causes the division of R-trees into subtypes: with quadratic and linear complexity (Guttman, of course, described with exponential complexity — Exhaustive Search, but naturally he is not used anywhere).
The quadratic subtype is divided into two rectangles with a minimum area covering all objects. Linear - in the division by maximum distance.



HASH


Hash indices were proposed by Arthur Fuller, and they do not preserve the values ​​themselves, but their hashes, thereby reducing the size (and, accordingly, their processing speed) of indices from large fields. Thus, when querying using HASH indexes, the hash from the desired value will not be compared with the field hashes, but with the hashes of the fields.
Due to the non-linearity of hash functions, this index cannot be sorted by value, which leads to the impossibility of using more / less and “is null” in comparisons. In addition, since hashes are not unique, collision resolution methods are used for matching hashes.

Bitmap


Bitmap index - the bit index method consists in creating separate bitmaps (sequence 0 and 1) for each possible value of the column, where each bit corresponds to a row with an indexed value, and its value equal to 1 means that the entry corresponding to the bit position contains the indexed value for this column or property.

EmpidFloor
oneMale
2Female
3Female
fourMale
fiveFemale


Bit cards
ValueStartthe endBit mask
MaleAddress of the first lineLast line address10010
FemaleAddress of the first lineLast line address01101

The main advantage of bit indices is that on large sets with low power and good clustering by their values, the index will be less than B * -tree. (Read more here or here )

Reverse index


Reverse index is also a B-tree index but with a reversed key, used mainly for monotonously increasing values ​​(for example, an auto-incrementing identifier) ​​in OLTP systems in order to remove competition for the last leaf block of the index, since due to the reversal of the value, two adjacent index entries fall into different index blocks. It cannot be used for range search.
Example:
Field in the table (bin)Reverse index key (bin)
0000000110,000,000
......
0000100110010000
0000101001010000
0000101111010000
As you can see, the value in the index changes much more than the value in the table itself, and therefore in the b-tree structure, they will fall into different blocks.

Inverted index


An inverted index is a full-text index that stores for each key token a sorted list of addresses of table entries that contain the given key.
oneMom washed the frame
2Dad washed the frame
3Dad was washing a car
fourMom polished car

In simplified form, it will look like this:
Mama1.4
Soapone
Rama1.2
Dad2.3
Polishedfour
Car3.4


Partial index


Partial index is an index built on the part of the table that satisfies a specific condition of the index itself. This index was created to reduce the size of the index.

Function-based index


The very flexible type of index is a functional index, that is, an index whose keys store the result of user-defined functions. Functional indexes are often built for fields whose values ​​are pre-processed before comparison in a SQL command. For example, when comparing string data with case-insensitive characters, the UPPER function is often used. Creating a functional index with the UPPER function improves the efficiency of such comparisons.
In addition, a functional index can help to implement any other missing index type of this DBMS (except, perhaps, a bit index, for example, Hash for Oracle)

Summary Table of Index Types


MysqlPostgreSQLMS SQLOracle
B-Tree indexthere isthere isthere isthere is
Spatial indexes supportedR-Tree with quadratic splittingRtree_GiST (linear split is used)4-level Grid-based spatial index (separate for geographic and geodetic data)R-Tree with quadratic partitioning; Quadtree
Hash indexOnly in tables of type Memorythere isNotNot
Bitmap indexNotthere isNotthere is
Reverse indexNotNotNotthere is
Inverted indexthere isthere isthere isthere is
Partial indexNotthere isthere isNot
Function based indexNotthere isthere isthere is

It is worth mentioning that in PostgreSQL GiST allows you to create an index based on R-Tree for any of your own data types. For this you need to implement all 7 functions of the R-Tree mechanism.

Additionally, you can read here:
Oracle Spatial User Guide and Reference
Spatial data in MS SQL
MS SQL: Spatial Indexing Overview
Hilbert r-tree
PostgreSQL spatial types
PostgreSQL Spatial Functions
Indexing spatial data in Microsoft SQL Server 2000 DBMS
Papadias D., Theodoridis T. Spatial Relations, Minimum Bounding Rectangles and Spatial Data Structures // Technical Report KDB-SLAB-TR-94-04,
Faloutsos C., Kamel I. Hilbert R-Tree: An Improved R-Tree Using Fractals // Department of CS, University of Maryland, Technical Research Report TR-93-19,
Wikipedia: Hilbert R-tree
External Memory Search Methods
Arthur Fuller. Intelligent database design using hash keys

Ps. Maybe I forgot to mention something, write on a personal or in a comment - I will add.

Source: https://habr.com/ru/post/102785/


All Articles