Previous Post: What is Teradata?How does Teradata distribute strings?
- Teradata uses a hashing algorithm for randomly distributing table rows between AMPs (advantages: distribution is the same, regardless of the amount of data, and depends on the content of the row, not the demographics of the data)
- Primary Index determines whether the rows of the table are evenly or unevenly distributed between AMPs.
- A uniform distribution of the rows of the table leads to a uniform load distribution.
- Each AMP is responsible only for its own subset of rows of each table.
- Strings are placed in an irregular manner (advantages: support for maintaining order is not required, order is not dependent on any submitted query)
')
Primary Key (PK) vs. Primary Index (PI)
Primary Key (primary key) - this is the convention of the relational model, which uniquely identifies each line.
The Primary Index is a Teradata convention, which defines the distribution of rows and access.
A well-designed database contains tables in which PI is the same as PK, as well as tables in which PI is defined in columns other than PK and may affect access paths.
Primary Key (PK) | Primary Index (PI) |
---|
Logical data modeling concept | Mechanism for row allocation and access |
Teradata does not need a PK definition | The table must necessarily have one PI |
No limit on the number of columns | Can be from 1 to 64 columns |
PK is defined in a logical data model. | PI is defined when the table is created. |
Value must be unique | The value does not have to be unique |
Uniqueness identifies each line | Used to place a string in AMP |
Values should not change | Values are subject to change (updated) |
Cannot be null | May be NULL |
Not applicable to the access path | Determines the most efficient access path. |
Selected for logical correctness | Selected for the physical performance of the compound values |
There are two types of PIs:
UPI (Unique Primary Index) and
NUPI (Non-Unique Primary Index). When using UPI, lines are distributed evenly among AMPs, and when using NUPI, lines with the same index values belong to the same AMP-y.
Creating tables with UPI and NUPI:
CEATE TABLE Table1 ( Col1 INTEGER, Col2 INTEGER ) UNIQUE PRIMARY INDEX (Col1);
CEATE TABLE Table2 ( Col1 INTEGER, Col2 INTEGER ) PRIMARY INDEX (Col2);
Hash allocation of strings
The Primary Index value is passed to the hashing algorithm, which ensures uniform distribution of unique values among all AMPs. The algorithm produces a 32-bit hash value. The first 16 bits (Hash Bucket Number) are used as a pointer to a hash map (Hash Map). Hash values are calculated using a hashing algorithm. The hash card is uniquely configured for each system; it is an array that binds the DSW (statistics key) to a specific AMP. Two systems with the same number of AMPs will have the same hash card. Changing the number of AMPs in the system requires changes in the hash map.
NoPI Table
NoPI- table is a
table without Primary Index (feature of the 13th version of Teradata). In this case, new rows are always added to the end of the table and are never added to the middle of the hash sequence.
Rows will also be distributed between AMPs. A new random code will determine which AMPs will receive strings or groups of strings. Within AMP, rows are simply added to the end of the table. They will have a unique identifier, which increases the uniqueness of the value.
Benefits:
The table will reduce the offset in intermediate tables ETL (Extract, Transform, Load — Extract, transform, load) that do not have a Primary Index.
Downloads (FastLoad and TPump Array Insert) in the NoPI intermediate table are faster.
PPI and MLPPI
PPI (Partitioned Primary Index) is a partitioning mechanism that is used to improve performance for large tables when sending queries that specify a range limit.
Data distribution using PPI is based on PI: Primary Index -> Hash value -> AMP definition, which will get the string. With NPPI (No Partitioned Primary Index), the rows in the AMP are arranged in the order of the hash rows. When using PPI, strings are first arranged by partitions, and then in the order of the hash strings.
MLPPI (Multi-Level Partitioned Primary Index) allows for multipartitioning or the use of an uncompressed Join Index. In multilevel partitioning, each partitioning level is determined independently using the
RANGE_N or
CASE_N expressions .
The goal of partitioning (partitioning) is to reduce the load on the system if the data in the table is rarely or even never updated (for example, various logs).
PSJust today, we published an article
Physical Design of Storage Structures in the Teradata DBMS on the Teradata blog, so I will not write about the
Secondary Index .
I hope the company will continue the regular series of articles, and I will not have to translate the documentation from English.