⬆️ ⬇️

What is Teradata?

Preface: I had to study the structure of the Teradata database for work, and it turned out that there is almost no information on the Internet, especially in Russian. Therefore, I decided to gather in a pile all the available information.



The rapid increase in the volume of storage media and the cheapening of the cost of storing data has led to the emergence of methods capable of providing faster access to the necessary data β€” indices, storing data in a sorted form, and so on. These methods quite successfully cope with their task, however, the growing competition in the world makes us look for new, faster ways to access information. "Who owns the information, he owns the world." The main interest is the database with the traditional relational data model that meets the requirements of ACID (Atomicity, Consistency, Isolation, Durability - atomicity, consistency, isolation, reliability) and intended for Big Data analytics.



Teradata is a parallel relational DBMS that runs on operating systems:

')



The variety of operating systems supported is one of the reasons why Teradata has an open architecture.



Teradata DBMS is a large database server that communicates with multiple clients via TCP / IP protocol or through a connection to the IBM Universalframe channel.



Companies choose Teradata DBMS for a variety of reasons:







Teradata provides high-speed access to data through MPP (Massive Parallel Processing) - a massively parallel architecture. Its peculiarity is that the memory is physically divided. Teradata offers Intel servers connected to a BYNET private messaging network. Teradata systems are offered with proprietary disk arrays for storing production databases of either LSI or EMC. More information about the configuration of storage systems can be found in the company's blog.



Amp


The basic concept in the Teradata Database architecture is AMP (Access Module Processor), a separate node / node containing and independently processing its data. That is, each AMP is busy processing and storing only its part of the database and is little dependent on other AMPs. In this, the Teradata Database is similar to Hadoop (a system for distributed computing). However, a massively parallel architecture with an improperly designed database due to the overload of network channels between AMPs can produce even worse results than a single-stream powerful database server, such as the original Oracle database server. For load balancing between AMPs and other administrative tasks, Teradata Manager, DBSConsole and Teradata Administrator are used. In particular, these tools allow you to set filters and priorities for user processes running on AMPs, or on the server as a whole.



The Teradata architecture is described in more detail in the company's blog.



Teradata has a query optimizer , which is based on statistical information about data.



Starting with the 14th version, Teradata has the ability to store data in the form of both rows and columns (horizontal and vertical partitioning). Hybrid data storage is also described in the company's blog.



Data mart


Traditionally, data processing was divided into two categories: OLTP (On-line Transaction Processing) and DSS (Decision Support Systems). But for analytical databases with a large amount of information, data processing is divided into OLAP (On-line Analytical Processing) and DM (Data Mining).



Type ofDescriptionExampleNumber of rows availableResponse time
OLTPWork with small-sized transactions, but with a large flow, and the client needs a minimum response time from the systemUpdate your current account to display a depositFewSeconds
DSSDecision support system for a complete and objective analysis of the subject activityWhat were the monthly sales of shoes from retailer X?Many (millions)Seconds or minutes
OLAPData processing technology, consisting in the preparation of total (aggregated) information based on large data arrays, structured according to a multidimensional principleShow 10 best-selling products among all stores in 2005Many detailed rows or average number of summary linesSeconds or minutes
Data miningPredictive data analysisWhich customers are most likely to respond to the action?The average number of long detailed linesPhase 1: minutes or hours

Phase 2: seconds or fraction of seconds


Thus, Big Data is more convenient to process using so-called. data marts (Data Mart) - a data warehouse cut-off, which is an array of thematic, narrowly focused information, oriented, for example, to users of one working group.



The concept has several advantages:





However, the concept of storefronts does not suggest ways to ensure the integrity and consistency of the stored data.



The Teradata database architecture eliminates the need to load and transform data marts, which makes the same data stores available for all user needs.



Sources:

[1] Teradata company blog: Teradata - DBMS parallel from birth

[2] Teradata Blog: Speed ​​or Volume? Automation of storage systems management with heterogeneous characteristics

[3] Teradata's blog: Statistics in Teradata DBMS

[4] Teradata Blog: Constructive and Hybrid Storage of Teradata Database Entries

[5] Are relational databases doomed?

[6] Simple and accessible about analytical databases.

[7] Data access speed: the battle for the future

[8] Wikipedia

[9] Documentation in English in paper form.



Now you need to understand the concept of Primary Index : how these indexes are put down and how they affect performance.



UPD

Next post: Row Distribution and Access in Teradata (Primary Index)

Source: https://habr.com/ru/post/209078/



All Articles