What is Teradata?

Preface: I had to study the structure of the Teradata database for work, and it turned out that there is almost no information on the Internet, especially in Russian. Therefore, I decided to gather in a pile all the available information.

The rapid increase in the volume of storage media and the cheapening of the cost of storing data has led to the emergence of methods capable of providing faster access to the necessary data — indices, storing data in a sorted form, and so on. These methods quite successfully cope with their task, however, the growing competition in the world makes us look for new, faster ways to access information. "Who owns the information, he owns the world." The main interest is the database with the traditional relational data model that meets the requirements of ACID (Atomicity, Consistency, Isolation, Durability - atomicity, consistency, isolation, reliability) and intended for Big Data analytics.

Teradata is a parallel relational DBMS that runs on operating systems:
')

MP-RAS UNIX
Microsoft Windows 2000/2003 Server
SuSE Linux

The variety of operating systems supported is one of the reasons why Teradata has an open architecture.

Teradata DBMS is a large database server that communicates with multiple clients via TCP / IP protocol or through a connection to the IBM Universalframe channel.

Companies choose Teradata DBMS for a variety of reasons:

Support for large amounts of information - more than 400 TB in one area
Support for modular extension from small databases (10GB) to large (100+ TB)
Providing a parallel-aware optimizer that eliminates the need for complex settings to get a query
Automatic data distribution eliminates complex indexing schemes and time-consuming reorganizations.
The database is designed and built on a parallel architecture from the very beginning.
Support for ad hoc queries that use the ANSI standard SQL and include SQL database management information (log files), which allows you to submit queries from other database management systems to Teradata
Unified Management Point for Database Administration (Teradata Manager)

Teradata provides high-speed access to data through MPP (Massive Parallel Processing) - a massively parallel architecture. Its peculiarity is that the memory is physically divided. Teradata offers Intel servers connected to a BYNET private messaging network. Teradata systems are offered with proprietary disk arrays for storing production databases of either LSI or EMC. More information about the configuration of storage systems can be found in the company's blog.

Amp

The basic concept in the Teradata Database architecture is AMP (Access Module Processor), a separate node / node containing and independently processing its data. That is, each AMP is busy processing and storing only its part of the database and is little dependent on other AMPs. In this, the Teradata Database is similar to Hadoop (a system for distributed computing). However, a massively parallel architecture with an improperly designed database due to the overload of network channels between AMPs can produce even worse results than a single-stream powerful database server, such as the original Oracle database server. For load balancing between AMPs and other administrative tasks, Teradata Manager, DBSConsole and Teradata Administrator are used. In particular, these tools allow you to set filters and priorities for user processes running on AMPs, or on the server as a whole.

The Teradata architecture is described in more detail in the company's blog.

Teradata has a query optimizer , which is based on statistical information about data.

Starting with the 14th version, Teradata has the ability to store data in the form of both rows and columns (horizontal and vertical partitioning). Hybrid data storage is also described in the company's blog.

Data mart

Traditionally, data processing was divided into two categories: OLTP (On-line Transaction Processing) and DSS (Decision Support Systems). But for analytical databases with a large amount of information, data processing is divided into OLAP (On-line Analytical Processing) and DM (Data Mining).

Type of	Description	Example	Number of rows available	Response time
OLTP	Work with small-sized transactions, but with a large flow, and the client needs a minimum response time from the system	Update your current account to display a deposit	Few	Seconds
DSS	Decision support system for a complete and objective analysis of the subject activity	What were the monthly sales of shoes from retailer X?	Many (millions)	Seconds or minutes
OLAP	Data processing technology, consisting in the preparation of total (aggregated) information based on large data arrays, structured according to a multidimensional principle	Show 10 best-selling products among all stores in 2005	Many detailed rows or average number of summary lines	Seconds or minutes
Data mining	Predictive data analysis	Which customers are most likely to respond to the action?	The average number of long detailed lines	Phase 1: minutes or hours Phase 2: seconds or fraction of seconds

Thus, Big Data is more convenient to process using so-called. data marts (Data Mart) - a data warehouse cut-off, which is an array of thematic, narrowly focused information, oriented, for example, to users of one working group.

The concept has several advantages:

Analysts see and work only with the data they really need.
The target database is as close to the end user as possible.
Data marts usually contain thematic subsets of pre-aggregated data, they are easier to design and customize.
High-powered computing is not required to implement data marts.

However, the concept of storefronts does not suggest ways to ensure the integrity and consistency of the stored data.

The Teradata database architecture eliminates the need to load and transform data marts, which makes the same data stores available for all user needs.

Sources:
[1] Teradata company blog: Teradata - DBMS parallel from birth
[2] Teradata Blog: Speed or Volume? Automation of storage systems management with heterogeneous characteristics
[3] Teradata's blog: Statistics in Teradata DBMS
[4] Teradata Blog: Constructive and Hybrid Storage of Teradata Database Entries
[5] Are relational databases doomed?
[6] Simple and accessible about analytical databases.
[7] Data access speed: the battle for the future
[8] Wikipedia
[9] Documentation in English in paper form.

Now you need to understand the concept of Primary Index : how these indexes are put down and how they affect performance.

UPD
Next post: Row Distribution and Access in Teradata (Primary Index)

Source: https://habr.com/ru/post/209078/

All Articles

What is Teradata?

Amp

Data mart

More articles: