Preface: I had to study the structure of the Teradata database for work, and it turned out that there is almost no information on the Internet, especially in Russian. Therefore, I decided to gather in a pile all the available information.
The rapid increase in the volume of storage media and the cheapening of the cost of storing data has led to the emergence of methods capable of providing faster access to the necessary data β indices, storing data in a sorted form, and so on. These methods quite successfully cope with their task, however, the growing competition in the world makes us look for new, faster ways to access information. "Who owns the information, he owns the world." The main interest is the database with the traditional relational data model that meets the requirements of ACID (Atomicity, Consistency, Isolation, Durability - atomicity, consistency, isolation, reliability) and intended for Big Data analytics.
Teradata is a parallel
relational DBMS that runs on operating systems:
')
- MP-RAS UNIX
- Microsoft Windows 2000/2003 Server
- SuSE Linux
The variety of operating systems supported is one of the reasons why Teradata has an open architecture.
Teradata DBMS is a large database server that communicates with multiple clients via TCP / IP protocol or through a connection to the IBM Universalframe channel.
Companies choose Teradata DBMS for a variety of reasons:
- Support for large amounts of information - more than 400 TB in one area
- Support for modular extension from small databases (10GB) to large (100+ TB)
- Providing a parallel-aware optimizer that eliminates the need for complex settings to get a query
- Automatic data distribution eliminates complex indexing schemes and time-consuming reorganizations.
- The database is designed and built on a parallel architecture from the very beginning.
- Support for ad hoc queries that use the ANSI standard SQL and include SQL database management information (log files), which allows you to submit queries from other database management systems to Teradata
- Unified Management Point for Database Administration (Teradata Manager)
Teradata provides high-speed access to data through
MPP (Massive Parallel Processing) - a massively parallel architecture. Its peculiarity is that the memory is physically divided. Teradata offers Intel servers connected to a
BYNET private
messaging network. Teradata systems are offered with proprietary disk arrays for storing production databases of either LSI or EMC. More information about the
configuration of storage systems can be found in the company's blog.
Amp
The basic concept in the Teradata Database architecture is
AMP (Access Module Processor), a separate node / node containing and independently processing its data. That is, each AMP is busy processing and storing only its part of the database and is little dependent on other AMPs. In this, the Teradata Database is similar to Hadoop (a system for distributed computing). However, a massively parallel architecture with an improperly designed database due to the overload of network channels between AMPs can produce even worse results than a single-stream powerful database server, such as the original Oracle database server. For load balancing between AMPs and other administrative tasks, Teradata Manager, DBSConsole and Teradata Administrator are used. In particular, these tools allow you to set filters and priorities for user processes running on AMPs, or on the server as a whole.
The
Teradata architecture is described in more detail in the company's blog.
Teradata has a
query optimizer , which is based on statistical information about data.
Starting with the 14th version, Teradata has the ability to store data in the form of both rows and columns (horizontal and vertical partitioning).
Hybrid data storage is also described in the company's blog.
Data mart
Traditionally, data processing was divided into two categories: OLTP (On-line Transaction Processing) and DSS (Decision Support Systems). But for
analytical databases with a large amount of information, data processing is divided into
OLAP (On-line Analytical Processing) and
DM (Data Mining).
Type of | Description | Example | Number of rows available | Response time |
---|
OLTP | Work with small-sized transactions, but with a large flow, and the client needs a minimum response time from the system | Update your current account to display a deposit | Few | Seconds |
DSS | Decision support system for a complete and objective analysis of the subject activity | What were the monthly sales of shoes from retailer X? | Many (millions) | Seconds or minutes |
OLAP | Data processing technology, consisting in the preparation of total (aggregated) information based on large data arrays, structured according to a multidimensional principle | Show 10 best-selling products among all stores in 2005 | Many detailed rows or average number of summary lines | Seconds or minutes |
Data mining | Predictive data analysis | Which customers are most likely to respond to the action? | The average number of long detailed lines | Phase 1: minutes or hours
Phase 2: seconds or fraction of seconds |
Thus, Big Data is more convenient to process using so-called. data marts (Data Mart) - a data warehouse cut-off, which is an array of thematic, narrowly focused information, oriented, for example, to users of one working group.
The concept has several advantages:
- Analysts see and work only with the data they really need.
- The target database is as close to the end user as possible.
- Data marts usually contain thematic subsets of pre-aggregated data, they are easier to design and customize.
- High-powered computing is not required to implement data marts.
However, the concept of storefronts does not suggest ways to ensure the integrity and consistency of the stored data.
The Teradata database architecture eliminates the need to load and transform data marts, which makes the same data stores available for all user needs.
Sources:
[1]
Teradata company blog: Teradata - DBMS parallel from birth
[2]
Teradata Blog: Speed ββor Volume? Automation of storage systems management with heterogeneous characteristics
[3]
Teradata's blog: Statistics in Teradata DBMS
[4]
Teradata Blog: Constructive and Hybrid Storage of Teradata Database Entries
[5]
Are relational databases doomed?
[6]
Simple and accessible about analytical databases.
[7]
Data access speed: the battle for the future
[8]
Wikipedia
[9] Documentation in English in paper form.
Now you need to understand the concept of
Primary Index : how these indexes are put down and how they affect performance.
UPD
Next post: Row Distribution and Access in Teradata (Primary Index)