📜 ⬆️ ⬇️

Writing data to Teradata using DML

If you go right and straight, you will not get far ...
(Little Prince, Antoine de Saint-Exupery)


Recently, a colleague approached me with a request to somehow help with Teradata. She is now actively implementing and the first step of this implementation is to download data on a daily basis. You have to pour a lot and as soon as possible. I was asked to find some alternative ways to load data into Teradata, which would not be very dependent on the resources allocated to the user. In the course of this work, I had to become more closely acquainted with the .NET Data Provider for Teradata. In the course of our acquaintance, some curious details were found out, the knowledge of which, in my opinion, can be very useful. Since not everyone knows about Teradata, I will begin with a brief description of it.


What is Teradata


From the point of view of a programmer who downloads and gets data from it:
Teradata is a large data repository that people are trying to optimize with using a software and hardware architecture. The architecture focuses on concurrency.

From Wikipedia :
Teradata is a massively parallel architecture. Its technology consists of hardware, software, database, and consulting. It can be recalled and analyzed.
')

Data loading in Teradata


Downloading / Uploading large amounts of data to Teradata is suggested using
Teradata Load and Unload Utilities . There are currently the following utilities:

Our company uses mainly Teradata FastLoad . As ETL we use the product of the company
Informatica .

Compared with DML, the load / unload utilities in Teradata require a deeper immersion into the Teradata theory. In particular:

From here, the method of recording data using the Insert DML command is still relevant. DML is more common, less demanding of settings and supported by Teradata. I decided to check how effectively this command can be used. For the purpose of the experiment, a test project was created, the purpose of which was to understand the principles of operation of the Terdata provider. First of all, I tried to solve the problem in the forehead - read one line from Oracle and then insert it into Teradata, but this method turned out to be very slow. In order to find the culprit, I upgraded the program - now she read from Oracle, creating a stream for each line that inserted data into Teradata. For the rapidly growing number of threads, it became clear that the main culprit for the low speed was Insert operations in Teradata. Realizing that inserting a single line takes a lot of time, I tried to insert several lines at once, separating ';' insert commands. The speed has increased significantly. Increasing the number of lines inserted at one time I came to Exception:

[.NET Data Provider for Teradata][100056] Total size of parameters is greater than the max Data parcel size. 

Having rummaged in source codes, I found the place where this exception occurred:

 if (bodyLength > this._maxParamRowSpace || bodyLength - num > this._sessionContext[WpSessionContext.SessionLimits.MaxDataParcelSize]) throw UtlTrace.TraceException((Exception) new TdException("InvalidDataPclSize", new object[0])); 

I didn’t manage to calculate the exact size of Data parcel, as the numbers, which in theory were sort of true, were broken about the practice, when I simply changed the number of fields and their type in the Insert operation. But from the source, it is clear that an exception appears before accessing Teradata, inside the provider, which allows the loop to increase the number of Insert operations until the error occurs, since there is no waiting time from Teradata.

 do { numParamsInExec = _oneExec(ref com, numParamsInExec, ref err); } while (err == Const.ERR_100056); 

Thus, we determine the maximum number of Insert operations at a time. Next, perform the insertion of data packets, in the case of Exception 100056, reduce the size of the package. This solution significantly increased the speed of writing data to Teradata, without using special utilities. From here, this method can be used if you are limited in resources or to record relatively small amounts of data in Teradata.

Source: https://habr.com/ru/post/271735/


All Articles