📜 ⬆️ ⬇️

Universal Soldier: Groovy Transformer in DataStage

The ETL capabilities of IBM DataStage cover a fairly wide range of requirements that arise in data integration tasks, but, sooner or later, there is a need to extend the functionality by implementing C Parallel Routines or creating Java classes that are later used in Java Transformer or java client. The rather limited capabilities of the built-in Basic language have long been outdated and cannot be regarded as a serious help (for example, it is impossible to use XML structures, or another example - try writing MD5 hashing using Basic. This is possible, but it will take considerable time to develop and debug ).
Anyway, I would like to have a fairly flexible tool that allows you to work with a data stream that does not require recompilation of its source codes and which could be used in the DataStage Client editor. My colleague and close friend was asked to develop a Groovy Transformer. About him and will be discussed in this article.

Why groovy? Because this language is quite flexible and has all the features of Java, as it is an add-on to this language, but in addition it offers the following advantages to developers:


The main idea of ​​Groovy Transformer is to use Groovy code in Java Transformer. Since the code can be written directly in this stage, you can decide for yourself what to do - execute the Groovy code that is stored in the file, comes from the job parameters, or which you write yourself.
So we need to learn how to create a Java Transformer. Those who already know how to do this can skip this section. But I will try briefly, since the documentation for this part is written in sufficient detail.
So, to create a Java transformer, we need to create a class that inherits from the Stage class:
 import com.ascentialsoftware.jds.* ; class MyJavaTransformer extends Stage{ } 

And it is necessary to implement the three most frequently used methods: initialize (), process () and terminate ().
The initialize () method is executed before the page processes the stream and can contain declarations of objects that you intend to use throughout the life of the transformer.
The process () method is executed for each line of the input stream and must contain the logic of your processing.
The terminate () method is executed at the end of the transformer’s existence and it can contain actions for deleting temporary objects (yes, I know that there are no destructors in Java, any garbage you used: files, tables, whatever it is).
Note for parallel mode transformer: DataStage runs a separate Java machine for each node. In other words, if you have four nodes, then DataStage will launch four JVMs. Since the virtual machines are isolated, you do not have acceptable ways to exchange data between the threads running in each of them.
')
Now we are ready to create a template for our Java-transformer:
 import com.ascentialsoftware.jds.*; public class MyJavaTransformer extends Stage { public void initialize() { trace("Init"); } public void terminate() { trace("Terminate"); } public int process() { return 0; } } 

To read the rows entering and leaving the transformer, you can use the Row object and two methods: readRow () to access the values ​​of the input stream and writeRow () to write to the output.
The Row object also allows you to get the metadata of each column and allows you to get the values ​​of these columns. The following example demonstrates how to replace the values ​​of all columns that are of type VarChar with the value “Hello from the Java”; all other columns are “pushed” without any further changes:

 public int process() { Row inputRow=readRow() ; if (inputRow == null) //     return OUTPUT_STATUS_END_OF_DATA; Row outputRow=createOutputRow(); for (int i=0;i<inputRow.getColumnCount();i++) { Object column=inputRow.getValueAsSQLTyped(i); if (column instanceof java.lang.String) outputRow.setValueAsSQLTyped(i, “Hello from Java”); else outputRow.setValueAsSQLTyped(i, column); } writeRow(outputRow); } 


Note : To compile the transformer class, do not forget to import the tr4j.jar library into the class path or into your IDE.
Now we can formulate the requirements for our Groovy Transformer.

Groovy Transformer is a JavaTransformer that compiles Groovy code on the fly. It contains syntactic sugar to facilitate the routine operations that have to be performed when working with the Stage class.

So, our transformer should:


In accordance with these requirements, Groovy Transformer was created, which you can download here: http://geckelberryfinn.ru/fr/GroovyTransformer.html . (Caution! This Java Transformer is also written in Groovy =), there will be problems with decompiling).

Groovy Transformer predetermines the following objects:
An objectDescriptionExample
GtransformerObject. Link to this class Stage. Contains all the methods and attributes of this class.GTransformer.createOutputRow ()
OutputMatchingHashMap. Contains matching column names and their indexes.OutputMatching.get (k);
OutputMatching.ID;
OutputMatching.LIBL;
MetadataHashMap. Contains information about the input flow method columns.MetaData.ID.Description;
MetaData.ID.Derivation;
MetaData.ID.SQLType:
MetaData.ID.DataElementName;
OutputMetaDataHashMap. Contains information about the method output streamOutputMetaData.ID.Description;
OutputMetaData.ID.Derivation;
OutputMetaData.ID.SQLType:
OutputMetaData.ID.DataElementName;
InputColumnsHashMap. Contains all input columnsInputColumns.ID;
InputColumns.LIBL;
OutputRowsList <, HashMap>. List of lines of the output stream. You can use this object when the number of lines leaving more than the number of incoming lines.HashMap curRow = new HashMap ();
outputRows [0] = curRow;
outputRows [0] .ID = 0;
outputRows [0] .LIBL = "First item";


Thus, there are two ways to create an output stream:
  1. Fill in the list OutputRows;
  2. Call the createOutputRow () method and then the writeRow of the GTransformer object.


Which way to use depends on the specific situation.

To start using Groovy Transformer in your projects, it would be nice to fill in the properties of the Java Transformer steage:

Below I will give a few examples of using Groovy Transformer:



Some useful links:
  1. Groovy Reference: http://groovy.codehaus.org/Documentation
  2. Java Transformer Example: http://www.ibm.com/developerworks/data/library/techarticle/dm-1106etljob/index.html

Source: https://habr.com/ru/post/163615/


All Articles