It is no secret that encrypted parameters (i.e., having the type Encrypted) used in IBM DataStage in versions prior to 8.7 are very easy to decrypt. These encrypted settings are often used to transfer passwords needed to connect to databases.
When building corporate ODS (and in some cases even in the case of data warehousing), it makes sense to create universal jobs - the so-called generics, which are completely configurable from the outside and do not contain information specific to each table, and therefore they can be used for many ETL processes. This is especially necessary when extracting data from source databases (Extraction). In this case, you need to store passwords for each data source in configuration files. And you have to bend, under the security policies of various enterprises, to pretend that this is a reliable encryption algorithm and to store passwords to corporate data in an encrypted DataStage.
But problems arise if you want to pass such parameters to the job. What problems and how to solve them, and I will write in this article.
And what is the problem?
Suppose you have a configuration file in which you describe your ETL process. It does not matter in what form it is stored. We, for example, use XML. In this config, you want to save the database password in an encrypted form, for example:
<export> <parameters> <parameter name="SQL" handleQuotes="Y"> <value><![CDATA[select * from STAGING.TABLE]]></value> </parameter> <parameter name="DB" value="SAMPLEDB"/> <parameter name="USER" value="USER"/> <parameter name="PASSWORD" value="L<I@@9V8M=;M07GILIJLBK96BLN"/> </parameters> </export>
You read the configuration, extract the necessary parameters to the generic. Well, if you try to transfer this password as an Encrypted parameter, DataStage will regard it as an unencrypted password and re-encrypt it. Moreover, it does not matter how you transmit it: in the sequencer through JobActivity or through the Basic DSSetParam function.
DSXchange and other StackOverflow contain some information about how to do this. But all this is somehow very mediocre. Variants like using external means of encryption / decryption and subsequent transfer of parameters in an open (String) form will not suit us, since the passwords will glow in the DataStage Director log (we remember that we are silent and keep the secret of the Polichinel regarding the reliability of the internal algorithm).
Possible solutions to the problem
- Use DataStage C API;
- Create a parameter file and use it with dsjob -run -paramfile;
- Decrypt password before sending it;
- Create a ParameterSet for each data source.
')
In short: none of these methods work. Well, or we are not satisfied. And that's why.
- Indeed, the API contains the DSPARAM structure, described as follows.
typedef struct _DSPARAM { int paramType; union { char *pString; char *pEncrypt; int pInt; float pFloat; char *pPath; char *pListValue; char *pDate; char *pTime; } paramValue; } DSPARAM;
which contains a pointer to the encrypted parameter char *pEncrypt
. The paramType
field must contain the parameter type, in this case DSJ_PARAMTYPE_ENCRYPTED
.
I have not tried this method. The fact is that, in my opinion, this is too unnecessarily costly a way to just start the job , in addition, you will have to implement all the logic of working with the job: starting, transferring all parameters, tracking its status and returning the status to the sequencer with reverse propagation of the crash in which case (Error Handler does not catch the exception in this case). The readability of the ETL process will fall and support such a project will already be possible only to fairly skilled peppers (yes, not everyone can hire seniors who have knowledge C but also knowledge of the DataStage). In addition, the client does not always give you the right to write the directory Server / PXEngine / lib (Server / PXEngine / user_lib) where you will need to put the compiled object.
Summarizing: it seems that this locomotive will fly, but it does not always make sense to try - This option does not work. At all. DataStage also encrypts the parameters again as it does in the case of DSSetParam. And besides, start of a job out of a sequencer a thing unpleasant enough from the point of view of support of the decision. Mark and this option.
- Yes, we can easily and easily decrypt passwords and then also encrypt them if necessary. But IBM has changed the encryption algorithm in versions starting from 8.7, changing it to a more robust AES. Those. if we migrate our solution (believe me, this will happen sooner or later), it will cease to work in new versions of the DataStage. Mark this option as well.
- The first thing that comes to mind: it's ugly. The second thing that comes: it is inconvenient. In a combat environment, you will have to release a new ParameterSet each time you add the next data source. Well, in the third - it will not work, because we can not dynamically change the name of the parameter set when starting the job.
Solution to the problem
I found only one way to solve this problem. Maybe there is another, more obvious solution. But I don't know about him.
I noticed in the ParameterSets configuration dialog on the Values tab

I have never used this tab before and I dare to suggest that very few people used it and in general know why it is needed. In this tab, you can specify the name of the text file in which the values of the parameter set you created will be stored.
This file is stored in the directory
${PROJECT_DIR}/ParameterSets/__/
I could not believe that the Encrypted settings will be stored in this file in the clear. And if this is not the case, DataStage will not re-encrypt them. Checking the hypothesis.

Fine! If we now replace the contents of this file with other parameters (substituting the encrypted password we need) and test the operation of the job with this set of parameters, we will see that everything works as it should.
Now, in order to pass parameters for several independent instances of a single job (if it is Multiple Instance), you will need to perform the following steps:
- Create a ParameterSet for with the required fields. We do not need to create a lot of sets. Only one. One set can use multiple files for the values of its fields.
- Specify any file name in the Values tab of the ParameterSets configuration. Anyone. This is for the DataStage to create a directory for the parameter.
- In the job parameters add this parameter set

- Before you run the job with an encrypted parameter, create a process that adds (and overwrites, if necessary) a file with a unique name for each job instance. Let me remind you that the file must be written to the directory
${PROJECT_DIR}/ParameterSets/__/
- To start the job you need to specify the file name as the parameter value of your ParameterSet

Note that when you start the job, DataStage marks the applied parameters as From value file.
Instead of conclusion
The above method was not tested by us in versions> 8.5, but in theory it should work, since we did not use anything supernatural here. The values of the parameters themselves are better declared at the project level in the DataStage Administrator and received during the execution. In DSParams, all Encrypted settings are also stored in encrypted form, so all the above considerations apply to this case. For example, we use this method of configuring our processes:
<export> <parameters> <parameter name="SQL" handleQuotes="Y"> <value><![CDATA[select count(*) from STAGING.TABLE]]></value> </parameter> <parameter name="PASSWORD"> <value><![CDATA[${SOURCE_PASSWORD}]]></value> </parameter> <parameter name="DB"> <value><![CDATA[${SOURCE_DB}]]></value> </parameter> <parameter name="USER"> <value><![CDATA[${SOURCE_USER}]]></value> </parameter> </parameters> </export>
where SOURCE_ * are project variables.