📜 ⬆️ ⬇️

Apply the Check Knowledge Module (CKM) in projects based on Oracle Data Integrator

With this post, we open the cycle of materials devoted to the data validation module that is part of ETL (or ELT - as ORACLE is positioning) of the Oracle Data Integrator product. In our opinion, the module functional is undeservedly ignored in favor of more sophisticated and “intelligent” products of the Data Quality class. In this regard, we have a desire to look at CKM not as a kind of atavism, but as a holistic solution, allowing to provide basic control over the processed data.

For this we plan:

  1. tell about the types of checks included in the standard Oracle module and what settings need to be made to activate them;
  2. touch upon the features of implementation, the possibilities for expanding the standard module, the use of a substitution API, which is used to ensure the versatility of the functional being developed;
  3. for a specific example, consider the possibilities provided by Oracle Data Integrator Tools, and the option of transferring the DEV-> PROD settings using the topology;
  4. evaluate the workplace of the operator processing the errors detected by the CKM module.

As an introduction, it is worth noting that all modules in the ODI are proudly called the Knowledge Module, which apparently reflects the following facts:
')
  1. certain behavior is pre-wired to the module;
  2. the name of the module (Check, Loading, Integration, etc.) corresponds to the class of tasks solved by the module;
  3. Within one class, specific modules can be selected or developed to solve specific types of problems.

CKM-module refers to the so-called template type of modules, i.e. It involves (in addition to influencing the module’s own behavior through setting options) modifying the “body” or creating its own modules from the “zero” of this class.

In this article we will begin the story with standard checks that are implemented in the module supplied with ODI CKM. Consider how they are set and what functionality can be implemented using them.

Immediately it should be noted that the module can be used in two modes (STATIC_CONTROL and FLOW_CONTROL): control of the source / target data or validation of the data obtained as a result of integration mapping, before placing them in the target table, respectively.

So, the “standard” includes five types of “self-explanatory” checks:


These checks are set at the description level of the data model in ODI, but can be activated / deactivated in different places depending on the mode of operation of the module.

The first four types of checks are set at the Constarints level of a specific DataStore (DS — entity description — tables, files, etc. — in the ODI metadata repository) and use separate entries for each defined restriction.


In the illustration we see a set of checks set for two Data Store: DIM_COUNTRY and REF_CALENDAR. Please note that when creating under one of DS, both checks of the foreign key become visible under the other DS linked by the FK condition.

The last type of check, Mandatory, is set at the DS field level:


Let's consider what general and particular parameters are set for different types of checks.
1. The modes for which the considered data control (Static / Flow) can be used correspond to the CKM - STATIC_CONTROL / FLOW_CONTROL operating modes, respectively. The picture below is red.


The flags in the green box (only for PK / AK / CK checks) indicate the need for physical presence and activity of this restriction in the final system, respectively, if this possibility is implied. The same meaning is carried in the values ​​of the parameters Type = Database reference and the Activate on Database flag for limiting the FK type.


Or Database Condition value for CK limit.


In fact, the parameters responsible for the physical presence and activity of the restriction are only intended to reflect the desired / existing state of the final system - their installation / removal does not directly affect the final system. These parameters are put down when building a data model based on the metadata of the final system (reverse engineering). Either the corresponding definition of restriction is added to the object generation script in the final system, automatically created on the basis of the description of the data model — with an explicit request for creation.

2. Attributes participating in the constraint condition for PK / AK specify the fields that are part of the key:


for FK, the reference fields (on the left) and the corresponding key fields in the parent table (on the right):


If the FK restriction type is set as the Complex User Reference, then the table link condition is specified in the Expression field


For CK, a condition is set that will be checked (it can be quite complicated).


Here it should be said that in the expression for both FK and CK it is allowed to use functions from the wildcard API (about them in the next article). However, for CK (subject to Type = Oracle Data Integrator Condition), this condition will go through a built-in test (green jackdaw), but for FK it will not.

3. Individual settings.
A. For PK and AK.

The restriction type Primary Key or Alternate Key indicates the corresponding type of verification. The Not Unique Index option is used only as an indication of the need to create an additional index in order to increase productivity.


B. For FK

When determining the limits, the data model and the parent table (more precisely, DS) are selected from the available data — in the figure below red.


In the simple version (Type = User Reference / Database Reference) the connection condition sets the foreign key constraint, in the “advanced” version (Type = Complex User Reference) a more complex condition is allowed, as already described above (the option is highlighted in green in the figure).

C. For CK and NN, options for the available settings have already been covered when parsing other items

So, the limitations are written in the data model - what's next? Now they should be involved in the STATIC_CONTROL check. For FLOW_CONTROL, there is an additional level of control of the ACTIVATION of restrictions, which, when creating a mapping, is set in accordance with the settings available in the model, but can be redefined. Get to know him.

To do this, consider the Logical tab, which is selected when viewing a specific integration mapping.


It is necessary to select the resulting DS and go to its properties. Here, in the indicated red blocks, you can activate / deactivate existing checks.


But there are a number of nuances.

NN checks can be activated / deactivated regardless of whether the Mandatory and Flow flags are set in the data model. Those. these settings are completely independent of the model and completely override them. Thus, changes made at the model level will not affect the existing mapping and will be taken into account only when creating a new one.

PK / AK / FK / CK checks can be activated / deactivated, but if the Flow-flag is cleared in the model, then activating the restriction at the mapping level does not help - this check will not be performed in the FLOW_CONTROL mode regardless of the value specified at the mapping level. The reverse works - the check can be disabled at the mapping level.

There is a feature of the behavior of ODI Studio 12c (version 12.1.3.0.0). When the Flow flag for PK / AK / FK / CK restrictions at the model level is changed, this fact is not automatically reflected in the existing mapping (the word does not appear / does not disappear in the Constarints window opposite the corresponding restriction). This will happen only when re-selecting the flag value in the interface. Therefore, in order to avoid misunderstandings in the behavior of the module, taking into account the previous nuance, when deactivating the constraints in the model, it is necessary to manually update all the associated mappings.

And the last thing that needs to be said in this article for completeness of the topic touched upon is how to use CKM in the checks. For STATIC_CONTROL mode, the module must be specified in the data model settings.


This will allow analyzing the “purity” of data in any of the existing DS, for example, by selecting the corresponding context menu item on a specific DS or by clicking the Datastore Static Control button on the Definition tab when viewing DS.


To activate the STATIC_CONTROL / FLOW_CONTROL mode at the mapping level, you will need to specify the corresponding CKM_STATIC / CKM_FLOW directive in the IKM code of the module connected to the mapping.


And also on the Physical tab in the mapping, specify the CKM module itself and make sure that the FLOW_CONTROL option is activated when the IKM is connected (highlighted in green).


The article was prepared by Alexei Polev, architect of the Jet Infosystems Department of Applied Financial Systems.

Source: https://habr.com/ru/post/312190/


All Articles