Good day! We are a team of system analysts at one of the data management divisions of Rostelecom. Our company has more than 300 heterogeneous data sources - such diversity is necessary to support the work of Rostelecom in all the numerous areas. We study data sources and, if necessary, partially unload into the storage loop.
Rover Curiosity on the surface. It also has many heterogeneous data sources. Picture borrowed from therahnuma.com.In this process, two subtasks are distinguished: the definition of a strategy for collecting data from source tables depending on their properties and the preparation of tables “receivers” of the data warehouse. For this, we use various GUI and reverse engineering tools. In addition, when collecting information, the system analyst begins to overgrow with a pool of auxiliary queries to the DBMS information tables (mainly Oracle). In this article I will share the “gentleman's set” of such scripts used by our team.
First, a little explanation to all the above scripts:
')
- Many scripts use xmlagg to aggregate strings, since listagg cannot handle too long strings resulting from concatenation.
- In all scripts except “Procedures, functions and packages”, target tables are specified via the filter table in the “with” block. The name of the scheme and the name of the table are filled
- Each script is accompanied by one or several usage scenarios, a description of the specification (result set), and a list of the system tables used (to assess the possibility of use on a specific database).
Script "Information about the tables"
Specification :
Column name
| Comment
|
SCHEMA_NAME
| Data Schema Name (OWNER)
|
TABLE_NAME
| Table name
|
COMMENTS
| Comment to the table
|
HEIGHT
| Number of rows in the table (approximately)
|
Widtth
| Number of columns
|
DATETIME_COLUMNS
| Columns with time data types and columns, based on the name, supposedly being timestamps (patterns -% period%,% date%,% time%)
|
AVG_ROW_LEN
| Average string length in bytes
|
PART_KEY
| Columns on which partitioning is carried out
|
SUBPART_KEY
| Columns on which subpartization is carried out
|
System tables used: all_tab_columns, all_tab_comments, all_tab_statistics, all_part_key_columns, all_subpart_key_columns.
The query is useful for defining a strategy for uploading data from the source system. If a primary key is built on the table in question, then you can organize the unloading with the subsequent allocation of the “increment” on it. If you have a time stamp — for example, in technical fields with information about data insertion or update — you can upload only changed / added records for a period of time. Information about the structure of partitions can be useful when creating a similar table- "receiver".
Request body:with filter (owner, table_name) as ( select 'SCHEMA_NAME_1', t.* from table( sys.odcivarchar2list( 'TABLE_NAME_1' , 'TABLE_NAME_2' ) ) t union all select owner , table_name from all_tables where owner = 'SCHEMA_NAME_2' ) select a.owner as schema_name , a.table_name , e.comments , b.height , c.width , d.datetime_columns , b.avg_row_len , p.part_key , s.subpart_key from filter a left join ( select owner , table_name , num_rows as height , avg_row_len from all_tab_statistics where object_type = 'TABLE' ) b on a.table_name = b.table_name and a.owner = b.owner left join ( select owner , table_name , count(1) as width from all_tab_columns group by owner , table_name ) c on a.table_name = c.table_name and a.owner = c.owner left join ( select owner , table_name , listagg( column_name || ' (' || data_type || ')' , ', ' ) within group (order by column_id) as datetime_columns from all_tab_columns where data_type = 'DATE' or data_type like 'TIMESTAMP%' or data_type like 'INTERVAL%' or lower(column_name) like '%period%' or lower(column_name) like '%date%' or lower(column_name) like '%time%' group by owner , table_name ) d on a.table_name = d.table_name and a.owner = d.owner left join ( select owner , table_name , comments from all_tab_comments where table_type = 'TABLE' ) e on a.table_name = e.table_name and a.owner = e.owner left join ( select owner , name as table_name , listagg( column_name , ', ' ) within group (order by column_position) as part_key from all_part_key_columns where object_type = 'TABLE' group by owner , name ) p on a.owner = p.owner and a.table_name = p.table_name left join ( select owner , name as table_name , listagg( column_name , ', ' ) within group (order by column_position) as subpart_key from all_subpart_key_columns where object_type = 'TABLE' group by owner , name ) s on a.owner = s.owner and a.table_name = s.table_name order by e.owner , e.table_name ;
Script "Partitions and Subpartitions"
Specification:
Column name
| Comment
|
SCHEMA_NAME
| Data Schema Name (OWNER)
|
TABLE_NAME
| Table name
|
PART_KEY
| Columns on which partitioning is carried out
|
PARTITION_NAME
| Partition Name
|
PARTITION_POSITION
| Part Number
|
PARTITION_HEIGHT
| Number of rows in the partition
|
SUBPART_KEY
| Columns on which subpartization is carried out
|
SUBPARTITION_NAME
| Subpartition name
|
SUBPARTITION_POSITION
| Subpartition number
|
SUBPARTITION_HEIGHT
| The number of lines in the subpartition
|
System tables used: all_tab_partitions, all_tab_subpartitions, all_part_key_columns, all_subpart_key_columns.
The script will be useful for obtaining characteristics (name, size) of partitions with their direct use as data sources.
Request body: with filter (owner, table_name) as ( select 'SCHEMA_NAME_1', t.* from table( sys.odcivarchar2list( 'TABLE_NAME_1' , 'TABLE_NAME_2' ) ) t union all select owner , table_name from all_tables where owner = 'SCHEMA_NAME_2' ) select f.owner as schema_name , f.table_name , p.part_key , pc.partition_name , pc.partition_position , pc.num_rows as partition_height , s.subpart_key , sc.subpartition_name , sc.subpartition_position , sc.num_rows as subpartition_height from filter f join ( select owner , name as table_name , listagg( column_name , ', ' ) within group (order by column_position) as part_key from all_part_key_columns where object_type = 'TABLE' group by owner , name ) p on f.owner = p.owner and f.table_name = p.table_name left join all_tab_partitions pc on p.table_name = pc.table_name and p.owner = pc.table_owner left join ( select owner , name as table_name , listagg( column_name , ', ' ) within group (order by column_position) as subpart_key from all_subpart_key_columns where object_type = 'TABLE' group by owner , name ) s on p.owner = s.owner and p.table_name = s.table_name left join all_tab_subpartitions sc on f.owner = sc.table_owner and f.table_name = sc.table_name and pc.partition_name = sc.partition_name order by f.owner , f.table_name ;
Script "Attribute composition of tables"
Specification:
Column name
| Comment
|
SCHEMA_NAME
| Data Schema Name (OWNER)
|
TABLE_NAME
| Table name
|
COLUMN_ID
| Column sequence number
|
COLUMN_NAME
| Column name
|
DATA_TYPE
| Data type
|
COLUMN_COMMENT
| Comment to column
|
SCALE
| Scale (for data type NUMBER)
|
PRECISION
| Accuracy (for data type NUMBER)
|
BYTE_LENGTH
| Field length in bytes
|
ENCODING
| Encoding for text data types (CHAR, VARCHAR2, NCHAR, NVARCHAR2)
|
CHAR_LENGTH
| Maximum string length for text data types (CHAR, VARCHAR2, NCHAR, NVARCHAR2)
|
NOT_NULL
| Flag "Required"
|
IS_PRIMARY
| Flag "Included in the primary key"
|
DEFAULT_VALUE
| Default value
|
COLUMN_IMPACT
| A list of columns of other tables that reference the column.
|
COLUMN_DEPEND
| A list of columns of other tables referenced by the column.
|
Used system tables: all_tables, all_constraints, all_cons_columns, all_tab_columns, all_col_comments, v $ nls_parameters.
This script will be useful for the preparation of tables "receivers" in the data warehouse, when you need detailed information about the table, its relationships with other tables, as well as the full attribute composition. Through the auxiliary table filter2, filtering of the tables for which references are searched for (from and to) is set. By default, tables are taken from all schemas except system ones.
Request body: with filter (owner, table_name) as ( select 'SCHEMA_NAME_1', t.* from table( sys.odcivarchar2list( 'TABLE_NAME_1' , 'TABLE_NAME_2' ) ) t union all select owner , table_name from all_tables where owner = 'SCHEMA_NAME_2' ) , filter2 (owner, table_name) as ( select owner, table_name from all_tables where owner not in ('MDSYS', 'CTXSYS', 'SYSTEM', 'XDB', 'SYS') ) , refs as ( select b.constraint_type as from_constraint_type , b.constraint_name as from_constraint_name , d.position as from_position , d.column_name as from_column_name , b.table_name as from_table_name , b.owner as from_owner , a.owner as to_owner , a.table_name as to_table_name , c.column_name as to_column_name , c.position as to_position , a.constraint_name as to_constraint_name , a.constraint_type as to_constraint_type from all_constraints a left join all_constraints b on a.r_constraint_name = b.constraint_name and a.r_owner = b.owner left join all_cons_columns c on a.constraint_name = c.constraint_name and a.table_name = c.table_name and a.owner = c.owner left join all_cons_columns d on b.constraint_name = d.constraint_name and b.table_name = d.table_name and b.owner = d.owner where a.constraint_type = 'R' and b.constraint_type in ('P', 'U') and c.position = d.position ) , depends as ( select rtrim( xmlagg( xmlelement( e , to_owner || '.' || to_table_name || '.' || to_column_name , ', ' ).extract('//text()') order by to_owner ).getclobval() , ', ' ) as val , from_owner as owner , from_table_name as table_name , from_column_name as column_name from refs where (to_owner, to_table_name) in (select * from filter2) group by from_table_name , from_column_name , from_owner ) , impacts as ( select rtrim( xmlagg( xmlelement( e , from_owner || '.' || from_table_name || '.' || from_column_name , ', ' ).extract('//text()') order by from_owner ).getclobval() , ', ' ) as val , to_owner as owner , to_table_name as table_name , to_column_name as column_name from refs where (from_owner, from_table_name) in (select * from filter2) group by to_table_name , to_column_name , to_owner ) select f.owner as schema_name , f.table_name , a.column_id , a.column_name , a.data_type , b.comments as column_comment , decode ( a.data_type , 'NUMBER', nvl(a.data_scale, 0) , '' ) as scale , decode ( a.data_type , 'NUMBER', nvl(a.data_precision, 38) , '' ) as precision , a.data_length as byte_length , case when a.data_type in ('CHAR', 'VARCHAR2', 'NCHAR', 'NVARCHAR2') then d.value end as encoding , case when a.data_type in ('CHAR', 'VARCHAR2', 'NCHAR', 'NVARCHAR2') then a.char_length
Script "Procedures, functions and packages"
Specification:Column name
| Comment
|
SCHEMA_NAME
| Data Schema Name (OWNER)
|
NAME
| Name of the procedure / function / package / package header
|
BODY
| Body
|
TYPE
| Type (PACKAGE BODY, PACKAGE, FUNCTION, PROCEDURE)
|
WRAPPED
| Flag "Body encoded or not (wrapped)"
|
Used system tables: all_source
When analyzing the source, there may be a task to study the data flows circulating in the system. It is almost impossible to do without the code base of packages, functions and procedures, especially with incomplete or missing documentation. For convenience, the listed objects can be represented as a table through a script. The result of the request using the console utility can be output to the stream and redirected by a simple handler (bash script) to files for further study by your favorite editor. In addition, various handlers can be “hung” on the output stream - beautifying, unwrapping, etc.
Request body: select t.owner as schema_name , t.name as name , xmlagg( xmlelement( e , t.text , '' ).extract('//text()') order by t.line asc ).getclobval() as body , f.wrapped , t.type as type from ( select owner, name, type , case when lower(text) like '%wrapped%' then 1 else 0 end as wrapped from all_source where type in ( 'PACKAGE BODY' , 'PACKAGE' , 'FUNCTION' , 'PROCEDURE' ) and line = 1 and owner not in ('MDSYS', 'CTXSYS', 'SYSTEM', 'XDB', 'SYS') ) f join all_source t on f.owner = t.owner and f.name = t.name and f.type = t.type group by t.owner , t.name , t.type , f.wrapped order by t.owner , t.name , t.type ;
Conclusion
The scripts described above help our system analysts to get rid of many routine actions to collect information about the database and focus on more creative things, such as the loading strategy and the structure of the “receiver” tables. I hope the scripts will be useful to you. It would be interesting to know how you automate these and similar tasks.