Useful scripts when migrating from Oracle to PostgreSQL

During the migration from Oracle to PostgreSQL using ora2pg, the question arose of a mismatch of data types between different databases. By default, not all columns are converted correctly, and the lack of an Oracle Boolean creates ambiguity at all - some of the columns must be transferred as numbers, and some as logical values. At the same time, hibernate knows all about data types and can create a reference schema.

The final migration process was as follows: creating the structure of the tables via ora2pg, correcting the structure according to the reference scheme, transferring data, converting blob and Boolean, adding functions not present in PostgreSQL (nvl, nvl2, regexp_substr), creating the remaining structure — indexes, view, and others.

Under the cut, the sql scripts accumulated during the conversion for semi-automatic migration.

Training

As a utility for converting data, ora2pg was used. The process of using is very well described in the article .
')
Create a project in ora2pg, set up a project, and generate a schema.

$./export_schema.sh

Create the 'ora_schema' schema and tables for the file './schema/tables/table.sql' in PostgreSQL
Switch Hibernate to create mode and create another reference scheme 'hb_schema'. In the case of using the view in the project, the number of tables in different schemas will not converge. Hibernate will generate full tables instead of view, and this should be taken into account.

Fix column types

We look in what columns it is necessary to make changes

 select hb.table_name as table , hb.column_name as column, hb.data_type as hb_type, ora.data_type as ora_type from information_schema.columns hb join information_schema.columns ora on hb.table_name = ora.table_name and hb.column_name = ora.column_name where hb.table_schema = 'hb_schema' and ora.table_schema= 'ora_schema' and ora.data_type != hb.data_type ORDER BY hb.table_name, hb.column_name;

Substitutions for simple cases

 select 'ALTER TABLE ora_schema.' || ora.table_name || ' ALTER COLUMN ' || ora.column_name || ' TYPE ' || hb.data_type || case when hb.data_type = 'numeric' then '('|| hb.numeric_precision ||','|| hb.numeric_scale||')' else '' end || '; ' from information_schema.columns as hb INNER JOIN information_schema.columns as ora on hb.table_name = ora.table_name and hb.column_name = ora.column_name where hb.table_schema = 'hb_schema' and ora.table_schema= 'ora_schema' and ora.data_type != hb.data_type and (hb.data_type = 'integer' or hb.data_type = 'numeric' or hb.data_type = 'bigint' or hb.data_type = 'date') ORDER BY hb.table_name, hb.column_name;

We transfer data

Before transferring the data, we check that no type matches remain only in the boolean and bytea (oid) columns, and all the columns with numerical values have the correct accuracy and are not 'double precision' in order to avoid random rounding.

By default, the copy process goes into one stream. Change the configuration to the desired number

 JOBS 4 ORACLE_COPIES 4

Start the process of copying data

 ora2pg -t COPY -o data.sql -b ./data -c ./config/ora2pg.conf

We process boolean

They need to be removed if there is a default value, change the type and return the default value

 --ALTER TABLE mytabe ALTER COLUMN mycolumn DROP DEFAULT; --ALTER TABLE mytabe ALTER mycolumn TYPE bool USING CASE WHEN mycolumn is null then null when mycolumn=0 THEN FALSE ELSE TRUE END; --ALTER TABLE mytabe ALTER COLUMN mycolumn SET DEFAULT FALSE; select case when ora.column_default is not null then 'ALTER TABLE ora_schema.'|| ora.table_name || ' ALTER COLUMN ' || ora.column_name || ' DROP DEFAULT; ' else '' end || 'ALTER TABLE ora_schema.'|| ora.table_name || ' ALTER ' || ora.column_name || ' TYPE bool USING CASE WHEN ' || ora.column_name ||' is null then null WHEN ' || ora.column_name ||'=0 THEN FALSE ELSE TRUE END;' || case when cast(ora.column_default as NUMERIC) = 0 then 'ALTER TABLE ora_schema.'|| ora.table_name || ' ALTER COLUMN ' || ora.column_name || ' SET DEFAULT FALSE' when cast(ora.column_default as NUMERIC) = 1 then 'ALTER TABLE ora_schema.'|| ora.table_name || ' ALTER COLUMN ' || ora.column_name || ' SET DEFAULT TRUE' else '' end from information_schema.columns as hb INNER JOIN information_schema.columns as ora on hb.table_name = ora.table_name where hb.table_schema = 'hb_schema' and ora.table_schema= 'ora_schema' and hb.column_name = ora.column_name and hb.udt_name != ora.udt_name and hb.data_type = 'boolean' ORDER BY hb.table_name, hb.column_name;

Processed by oid (bytea)

Create a procedure for converting bytea to oid

 CREATE OR REPLACE FUNCTION ora_schema.bytea_to_oid(p_blob bytea) RETURNS oid AS $BODY$ declare v_oid oid; v_int integer; begin if octet_length(p_blob)=0 then v_oid:=null; else select lo_create(0) into v_oid; select lo_open(v_oid, x'20000'::int | x'40000'::int) into v_int; select lowrite (0, p_blob) into v_int; end if; return v_oid; END;$BODY$ LANGUAGE plpgsql VOLATILE COST 100; ALTER FUNCTION ora_schema.bytea_to_oid(bytea) OWNER TO postgres;

Create a temporary column.

 select 'ALTER TABLE ora_schema.' || ora.table_name || ' ADD COLUMN ' || hb.column_name || '_oid oid; ' from information_schema.columns as hb JOIN information_schema.columns as ora on hb.table_name = ora.table_name and hb.column_name = ora.column_name where hb.table_schema = 'hb_schema' and ora.table_schema= 'ora_schema' and hb.udt_name != ora.udt_name and hb.data_type = 'oid' ORDER BY hb.table_name, hb.column_name;

We transfer data

 select 'update ' || ora.table_name || ' set ' || hb.column_name || '_oid = bytea_to_oid(' || hb.column_name || ');' from information_schema.columns as hb JOIN information_schema.columns as ora on hb.table_name = ora.table_name and hb.column_name = ora.column_name where hb.table_schema = 'hb_schema' and ora.table_schema= 'ora_schema' and hb.udt_name != ora.udt_name and hb.data_type = 'oid' ORDER BY hb.table_name, hb.column_name;

Remove the old column

 select 'ALTER TABLE ora_schema.' || ora.table_name || ' DROP COLUMN ' || hb.column_name || '; ' from information_schema.columns as hb INNER JOIN information_schema.columns as ora on hb.table_name = ora.table_name and hb.column_name = ora.column_name where hb.table_schema = 'hb_schema' and ora.table_schema= 'ora_schema' and hb.udt_name != ora.udt_name and hb.data_type = 'oid' ORDER BY hb.table_name, hb.column_name;

Rename temporary column

 select 'ALTER TABLE ora_schema.' || ora.table_name || ' RENAME COLUMN ' || hb.column_name ||'_oid to '|| hb.column_name||'; ' from information_schema.columns as hb INNER JOIN information_schema.columns as ora on hb.table_name = ora.table_name and hb.column_name ||'_oid' = ora.column_name where hb.table_schema = 'hb_schema' and ora.table_schema= 'ora_schema' and hb.data_type = 'oid' ORDER BY hb.table_name, hb.column_name;

Add missing functions in PostgreSQL

So that there is no need to rewrite the code, simply create those that are missing in PostgreSQL, but which are in Oracle and are used in the project.

last_day (date)

 create function last_day(dt date) returns date LANGUAGE SQL AS $$ select cast (date_trunc('month',dt)+interval '1 month' - interval '1 day' as date) $$;

nvl (date, date)

 create function nvl(var1 date, var2 date) returns date LANGUAGE plpgsql AS $$ begin return case when var1 is null then var2 else var1 end; end; $$;

nvl (integer, integer)

 create function nvl(var1 integer, var2 integer) returns integer LANGUAGE plpgsql AS $$ begin return case when var1 is null then var2 else var1 end; end; $$;

nvl (numeric, numeric)

 create function nvl(var1 numeric, var2 numeric) returns numeric LANGUAGE plpgsql AS $$ begin return case when var1 is null then var2 else var1 end; end; $$;

nvl (text, text)

 create function nvl(var1 text, var2 text) returns text LANGUAGE plpgsql AS $$ begin return case when var1 is null then var2 else var1 end; end; $$;

nvl (timestamp with time zone, timestamp with time zone)

 create function nvl(var1 timestamp with time zone, var2 timestamp with time zone) returns timestamp with time zone LANGUAGE plpgsql AS $$ begin return case when var1 is null then var2 else var1 end; end; $$;

nvl (timestamp, timestamp)

 create function nvl(var1 timestamp without time zone, var2 timestamp without time zone) returns timestamp without time zone LANGUAGE plpgsql AS $$ begin return case when var1 is null then var2 else var1 end; end; $$;

nvl (varchar, varchar)

 create function nvl(var1 character varying, var2 character varying) returns character varying LANGUAGE plpgsql AS $$ begin return case when var1 is null then var2 else var1 end; end; $$;

nvl2 (date, date, date)

 create function nvl2(var1 date, var2 date, var3 date) returns date LANGUAGE plpgsql AS $$ begin return case when var1 is not null then var2 else var3 end; end; $$;

nvl2 (integer, integer, integer)

 create function nvl2(var1 integer, var2 integer, var3 integer) returns integer LANGUAGE plpgsql AS $$ begin return case when var1 is not null then var2 else var3 end; end; $$;

nvl2 (numeric, numeric, numeric)

 create function nvl2(var1 numeric, var2 numeric, var3 numeric) returns numeric LANGUAGE plpgsql AS $$ begin return case when var1 is not null then var2 else var3 end; end; $$;

nvl2 (text, text, text)

 create function nvl2(var1 text, var2 text, var3 text) returns text LANGUAGE plpgsql AS $$ begin return case when var1 is not null then var2 else var3 end; end; $$;

nvl (timestamp with time zone, timestamp with time zone)

 create function nvl2(var1 timestamp with time zone, var2 timestamp with time zone, var3 timestamp with time zone) returns timestamp with time zone LANGUAGE plpgsql AS $$ begin return case when var1 is not null then var2 else var3 end; end; $$;

nvl2 (timestamp, timestamp, timestamp)

 create function nvl2(var1 timestamp without time zone, var2 timestamp without time zone, var3 timestamp without time zone) returns timestamp without time zone LANGUAGE plpgsql AS $$ begin return case when var1 is not null then var2 else var3 end; end; $$;

nvl2 (varchar, varchar, varchar)

 create function nvl2(var1 character varying, var2 character varying, var3 character varying) returns character varying LANGUAGE plpgsql AS $$ begin return case when var1 is not null then var2 else var3 end; end; $$;

regexp_substr (text, text)

 create function regexp_substr(str text, pattern text) returns text LANGUAGE SQL AS $$ SELECT (regexp_matches(str, pattern))[1] $$;

regexp_substr (varchar, varchar)

 create function regexp_substr(str character varying, pattern character varying) returns text LANGUAGE SQL AS $$ SELECT (regexp_matches(str, pattern))[1] $$;

trunc (date, varchar)

 create function trunc(dt date, formatstr character varying) returns date LANGUAGE plpgsql AS $$ begin return date_trunc(formatstr,dt); end; $$;

trunc (timestamp with time zone, varchar)

 create function trunc(dt timestamp with time zone, formatstr character varying) returns date LANGUAGE plpgsql AS $$ begin return date_trunc(formatstr,dt); end; $$;

trunc (timestamp, varchar)

 create function trunc(dt timestamp without time zone, formatstr character varying) returns date LANGUAGE plpgsql AS $$ begin return date_trunc(formatstr,dt); end; $$;

Text fields

In Oracle, there is no unlimited text field; Lob is used instead. PostgreSQL has a special type - text. To be able to use both bases, the text field annotations should be as follows.

 @Column(name = "script", nullable = true) @Type(type = "org.hibernate.type.MaterializedClobType") public String scriptText;

For Postgres, we also use a self-written dialect that knows about the added functions.

 public class PostgresDialect extends PostgreSQL9Dialect { public PostgresDialect() { super(); this.registerFunction("nvl", new StandardSQLFunction("nvl")); } public SqlTypeDescriptor getSqlTypeDescriptorOverride(int sqlCode) { Object descriptor; switch (sqlCode) { case 2005: descriptor = LongVarcharTypeDescriptor.INSTANCE; break; default: descriptor = super.getSqlTypeDescriptorOverride(sqlCode); } return (SqlTypeDescriptor) descriptor; } }

Sequence

Oracle and Postgress have different syntax nextval from Sequence.
Oracle

 SELECT my_seq.nextval;

Postgres

 select nextval('my_seq');

We lead to a general view. To do this, we create functions in Postgres and Oracle and rewrite them everywhere to use this function.
Oracle

 create or replace function seq_nextval(p_sequence_name varchar) return integer as l_statement varchar(4000); l_value integer; begin l_statement := 'select '||upper(p_sequence_name)||'.nextval from dual'; execute immediate l_statement into l_value; return l_value; end;

Postgres

 create function seq_nextval(p_sequence_name text) returns bigint as $$ select nextval(p_sequence_name); $$ language sql;

Using

 select seq_nextval('my_seq')

Using the Postgres nextval ('my_seq') function directly will not work, because although Oracle will allow you to create this function, it will not allow you to select values.

Transformers.ALIAS_TO_ENTITY_MAP

If you use Transformers.ALIAS_TO_ENTITY_MAP, then you must specify the return types. Oracle will default keys to uppercase, PotgresSql to lowercase, and the mismatch of keys is corrected only manually.

Sysdate

Sysdate is not in PotgresSql, and current_timestamp works in both. Replace with him

substring

Unlike java, in databases, indices start from one and correctly specify substring ('str', 1, 2) to get the first two characters, not substring ('str', 0, 2). But Oracle allows the second option. It is necessary to fix and not to use the 0 index

Temporary tables

In PostgreSQL, unlike Oracle, temporary tables are created every time and live during a session. In Oracle, only the contents of temporary tables live in any framework, and the tables themselves are permanently created.
From this stem the following problems:
1) When creating each session, you will have to create a temporary table. (Now it can be simplified using the ability of the application server to perform the initialization sql block when creating a new session (connection pool → advanced → Init SQL)
2) Validation of entities in hibernate is performed within the framework of the configured schema (if you do not specify a schema, validation will be performed according to all available schemas). Since in Postgre, temporary tables are created in separate schemas, validation will fail.
(You can get around by creating a real table with the same structure. With native queries, work will be done with a temporary table, not with a regular one).
3) It is possible to work with such a temporary table only through native queries, since Hibernate adds the name of the schema to all generated queries (either all the work with temporary tables should be done through native queries, or to abandon temporary tables and use the usual one with data delimitation using a unique key. Both methods involve rewriting the functionality).

To continue working with temporary tables in Postgress, as in Oracle, we will use table functions and create a view based on a table function. Details here

The function of creating temporary tables

create_permanent_temp_table

 create or replace function create_permanent_temp_table( p_table_name varchar, p_schema varchar default null) returns void as $$ declare -- https://github.com/yallie/pg_global_temp_tables v_table_name varchar := p_table_name || '$tmp'; v_trigger_name varchar := p_table_name || '$iud'; v_final_statement text; v_table_statement text; -- create temporary table... v_all_column_list text; -- id, name, ... v_new_column_list text; -- new.id, new.name, ... v_assignment_list text; -- id = new.id, name = new.name, ... v_cols_types_list text; -- id bigint, name varchar, ... v_old_column_list text; -- id = old.id, name = old.name, ... v_old_pkey_column text; -- id = old.id begin -- check if the temporary table exists if not exists(select 1 from pg_class where relname = p_table_name and relpersistence = 't') then raise exception 'Temporary table % does not exist. %', p_table_name, 'Create an ordinary temp ' || 'table first, then use create_permanent_temp_table function to convert it to a permanent one.' using errcode = 'UTMP1'; end if; -- make sure that the schema is defined if p_schema is null or p_schema = '' then p_schema := current_schema; end if; -- generate the temporary table statement with pkey as ( select cc.conrelid, format(E', constraint %I primary key(%s)', cc.conname, string_agg(a.attname, ', ' order by array_position(cc.conkey, a.attnum))) pkey from pg_catalog.pg_constraint cc join pg_catalog.pg_class c on c.oid = cc.conrelid join pg_catalog.pg_attribute a on a.attrelid = cc.conrelid and a.attnum = any(cc.conkey) where cc.contype = 'p' group by cc.conrelid, cc.conname ) select format(E'\tcreate temporary table if not exists %I\n\t(\n%s%s\n\t)\n\ton commit drop;', v_table_name, string_agg( format(E'\t\t%I %s%s', a.attname, pg_catalog.format_type(a.atttypid, a.atttypmod), case when a.attnotnull then ' not null' else '' end ), E',\n' order by a.attnum ), (select pkey from pkey where pkey.conrelid = c.oid)) as sql into v_table_statement from pg_catalog.pg_class c join pg_catalog.pg_attribute a on a.attrelid = c.oid and a.attnum > 0 join pg_catalog.pg_type t on a.atttypid = t.oid where c.relname = p_table_name and c.relpersistence = 't' group by c.oid, c.relname; -- generate the lists of columns select string_agg(a.attname, ', '), string_agg(format('new.%I', a.attname), ', '), string_agg(format('%I = new.%I', a.attname, a.attname), ', '), string_agg(format('%I %s', a.attname, pg_catalog.format_type(a.atttypid, a.atttypmod)), ', '), string_agg(format('%I = old.%I', a.attname, a.attname), ' and ') into v_all_column_list, v_new_column_list, v_assignment_list, v_cols_types_list, v_old_column_list from pg_catalog.pg_class c join pg_catalog.pg_attribute a on a.attrelid = c.oid and a.attnum > 0 join pg_catalog.pg_type t on a.atttypid = t.oid where c.relname = p_table_name and c.relpersistence = 't'; -- generate the list of primary key columns select string_agg(format('%I = old.%I', a.attname, a.attname), ' and ' order by array_position(cc.conkey, a.attnum)) into v_old_pkey_column from pg_catalog.pg_constraint cc join pg_catalog.pg_class c on c.oid = cc.conrelid join pg_catalog.pg_attribute a on a.attrelid = cc.conrelid and a.attnum = any(cc.conkey) where cc.contype = 'p' and c.relname = p_table_name and c.relpersistence = 't' group by cc.conrelid, cc.conname; -- if primary key is defined, use the primary key columns if length(v_old_pkey_column) > 0 then v_old_column_list := v_old_pkey_column; end if; -- generate the view function v_final_statement := format(E'-- rename the original table to avoid the conflict alter table %I rename to %I; -- the function to select from the temporary table create or replace function %I.%I() returns table(%s) as $x$ begin -- generated by pg_global_temp_tables -- create table statement %s return query select * from %I; end; $x$ language plpgsql set client_min_messages to error;\n', p_table_name, v_table_name, p_schema, p_table_name, v_cols_types_list, v_table_statement, v_table_name); -- generate the view v_final_statement := v_final_statement || format(E' create or replace view %I.%I as select * from %I.%I();\n', p_schema, p_table_name, p_schema, p_table_name); -- generate the trigger function v_final_statement := v_final_statement || format(E' create or replace function %I.%I() returns trigger as $x$ begin -- generated by pg_global_temp_tables -- create temporary table %s -- handle the trigger operation if lower(tg_op) = \'insert\' then insert into %I(%s) values (%s); return new; elsif lower(tg_op) = \'update\' then update %I set %s where %s; return new; elsif lower(tg_op) = \'delete\' then delete from %I where %s; return old; end if; end; $x$ language plpgsql set client_min_messages to error;\n', p_schema, v_trigger_name, v_table_statement, -- function header v_table_name, v_all_column_list, v_new_column_list, -- insert v_table_name, v_assignment_list, v_old_column_list, -- update v_table_name, v_old_column_list); -- delete -- generate the view trigger v_final_statement := v_final_statement || format(E' drop trigger if exists %I on %I.%I; create trigger %I instead of insert or update or delete on %I.%I for each row execute procedure %I.%I();', v_trigger_name, p_schema, p_table_name, v_trigger_name, p_schema, p_table_name, p_schema, v_trigger_name); -- create all objects at once execute v_final_statement; end; $$ language plpgsql set client_min_messages to error;

Function of deleting temporary tables

drop_permanent_temp_table

 create or replace function drop_permanent_temp_table( p_table_name varchar, p_schema varchar default null) returns void as $$ declare -- https://github.com/yallie/pg_global_temp_tables v_table_name varchar := p_table_name || '$tmp'; v_trigger_name varchar := p_table_name || '$iud'; v_count int; v_drop_statements text; begin -- make sure that the schema is defined if p_schema is null or p_schema = '' then p_schema := current_schema; end if; -- check if the supporting functions exist select count(*) into v_count from pg_catalog.pg_proc p join pg_catalog.pg_namespace n on n.oid = p.pronamespace where p.proname in (p_table_name, v_trigger_name) and p.pronargs = 0 and n.nspname = p_schema and p.prosrc like '%pg_global_temp_tables%'; if v_count <> 2 then raise exception 'The table %.% does not seem to be persistent temporary table. %', p_schema, p_table_name, 'The function only supports tables created by pg_global_temp_tables library.' using errcode = 'UTMP2'; end if; -- generate the drop function statements v_drop_statements := format(E'-- drop the functions and cascade the view drop function %I.%I() cascade; drop function %I.%I() cascade;', p_schema, p_table_name, p_schema, v_trigger_name); -- drop the functions execute v_drop_statements; end; $$ language plpgsql set client_min_messages to error;

Using
Create a regular temporary table

 CREATE temp TABLE filter_table ( id BIGINT NOT NULL, id_str VARCHAR(255), key VARCHAR(255) NOT NULL, fd DATE, id_long BIGINT, sd DATE, CONSTRAINT filter_table_pkey PRIMARY KEY (key, id) ) ON COMMIT PRESERVE ROWS;

We run it through the generation function:

 select create_permanent_temp_table('filter_table', 'schema_name');

Source: https://habr.com/ru/post/340566/

All Articles

Useful scripts when migrating from Oracle to PostgreSQL

Training

Fix column types

We transfer data

We process boolean

Processed by oid (bytea)

Add missing functions in PostgreSQL

Text fields

Sequence

Transformers.ALIAS_TO_ENTITY_MAP

Sysdate

substring

Temporary tables

More articles: