
It just so happens that with Oracle our organization has been working for a long time and tightly. I myself became acquainted with Oracle Database at the time of version 6 and, since then, have not experienced any discomfort. All spoiled market relations. Recently, we began to notice that the Customer looks at our projects much more favorably if they use free
DBMS . On the porting of one of these projects my story will be ...
The choice of a free DBMS was, for me, a long and difficult affair, but, in the end, came down to two well-known alternatives.
PostgreSQL attracted rich (and still evolving) functionality, while
MySQL was luring performance and “zero” administration. Since Oracle has spoiled us and we all knew and loved
SQL , numerous and new-fangled
NoSQL variants disappeared back in the semi-finals.
It is difficult to say which of the DBMS I would end up with if it weren’t for one shabashka that gave me the opportunity to touch both PostgreSQL and MySQL live, slowly comparing them and making, in my opinion, a well-grounded decision. In addition to the functionality, of course, compared and performance. I will not tell the details, but one of the principal moments of “shabashki” was the ability to quickly and reliably (
ACID , yes) insert a large number of records into the database. On this topic and the test was conducted:
The ordinate axis defers the number of records of fixed length, stored in the database every second. The number in the "legend" means the size of the transaction. It should be noted here that MySQL was measured "as is", and PostgreSQL using a small
patch that provided the ability to work with
partitioned tables and
materialized views that I was used to . Since it was a question of “safe” storage,
MyISAM on the graph is presented solely for the sake of completeness and understanding of where the “theoretical maximum” of the desired performance on the used hardware is located.
')
Since the testing itself was carried out for a long time and no
SSD on the available iron even smelled, the absolute values ​​shown in the graph should not be treated as a dogma. Of course, you can save data even faster, but I was interested in the performance ratio of various DBMS that worked in (almost) the same conditions. It came as a surprise to me that PostgreSQL, even weighted by partitioning triggers, works almost as fast as MySQL using
InnoDB , and on large transactions (1000 records and more) it starts to catch up with MyISAM!
As you can easily guess, the graph shown above finally convinced me that I should switch to PostgreSQL. Re-creating tables with overriding column types (number in numeric and integer, varchar2 in varchar and text ...) was trivial.
XML and
XSLT helped in data transfer.
About the benefits of XMLStrictly speaking, XML helped even earlier. One of the features of our product is storing in the database descriptions of business entities in the form of ordinary tabular data (I do not think that it is very original in this). Comparing such “metadata” for two different schemes was a real headache, until I wrote a small package that unloads them into XML descriptions. Sorting tags inside descriptions allowed comparing them as ordinary text files. XSLT has added a picture, having provided automatic generation of SQL scripts from files of descriptions.
It remains to ensure the operability of all SQL-code written for Oracle. Most of the requests worked, some earned after minor cosmetic changes. First of all, I created a
dual table:
create table dual ( x varchar(1) not null ); insert into dual(x) values('x');
Not that it was impossible to do without it, but in our queries it was used so often that it was simply inappropriate to rewrite them. In order for PostgreSQL to be “satisfied,” we had to add more strictness to queries:
Oracle version select b.id id, b.name name from ( select list_value from acme_obj_list_value group by list_value ), acme_list_value b where b.id = list_value
PostgreSQL version select b.id id, b.name as name from ( select list_value from acme_obj_list_value group by list_value ) a, acme_list_value b where b.id = a.list_value
All
inline view must be named, and it is highly desirable to use the '
as ' keyword in front of column aliases. For most columns, it can be omitted, but when using names like 'name' or 'value', this leads to an error. The next step was to replace the platform-specific code with the corresponding constructs supported in both Oracle and PostgreSQL. We are talking about
nvl and
decode , as well as the outdated syntax of the
external connection . The first two are easily replaced with standard (and more flexible)
coalesce and
case , in the case of using an external connection, the query should be rewritten:
Oracle version select ot.name, mv.str_value from acme_object o, acme_meta_value mv, acme_obj_type ot where o.id = :object_id and ot.id = o.obj_type_id and mv.owner_id(+) = ot.id and mv.param_id(+) = 9520
PostgreSQL version select ot.name, mv.str_value from acme_object o left join acme_meta_value mv on (mv.owner_id = ot.id and mv.param_id = 9520) inner join acme_obj_type ot on (ot.id = o.obj_type_id) where o.id = :object_id
ANSI Join is supported by Oracle from version 9 and, in my opinion, is more convenient (although less concise) than the outdated version using
(+) . You should not try to combine different forms of connection in a single SQL query. If we used outer join, then for inner joints it is quite logical to use
inner join , rather than listing the tables in the
from phrase, separated by commas.
The main part of the work on the migration of SQL-code was associated with the rewriting of hierarchical queries. The
connect by phrase in PostgreSQL is naturally not supported. Meanwhile, there were a large number of requests of the following form:
An example of using a hierarchical query select t.id as value from acme_object t, acme_obj_ref_value rv where rv.object_id = t.id and rv.attr_id = 220102 and rv.ref_value = :object_id and t.obj_type_id in ( select ot.id from acme_obj_type ot connect by prior ot.id = ot.parent_id start with ot.id = 200335 )
A simple rewriting of such queries using
CTE would not allow them to become platform-independent. Although Oracle (starting with version 11.2)
supports recursive queries, their syntax differs from that used in
PostgreSQL . In particular, in PostgreSQL, the use of the recursive keyword is mandatory, but Oracle does not "understand" it. Fortunately, in most cases, the hierarchical part of the request could be “hidden” in the
view .
Oracle version create or replace view acme_arm(id) as select ot.id from acme_obj_type ot connect by prior ot.id = ot.parent_id start with ot.id = 200335
PostgreSQL version create or replace view acme_arm(id) as with recursive t(id) as ( select id from acme_obj_type where id = 200335 union all select a.id from acme_obj_type a inner join t on (t.id = a.parent_id) ) select id from t
Rewriting the
merge operators turned out to be more “puzzling” (fortunately, they were not used as often as hierarchical queries). PostgreSQL does not support this operator, but it does support the use of the
from and
returning phrases in the
update statement, the latter returning a full resultset (similar to the
select statement), which allows it to be used in the
with phrase. I'll just leave it here:
Oracle version merge into acme_obj_value d using ( select object_id from acme_state_tmp ) s on (d.object_id = s.object_id) when matched then update set d.date_value = least(l_dt, d.date_value) when not matched then insert (d.id, d.object_id, d.date_value) values (acme_param_sequence.nextval, s.object_id, l_dt)
PostgreSQL version with s as ( select object_id from acme_state_tmp ), upd as ( update acme_obj_value set date_value = least(l_dt, d.date_value) from s where acme_obj_value.object_id = s.object_id returning acme_obj_value.object_id ) insert into acme_obj_value(id, object_id, date_value) select nextval('acme_param_sequence'), s.object_id, l_dt from s where s.object_id not in (select object_id from upd)
In this example, you can see that working with
sequences in PostgreSQL is also different from Oracle. Of course, in Oracle, it was possible to define a function similar to the one that gets the values ​​from sequences in PostgreSQL, but rewriting the Oracle code (as well as the Java code) could be avoided. In addition, this approach could be associated with additional overhead.
A lot of joy brought work with
date and time . The fact is that the type of
date commonly used in Oracle accustomed to some negligence in dealing with its values. We can assume that such a value is a number, the integer part determines the number of days elapsed from a certain “magic” date, and the fractional one defines the time, to the nearest second. After some getting used to (as with most features of Oracle), this is quite convenient, but PostgreSQL is much stricter with regard to data types.
date '2001-09-28' + interval '1 hour'
Thus, you can add a constant interval to the date, but what if you need to add a variable value? The search expression is not at all obvious:
date '2001-09-28' + (to_char(p_hours, '99') || ' hour')::interval
Space in the line before the '
hour ' is required! Also, it can be noted that the rigor of PostgreSQL extends to the conversion of numeric values ​​to string values ​​(and vice versa, of course). The mask is required, even if it consists of some nines. Implicit conversions, so familiar after working with Oracle, do not work.
The remaining requests have undergone less radical changes. The revision required all the code working with strings, simply because the corresponding functions in
Oracle and
PostgreSQL look different. The
rownum column, where it still remained, had to be replaced with windowed
row_number () . In cases where the condition on rownum was used to limit the number of rows to be displayed, requests were overwritten using the phrase
limit .
We should also talk about table functions. And in
Oracle and
PostgreSQL they are. The implementation is different, of course, but referring to them from the SQL query looks similar. Unfortunately, as in the case of the recursive CTE, everything is spoiled by the presence of one keyword:
Oracle version select * from table(acme_table_fuction(...))
PostgreSQL version select * from acme_table_fuction(...)
It remains to deal with the
packages . There is no such thing in PostgreSQL, but upon closer inspection, it turns out that it doesn’t really need it. Indeed, why are packages in Oracle? If we set aside global variables and initialization code (which we do not use), the main advantage of packages is that they break dependency chains. When changing database objects, only implementations of dependent packages are invalidated, but not their headers. The ability to make recursive calls inside packets is one consequence of this fact.
In PostgreSQL, the dependency mechanism is not implemented. Recursive calls to stored functions (there are no procedures in PostgreSQL) are also in order. In order to make the client code have to make a minimum of changes, it is enough to ensure only the appearance of the fact that we continue to work with packages. PostgreSQL schemas are perfect for this. Of course, in such a “package”, it will not be possible to implement “private” functions, but this is not a very big problem. Here's what the code will look like:
Package emulation in PostgreSQL drop function acme_utils.get_str_res(numeric); drop function acme_utils.c_str_res_ot(); drop function acme_utils.c_str_res_id_attr(); drop schema acme_utils; create schema acme_utils; create or replace function acme_utils.c_str_res_ot() returns numeric as $$ begin return 20069; end; $$ language plpgsql IMMUTABLE; create or replace function acme_utils.c_str_res_id_attr() returns numeric as $$ begin return 20070; end; $$ language plpgsql IMMUTABLE; create or replace function acme_utils.get_str_res(in p_res_id numeric) returns text as $$ declare res text; begin select o.name into strict res from acme_object o inner join acme_obj_value rid on (rid.object_id = o.id and rid.attr_id = acme_utils.c_str_res_id_attr()) where o.obj_type_id = acme_utils.c_str_res_ot() and rid.num_value = p_res_id; return res; end; $$ language plpgsql STABLE;
The need to remove all objects before the “re-creation” of the scheme is a bit tiring, but you can live. You can see in the text the unusual word '
strict '. It provides familiar Oracle behavior when trying to fetch zero or more than one record. Among other memorable moments, I can mention a
strange construction that calculates the number of rows modified by the last query:
Oracle version insert into acme_resource_tmp(id, device_id, profile_id, owner_id, type_id, res_num, name) select acme_main_sequence.nextval, t.device_id, t.profile_id, r.id, p.type_id, t.num, t.value from acme_state_tmp t inner join acme_profile_detail p on (p.profile_id = t.profile_id and p.param_id = t.param_id) inner join acme_resource r on (r.device_id = t.device_id and r.owner_id is null); l_ic := sql%rowcount;
PostgreSQL version insert into acme_resource_tmp(id, device_id, profile_id, owner_id, type_id, res_num, name) select nextval('acme_main_sequence'), t.device_id, t.profile_id, r.id, p.type_id, t.num, t.value from acme_state_tmp t inner join acme_profile_detail p on (p.profile_id = t.profile_id and p.param_id = t.param_id) inner join acme_resource r on (r.device_id = t.device_id and r.owner_id is null); get diagnostics l_ic = row_count;
Of course, the implementation of all packages had to be rewritten, since there were not so many of them. From my previous narration, it can be understood that the entire rewritten SQL code is divided into three categories:
- Requests that by small rewriting were able to lead to a platform-independent form
- Queries that managed to hide platform-specific fragments in views
- Certainly platform-specific code
There are no difficulties with the first two. The latter category may cause some problems if platform-specific constructs are present in queries generated by the client. The point is that you don't want to rewrite Java code. Even less desire to divide the source into two versions, working with different DBMS. Unfortunately, it was not possible to completely eliminate platform-dependent constructs from client code. For the most part, the
table keyword in table function queries interfered. There were also references to sequences and some hierarchical queries.
It was decided to store all platform-dependent queries in the database, loading them into the program cache, at the first call. Initially it was assumed that each of the databases would store their versions of queries, but it turned out to be more convenient to store queries at the same time for all used DBMS. In Oracle, the
CLOB field was used to store query text, while in PostgreSQL,
text was used . To ensure consistency, the CLOB to varchar2 conversion was used, which
limited the maximum request size to 4000 characters (one request did get out of this size, but since it was intended for the PostgreSQL version, it was not necessary to “compress” it). The
to_char transformation
itself had to be hidden using the view:
Oracle version create or replace view acme_query(name, sql) as select a.name, to_char(c.clob_value) from acme_object a inner join acme_obj_list_value b on (b.object_id = a.id and b.attr_id = 10061) inner join acme_obj_value c on (c.object_id = a.id and c.attr_id = 10062) where a.obj_type_id = 10004 and b.list_value = 10061;
PostgreSQL version create or replace view acme_query(name, sql) as select a.name, c.clob_value from acme_object a inner join acme_obj_list_value b on (b.object_id = a.id and b.attr_id = 10061) inner join acme_obj_value c on (c.object_id = a.id and c.attr_id = 10062) where a.obj_type_id = 10004 and b.list_value = 10062;
Summarizing, I can say that the work was not so terrible as it seemed at the beginning. Most of it was related to rewriting hierarchical queries and Oracle packages, and most of the problems were related to more strict SQL syntax and the absence of the usual implicit conversions in PostgreSQL. The scope of work could be less if we initially used more strict and platform-independent code in Oracle.