Comparing the schemas of two or more different Postgres databases is a common task, but it can become more tricky if these databases run on different versions of Postgres. A quick and canonical way to compare schemas is to use the same
pg_dump program to interact with each base with the
--schema-only parameter. This method works great, but there are some pitfalls, especially when copying
views .

(
Photo taken by
Philippe Vieux-Jeanton )
Premise
Let's start with some prerequisites for how this problem was discovered. We have an instance that is in the process of upgrading Postgres version from 9.2 to 9.6 (the latest version at the time of this writing). Using
pg_upgrade was not possible, as it was planned not only to include
checksums of the data , but also to change the encoding to UTF-8. A number of factors, especially a change in encoding, meant that the typical update process was
pg_dump old_database | psql new_database is not possible. Thus, we have a very specific program that gently migrates parts of the data, performing actions on them along the way.
Problem
As a final assessment of sanity, we wanted to make sure that the final schema of the database updated to version 9.6 is as identical as possible to the current schema of the product database version 9.2. When comparing the output of pg_dump, we quickly discovered the problem of the way the views were displayed. Version 9.2 uses a very meager, single-line output, while version 9.6 uses a multi-line "nicely deduced" variation. Needless to say, this meant that none of the views were the same when comparing the output of pg_dump.
')
The problem lies in the
pg_get_viewdef () system function, which pg_dump uses to return a human-readable and Postgres-recognized version of the view. To demonstrate the problem and solution, we will create a simple representation on both bases, after which we compare them with pg_dump:
$ psql -p 5920 vtest -c \ 'create view gregtest as select count(*) from pg_class where reltuples = 0' CREATE VIEW $ psql -p 5960 vtest -c \ 'create view gregtest as select count(*) from pg_class where reltuples = 0' CREATE VIEW $ diff -u <(pg_dump vtest -x -p 5920 --schema-only) <(pg_dump vtest -x -p 5960 --schema-only) --- /dev/fd/70 2016-09-29 12:34:56.019700912 -0400 +++ /dev/fd/72 2016-09-29 12:34:56.019720902 -0400 @@ -2,7 +2,7 @@ -- PostgreSQL database dump -- --- Dumped from database version 9.2.18 +-- Dumped from database version 9.6.0 -- Dumped by pg_dump version 9.6.0 SET statement_timeout = 0; @@ -35,22 +35,14 @@ -- CREATE VIEW gregtest AS -SELECT count(*) AS count FROM pg_class WHERE (pg_class.reltuples = (0)::double precision); + SELECT count(*) AS count + FROM pg_class + WHERE (pg_class.reltuples = (0)::double precision);
The only difference, apart from the server version, is a view that does not correspond at all, and diff is concerned with it. (For the purposes of this article, all minor lines have been removed from the output).
As stated earlier, the culprit is the
pg_get_viewdef () function. His job is to present the filling of the presentation in an adequate, readable form. There are two major changes she makes with this output: adding parentheses and adding indents with spaces. In recent versions, despite the fact that documents hint, indents (beautiful output) cannot be disabled, which means there is no easy way to force the server with version 9.6 to give the difference in views in one line, as does the server with version 9.2 by default. Moreover, there are five versions of the pg_get_viewdef function, each of which takes a different argument:
- view name
- view name and boolean argument
- OID
- OID and Boolean Argument
- OID and integer argument
In Postgres version 9.2, the
pg_get_viewdef (text, boolean) version will enable and disable indents, moreover, you can see that no default indents are added:
$ psql vtest -p 5920 -Atc "select pg_get_viewdef('gregtest')" SELECT count(*) AS count FROM pg_class WHERE (pg_class.reltuples = (0)::double precision); $ psql vtest -p 5920 -Atc "select pg_get_viewdef('gregtest',false)" SELECT count(*) AS count FROM pg_class WHERE (pg_class.reltuples = (0)::double precision); $ psql vtest -p 5920 -Atc "select pg_get_viewdef('gregtest',true)" SELECT count(*) AS count + FROM pg_class + WHERE pg_class.reltuples = 0::double precision;
In Postgres version 9.6, however, you always encounter a “beautiful” display, no matter which of the five versions of the function you choose and which arguments you pass to them! Here is a call to the same function as in the example above on version 9.6:
$ psql vtest -p 5960 -Atc "select pg_get_viewdef('gregtest')" SELECT count(*) AS count FROM pg_class WHERE (pg_class.reltuples = (0)::double precision); $ psql vtest -p 5960 -Atc "select pg_get_viewdef('gregtest',false)" SELECT count(*) AS count FROM pg_class WHERE (pg_class.reltuples = (0)::double precision); $ psql vtest -p 5960 -Atc "select pg_get_viewdef('gregtest',true)" SELECT count(*) AS count FROM pg_class WHERE pg_class.reltuples = 0::double precision;
Solutions
When I first encountered this problem, three solutions surfaced in my mind:
- Write a script that will transform and normalize the output of the scheme
- Change Postgres source code to change pg_get_viewdef behavior
- Get pg_dump's call to pg_get_viewdef in such a way as to get identical output
Initially, I decided that a quick Perl script would be the easiest way. And by the time I got one working version of the script, it was a lot of pain to turn the output from “beautiful” into “ugly”, especially spaces and the use of brackets. The approach of brute force, by simply removing all the round and square brackets, the extra spaces from the rules and definitions of the representations almost worked, but the conclusion was rather
ugly difficult to read, and besides, there were problems with the extra spaces.
Approach number two, changing the Postgres source code, is actually quite simple. At some point, the source code was changed in such a way that the insertion of the gaps was forced to be turned on. Changing the only symbol in the
src / backend / utils / adt / ruleutils.c file decided everything:
- #define PRETTYFLAG_INDENT 2 + #define PRETTYFLAG_INDENT 0
Although this solution eliminated the problem with indents and spaces, the brackets are still different and it is not so easy to solve. Overall, not the best solution.
The third approach was to change the pg_dump source code. In particular, it uses the
pg_get_viewdef (oid) function format. By changing this format to
pg_get_viewdef (oid, integer), the format of the function and applying to the input of the argument 0, both version 9.2 and version 9.5 output the same thing:
$ psql vtest -p 5920 -tc "select pg_get_viewdef('gregtest'::regclass, 0)" SELECT count(*) AS count + FROM pg_class + WHERE pg_class.reltuples > 0::double precision; $ psql vtest -p 5960 -tc "select pg_get_viewdef('gregtest'::regclass, 0)" SELECT count(*) AS count + FROM pg_class + WHERE pg_class.reltuples > 0::double precision;
This modified version will reproduce the same pattern in our test database:
$ diff -u <(pg_dump vtest -x -p 5920 --schema-only) <(pg_dump vtest -x -p 5960 --schema-only) --- /dev/fd/80 2016-09-29 12:34:56.019801980 -0400 +++ /dev/fd/88 2016-09-29 12:34:56.019881988 -0400 @@ -2,7 +2,7 @@ -- PostgreSQL database dump -- --- Dumped from database version 9.2.18 +-- Dumped from database version 9.6.0 -- Dumped by pg_dump version 9.6.0 SET statement_timeout = 0;
The best solution, in the opinion of my colleague David Christensen, is simply to have Postgres do all the hard work itself with the magic of import / export. By the end of the day, pg_dump's output is not only human-readable, but also designed in such a way that Postgres can recognize it. Thus, we can feed the old scheme of version 9.2 to the time base of version 9.6, then turn around and copy it. As a result, we have identical
pg_get_viewdef () calls for both schemes. Here it is on our test bases:
$ createdb -p 5960 vtest92 $ pg_dump vtest -p 5920 | psql -q -p 5960 vtest92 $ diff -s -u <(pg_dump vtest92 -x -p 5960 --schema-only) <(pg_dump vtest -x -p 5960 --schema-only) Files /dev/fd/63 and /dev/fd/62 are identical
Conclusion
Attempts to compare schemas of different versions can be quite difficult, so it is better not to even try. Copying and restoring schemes is a cheap operation, so just copy both schemes to one server, and then make a comparison.