How to work with MS Access in Linux

Many people use Acces ... even in production ... even to this day. Therefore, there are times when someone wants to connect to this database from some unexpected place. For example, from a unix server. Of course, you want to connect for a reason, and to use data from Access in a web application. And, without any doubt, there will be a desire to use this data together with information from other, more modern databases.

So, I want to describe several approaches to communicating with a creature called MS Access. Therefore, the initial task is as follows: establish a connection with MS Access from a Ruby on Rails application or from PostgreSQL (using FDW ) and access data, preferably in real time.

Below I will try to collect all the information related to the task described above and try to describe non-trivial cases and underwater stools. I hope this description will save time for someone ... or just, to some extent, amuse the respected audience.

Immediately tldr for those who are only interested in the facts and opinion of the author on this issue.
')

Conversion to CSV

To begin with, I will describe a simple working solution. It is guaranteed to work on Ubuntu 14.04. Should work on other Linux distributions. And does not require any ~~higher / sacred~~ knowledge, skills and other magic.

There is such a thing mdbtools . It is put very simply:

sudo apt-get install mdbtools

Details about its dependencies, manual build, package features, and much more can be found on the GitHub page .

This package provides a bunch of different tools for working with Access. We will not consider the whole bunch, but dwell on one thing. The one that can turn mdb files into csv:

 mdb-export 'mdb-file' 'table-name' > result.csv

As a result, we obtain a csv-file with the contents of the specified table. Further, this file can be subjected to all conceivable and inconceivable treatments and tortures, because csv is an insanely simple and widespread format.

Query execution in MS Access

Now a more difficult task: to execute a SQL query, having in your hands an mdb file and a machine with something unix. It is not hard to guess that you need to put a couple more packages and create several configs.

First, you'll need ODBC . This is the standard API for communicating with the database. In Unix, unixODBC exists for this purpose. Its installation is very simple:

 sudo apt-get install unixodbc libmdbodbc1

The second package contains libmdbodbc.so , which will be needed just below.

The next step is to find a suitable ODBC driver for MS Access. The closest available driver is from mdbtools. Next, you need to dig into the configs: describe the driver and declare the database.

The driver is described in /etc/odbcinst.ini :

 [MDBTools] Description = MDBTools Driver Driver = libmdbodbc.so Setup = libmdbodbc.so FileUsage = 1 UsageCount = 1

And the database is announced in /etc/odbc.ini:

 [testdb] Description = test Driver = MDBTools Database = /opt/db/MS_Access.mdb

It is worth noting that in the “Driver” you need to specify the name of the driver, which is described in odbcinst.ini.

More about odbcinst.ini and odbc.ini can be found here .

So, the configuration is done. Now you can proceed to query. For this purpose, we use the isql utility from the unixODBC package:

 isql testdb

If everything is done correctly, then a console should appear to execute queries:

 SQL> SELECT * from "" +------------+-----------------------------------------------------+ |  |  | +------------+-----------------------------------------------------+ | 1 |  | | 2 |   | | 3 |  | | 4 |  | +------------+-----------------------------------------------------+ SQLRowCount returns 4 4 rows fetched

Lastly, it is worth noting that there is an analogue of isql with unicode support. It is called iusql.

Oddities isql

To be honest, the isql utility is quite easy. It has a lot of restrictions on syntax and no user friendliness and understanding. For example: put a semicolon at the end of the expression - get an error and try to guess what it is. There are no tips, tips and other delights of modern development here. This is not PotgreSQL, which will kindly say that you were mistaken in the expression and suggest the correct version. Here you just send and do not even tell the reason. Therefore, for at least some ease of working with isql, the pyodbc-cli shell was created. With its help, you can somehow weaken the fight against isql and focus on writing queries.

Exotic table / column encodings

There are a lot of rumors about the 'Charset' parameter, which affects the staked page used. Here is an example of using this parameter:

 [testdb] Description = test Driver = MDBTools Database = /opt/db/MS_Access.mdb Charset = CP1251

The effect of this parameter on the work of isql was not observed. In isql, I can work with both mdb files containing Cyrillic and regular Unicode mdb files. At the same time, the iusql utility, regardless of the 'Charset' parameter, produced many question marks (like these::) when working with a Cyrillic mdb file.

Alternatives for isql

The alternative for isql is mdb-sql from the mdbtools package. Ini-files are not needed for this utility. You just need to set it on a specific mdb file:

 mdb-sql /opt/db/MS_Access.mdb

The man page will respond well to all questions about using the utility. The only feature: the utility could not swallow the above Cyrillic mdb file. There were no problems with Unicode files.

Ruby / Rails Path

It is now the middle of 2016, the last release of MS Access was on September 22, 2015. But bad luck, the latest work on the adapter for ActiveRecord is dated 2008. Therefore, I, as is customary, have two news: good and bad.

I'll start with the good: there is an odbc-rails and its reincarnation activerecord-odbc-adapter .

And now the bad: as noted above, the latest commits to the adapter's repository are dated 2008 and support for Rails and ActiveRecord versions one and two is announced; therefore, I do not know how to run it on Rails 3+ (and can it even be done at all). The reasons for my ignorance are approximately as follows. First: the adapter has bad documentation (or rather its absence). And secondly: there is no desire to go into the source, to understand and bring them back to life. So if you have enough knowledge, experience and time - you can finish and describe how to use it. Good luck in this case!

Ruby-ODBC

Since everything is sad with the adapter, you can look in other directions. One side is called ruby-odbc .

The last update of this heme is dated 2011, but, at the moment, it is more or less working. To install the heme you need to perform some simple steps:

 sudo apt-get install unixodbc unixodbc-dev gem install ruby-odbc

Without the unixodbc-dev package, compiling the native extension will fail with the error: ERROR: sql.h not found .

Further, we assume that ODBC is configured in the system (that is, odbcinst.ini and odb.ini are present). In this case, you can open irb and do the following:

 001 > require 'odbc' => true 002 > client = ODBC.connect("testdb") => #<odbc::database:0x00000000e38d98> 003 > statement = client.prepare 'SELECT * FROM ""' => #<odbc::statement:0x00000000e11608 @_a="[]," @_h="{}," @_c0="{}," @_c1="{}," @_c2="{}," @_c3="{}"> 004 > statement.execute => #<odbc::statement:0x00000000e11608 @_a="[]," @_h="{}," @_c0="{}," @_c1="{}," @_c2="{}," @_c3="{}"> 005 > first_row = statement.fetch => [1, "\xD0\x94\xD0\xBE\xD0\xBA\xD1\x83\xD0\xBC\xD0\xB5\xD0\xBD\xD1\x82\xD0\xB0\xD1\x86\xD0\xB8\xD1\x8F\x00"] 006 > first_row[1].force_encoding("utf-8") => "\u0000"

More information about the syntax and available ruby-odbc heme commands can be found in the ruby-odbc / test directory on GitHub.

Mdb gem

This gem provides DSL'ku to work with mdb-files. And she looks pretty cute. But there is a nuance: gem is just a ruby wrapper over the above described mdbtools. That is, the hem converts mdb to csv and processes this csv in memory. No magic and direct access to the database.

Alternative to ODBC driver

There is a commercial version of the ODBC driver for MS Access . But there is no factual information about him. In the optimistic version, this adapter will help with advanced queries in Access (the driver from mdbtools does not know much: no LIMIT, GROUP, AS, etc.). But this is only speculation. What will be in fact can be found only by buying it, or by taking a 14-day trial, which is available after registering on the site. In addition to this information, there were neither user reviews, nor any bug reports, nor any references to the fact that someone used the driver and he helped him with something.

PotgreSQL Path

For Postgres, there is an extension OGR . It is part of GDAL . Which, in turn, is a huge library for converting raster and vector formats of geospatial data. For our current purposes, the purpose of the library has absolutely no meaning. The main thing is that it is stated that it can work with mdb-format.

Installation

First you need to put a few dependencies:

 sudo apt-get install gdal-bin libgdal-dev sudo apt-get install postgis postgresql-9.3-postgis-2.1

This team will pull in a ton of dependencies ... but this is normal. The first set of packages for ogr_fdw, the second for postgis.

Step two: gather pgsql-ogr-fdw from sources. Here is a small bash-style manual:

 git clone git@github.com:pramsey/pgsql-ogr-fdw.git cd pgsql-ogr-fdw sudo apt-get install postgresql-server-dev-9.3 sudo apt-get install checkinstall make sudo checkinstall

Yes, you can take make install, but we do not want the seals to suffer . In the dialog that appears from checkinstall, it is necessary to correct the “version” parameter. It is necessary to make it in the format "numbers separated by dots" (for example: '0.1.0'). Otherwise, with default values, the package build will drop.

Step three: go and put the extensions in Postgres:

 CREATE EXTENSION ogr_fdw; CREATE EXTENSION postgis;

There is a suspicion that postgis is superfluous, but in the Readme on GitHub it is said that both are needed, therefore I will leave this question to inquisitive readers.

Step four: time to create FDW. In ogr_fdw there are two possible ways to work with Access. The first uses system ODBC. Details on this option can be found here . The second is more interesting, uses the MDB format from OGR, which provides direct access to the file using Jackcess . Details on this option are here . Below I will describe both ways.

Finally, one note: OGR is an extremely powerful thing; The ability to work with MS Access is a small part of the whole variety of available formats and, dear reader, it may quite reasonably be said that this is firing from a cannon on sparrows ... but the choice is not great and there has been no other tool to find it. And yes, here is a list of all formats supported by OGR .

ORG ODBC format

This approach uses the ODBC system settings and works by analogy with the above osql and ruby-odbc, but inside the database. All available options for initializing FDW are presented on the GDAL ODBC driver page. Below I will give only a simple example of use.

Actually, here it is:

 postgres=# CREATE SERVER testdb_access postgres-# FOREIGN DATA WRAPPER ogr_fdw postgres-# OPTIONS( postgres(# datasource 'ODBC:testdb', postgres(# format 'ODBC'); CREATE SERVER postgres=# CREATE FOREIGN TABLE access_sections ( postgres(# "" decimal, postgres(# "" varchar) postgres-# SERVER testdb_access postgres-# OPTIONS (layer ''); CREATE FOREIGN TABLE postgres=# SELECT * FROM access_sections; ERROR: unable to connect to layer to "" HINT: Does the layer exist?

As far as I understood from the OGR, layer documentation, in our case, it is equivalent to the database table.

A list of all layers can be obtained using the ogrinfo utility:

 $ ogrinfo -al 'ODBC:testdb' geometry_columns is not a table in this database Got no result for 'SELECT f_table_name, f_geometry_column, geometry_type FROM geometry_columns' command INFO: Open of `ODBC:testdb' using driver `ODBC' successful.

Based on this message, we can assume that everything works, but the target database (that is, the mdb file) does not contain the required Geo-data format and the OGR stumbles upon this unfortunate misunderstanding. I do not know how to wean him to forcefully check the format of the provided database. But some write that this approach works fine under Windows. In general, if you know how to reason with OGR ODBC and get it to work with an arbitrary mdb file, please say so, do not keep this knowledge in yourself.

A separate question: how PG will work with Cyrillic (and with any other non-Latin) table and column names. On the one hand, Postgres doesn't matter what the table / column name is, wrap them in double quotes and at least you can use special characters. On the other hand: who knows whether it is applicable to FDW, and it is not yet possible to check on a specific example.

ORG MDB format

This approach is based on the Jackcess Java library. Since this is Java and it has its own rich inner world, this approach has no links with system ODBC and, therefore, problems with drivers for MS Access are alien to it. But there are other features that I will describe below.

Immediately I’ll warn you that due to the “rich” documentation on the entire described process, the lack of extensive experience with Java and some monstrosity of the target package, we were able to assemble the working version in 3 days and ~ 20 complete rebuilds of the package. Therefore, I will immediately say about some things:

This approach is only suitable for working with unencrypted mdb files (that is, with files without a password);
since this is a package build, all the dependencies described below, paths, versions, and other attributes are valid for my particular case and environment, you may not be able to do this at all, and it’s not recommended to copy and paste commands.

So, everything written below is a more detailed version of the original official description of the GDAL ACCESS MDB database driver .

First: you need to put openjdk-6-jdk.

 sudo apt-get install openjdk-6-jdk

After a brief , thoughtful reading of the source code for GDAL , it seemed that he supported openjdk-7-jdk. But I could not get it to work with the 7th version.

Next, you need libgdal-dev.

 sudo apt-get install libgdal-dev

Here you need to remember the version of the package. It is directly related to the version of the GDAL package. In my case, this is version 1.10.1 .

Note: mdb format support starts from version 1.9.0 .

And lastly, you need to demolish the gdal-bin package, since we are going to build the extended version of it from source.

 sudo apt-get remove gdal-bin

Secondly: you need to download several JARs (ancient and not so), namely: jackcess-1.2.2.jar, commons-lang-2.4.jar and commons-logging-1.1.1.jar; then put them in lib / ext. In my case, the full path to this directory is: / usr / lib / jvm / java-6-openjdk-amd64 / jre / lib / ext . The aforementioned versions of JARs can be found inside this utility . For me, everything works with any later version of commons-logging (1. *), with any other minor version of commons-lang (2. *) and jackcess (1. *). Errors appeared only when using the next major jaccess version (2.1.4).

Thirdly: you need to download and configure GDAL.

 git clone git@github.com:OSGeo/gdal.git cd gdal/gdal/ git checkout 1.10

Here you need to go to the branch corresponding to the version of the libgdal-dev package, which was installed in the item number of times. Otherwise, the compiled binary will be incompatible with libraries.

Next you need to call configure. There are two ways to call. Plain:

 ./configure --with-java=yes --with-jvm-lib-add-rpath=yes --with-mdb=yes

and with explicit directions:

 ./configure --with-java=/usr/lib/jvm/java-6-openjdk-amd64 \ --with-jvm-lib=/usr/lib/jvm/java-6-openjdk-amd64/jre/lib/amd64/server \ --with-jvm-lib-add-rpath=yes \ --with-mdb=yes

The second option may be useful if there are several versions of Java on the system (for example, openjdk-6-jdk and openjdk-7-jdk) or if the first option did not produce the desired result.

After the end of configure, you need to find the cherished word 'yes' opposite the MDB format.

Fourthly: you need to find a cup of tea / coffee or something stronger and run the assembly package.

 sudo checkinstall

Here you need to answer a couple of simple questions and wait. In my case, it was necessary to wait about 10 minutes.

Here it should be noted that the package will be weighty, about 300MB. Of course, you can throw out all unnecessary from it, collect it with your hands and approach the size of the gdal-bin package from the repository (~ 900Kb), but this goes beyond the narrative and, therefore, will not be described.

Fifth: if something went wrong, the package build fell off, then Google and a bright mind to help you.

Sixthly: if everything went well, then after checkinstall the package should have been automatically installed and now you need to check whether the received binaries support the mdb format:

 $ ogrinfo --formats | grep MDB -> "MDB" (readonly)

If there was no information about mdb in the ogrinfo output, go to the beginning of this section, re-read the manuals, look at the dependencies, system parameters, moon phase and other attributes that may affect the compilation and the final binary, and try to rebuild the whole economy again.

If the team and the output match, then everything is fine and the most muddy part is behind. Now ogrinfo can work with mdb files and provide information about their contents:

 $ ogrinfo /opt/db/test-database.mdb INFO: Open of `/opt/db/test-database.mdb' using driver `MDB' successful. 1: closeouts 2: economics

Seventh: you can now configure FDW in Postgres. Here is a small script with an example of this action:

 postgres=# CREATE SERVER acc FOREIGN DATA WRAPPER ogr_fdw OPTIONS ( datasource '/opt/db/test-database.mdb', format 'MDB' ); CREATE SERVER postgres=# CREATE FOREIGN TABLE economics( ID integer) SERVER acc OPTIONS(layer 'economics'); CREATE FOREIGN TABLE postgres=# SELECT * FROM economics; id ---- 1 2 3 4 5 (5 rows)

And, in general, everything. In conclusion of this section I will say a few words about the "encrypted" mdb-files.

If FDW cannot pull data out of Access, ogrinfo swears like this:

 Exception in thread "main" com.healthmarketscience.jackcess.UnsupportedCodecException: Decoding not supported. Please choose a CodecProvider which supports reading the current database encoding. at com.healthmarketscience.jackcess.DefaultCodecProvider$UnsupportedHandler.decodePage(DefaultCodecProvider.java:115)

then, most likely, you have a password-protected mdb file. In this case, you should look at the FAQ from Jaccess and think about finishing the OGR Access driver. As far as I understand, there is a project Jackcess Encrypt . This project provides CryptCodecProvider, which, in turn, provides an implementation of the CodecProvider interface for Jackess and supports some encryption formats for mdb files. But, unfortunately, the current driver from GDAL does not know how to work with Jackcess Encrypt and, therefore, does not support encrypted files. So, there is a good direction to work in the open source camp.

Other FDW

A list of all existing FDW for Postgres can be found on the official wiki page . There is a ZhengYang / odbc_fdw , in which the last commit is dated 2011. And CartoDB / odbc_fdw , which is actively developed and supports Postgres 9.5+. So the choice is small.

Conclusion

Working with MS Access is painful ... doubly painful if you need to do it under Linux. So immediately good advice: pull data from any data into any modern database and get rid of the car of problems. If you can not pull out, then work with the access in Windows. There is a normal driver provided by Microsoft "out of the box", there are at least some articles and configuration examples about the docking of Access and Postgres in Windows and, in general, products of the same company usually work well with each other. If there is no such possibility either, then you have two options again: turn everything into CSV and work with it or try to directly access the mdb file. The first option is simple, works out of the box and does not require special skills. The second option is much more complicated, takes time, nerves, straight arms, has a set of restrictions, pitfalls and other unpleasant things. Therefore, choose wisely.