DOS-application (Clipper) organization transferred to a modern platform. The question arose of what to do with the data accumulated over more than 10 years and “sliced” by years, and the not very “broad” database table contains more than a million records. While the management was determined with the decision, asked what the theory says and whether there are regulatory documents on this subject. Practical recommendations found a little.
In the article
V.I. Tikhonova “Organization of archival storage of electronic documents: problems, practice, recommendations” classified the following approaches to archiving ED:
Migration
It consists in transferring the archived data to a new application. The feasibility in this case is doubtful, since for current calculations data from previous years are not required, and the differences in the structures involve a significant number of uncertainties that are resolved mostly manually.
')
Some personal experience of migration reminds that this is troublesome. Usually, a story is not dragged, but limited to account balances (Bookkeeping), a couple of certificates of individuals (Salary). And they start a new life from the first working day of January.
Conversion
- transformation of the database structure into other forms of electronic documents. Those. reports (forms) generated by the application in a systematic set of files in any generally accepted readable format (.txt, .xls, .doc, .pdf).
With a high frequency of reporting (monthly) and a large number of reporting facilities, labor costs may be high, and managing the final archive is difficult.
In clinical cases (the source database structure is not documented or unavailable), conversion and subsequent parsing of reports may be the only way to upload data (at least their semantics is clear).
In the Oracle database,
PL / SQL package AS_READ_XLSX is useful for analyzing the contents of xlsx reports, and for mass conversion xls to xlsx is the Libre Office utility:
soffice --headless --convert-to xlsx --outdir <_> <.xls>
Emulation and Virtualization
Since any transformations do not exclude errors and losses, the undeniable advantage of the approach is the access to the original data from the original application. Labor costs are minimal, options are possible:
- VirtualBox virtual machine with FREE DOS. I remembered memory management, and, after several attempts, started the application.
- DOSemu on a Linux server. The application started without any additional manipulations. Client access over SSH (from Windows using PutTTY, no installation required). On the client it is possible to copy-paste screen reports and forms to new documents.
On the server, you can upload zipped Clipper bases into an iso image and, when launching the application, unarchive into a temporary folder, keeping the original unchanged. - Dosbox JS-DOS API: we start DOS in the browser / Habrahabr (did not try).
- Windows virtualization.
Encapsulation
- export (unloading) of the database into structured cross-platform file formats (XML).
Possible contextual search for XML-documents and view any browser. Volumes of the final XML archive can grow significantly with respect to DBF.
The export / import of tables in XML is supported by most DBMSs, and the built-in language tools (DOM, XMLQuery, XSLT) allow processing more complex structures.
Exporting DBF to XML "as is" is an intermediate solution. It is preferable to “cut off the excess” and transform the relational model into a hierarchical one (natural for XML). By separating the data from the view, you can style the XML files or feed the XML to the report generators.
Returning to the problem. It was supposed to import into the database of the main DBMS of the organization - Oracle (again migration!). But importing and merging databases is half the battle. And then what? Provide a set of reports duplicating the functions of both the original and the new application?
With both hands for emulation, but preparing for any development of events. Experienced colleagues suggested: in Oracl, import from Access works well with DBF attached. There is one thing: DBF tables contain fields with integers packed in C4. And with the difference of encodings ... Well, you understand. In this case, the
PL / SQL package of importing tables from BLOBs is available .
While the story has no continuation.
Updated 06/15/16