When designing the full life cycle of an Enterprise application, the issue of organizing their access to data becomes very important. There are several reasons for this:
- pricing or other data warehouse vendor policies change regularly, but enterprises using storage data do not always agree with these changes;
- With the growth of the enterprise itself and the scale of its IT infrastructure, existing data storage solutions may cease to meet its needs or financial capabilities;
- data storage technologies are evolving, new tools are emerging, designed to solve specialized tasks;
- as part of open source projects, cheap or even free alternatives to expensive commercial solutions are being developed.
All this may lead to the fact that at some point the company wants (or will be forced) to change the data storage technology or start using new technologies simultaneously with the current ones. However, if during the design of automated systems their business logic was not separated from working with data warehouses, then a change in the storage tool can lead to costly and poorly managed migration.
The problem of separating business logic and working with data at the level of an individual application is solved by the widely known architectural template Data Access Layer (DAL), which is described more than once in Habrahabr. In order for this template to be scaled to the level of the entire enterprise, it is necessary to supplement it with a number of architectural principles that are discussed in this article. Following these principles will allow an enterprise to perform a controlled (controlled) replacement or add storage technology to its IT architecture.
Problematics
This section describes in more detail the original issues and prerequisites that led to the development of the DAL concept.The need to work with specialized databases
Currently, classic relational database management systems (RDBMS) have ceased to be the only means for solving the problem of data storage during application development. From universal DBMS is the transition to specialized. At the same time, specialization of storage means goes in different directions: by volume (Big Data class), load (HighLoad class), performance (High Performance, Fast Data classes), etc.
')
For the architecture of a particular application, the selection of a means or class of storage means for a specific task being solved is included in the norm. For large-scale IT systems, the simultaneous use of several databases of various types within one application (heterogeneous storage, Fig. 1) for different groups of data is often optimal.
Fig. 1. Universal and specialized databasesFor example, for individual tasks that do not need to present simple structure data in the relational model, but require increased performance and scalability requirements, specialized key-value (key-value) storages may be used. For other special types of data (document, photo, video, excel, text) and various non-functional requirements, there are a lot of other means of storage with various access interfaces (NoSQL) that are widely used.
At the same time, the SQL language remains the most suitable interface for working with relational data, when an application requires complex data structures and analytical queries with moderate non-functional requirements (volumes of stored data, scaling, parallel processing, performance).
Thus, a modern enterprise with a developed IT architecture is increasingly confronted with the need to move away from the “single gold standard” of the applied DBMS (often Oracle or Microsoft databases) and find themselves face to face with the need to provide the application infrastructure in the form of multiple databases and warehouses , including the NoSQL class.
At the level of application architecture, this leads to the emergence of many previously inaccessible opportunities to use specialized databases, which allows us to speed up development, reduce maintenance costs, provide fundamentally new business opportunities (if, of course, new opportunities are used wisely). For example, the use of cluster databases with storage in memory (the so-called in-memory database, characteristic representatives - VoltDB or SAP HANA) can implement a radically different approach to solving business intelligence problems by speeding up calculations by several orders of magnitude.
In turn, at the level of the technical architecture of an enterprise, there is a need to work with a large number of heterogeneous databases, which significantly complicates and increases the cost of all IT service processes, requires retraining of specialists, leads to more complex management tasks, including life cycle management of the technologies used.
The optimal solution of the listed problems should combine, on the one hand, the possibility of simultaneous use of several modern specialized databases, and on the other, controllability of this design at the level of the general policy and technological strategy of the enterprise.
Environmental Impact
The second significant factor initiating the process of change in the technical infrastructure of an enterprise is the change in the commercial (or other) conditions of the solution providers for the enterprise. This is especially true for RDBMS, with which business applications are often more closely connected than with operating systems, network equipment or file storage systems.
Under certain conditions, there may be an urgent need to replace RDBMS of certain suppliers. In this case, an urgent replacement of the database in many actively working business applications can be extremely expensive and complex project, fraught with disruptions and downtime of information systems, and as a result - and business. To eliminate the risk of getting into such a difficult situation for an enterprise could be a gradual and systematic preparation of business applications for a DBMS change should the need arise.
The need for modernization and technological development
The emergence of the task of replacing storage facilities can be caused not only by a change in external conditions, but also by the need for technological modernization and development.
In the case of external changes, the architect is faced with the fact that the storage medium used is outdated, no longer maintained by the manufacturer, or has lost competitiveness for other reasons. If replacement of the storage medium requires substantial processing of the application, it will remain irrational from the point of view of expenditure of funds for a long time. As a result, part of the enterprise architecture is gradually becoming obsolete morally, and the IT service is beginning to experience problems with support, maintenance and staffing, since it is difficult to select employees to work with outdated products.
In the second case, even if the competitiveness of a specific storage medium is preserved, trends and technological development tasks may require a transition to new technologies, opening up fundamentally different opportunities for the development of application functionality. If an existing application is closely connected with one type of database or only one type of database is fixed within the framework of the entire enterprise architecture, then many opportunities for developing business applications are closed.
Data Access Layer Concept
This section describes the general concept and principles of DAL, with which you can solve the above problems. Here the concept is described in the abstract and is not yet associated with the use of certain technologies or software products.Principle of managed specialized storage
So, the concept being developed should correspond to the set problem:
- to ensure the possibility of using specialized databases in the IT architecture of an enterprise while maintaining controllability (for example, by controlling their diversity and use);
- prepare business applications for a smooth and technological change of the DBMS in case of aggravation of relations with the supplier of the DBMS or, possibly, for reasons of technological modernization and technological development.
The concept is based on the principle of managed specialized data storage, which introduces a unified and controlled way to access various data for applications (Fig. 2).
Fig. 2. The principle of managed specialized data storageAt the level of enterprise technology policy, a fixed set of data classes is defined for which some storage facilities are provided in the infrastructure. At the same time, each data class assumes a specialized access interface (so far - at the logical level) that is optimal for this class. For example, for the key-value data class, the access interface must provide read and write operations on the key. And for the “relational model” data class, the access interface is represented as some fixed SQL dialect.
For each such logical data class, the infrastructure can provide storage with a different level of non-functional requirements that is clearly fixed by some SLA (defined by the class of storage).
Enterprise applications request and use various classes of data and corresponding access interfaces, abstracting from specific technologies that currently implement one or another interface. Access control to various databases is provided in a controlled mode, and the application “does not know” with which implementation of the database it works.
As a result, on the one hand, an application can use specialized storage facilities, accessing them through a well-defined software interface. On the other hand, if it is necessary to replace a particular DBMS with a similar one, you will not have to make serious modifications in the application. In extreme cases, a relatively inexpensive adaptation to the fine features of the new technology without the need to revise the architecture. The entire structure of access to data “applications - various DBMS” across the enterprise becomes observable and controlled.
Restricting the way data is accessed at the software architecture level
Fig. 3. Restriction of access to data at the level of software architectureAnother important principle of DAL is related to the mechanism for providing the chosen data access method for business applications (Fig. 3). Theoretically, it would be sufficient to fix a number of data access restrictions at the level of the architectural policy of the enterprise and monitor compliance with these restrictions when designing and developing information systems.
However, in practice such control is rather difficult to organize. This requires time-consuming expensive expert architects from the enterprise. At the same time, it is still possible that, for one reason or another, the manufacturer of the information system will not go beyond the contractual class of data and SLA. The accumulation of such precedents over time leads to the fact that the application is not portable to another DBMS, despite the formally limited data access interface.
The solution to this problem can only be a rigid separation of the application from the DBMS in such a way that the application basically does not get direct access to the DBMS, but works only through a specialized data access module - this is the Data Access Layer. In essence, it turns out that DAL fixes the architectural policy of an enterprise in terms of data access at the level of software architecture, which gives much more guarantees for the fulfillment of a data class contract compared to fixation at the level of specifications and agreements.
For information systems implemented on different basic technologies and requiring different data access interfaces, the implementation of DAL modules may be different. Instances of these modules may also be different for different applications, and may be common - for reasons of economy of resources or, on the contrary, to ensure the independence of the functioning of systems. But in all cases, the principle of indirect application access to the storage facilities should be respected.
Removal of business logic from the database
The principle of separation of applications from storage facilities through unified access interfaces can solve the problem of a rapid transition to alternative storage technology, but only if the application itself is deployed outside the database. In practice, information systems are still often found and actively used, in which a substantial part of the business logic of an application is implemented within the DBMS in the embedded procedural language. For such systems, easy portability to other DBMS can be ensured only by processing the system with the removal of business logic from the DBMS to a separate application server.
For all newly developed or actively developed systems, it is also necessary to control that a large amount of functionality does not accumulate inside the DBMS, which can make it difficult to switch to another means of storage. It is technically rather difficult to control this, since it is necessary to embed an intermediate link not only in the process of accessing data, but also in the process of managing data structures and DBMS settings. Therefore, such control should be provided when deploying new systems and their updates, as well as in the processes of architectural audit of the proposed technical solutions with the help of expert evaluation.
Concepts
This section refines and details the DAL description object model briefly introduced in the previous section.
Fig. 4. Conceptual model of the DAL conceptThe concept introduces several concepts (Fig. 4):
- data class;
- storage facility;
- storage facility characteristic;
- storage class;
- data group;
- data container.
These concepts and taxonomies are used as a methodical tool for designing speakers, as well as a means of managing the level of data storage across the enterprise.
Data class
Defines an ideal data abstraction, independent of either specific technologies or implementation limitations:
- an abstract (“logical-mathematical”) model of the data itself (for example, “a set of relational relations” or “a set of key-value pairs”);
- a model (or several models) of data access and calculations over data (for example, “relational algebra” or MapReduce);
- security model (for example, "named containers", "users" and "rights");
- data structure model (for example, “schema with description of tables” or “list of attribute groups”).
At the data class level, there are still no transactional issues, no performance issues, or in-memory type implementation characteristics. The business logic of an application is implemented on the basis of a specific data class, and a change in the class of data cannot happen without redesigning the application logic.
Data structure
Defines a data structure in a certain model (defined by a data class) that is specific to a particular application: for example, for a relational model, a list of tables, columns, and keys.
Storage facilities
Specific storage facilities on the market or deployed in the enterprise (various types of databases) provide the capabilities of one or several classes of storage for several classes of data.
Feature Storage
Represents any significant numerical, qualitative characteristic or binary sign of the storage medium:
- read / write performance;
- scalability;
- work in memory (in-memory);
- restrictions;
- specialization (for example, Big Data or Fast Data);
- support for ACID transactions, etc.
Storage Class (SLA)
Within the framework of one data class groups such a set of storage characteristics, which, on the one hand, is often required in applications, and on the other hand, provides an alternative implementation (storage facility).
Different data classes may define different SLAs. Translating application data from one storage class to another may require partial redesign, since compensation will be needed for the weakening or disappearing properties of the SLA. Alterations are also likely, if it is necessary to allocate a separate group of data within an application for placement in another data class or storage medium followed by integration of this data with other groups.
Data group
Data within a specific application, according to the logic of this application, belong to the same data class and require one SLA; for example, in an application, the “X reference books” data group belonging to the key-value class and requiring quick reading by key can be highlighted.
Data container
The identifiable amount of data stored by some storage medium and corresponding to one data group of the same application. In response to a data storage request for the application, one or more data containers are allocated.
Data Access Layer Structure
This section describes the principal design of the DAL software that implements the concept of unified data access. In the description of the structure of DAL are considered including the issues of selection and application of specific technologies and software products.DAL functions and boundaries in the enterprise IT landscape
The DAL structure is a set of software tools from which a unified data access layer can be built within an enterprise IT architecture that implements the principle of “managed specialized data storage” and fixes it in a rigid form of software architecture. DAL forms a controlled "layer" between business applications and storage facilities (databases) (Fig. 5).
Fig. 5. DAL structure in enterprise architectureThe DAL structure is a collection of loosely coupled software modules for accessing data from different applications. The weak connectivity of the modules allows you to avoid problems in synchronizing the development processes and updating various applications and DAL software. The access modules themselves can have specific implementations for various underlying technologies on which business applications are developed.
Principles applied in DAL design
- The task is not automatic, non-stop, cost-effective or instantaneous switching of a working speaker from one storage medium to another. Design solutions should not completely exclude (which is long and difficult), but significantly reduce the dependence of the application logic on the storage medium in such a way as to greatly facilitate a possible transition. It is necessary to rely on a reasonable approach of NP designers and their understanding of architectural principles and goals, and therefore not try to build a super-secure technical system.
- In designing, ready-made, existing and standardized tools and components should be used to the maximum. If an enterprise wants to ensure maximum independence from vendors, then it is preferable to use Open Source products and technologies, but only sufficiently mature, with the appropriate size of the community and examples of implementations in responsible enterprises.
- You should strive for simplicity of transferring existing speakers to DAL.
- It is better to increase the capabilities of the implemented DAL gradually, iteratively. The first step should be simple, cheap and fast, aimed at solving the most urgent problems (in particular, the portability of relational DBMS).
- A new approach to storage management may not be limited to technical means, but include, for example, changes in the organizational mechanisms of the enterprise in terms of infrastructure management, as well as in the design and technical audit processes of the plant.
Layout options for DAL structure components
Fig. 6. Layout options for DAL structure componentsThe DAL structure involves two options for the placement of data access modules (Fig. 6).
- In the form of a library included in the application. In this embodiment, it is necessary to develop different types of access modules for different speaker design technologies.
- In the form of a network service. This option is intended for those cases when for some reason the data access service in the form of a library is impossible or undesirable (for example, the IC is developed on a platform that does not have a library for working with DAL). In this embodiment, DAL is available for any application that can work with network services via HTTP. For this placement method, the set of data access interfaces provided may be limited.
Data Access Software Interfaces
To implement access of business applications to the data of each class, one should give preference to industry standards and de facto standards for programmatic access interfaces (APIs). This will ensure not only the simplest adaptation of existing speakers to work with DAL, but also the long-term sustainability of the data access layer to subsequent technological changes.
Software data access interfaces (APIs) that business applications use, may vary for different underlying application technologies. This optimizes development and conforms to de facto standards for this underlying technology. It is important to note that this separation does not violate the general principles of the concept of unified data access, since - and this is important - the application remains independent of the specific database technology used. For example, for applications developed in Java, the standard JDBC interface in the Java world can be the standard interface for accessing relational data, while the ODBC interface is provided for C / C ++ applications.
SQL dialect constraint when working through ODBC / JDBC interfaces
Fig. 7. SQL unification and translationFor relational data, modern DBMSs provide a large number of specific features and dialects of the SQL language. The active use of various tools in applications leads to a deterioration of their portability. Therefore, as part of the implementation of DAL, the SQL-based access to the relational data must specifically restrict the SQL dialect used in such a way that the application in principle does not have the ability to be tied to the features of a particular DBMS (Fig. 7).
In addition to the task of restricting the dialect used, there also arises the task of translating individual syntax elements of SQL into a dialect of a specific DBMS. This need arises from the fact that some common RDBMS capabilities widely used in applications, although present in the vast majority of them, are not included in the SQL standard.
RDBMS Access Module for Java Applications
Based on the design principles outlined above, the first and most relevant module for implementing DAL is the access module to relational data. Despite the fact that, as described in the problematic, corporate business applications are increasingly taking advantage of the potential advantages of using NoSQL-DB, relational databases remain the most widely used and popular types of storage tools (often simply because a large number of application software is already developed based on RDBMS).
We also recall that for applications implemented on different basic technologies, it is highly likely that the development of different access modules will be necessary. This section describes the DAL module device for Java applications. This technology is widespread in large corporations and banks and is characterized by a large number of sustainable and supported industry standards that can be taken as the basis for developing a unified data access interface.
Interfaces to access relational data
For Java applications, two standard access interfaces to RDBMS are widely distributed:
- access directly to relational data through the JDBC interface and the SQL language;
- access via the object model through the JPA (Java Persistence API) object-relational adapter.
In practice, applications use a combination of these two access methods. JPA is often used to modify data and read single objects, and for complex reports and samples, SQL queries are executed directly through a JDBC connection or through special inputs to JPA (so-called native queries).
It seems expedient to fix these two methods together as a unified access interface. It is necessary, as mentioned above, to explicitly restrict the SQL dialect used in direct queries to some “unified” variant, since JDBC and JPA native queries allow you to execute arbitrary queries in the syntax of an arbitrary DBMS.
Constructive implementation of the DAL module for Java
So, the DAL module for accessing Java applications to RDBMS should provide the JPA and JDBC interfaces, but at the same time limit the use of SQL to its “unified” version, in order to ensure the most rapid transfer to another RDBMS, if necessary.
A module with such requirements could be developed independently, but the best option is to use ready-made software components with similar functionality for this. In our company, when implementing DAL, the
JBoss Teiid product, originally intended for federation (virtualization) of data access
across the enterprise , was selected as such a component after analyzing a number of existing finished products (Fig. 8).
Fig. 8. Constructive implementation of the DAL module for JavaThe following functions and properties of JBoss Teiid correspond to the DAL requirements described above:
- providing a JPA interface for Java applications;
- Providing a JDBC interface for Java applications
- Using your own unified SQL dialect (based on the SQL-99 standard);
- translating your own SQL dialect into specific dialects of supported databases;
- availability of connectors for most used RDBMS and simple extensibility;
- the ability to work as a built-in module, and in network service mode;
- open source (open source), the possibility of free use for commercial purposes, a developed community and examples of implementation in large enterprises.
In addition, the use of Teiid offers the following additional features:
- database consolidation by creating a so-called “virtual DB” (virtual DB, VDB) in the settings using the visual designer. Such a “virtual database” is associated with real means of storage (not only relational);
- efficient execution of “cross-queries” - SQL queries to the “virtual database”, which are automatically translated into SQL queries to the real database;
- caching the results of queries to the "virtual database".
DAL Usage Examples
As part of the study of the migration possibilities for PostgreSQL, our company conducted a pilot project to transfer one of the IT systems from Oracle to this DBMS. The object was a calculated back-office system built on the principle of a three-tier architecture. Most of the business logic (basic calculations) of the IT system was at the application server level, and part of the logic (accounting) was in Oracle stored procedures. When designing this IT system, the principle of data access via the Data Access Layer was originally incorporated into the architecture. This layer was built on JPA technology with the implementation of Hibernate.
Since the migration needed to be carried out in a short time and the transfer of the accounting component logic to PostgreSQL would take considerable time, we decided to first implement an intermediate option - hybrid data storage at the same time in Oracle and PostgreSQL.
The accounting logic and the tables necessary for it remained in Oracle, and the rest of the application data was migrated to PostgreSQL. The transfer of the account part to PostgreSQL was planned for the second stage of the migration.For the organization of hybrid access, a solution based on JBoss Teiid was prepared: a “virtual database” was created, combining access to PostgreSQL tables and stored procedures and accounting part tables in Oracle. This allowed the system to access two DBMS as a single database. Since the application worked through DAL, all these subtleties were screened for it. Also, at the DAL level, improvements were made to convert Oracle-specific SQL constructs into expressions supported by Teiid. These improvements concerned only work with data and did not affect the business functionality of the system. After completing the modifications, the system successfully passed modular and functional testing.A recent post was devoted to the technical details of the database migration from Oracle to PostgreSQL.my colleague Maxim Tregubov.Also, the approach of organizing access to data through DAL played a positive role in the modernization of a number of applications of our company. During the upgrade, various application logs (audit logs, interactions with external systems, etc.) were transferred to Hadoop-based archive storage. The remaining application data remained in the operational relational DBMS. Since access to these logs was carried out through a separate component with its own API, the replacement of the storage medium did not affect the rest of the application's functionality and did not require significant improvements.Conclusion
If an enterprise wants to preserve independence from suppliers of specific solutions for IT infrastructure, then developers of enterprise IT systems should abstract themselves as far as possible from the specifics of these solutions. Following the architectural principles of the Data Access Layer will allow the enterprise not to be held hostage by proprietary DBMS and choose the most suitable solution for data storage in terms of its capabilities and cost. In addition, the company will be able to use new technologies for working with data without significant modifications to the AU.