Message based integration. Advantages and differences from other approaches
Every enterprise during its life activity accumulates various software solutions for the implementation of its business processes. As a result, the challenge arises of ensuring the interaction between these software products for implementing end-to-end business processes that rely on data and functionality that are beyond the scope of a single software product.
The main ways to integrate applications:
• File sharing • Exchange through a common database • Remote function call • Enterprise service bus (MQ, ESB)
File sharing
This exchange method is based on the file mechanism, which is the basis of all modern operating systems. The main advantage of file sharing is that the source system does not need to know anything about consumer systems. We simply generate a data file and upload it to the repository (for example, the file directory), where other participants of the integration process can obtain information from it. There are quite a few software solutions that still use this integration approach as the main one. ')
The advantages of integration through file sharing include:
• Lack of strict links between integrable applications • No need to install additional software • Overall ease of implementation, lack of high requirements for developer qualifications
However, this scheme has several important limitations that must be considered when developing an integration model: • Most applications act rather "selfishly" and operate on such a file as its only consumers. As a result, quite often there are situations when systems compete with each other for file access. This is especially pronounced in cases where the confirmation of the delivery of data is moving the file to a special directory or deleting it.
• To avoid the first problem, an approach is often used when a separate file is generated for each consumer. But in this case, we get several new problems at once: an increase in the amount of data generated by the source system, an increase in outgoing traffic and an increase in time delays for placing files in public folders. There is another problem with this approach: in order to correctly form the files, the source system must clearly imagine who will be the consumer of this data for each type of data.
• Many systems do not have built-in file system interaction. They do not provide the ability to subscribe to events for recording or editing a file, to track the event of completion of writing to a file, etc. In such systems, it is necessary to implement cycles of periodic polling of file resources, which imposes an additional atypical load on the system - it has to spend some of its resources not on servicing its own business processes, but on implementing integration. Sometimes, to ensure a more reliable file exchange, it is necessary to explicitly allocate separate procedural periods for uploading and for downloading data. This leads to a temporary desynchronization of integrable systems.
• The exchange is performed on a point-to-point basis. It is quite difficult to trace routes and the history of data flow centrally. Difficult centralized management of the integration model.
Sharing through a common database
Data exchange through a common database is a development of the method of transmitting data through a file in order to overcome its shortcomings. In this approach, a single integration database is highlighted, to which all participants in the integration process can connect. The source system places its data in this database, and consumer systems read only the data they need.
The main advantages include:
• Built-in DBMS mechanisms for restricting access to competitive data. Data can not be read or modified until the completion of the recording procedure.
• Unified mechanisms for writing and reading data. All applications operate with standard database management mechanisms for working with data. This allows you to organize common approaches to the development and introduction of changes.
• Uniform data format for all participants in the integration process. The problem of semantic dissonance is eliminated. All applications perform data conversion to the same types, any application has full knowledge of current data types and their structure.
• Higher speed of data delivery regarding file exchange. In this scheme, it is not required to allocate the procedural periods of data access - they can be read immediately after they are recorded in the database.
• Built-in DBMS mechanisms for logging access to data allow inquiries about the reasons for a particular variance during delivery.
The disadvantages of the scheme include:
• The single database is the point of failure for the entire integration loop. Failure of a single database leads to the impossibility of the functioning of the integration scheme as a whole. Applications should provide their own mechanisms for accumulating unsent information and mechanisms for controlling the state of access to the integration database.
• With a high exchange rate, the integration database itself may become a bottleneck. There is a competitive access to data, there may be locks to change data.
• A fairly high degree of application relatedness. Making a change to the exchange pattern will require a concerted change in the respective systems.
• Working with a single format leads to higher requirements when designing an integration database schema, since the stored data must satisfy all participants of the integration process. The data should be stored in formats and structures that can be unambiguously read by all participants in the integration processes.
• All participants of the integration landscape should be able to connect to the integration database. At the same time, it should be understood that not all participants of integration processes support work with modern DBMS. This is a limiting factor when choosing a DBMS and can lead to higher transfer overhead. Moreover, it is not uncommon for an application to have no mechanisms for writing and reading from third-party DBMS at all.
• When working in distributed networks, there is excess traffic. If half of the consumer systems are located in segments other than the integration database, but the source system is in the same segment as the consumer systems, then the data from this segment should still be placed in the integration database and only then read by the consumer systems. We get redundant data transport between segments.
• There is some discrepancy between the sent and received data, as the source system does not affect the discreteness of the data received by the consuming systems. In such conditions it is quite difficult to trace the completeness and consistency of the data received by the recipient systems. There is a high probability of cases when the obtained data cannot be processed due to a delay in the delivery of related data, and there is no understanding of when they can be processed (when the missing data is received). The organization of post-processing and data storage mechanisms is required.
Remote function call
The approaches described above (file sharing and sharing through a common database) are aimed at ensuring interaction between applications in terms of data, but not in terms of functions. To ensure interaction at the level of functions, various technologies and mechanisms for calling remote functions are used.
To implement this approach, the following technologies can be used that provide mechanisms for remote procedure call:
• COM • CORBA • SOAP • Java RMI, etc.
In this case, the application must independently implement mechanisms for providing remote access to data.
The main advantages of the approach include:
• No need to organize an intermediate data warehouse. Consumer systems independently request data as such a need arises.
• Data consistency. The source system performs preliminary data preparation, including all data integrity functionality.
• Speed of receiving data. There are no delays associated with the need to write and retrieve data from proxy stores.
• Ability to organize schemes of "pulling" and "pushing" data. In the first case, the source system provides the necessary functionality for receiving data, and the subscriber system calls it as the need arises for receiving data. In the second case, source systems provide functionality for loading data; transmission is performed by the source system when the corresponding event occurs.
The disadvantages of the approach include:
• High connectivity applications. The efficiency of the consumer system begins to depend entirely on the availability and operability of the source system. In systems, it is necessary to implement data buffers that retain integration data for the period when there is no connection with the other participants in the integration process.
• When scaling an integration landscape, it is necessary to refine systems-sources and systems-consumers.
• If the integration landscape includes systems that implement various technologies for providing remote procedures, for example, some applications implement CORBA, and some SOAP, then all applications should be able to implement all approaches, or a layer of proxying applications appears that reduce the exchange to a single technology.
• Due to the difference in technology, systems can operate with different structures and data types. There are additional costs for data conversion.
• With a high exchange rate, the application starts spending more and more resources not on servicing its business processes, but on servicing the integration layer.
Service bus company
For a comprehensive solution to the problem of data transfer with access to application functionality, a messaging approach using specialized products is used. Conventionally, these products can be divided into two types: message queue services (Message Queue Services, MQS) and enterprise service buses (ESB). The general approach to building integration is as follows: the system is connected to the integration bus through specialized connectors. The main task of the connector is to provide a channel for receiving data into the system and transmitting data from it. The task of the source system is to transfer data to the connector, and the routing, transformation, and delivery of messages to consumer systems are already carried out without its participation.
The connectors are located as close as possible to the systems and guarantee the ability to transfer data even in the absence of a network connection, thereby relieving the systems involved in the integration from the overhead of ensuring the safety and transmission of data.
The main advantages of the system are:
• Weak connections between systems involved in integration. Moreover, in a well-built integration model, the system knows nothing at all about the other participants in the integration landscape. All work is reduced to sending messages to the service bus and receiving messages from the bus. This achieves the highest level of flexibility and scalability relative to all previously reviewed systems.
• Opportunities for data transformation. Allows you to integrate applications designed for different data formats, without the need for modifications. This helps to reduce the cost of data processing systems (data is sent once in the format of the source system and accepted by the consumer systems in their "native" formats), as well as engage in the integration of the system, which can not be completed or highly undesirable for one reason or another. Moreover, the cost of data transformation does not fall on integrable systems.
• Routing data. One of the most important mechanisms of a service bus, which makes it possible to drastically reduce the dependence and connectedness of participants in integration processes. With a routing mechanism, the source system can simply send a message to the bus once. She does not need knowledge of who should receive this message, is he ready to receive the message, etc. The message will be delivered to all consumers in accordance with the current route.
Accordingly, when scaling the scheme, we also do not need to make changes to all systems. It is enough to make changes to the route by adding or deleting a data consumer. It also allows you to deliver messages according to certain conditions. Moreover, the systems themselves are not involved in determining the conditions of the route, and therefore this behavior can be easily modified without the need to make changes to the system.
• Guaranteed data delivery. This enterprise bus service mechanism greatly simplifies data delivery schemes on low-stability channels, removing the load from source systems. They do not have to implement mechanisms for checking the presence of a communication channel and intermediate storage facilities for messages during the absence of a delivery channel. This mechanism also reduces the algorithmic load on the implementation of the delivery confirmation mechanism. This functionality is implemented at the level of integration mechanisms of the service bus.
• Ensuring security in data transmission. It is not a secret to anyone that in many cases the leakage of confidential data occurs precisely during their transfer. Buses provide encryption of transmitted data, and also support secure network connections.
• Centralized integration management is an important component of any integration landscape. This approach greatly reduces the overhead of initial configuration, scaling and maintaining the efficiency of the circuit as a whole. It also allows you to concentrate the necessary competences in one place, without spraying them on the integrated systems.
• Diagnosis of condition. An important feature of the use of specialized service tires are diagnostic mechanisms. The use of these mechanisms makes it possible to identify problems associated with both data transfer and the state of the systems involved in the integration. The most advanced systems provide proactive diagnostic tools. This type of diagnosis allows you to identify potential problems in the initial stages before the problem manifests itself in full force, and promptly implement a complex of pre-emptive actions.
The main disadvantages of the model are considered to be:
• Additional costs for the acquisition and support of specialized software products (MQ, ESB). It is often necessary to allocate additional server resources.
• The need for staff training on these software products.
Criteria for choosing the integration method
What are the criteria for choosing one or another method of integration? There are several main criteria, but it should be borne in mind that the weight of a particular criterion is determined by current conditions and tasks to be solved:
• Ability of all integration contour applications to use the selected integration method
It is no secret that different applications can be implemented in different architectural styles and development paradigms. There are applications that provide integration mechanisms, there are applications that do not have such mechanisms. There are applications that implement a single integration mechanism. For example, if we choose the “File sharing” integration method, then we must be sure that all the applications of the integration contour are able to exchange files and are able to work with the formats provided by each application.
• Ability to make changes to applications
Based on the previously voiced criterion, it becomes necessary to assess the possibility of finalizing the application to ensure its involvement in the integration contour. You should also evaluate the total labor costs for the refinement and availability of specialists on the market.
• Reliability requirements
It is necessary to assess what the requirements for ensuring the reliability of data delivery are, whether a delivery confirmation is required, whether it is possible to re-deliver previously sent data, whether the reliability mechanisms used by the mechanisms are supported.
• Level of application relatedness
Depending on the chosen integration model, applications are involved in the integration contour with varying degrees of binding rigidity. It is necessary to evaluate the possibility of ensuring a given binding stiffness. For example, if we use application integration by calling remote functions, we should understand if an application is ready to work in a scheme when the absence of a consumer system makes it impossible to transfer data.
• Temporary data delivery delays
The type of integration chosen and the approaches to the formation of the information sent impose restrictions on the frequency and speed of data transfer. The impact of time delays on the delivery of data on the business processes of the enterprise should be evaluated.
• Data Protection Requirements
Requirements for ensuring data protection during system integration should be assessed. Protection can be performed by encrypting data or by working with secure transmission channels.
findings
If we apply the selection criteria to the previously considered integration patterns, we can formulate the following conclusions:
• File sharing can be used in integration models with a low exchange rate and a small number of systems included in the integration circuit. With the growth of the number of integrable systems, the intensity and complexity of the exchange, this approach is better not to apply.
• The use of sharing through a common database removes some of the problems of file sharing, but is also not recommended for use in complex and intensive integration landscapes for many reasons: strong local connections between systems are created, private interaction scheme change is difficult, the application needs to be finalized to work with in the integration database, the average data rate. In addition, the integration database itself may become a bottleneck of the entire scheme.
• Remote function call is suitable when organizing the exchange in a single technological stack. In most cases, the need to refine systems to ensure that work with new data. High speed data exchange in the formation of the event model. The approach is characterized by high complexity of maintenance and scaling.
• Messaging through the enterprise service bus is the most balanced, even with a small number of systems and simple integration landscapes. A high exchange rate and weak application connectivity make such a scheme suitable for integrating a large number of applications with subsequent scaling of the solution.
Thus, with great confidence it can be argued that the exchange of data through the message mechanism implemented by service tires will be most suitable for organizing data exchange and deserving the close attention of any developer participating in the creation of integration schemes. Such a choice can significantly save time, costs and nerves, provided that you select the correct service tire.