Building information landscapes using service tires

The main issues of integration landscapes

In today's world, business applications can hardly exist without integration with other applications. Through integration, we can both carry out the simplest data exchange and build complex composite applications.

However, the accumulated park of applications due to its architectural, platform and "age" differences has a lot of problems of the integration level.
')
The main problems of this type include:

• Various application integration mechanisms
• Non-matching or partially overlapping data formats, data transmission redundancy
• Lack of a single point of control
• Difficulties with scaling an integration scheme
• The need to modify subscriber applications.

A. Various integration mechanisms

Most of the applications in the design receive one or another integration interface. However, each vendor has its own understanding of the role of the application in the integration scheme. And most often, he sees his application in the center of the integration landscape (especially financial system providers). Based on this vision, the architecture of the integration part of the application is built. But since applications are developed in different time periods, they usually receive the integration technologies that dominate in a given period. And because of the “egocentric” architecture, one or two of the most popular technologies are most often laid. As a result, we have a set of “application-egoists”, expecting that the whole integration will be built on the basis of the integration model embedded in them.

B. Mismatched data formats

The problem has the same roots as discussed above. Each application places itself in the center of the integration landscape and plans to play by its own rules. The application describes the formats and composition of data, based on its data model and algorithmic bases. Most often, the transmitted data tries to lead to complete identity with the internal data model of the application in order to avoid the overhead of changing formats. As a result, we have a situation where the data is either partially or completely different in formats, up to different media formats. Quite often, with simple integration schemes, this leads to the need to send the same data multiple times in different formats.

C. Lack of a single point of control

The absence of a single point of control is one of the cornerstone problems in the construction of integration landscapes. When each application tries to manage the integration from its point of view, the situation rather quickly becomes poorly managed. Planning a set of measures, sometimes not even related to each other, is required for monitoring and maintenance. There are focal competence centers associated with a particular application. However, it should be noted that part of the problem can be solved through an integrated approach to the full automation of the enterprise - a single platform, common protocols, uniform formats, pass-through documentation. But, despite the transparency and finiteness of the methods, in reality this is very rare.

D. Difficulties with scaling an integration scheme

The problems associated with centralized management, gradually grow into problems of scaling the integration scheme. To plan scaling, you need to put together information: who, when, in what volume and composition, transmits and expects data, which formats does it use and does it support protocols. In such circumstances, changing the format of even one of the information packages can lead to increased labor costs and downtime.

E. The need to modify subscriber applications

To ensure the download and upload data often requires modification of the application-subscriber. This leads to the need for constant modification of applications when scaling the integration landscape, application downtime during the period of making changes, increasing the computational load on the application to ensure integration; requires specialists on each platform on which subscriber applications are implemented. It also makes it very difficult to modify subscriber systems, since when planning modifications by the project team of an application, it needs to take into account and plan the necessary improvements according to the integration model. At the same time, the project team will either have to plunge into the architecture of integration links itself, or additionally involve a specialist with this knowledge.

The main approaches to the construction of integration landscapes

It is no secret that modern products of the class ESB (Enterprise Service Bus) are aimed precisely at a comprehensive solution of the listed problems of integration landscapes.

Despite the different approaches and component base, they all profess the following basic principles:

• Standardization of formats
• Ensuring that subscriber systems are connected to a single network through a library of connectors
• Guaranteed message delivery
• Transfer of processing and routing rules to the global level
• Unified logging and monitoring mechanisms
• Advanced scaling tools.

Following these principles, a more or less standard set of work is usually performed to build integration. At the first stage, the analytical study and standardization of the structures of the transmitted data are performed. The following approaches can be used: an approach in which all data from source systems is reduced to uniform structures, or an approach in which data is reduced to structures of subscriber systems before delivery. A common approach to data structures is often used. The first option allows immediately upon receipt to bring the transmitted data to the standard and use this data in the future (for example, to dynamically route data based on their content). If necessary, templates of information packages are formed. Another advantage of this approach is that in this case, all participants in the integration process equally understand the structure of the integration perimeter data.

The second approach allows you to reduce the overhead of data conversion and perform it when you receive only for subscriber systems that need it. With a mixed scheme, the conversion is either divided by data types, or always performed at all points of the data delivery route. This is done to simplify the subsequent maintenance of the integration landscape. Since we know the planned data structure and we can clearly track it with monitoring and automatic validation tools.

After determining the formats and data structures produce the formation processes of information. There are “route” and “streaming” data delivery schemes. They are quite similar to each other, but there are some differences.

Route scheme.

This scheme assumes that the sending system does not need to know about data consumers. In turn, consumer systems should not be dependent on the data provider. They simply receive the data to which they are subscribed, upon their occurrence in the network. This allows you to build loosely coupled schemes with the ability to buffer and aggregate data. This scheme also differs in that the mechanisms for receiving, processing and loading data are separated into a separate process and are not directly related to the routing procedure.

To implement this scheme, you must define:

• Data types and extraction methods
• List of recipients and terms of delivery of this type of data to designated recipients
• List of routes
• A set of transformation procedures.

It is important to understand that for each object, actions related only to this object are performed. That is, for the source system we determine which data types it will give, and how we will retrieve them; determines the mechanism of transformation of the data (if required). The subscriber system, in turn, also defines the types of data it will accept, loading mechanisms, and pre-transformation schemes (if required). The routes describe the delivery conditions for different types of data to each subscriber system. This approach makes it easy to scale the integration scheme for any changes that occur.

The disadvantages of this scheme include the reduction in the end-to-end transparency of the data flow history. This deficiency in systems is most often solved by monitoring tools.

Streaming scheme

With this approach, the integration specialist describes the complete data flow. Source systems and receiver systems are rigidly interconnected within the flow.

For a complete flow description, the following objects and mechanisms should be defined:

• List of available source systems
• Connection methods to source systems
• Data extraction methods for each source system
• Data transformation methods
• Methods of splitting and adding data
• List of Subscriber Systems
• Connection methods to subscriber systems
• Data transfer methods for each subscriber system.

The most commonly used approach is when one stream operates with a single data type. However, it is important to understand that the classical flow model is not required to operate with one data type. The design of such systems is carried out through the definition of a list of flows, the structuring of information within each flow. Next, the points of origin of information, the list of recipients and the transformations necessary for them are determined.

It is important to understand that each stream description requires re-generating the above described list of objects and mechanisms.

The disadvantages of this model are:

• Potential partial overlap of data in different streams;
• Streams that operate on several types of data complicate the understanding of the complete scheme and require data management schemes within the streams (streams in the streams). The complexity increases with the level of nesting;
• Each flow requires a description of all extraction, transformation, connection, etc., procedures, even if similar procedures are used in adjacent flows. The complexity of changing the scale of the integration model increases.

In any approach, data transformation schemes are an important component of the integration model. Only the places where these schemes are described differ. In the first case, the transformation schemes are described relative to the systems — the output of the source system, the input of the subscriber system — and constitute a single data processing procedure for this source. In the second case, the transformation scheme is described inside the data stream and can be divided into several independent sections. I would like to draw your attention to the fact that, unlike point-to-point integration models, the transformation procedure makes it possible to significantly reduce the load on the application. Practically, the use of an ESB with transformation schemes relieves the client application from the entire atypical load of exchange. One of the typical problems is solved - the need to refine the subscriber system.

Transformation schemes usually have the following functionality:

• Changing data transfer formats
• Changing the structure of the transmitted data
• Data enrichment
• Data depletion to reduce
• Dividing information packages into several
• Merge several information packages into one
• Validation of data by applied business rules.

Also, the development of transformation schemes can significantly reduce the amount of transmitted data. This is achieved due to the fact that the data is unloaded once, atomically and only in the required volume. In this case, the initiator of the unloading is either a data change event or a request for data from the transformation scheme in the amount necessary to perform data enrichment.

All data extraction, transformation, and loading actions are logged and recorded in various logs. The most common approach is to allocate a separate server for storing such logs. This allows you to connect specialized monitoring systems for the ability to quickly obtain data in human-readable form.

Log storage is divided into operational and archival. For on-line storage, cyclically rewritable log files are often used, which, as they are filled, are dumped into archive storage.

The state of ESB objects and equipment performance counters are logged separately. This allows you to build a complete operational picture of the passage of data and the current state of the system as a whole. The most advanced solutions include tools for proactive diagnostics of the network state. This diagnostics allows you to proactively inform employees performing maintenance of the integration network about the possibility of a problem in order to eliminate potential threats.

As a result, we have a clear and transparent data model, managed from a single point and provided with various means of centralized monitoring.

***
Building an information landscape allows you to solve many problems that are present in point integrations, but it requires a thoughtful approach to the design and choice of solutions. It is necessary to evaluate the pros and cons of the approaches implemented and choose the product that provides the maximum coverage of the future landscape with connectors, transformation blocks, and has developed monitoring and proactive diagnostic systems. Then choose the approaches to the design of data transmission schemes, to evaluate which of the approaches is most preferable in your case. At the output we get a stable, scalable integration model with full and centralized process control.

Stanislav Pigolkin, Technical Director

Source: https://habr.com/ru/post/331340/

All Articles

Building information landscapes using service tires

More articles: