With the help of ERP-systems for more than 40 years ago, enterprises automate their business processes. Over time, as well as with the growth in the number and depth of automation of business processes, the volume of data grows rapidly. For companies operating in a competitive environment, the analysis of this information and the correct conclusions drawn from the analysis can bring commercial success: increase revenues, reduce costs, increase efficiency.
The problem is that as data grows, it becomes increasingly difficult to analyze information. The main problem is low productivity and, often, the lack of special analysis tools in ERP systems. Therefore, remaining on the current architecture of ERP-systems, it is no longer possible to perform data analysis in a reasonable time. Everything works slowly, or with a steady tendency to slow down. Even an increase in the computing power of ERP-system servers sometimes saves only in the short term.
Therefore, about 30 years ago, software architects thought about creating a new class of systems - data storage. The objectives of implementing data warehouses (CD) are usually the following:
- The accumulation and storage of data in structures that are optimal for analysis;
- Consolidate data from multiple systems;
- Reducing the computational burden on ERP-systems - data providers;
- Providing opportunities to users:
- Independently create reports on HD data using convenient tools;
- Use data mining analysis capabilities, OLAP, which have not been used previously.
Immediately there was the question of how to properly organize data in the repository, so that it not only satisfies the requirement of a “structure that is optimal for analysis”, but also makes it possible to make manageable and low resources for development and maintenance. Through trial and error, the data warehouse architects came up with a puff pie architecture,
LSA - Layered Scalable Architecture . And, as the name implies, not just puff, but also scalable.
')
In Figure LSA is a green block. Above it are the data visualization tools for reports based on Excel, Internet browsers, or just file formats for uploading HD data. Below the “green” LSA block are the types of systems providing data.From the point of view of implementation, it should be said that LSA is a logical division of structures with data (in terms of DBMS - tables) CD into several levels, each of which performs a specific function. The data is copied from level to level, transformed during copying and ends up in structures that are optimal for analysis (see the storage implementation objectives). Each level is not one object, not one structure and a table. In 99% of cases, there are many objects of different structures on the same level.
As mentioned earlier, the data should be copied from the supplier systems to CD, so that the load of analytical calculations falls on the CD - a system specially optimized for such tasks. The LSA architecture for temporary storage of data in the download format provides a special
“loading level” . When loading at this level, no data conversion is done. Only type checking is performed. For example, so that no letters are recorded in the number field, there was no "August 32", etc.
After the data at this level is stored, from t. further processing of HD is “all the same”, from where the data came: from a file, ERP-system or from a source of another type. Storage at load level from t.z. LSA is assumed to be temporary. Therefore, usually immediately after successful completion of the next download, the data is copied “higher”, transforming along the way, to the next level, etc. At the download level, it is usually recommended to store data for no more than 5 to 10 days, after which it should be deleted. To store data for a longer period of time, to accumulate large volumes at this level is not rational for performance reasons. After all, when reading, filtering on non-indexed fields of tables of this level can be performed with a large delay.
However, there are scenarios when data in the “download” format may be required in HD much later than 5–10 days. For such cases in HD provide the
level of "corporate memory" , not shown in the picture. In terms of the structure, the composition of the fields, each “Corporate Memory” object is identical to the corresponding load level object. However, data from corporate memory is not deleted so soon. But in order not to accumulate conditionally “infinite” volumes there, you should regularly archive data from this level with the possibility of subsequent quick recovery on demand.
At the next level, called the
“data storage level”, there are data in the most detailed structures that may be needed for consumers (user reports, other systems). Important note: only for consumers according to the requirements laid down in the project! Although this is not always the case. An experienced HD architect should try to anticipate unknowns at the time of the survey and design requirements and include them in the model. When the client declares them, they will already (voi la!) Be provided by the data model, or modification of the model will not be time consuming.
When loading to the storage level, the following operations can be performed (the list is incomplete):
- Coordination of homogeneous, but different in the codes of reference books values. For example, the gender F and F must be reduced to the same value;
- Calculations that require the highest possible detail of the data (for example, currency conversion at the exchange rate from the position of the document);
- Check for validity of analyst values ​​(check constraint);
- Filtration.
It is highly desirable to observe “isolation” when setting up downloads to objects of this level. Transformation logic when loading one object of the storage level should not depend on the data of another object of the same level. Otherwise, the order of loading will have to be observed and in case of complication of the transformation logic, it is not difficult to find yourself in a “mutual expectation” situation at the end of loading.
It is at the storage level that data should represent a “single version of truth” (
single version of truth ). In case of any questions from consumers, the data quality should be referred to the storage level and checked there. For this to work this way, the processes of timely and technically correct loading of data to this level with the execution of all checks, etc., become very critical.
Suppose this is all done, and at the data storage level we have a single version of the truth. But the information here is stored in the most detailed structure, which usually means a large amount of data. In addition, at this level, not all required indicators and analysts are calculated and saved. Indeed, it can be irrational to calculate all the required indicators at the level with the maximum detail of the data due to the large number of unnecessary calculations, or it is completely impossible in principle, since the calculation may require not only preliminary data aggregation, but also data from other objects of the same level. And for other objects, for example, a different update schedule ....
Therefore, there is a need for the next, third level of data that would store data subsets adapted to the needs of a user group and / or a group of reports. This level is called the “
data mart level ”.
At this level, along with simple aggregation (for example, compaction of a group of goods and a month of shipment), data is also combined with several objects of the data warehouse level. The objects of this level preserve the results of calculations of indicators and analytics.
The final level is
virtual data providers and reports . It is designed to combine (virtual, ie, without storage) data from various objects of both its level and the level of “data marts”. And, finally, user reports are defined here, which also usually contain the logic for calculating indicators.
The xDBMS cylinder on the diagram affects the 3 lower levels. It indicates the storage of data of the corresponding levels in the DBMS tables. The data of the uppermost “virtual” level, as follows from the scheme, is not stored in the DBMS tables, but is read from it.
And, of course, the cake "Napoleon" in the diagram is an excellent metaphor of stratification. As in the case of Napoleon, when the number of layers is not specified by the standards, LSA is also not a dogma. The concept of LSA only proposes to focus on a basic set of levels when designing an HD data model. And the rest is decided by the
confectioner architect HD and the client.
SAP has released a data warehouse product to the market as far back as 1998. This is the SAP NetWeaver BW product, widely known in the SAP world (Business Warehouse or SAP BW). Since SAP BW is constantly being improved, its new versions are regularly released. SAP BW is not a complete data warehouse; it is a tool for creating data warehouses. The set of object types from which the data warehouse in SAP BW is constructed corresponds perfectly to the LSA levels: for each of them there is a recommended set of object types.
- For the boot level, these are data sources with PSA tables (persistent stage area),
- For the data warehouse level, these are standard DSO (Data Store Object)
- For the level of "corporate memory" - this is Write-optimized DSO
- For the level of "data marts" - infokuby.
- Finally, for the last level of “Virtual Providers and Reports” - multi-providers, info sets, virtual infokubs and the reports themselves created in the Bex Query Designer tool.
All these objects (with the exception of reports and data sources) at all the specified levels form a “data model” constructed in SAP BW from the so-called. information objects - signs (other names: characteristics, analytics) and indicators. Info objects in BW are like bricks when building a house. This is from a technical point of view, and from a functional point of view in the SAP BW data model, feature objects-attributes implement regulatory reference information.
In SAP BW, objects such as "transformation" and "data transfer process" are used to copy data from one level to another with the implementation of transformations. In order to regularly perform sequences of actions for extracting, transforming and loading a data model, a “process chains” designer is provided.
All versions of SAP BW can use as the main relational DBMS the main vendors: Oracle, Microsoft, IBM, SAP / Sybase. “Main” here means that all business data, metadata, system information are stored in DBMS tables.
Whichever DBMS is chosen, uniform LSA rules are recommended for designing an effective data model in SAP BW.
Since 2012, SAP BW,
in addition to those mentioned above , has supported
SAP HANA as its main database system. The new features of SAP HANA and the platform of the same name made it possible to revise LSA so much that for BW on HANA some key approaches de facto changed, which is reflected in the new name
LSA ++ . I will tell about changes in more detail in one of the following publications.