📜 ⬆️ ⬇️

Kimball University: 10 basic rules for multidimensional modeling


Margy Ross - President of the Kimball Group .

Observe the following rules to ensure the granularity of the data, as well as the flexibility and availability of information resources in the future. Break the rules and throw confused users on the blank, impenetrable walls of your data storage.

Students attending the Kimball Group's multidimensional modeling lectures asked me for a list of “Kimball commandments” for multidimensional modeling. Refrain from using religious terminology. Therefore, the following, obtained by trial and error, we will call not too strict recommendations and rules "as-nothing-not-break."

Rule # 1: Load detailed atomic data into multidimensional structures.
')
Given the unpredictability of business user requests, to ensure all sorts of filtering and grouping, multidimensional models should be filled with the most detailed information as possible. This is the foundation on which everything will hold. Users typically need to see more than one entry at a time. And you cannot know all that many arbitrary, almost unpredictable, options for expanding / collapsing report elements. If only generalized data is available, then we can already assume cases where users will be pressed into a piece of brick wall, while trying to dig deeper into details. Of course, to increase the performance of frequent queries to aggregated data, atomic detailing can be supplemented with summary multidimensional models. But business users cannot live only with summarized data. In order to freely respond to ever-changing questions, they need as much detail as possible, even bloody ones.

Rule number 2: Build a multidimensional model around business processes.

Business processes are various activities carried out in an organization and represent a series of measurable events. Examples: order receipt or customer billing. Usually, business processes fix or generate unique performance indicators related to each event. Such indicators are interpreted as facts, packaged in separate (for each business process) tables of atomic facts. In addition to the fact tables of an individual process, sometimes summary fact tables are created that combine the data of many business processes into one fact table, but with a general level of detail. I repeat, the fact summary tables supplement the detailed business process tables, but do not replace them.

Rule No. 3: Ensure that each fact table is associated with a time dimension table.

The measured events described in Rule No. 2 always have a variety of date stamps associated with them. For example, a cut of a monthly balance or money transfer, fixed to hundredths of a second. Each fact table must have at least one foreign key to the related table with time dimensions. This table should be with granularity on the same day, with calendar attributes and any non-standard characteristics that describe the date of the event being measured. Examples include fiscal month and corporate holiday index. It happens that several external dates-keys are presented in the fact table.

Rule No. 4: Ensure that all facts are in a separate fact table, the same granularity or level of detail.

All fact tables can be classified into three main types of granularity: transactional, periodic cuts, or cuts with accumulation. Regardless of the type of granularity, each dimension in the fact table must be exactly the same level of detail. Combining facts from a single table, but representing different levels of detail, is bound to cause confusion among business users. And business intelligence tools will become subject to overstatement, duplication and other errors in the results of calculations.

Rule No. 5: In the fact tables, get rid of many-to-many relationships.

Since fact tables store the results of business process events, they are often based on the many-to-many (M: M) relationship between foreign keys, for example, many products are sold in many stores for many days. These foreign key fields should never be null (Null). Sometimes dimensions may contain multiple values ​​for a single indicator event. For example, a multitude of diagnoses related to a visit to a health organization or a multitude of clients related to a bank account. In such cases, it is unreasonable to solve the problem of dimensions with a set of values ​​directly in the fact table, since this will break the normal granularity of the measured event. Therefore, the connection with the fact table, the many-to-many relationship, occurs through an intermediate table (Bridge) with a double key.

Rule No. 6: Get rid of the many-to-one relationship between dimension tables.

Hierarchical, fixed-depth, many-to-one relationships (M: 1) between attributes tend to be denormalized or collapsed into a flat dimension table. If you have spent most of your career designing intuitive models for traditional transaction processing systems, then resist the instinctive desire to normalize the M: 1 relationship or break up into smaller sub-dimensions of the snowflake type (comment “). Denormalization of dimensions - this is a game called "multidimensional modeling".

The set of relations M: 1, presented in one dimension table, is a common practice. Also, in the dimension table there are one-to-one relationships, such as a unique product description associated with the product code. Sometimes, in the case of very detailed dimension tables with millions of rows and frequently changing convolution attributes, it is allowed to have a many-to-one relationship in the fact table. However, it is necessary to approach this use of the fact table very, very carefully, carefully weighing everything and considering it.

Rule No. 7: Store in the dimension tables the headings for the reports and the value areas for the filters.

The codes and, more importantly, the accompanying decoding and descriptions used to form labels and filter queries, should be assembled into dimension tables. Do not store fields with obscure codes or bulky descriptive fields directly in the fact table. But it’s not enough just to stuff the code into the dimension table. Do not expect that they will automatically be correctly supported in the BI tool, and users will not need explanatory decryptions. If this is a row / column header or filter from a drop-down menu, then they should be stored as a dimension attribute.

Because in Rule No. 5 we stated that the foreign keys of the fact table can never contain undefined values, it is reasonable to avoid undefined values ​​in the attribute fields of the dimension table. Replace the value of Null in the attribute with some default value, for example, NA or any other according to your taste. You, as the manager (situation: the situation with a certain servant or butler, and not a simple harsh administrator) is played up by the data source, as far as possible, prevents users from confusion.

Rule No. 8: Ensure that the dimension tables use a surrogate key.

Inexpressive, successively given surrogate keys (with the exception of the dimension with dates, where more expressive chronological keys are adopted (note: although, in theory, chronology is also a sequence)) provide a number of operational advantages. Such keys are smaller, which in turn reduces fact tables and indexes, but improves performance. Surrogate keys are absolutely necessary if you accompany changes in the dimension attribute by adding a new record to the dimension, with each profile change. Even if your business users did not initially plan to visualize changes to the tracked attribute, using surrogates it will be easier for you to make corrections if the wind blows the other way. Surrogates also make it possible to bring many existing keys to a common profile (note lane: meaning, to some standard form), and also mitigate the consequences of unexpected operational activities, such as disposing of outdated product numbers or acquiring another company, with their own coding scheme.

Rule No. 9: Create consistent dimensions for data integration across the enterprise.

Consistent dimensions (also known as common / common, master / master, standard / standard, or reference / reference) are important for corporate data warehouses. Once loaded into an ETL system and then reusable in a variety of fact tables, consistent dimensions provide all multidimensional models with consistent descriptive attributes and provide an opportunity to investigate (drill) and integrate data from many different business processes. The Enterprise Data Warehouse Bus Matrix bus matrix is ​​a key architectural template for representing the main business processes of an enterprise and the dimensions associated with it. Reuse of agreed dimensions ultimately reduces time to market by eliminating redundant design and development efforts; However, the agreed dimensions require the observance of certain obligations (commitment), as well as investment in the position of data manager. In general, this is not an administrator in the usual sense, and in general, maybe even a non-technical specialist, but a person who is well acquainted with the subject area, for example, a business analyst) and a management system (data stewardship and governance), even if you do not need to negotiate with each and every one for each dimension attribute to achieve complete consistency.

Rule No. 10: Continuously correlate requirements with reality in order to provide business users with a suitable DW / BI tool to help them make decisions.

Developers of multidimensional models must constantly balance between the requirements of business users and, laid down in the foundation, the realities of an existing data source. Only in this case, the produced project will be successfully implemented and, more importantly, it has a real chance to go into action and bring benefits. Maintaining the requirements-versus-reality balance is the essence of life for DW / BI professionals, whatever you focus on: multidimensional models, project strategies, technical / ETL / BI architectures, or deployment / maintenance plans.

Conclusion

If you regularly read our articles Intelligent Enterprise, Toolkit books or monthly Design Tips , then these rules should not be new to you. Now we have compiled our rules into a single set, which can be addressed when you are ready to design or recheck your models.

Source: https://habr.com/ru/post/116293/


All Articles