📜 ⬆️ ⬇️

Create an OLAP cube. Part 1

OLAP

Continuing the theme of Multidimensional Cubes, OLAP and MDX and olap for a small company , traditionally, I suggest starting with a simple “Hello World” cube that will analyze the processes and trends of voting on Habré.

So let's try to create our first OLAP system.
But, before, rubbing your hands, run Business Intelligence Studio, I suggest first creating a data storage for habr-voices, the so-called Data Warehouse .
What for? There are several reasons for this:

A bit of theory.


In essence, Data Warehouse can be:
In the latter case, you most likely want to implement ETL processes (using Integration Services or something else), but this is a reason for another, no less interesting, article.
')

What should be Data Warehouse?


It's very simple - your Data Warehouse should have a star- shape structure or a snowflake model and consist of facts (facts) and dimensions.

Facts are actual records (records) about some process that we want to analyze, for example, the voting process on Habr, or the process of changing the price of goods on the exchange. Very often, the facts contain some numerical data, for example, the actual value of the voice or price.

Dimensions are the defining attributes of facts, and usually answer all kinds of questions: when did the fact happen, over or with what exactly, who was the object or subject, etc. In general, measurements have a more descriptive (i.e. textual) character, for example, user name or month name, since it will be much easier for the end user to perceive the results described by the text (for example, the name of the month) rather than numbers (the month number of the year).

Determining where we have the facts, and where the measurements - it is very easy to build a star model.

Star.


Star

In the center we indicate our fact table, and with the rays we derive measurements.

And now the snowflake.


A snowflake is the same star , only measurements can depend on measurements of the next level, and those in turn can include more levels.

Snowflake
Each of these models has its own advantages and disadvantages, and the choice of the model itself should be based on the requirements for cube design, data loading speed, disk space, etc.
Naturally, the final Data Warehouse is usually much more complicated and consists of several stars or snowflakes that can share common measurements.

HabraDW.


Let's move on to the actual development of our Data Warehouse.

Our goal is to analyze the tendencies of voting on Habré, finding patterns and trends.
The main trends that we want to identify:
For clarity, our first model will be absolutely simple - we will include only what is relevant to the voting and exclude all unnecessary, including the time of user registration and the fact of who posted the article, as well as the voting time (only date) and other attributes (all these data can be included in the following articles and try to analyze more complex things).

As a result, we have the following tables:

The final scheme of our star will be like this.


HabraDW

And here is the original SQL script that creates and fills (so far only random data) our repository.

Well, now everything is ready to load the data into the cube.
See you in the next article.

Source: https://habr.com/ru/post/67272/


All Articles