
For quite a long time I have been an inhabitant of Habr, but I have never been able to read articles on the topic of multidimensional cubes, OLAP and MDX, although the topic is very interesting and every day becomes more relevant.
It is no secret that in that short period of time the development of databases, electronic accounting and online systems, the data itself has accumulated a lot. Now the interest is also a full-fledged analysis of archives, and perhaps an attempt to predict situations for similar models in the future.
On the other hand, large companies even for several years, months or even weeks can accumulate such large data arrays that even their elementary analysis requires extraordinary approaches and hard hardware requirements. These can be banking transaction processing systems, exchange agents, telephone operators, etc.
I think everyone knows 2 different approaches to building a database design: OLTP and OLAP. The first approach (Online Transaction Processing - real-time transaction processing) is designed for efficient data collection in real time, while the second (Online Analytical Processing - real-time analytical processing) is aimed at selecting and processing data in the most efficient way.
Let's look at the main features of modern OLAP cubes, and what tasks they solve (based on Analysis Services 2005/2008):
- quick access to data
- preaggregation
- hierarchies
- work with time
- multidimensional data access language
- KPI (Key Performance Indicators)
- date mining
- layered caching
- multilingual support
So, consider the possibilities of OLAP cubes in a little more detail.
A little more about the possibilities
Quick access to dataActually, fast access to data, regardless of the size of the array, is the basis of OLAP systems. Since the main focus is on this, the data warehouse is usually built on principles different from the principles of relational databases.
Here, the time to sample simple data is measured in fractions of a second, and a query that exceeds several seconds most likely requires optimization.
')
PreaggregationIn addition to quickly sampling existing data, it is also possible to pre-aggregate the “most likely used” values. For example, if we have daily sales records of a product, the system
may also transfer monthly and quarterly sales to us, which means that if we request data monthly or quarterly, the system will instantly give us the result. Why is the preaggregation not always happening - because of theoretically possible combinations of goods / time /, etc.? there may be a huge amount, which means you need to have clear rules for which elements the aggregation will be built, and for which not. In general, the topic of taking into account these rules and the actual direct design of aggregations is quite extensive and in itself deserves a separate article.
HierarchiesIt is natural that by analyzing data and building final reports, there is a need to take into account that months consist of days, and they themselves form quarters, and cities belong to areas that in turn are part of regions or countries. The good news is that OLAP cubes initially view data in terms of hierarchies and relationships with other parameters of the same entity, so building and using hierarchies in cubes is a very simple matter.
Work with timeSince most of the data analysis takes place in temporary areas, it is the time in OLAP systems that has a special meaning, and therefore, simply defining for the system where we have time here, in the future you can easily use functions like Year To Date, Month To Date ( period from the beginning of the year / month to the current date), Parallel Period (on the same day or month, but last year), etc.
Multidimensional Data Access LanguageMDX (Multidimensional Expressions) is a query language for easy and efficient access to multidimensional data structures. And that says it all - there will be a few examples below.
Key Performance Indicators (KPI)Key performance indicators are a financial and non-financial assessment system that helps an organization determine the achievement of strategic goals. Key performance indicators can be fairly simply defined in OLAP systems and used in reports.
Date miningIntelligent data analysis (Data Mining) - in fact, identifying hidden patterns or relationships between variables in large data arrays.
The English term “Data Mining” does not have an unambiguous translation into Russian (data mining, data dissemination, information mining, data / information extraction), therefore in most cases it is used in the original. The most successful indirect translation is considered the term "data mining" (IAD). However, this is a separate, no less interesting topic for consideration.
Layered cachingActually, to ensure the highest data access speed, except for clever data structures and preaggregations, OLAP systems support multi-level caching. In addition to caching simple queries, parts of data read from the data warehouse, aggregated values, calculated values ​​are also cached. Thus, the longer you work with an OLAP cube, the faster it, in fact, begins to work. There is also the concept of "warming up the cache" - an operation that prepares the OLAP system to work with specific reports, queries, or all together.
Multi-language supportYes Yes Yes. At a minimum, Analysis Services 2005/2008 (admittedly, Enterprise Edition) natively supports multilingualism. It is enough to provide the translation of the string parameters of your data, and the localized data will be sent to the client who has specified his own language.
Multidimensional cubes
So all the same these multidimensional cubes?
Imagine a 3-dimensional space, in which the axes of time, goods and buyers.
A point in this space will be asked by the fact that one of the buyers bought a particular product in one month.

In fact, the plane (or the set of all such points) will be a cube, and, accordingly, Time, Goods and Buyers will be its dimensions.
Providing (and drawing) a four-dimensional and more cube is a bit more difficult, but the essence does not change, and most importantly, for OLAP systems it does not matter how many dimensions you will work (within reasonable limits, of course).
Some mdx
So, what is the beauty of MDX is most likely that you need to describe not how we want to choose the data, but
what we want.
For example,
SELECT
{ [Measures].[Units] } ON COLUMNS,
{ [ Time ].[June, 2009], [ Time ].[July, 2009] } ON ROWS
FROM [Sales]
WHERE ([Product].[iPhone], [Country].[Mozambik])
* This source code was highlighted with Source Code Highlighter .
SELECT
{ [Measures].[Units] } ON COLUMNS,
{ [ Time ].[June, 2009], [ Time ].[July, 2009] } ON ROWS
FROM [Sales]
WHERE ([Product].[iPhone], [Country].[Mozambik])
* This source code was highlighted with Source Code Highlighter .
SELECT
{ [Measures].[Units] } ON COLUMNS,
{ [ Time ].[June, 2009], [ Time ].[July, 2009] } ON ROWS
FROM [Sales]
WHERE ([Product].[iPhone], [Country].[Mozambik])
* This source code was highlighted with Source Code Highlighter .
Which means - I want the number of iPhones sold in Mozambique in June and July.
At the same time I describe exactly
what data I want and
how exactly I want to see them in the report.
Beautiful, isn't it?
But a little more complicated:
WITH MEMBER AverageSpend AS
[Measures].[Amount] / [Measures].[ Transaction Count ]
SELECT
{ AverageSpend } ON COLUMNS,
{ [Customer].[Sex].[Female], [Customer].[Sex].[Male] } ON ROWS
FROM [Sales]
WHERE ([Shop].[Apple])
* This source code was highlighted with Source Code Highlighter .
WITH MEMBER AverageSpend AS
[Measures].[Amount] / [Measures].[ Transaction Count ]
SELECT
{ AverageSpend } ON COLUMNS,
{ [Customer].[Sex].[Female], [Customer].[Sex].[Male] } ON ROWS
FROM [Sales]
WHERE ([Shop].[Apple])
* This source code was highlighted with Source Code Highlighter .
In fact, first we define the formula for calculating the “average purchase size” and try to compare - who (what sex), spends more money for one visit to the Apple store.
The language itself is extremely interesting to learn and use, and, perhaps, deserves a lot of discussion.
Conclusion
In fact, this article covers very little even basic concepts, I would call it an “appetizer” - an opportunity to interest the habr-community with this topic and develop it further. As for the development, there is a huge plowed field, and I will be happy to answer all the questions of interest.
PS This is my first post on OLAP and the first publication on Habré - I will be very grateful for the constructive feedback.
Update: Transferred to SQL, transferred to OLAP as soon as allowed to create new blogs.