Organizing OLAP Cube with Python Tools

Good afternoon, dear readers.
Today I will tell you about how you can build a simple data analysis system in Python. This will help me framework cubes and cubesviewer package.
Subes is a framework for working with multidimensional data using Python. In addition, it includes an OLAP HTTP server for simplified reporting application development and general data viewing.
ubesviewer is a web-interface for working with the above server.

Install and configure cubes

First you need to install the libraries necessary for the package to work:

pip install pytz python-dateutil jsonschema pip install sqlalchemy flask

Next, install the cubes package itself:
')

 pip install cubes

As practice has shown, it is better to use the version (1.0alpha2) from the current repository .

Additional settings under windows

If you plan to work under Windows, in the {PYTHON_DIR} \ Lib \ site-packages \ dateutil \ tz.py file, replace the 40 line:

 return myfunc(*args, **kwargs).encode()

 return myfunc(*args, **kwargs)

Then, regardless of the platform you are working on, you need to add the following fix for the json parser to work correctly. You need to add it to {PYTHON_DIR} \ Lib \ site-packages \ cubes-1.0alpha-py2.7.egg \ cubes \ metadata.py starting from line 90:

 elif len(parts.scheme) == 1 and os.path.isdir(source): # TODO: same hack as in _json_from_url return read_model_metadata_bundle(source)

Description of the cube configuration and the process of its deployment

For example, take the OLAP-cube, which comes in delivery with cubes . It is located in the examples / hello_world folder (you can take it from the repository).
Of greatest interest to us are 2 files:

slicer.ini - http cube server settings file
model.json - cube model description file

Let us dwell on them in more detail. We start with the file slicer.ini , which may include the following sections:

[workspace] - configuration of the workplace
[server] - server parameters (address, port, etc.)
[models] - list of models to download
[datastore] [store] - data storage options
[translations] - localization settings for the model.

So, we will analyze from our test file that the server will be located on the local machine and will work on port 5000. The local SQLite database called data.sqlite will be used as storage.
More information about configuring the server can be found in the documentation.
Also from the file it is clear that the description of the model of our cube is in the model.json file, the description of the structure of which we will now deal with.
The description file for the model is a json file that includes the following logical sections:

name - model name
label - label
description - description of the model
locale - the locale for the model (if localization is specified)
cubes - list of cubes metadata
dimensions - a list of measurement metadata
public_dimensions - list of available dimensions. By default, all measurements are available.

We are interested in the sections cubes and dimensions , since all others are optional.
The dimensions list item contains the following metadata:

Key	Description
name	measurement ID
label	Dimension name visible to user
description	measurement description for users
levels	List of measurement levels
hierarchies	Hierarchy list
default_hierarchy_name	Hierarchy ID

The cubes list item contains the following metadata:

Key	Description
name	measurement ID
label	Dimension name visible to user
description	measurement description for users
dimensions	list of dimension names given above
measures	list of measures
aggregates	list of aggregation function measures
mappings	markup of logical and physical attributes

Based on the above, it can be understood that we will have 2 dimensions in the model ( item, year ). The measurement item has 3 measurement levels:

category . Display name "Category", fields "category", "category_label"
subcategory . Display name "Sub-category", fields "subcategory", "subcategory_label"
line_item . Display name "Line Item", field "line_item"

The “amount” field will act as a measure in our cube, for which the functions of the sum and counting the number of lines are performed.
Read more about the markup of the cube model in the documentation.
After we figured out the settings, we need to create our test database. In order to do this, you need to run the prepare_data.py script:

 python prepare_data.py

Now we just have to start our test server with a cube called slicer :

 slicer serve slicer.ini

After that you can check the performance of our cube. To do this, in the browser, you can enter:
localhost : 5000 / cube / irbd_balance / aggregate? drilldown = year
In response, we will get a json object with the result of the aggregation of our data. Read more about the server response format here .

Install cubesviewer

When we set up our cube, you can proceed with the installation of subesviewer . To do this, copy the repository to your disk:

 git clone https://github.com/nonsleepr/cubesviewer.git

And then just move the contents of the / src folder to the right place.
It should be noted that cubesviewer is a Django application, so Django (not higher than version 1.4) is required for its operation , as well as the requests and django-piston packages. Since This version of Django is already out of date, then I gave a link above from where you can get a subview for Django version 1.6.
Installing it is a little different from the original in that you need to add a line to the [server] section of the [server] configuration file slicer.ini allow_cors_origin: localhost:8000
After that, you need to configure the application in the file {CUBESVIEWER_DIR} /web/cvapp/settings.py . By specifying the database settings, the address of the OLAP server (variable CUBESVIEWER_CUBES_URL ) and the address of the CUBESVIEWER_BACKEND_URL ( CUBESVIEWER_BACKEND_URL )
It remains to make a small fix in dajno-piston
Now you can synchronize our application with the database. To do this from {CUBESVIEWER_DIR} / web / cvapp you need to run:

 python manage.py syncdb

It remains to run the local server Django

 python manage.py runserver

Now it remains to enter the address specified in CUBESVIEWER_BACKEND_URL via the browser. And enjoy the finished result.

Conclusion

To illustrate the work, I took the simplest example. It should be noted that for production projects cubes can be deployed for example on apache or uswgi . Well, it will not be difficult to connect Subesviewer to it using this article.
If the topic is interesting to the community, then I will reveal it in one of the future articles.

Source: https://habr.com/ru/post/222421/

All Articles