📜 ⬆️ ⬇️

Organizing OLAP Cube with Python Tools

Good afternoon, dear readers.
Today I will tell you about how you can build a simple data analysis system in Python. This will help me framework cubes and cubesviewer package.
Subes is a framework for working with multidimensional data using Python. In addition, it includes an OLAP HTTP server for simplified reporting application development and general data viewing.
ubesviewer is a web-interface for working with the above server.


Install and configure cubes


First you need to install the libraries necessary for the package to work:

pip install pytz python-dateutil jsonschema pip install sqlalchemy flask 

Next, install the cubes package itself:
')
 pip install cubes 

As practice has shown, it is better to use the version (1.0alpha2) from the current repository .

Additional settings under windows

If you plan to work under Windows, in the {PYTHON_DIR} \ Lib \ site-packages \ dateutil \ tz.py file, replace the 40 line:
 return myfunc(*args, **kwargs).encode() 

on

 return myfunc(*args, **kwargs) 

Then, regardless of the platform you are working on, you need to add the following fix for the json parser to work correctly. You need to add it to {PYTHON_DIR} \ Lib \ site-packages \ cubes-1.0alpha-py2.7.egg \ cubes \ metadata.py starting from line 90:

 elif len(parts.scheme) == 1 and os.path.isdir(source): # TODO: same hack as in _json_from_url return read_model_metadata_bundle(source) 


Description of the cube configuration and the process of its deployment


For example, take the OLAP-cube, which comes in delivery with cubes . It is located in the examples / hello_world folder (you can take it from the repository).
Of greatest interest to us are 2 files:

Let us dwell on them in more detail. We start with the file slicer.ini , which may include the following sections:

So, we will analyze from our test file that the server will be located on the local machine and will work on port 5000. The local SQLite database called data.sqlite will be used as storage.
More information about configuring the server can be found in the documentation.
Also from the file it is clear that the description of the model of our cube is in the model.json file, the description of the structure of which we will now deal with.
The description file for the model is a json file that includes the following logical sections:

We are interested in the sections cubes and dimensions , since all others are optional.
The dimensions list item contains the following metadata:
KeyDescription
namemeasurement ID
labelDimension name visible to user
descriptionmeasurement description for users
levelsList of measurement levels
hierarchiesHierarchy list
default_hierarchy_nameHierarchy ID

The cubes list item contains the following metadata:
KeyDescription
namemeasurement ID
labelDimension name visible to user
descriptionmeasurement description for users
dimensionslist of dimension names given above
measureslist of measures
aggregateslist of aggregation function measures
mappingsmarkup of logical and physical attributes

Based on the above, it can be understood that we will have 2 dimensions in the model ( item, year ). The measurement item has 3 measurement levels:

The “amount” field will act as a measure in our cube, for which the functions of the sum and counting the number of lines are performed.
Read more about the markup of the cube model in the documentation.
After we figured out the settings, we need to create our test database. In order to do this, you need to run the prepare_data.py script:

 python prepare_data.py 

Now we just have to start our test server with a cube called slicer :

 slicer serve slicer.ini 

After that you can check the performance of our cube. To do this, in the browser, you can enter:
localhost : 5000 / cube / irbd_balance / aggregate? drilldown = year
In response, we will get a json object with the result of the aggregation of our data. Read more about the server response format here .

Install cubesviewer


When we set up our cube, you can proceed with the installation of subesviewer . To do this, copy the repository to your disk:

 git clone https://github.com/nonsleepr/cubesviewer.git 

And then just move the contents of the / src folder to the right place.
It should be noted that cubesviewer is a Django application, so Django (not higher than version 1.4) is required for its operation , as well as the requests and django-piston packages. Since This version of Django is already out of date, then I gave a link above from where you can get a subview for Django version 1.6.
Installing it is a little different from the original in that you need to add a line to the [server] section of the [server] configuration file slicer.ini allow_cors_origin: localhost:8000
After that, you need to configure the application in the file {CUBESVIEWER_DIR} /web/cvapp/settings.py . By specifying the database settings, the address of the OLAP server (variable CUBESVIEWER_CUBES_URL ) and the address of the CUBESVIEWER_BACKEND_URL ( CUBESVIEWER_BACKEND_URL )
It remains to make a small fix in dajno-piston
Now you can synchronize our application with the database. To do this from {CUBESVIEWER_DIR} / web / cvapp you need to run:

 python manage.py syncdb 

It remains to run the local server Django

 python manage.py runserver 

Now it remains to enter the address specified in CUBESVIEWER_BACKEND_URL via the browser. And enjoy the finished result.


Conclusion


To illustrate the work, I took the simplest example. It should be noted that for production projects cubes can be deployed for example on apache or uswgi . Well, it will not be difficult to connect Subesviewer to it using this article.
If the topic is interesting to the community, then I will reveal it in one of the future articles.

Source: https://habr.com/ru/post/222421/


All Articles