Good afternoon, dear readers.
Today I will tell you about how you can build a simple data analysis system in Python. This will help me framework
cubes and
cubesviewer package.
Subes is a framework for working with multidimensional data using Python. In addition, it includes an OLAP HTTP server for simplified reporting application development and general data viewing.
ubesviewer is a web-interface for working with the above server.
Install and configure cubes
First you need to install the libraries necessary for the package to work:
pip install pytz python-dateutil jsonschema pip install sqlalchemy flask
Next, install the
cubes package itself:
')
pip install cubes
As practice has shown, it is better to use the version (1.0alpha2) from the current
repository .
Additional settings under windows
If you plan to work under Windows, in the
{PYTHON_DIR} \ Lib \ site-packages \ dateutil \ tz.py file, replace the 40 line:
return myfunc(*args, **kwargs).encode()
on
return myfunc(*args, **kwargs)
Then, regardless of the platform you are working on, you need to add the following
fix for the json parser to work correctly. You need to
add it to
{PYTHON_DIR} \ Lib \ site-packages \ cubes-1.0alpha-py2.7.egg \ cubes \ metadata.py starting from line 90:
elif len(parts.scheme) == 1 and os.path.isdir(source):
Description of the cube configuration and the process of its deployment
For example, take the OLAP-cube, which comes in delivery with
cubes . It is located in the
examples / hello_world folder (you can take it from the repository).
Of greatest interest to us are 2 files:
- slicer.ini - http cube server settings file
- model.json - cube model description file
Let us dwell on them in more detail. We start with the file
slicer.ini , which may include the following sections:
[workspace]
- configuration of the workplace[server]
- server parameters (address, port, etc.)[models]
- list of models to download[datastore] [store]
- data storage options[translations]
- localization settings for the model.
So, we will analyze from our test file that the server will be located on the local machine and will work on port 5000. The local SQLite database called data.sqlite will be used as storage.
More information about configuring the server can be
found in the documentation.
Also from the file it is clear that the description of the model of our cube is in the
model.json file, the description of the structure of which we will now deal with.
The description file for the model is a
json file that includes the following logical sections:
name
- model namelabel
- labeldescription
- description of the modellocale
- the locale for the model (if localization is specified)cubes
- list of cubes metadatadimensions
- a list of measurement metadatapublic_dimensions
- list of available dimensions. By default, all measurements are available.
We are interested in the sections
cubes and
dimensions , since all others are optional.
The
dimensions list item contains the following metadata:
Key | Description |
---|
name | measurement ID |
label | Dimension name visible to user |
description | measurement description for users |
levels | List of measurement levels |
hierarchies | Hierarchy list |
default_hierarchy_name | Hierarchy ID |
The
cubes list item contains the following metadata:
Key | Description |
---|
name | measurement ID |
label | Dimension name visible to user |
description | measurement description for users |
dimensions | list of dimension names given above |
measures | list of measures |
aggregates | list of aggregation function measures |
mappings | markup of logical and physical attributes |
Based on the above, it can be understood that we will have 2 dimensions in the model (
item, year ). The measurement
item has 3 measurement levels:
- category . Display name "Category", fields "category", "category_label"
- subcategory . Display name "Sub-category", fields "subcategory", "subcategory_label"
- line_item . Display name "Line Item", field "line_item"
The
“amount” field will act as a measure in our cube, for which the functions of the sum and counting the number of lines are performed.
Read more about the markup of the cube model in the
documentation.After we figured out the settings, we need to create our test database. In order to do this, you need to run the
prepare_data.py script:
python prepare_data.py
Now we just have to start our test server with a cube called
slicer :
slicer serve slicer.ini
After that you can check the performance of our cube. To do this, in the browser, you can enter:
localhost : 5000 / cube / irbd_balance / aggregate? drilldown = yearIn response, we will get a json object with the result of the aggregation of our data. Read more about the server response format
here .
Install cubesviewer
When we set up our cube, you can proceed with the installation of
subesviewer . To do this, copy the
repository to your disk:
git clone https://github.com/nonsleepr/cubesviewer.git
And then just move the contents of the
/ src folder to the right place.
It should be noted that
cubesviewer is a Django application, so
Django (not higher than version 1.4) is required for its operation , as well as the
requests and
django-piston packages. Since This version of Django is already out of date, then I gave a link above from where you can get a subview for Django version 1.6.
Installing it is a little different from the original in that you need to add a line to the
[server]
section of the
[server]
configuration file
slicer.ini
allow_cors_origin: localhost:8000
After that, you need to
configure the application in the file
{CUBESVIEWER_DIR} /web/cvapp/settings.py . By specifying the database settings, the address of the OLAP server (variable
CUBESVIEWER_CUBES_URL
) and the address of the
CUBESVIEWER_BACKEND_URL
(
CUBESVIEWER_BACKEND_URL
)
It remains to make a small
fix in
dajno-pistonNow you can synchronize our application with the database. To do this from
{CUBESVIEWER_DIR} / web / cvapp you need to run:
python manage.py syncdb
It remains to run the local server Django
python manage.py runserver
Now it remains to enter the address specified in
CUBESVIEWER_BACKEND_URL
via the browser. And enjoy the finished result.

Conclusion
To illustrate the work, I took the simplest example. It should be noted that for production projects
cubes can be
deployed for example on
apache or
uswgi . Well, it will not be difficult to connect
Subesviewer to it using this article.
If the topic is interesting to the community, then I will reveal it in one of the future articles.