📜 ⬆️ ⬇️

Control of mobile communication expenses within the organization: implementation

image

I would like to tell you some technical details of creating a mobile spending control system, the concept of which was described in a previous post.
As an example, we will consider the invoice for one of the months from MTS and create a simple OLAP project.
Judging by the responses, this topic is quite relevant, so who are interested, I ask under the cat.


We take information on expenses from the invoice file, or rather from its consolidated part for each number:
image
')
From the file we extract:

We also extract from the account “header”:


We will skip the description of the parser for extracting all this data, due to the fact that in each region and each provider the format of the accounts will differ and universal software simply can not be.

You can pay attention to some features, namely:


We import data into the sql table in accordance with the above components.
Raw data data looks like this:

image

If you select all the unique values ​​by the type of traffic, then we get a list of several hundred lines, which can in no way contribute to the comfortable work of the analyst.
Below is only a small part of the data. You can see, for example, that within network roaming is indicated for each city separately.

image

It is necessary to bring data on the type of connection to the standard 10-15 values.
This can be done through the case statement, analyzing all possible variants of one value, for example:

CASE WHEN rom.SERVICEDESCRIPTION LIKE '%%sms%' THEN ' ' WHEN (rom.SERVICEDESCRIPTION LIKE '%sms%' AND rom.SERVICEDESCRIPTION NOT LIKE '%%') THEN '' WHEN (((rom.SERVICEDESCRIPTION LIKE '%gprs%' OR rom.SERVICEDESCRIPTION LIKE '% %') and rom.SERVICEDESCRIPTION not LIKE '%hsdpa%') AND rom.SERVICEDESCRIPTION NOT LIKE '%%') THEN 'gprs' WHEN ((rom.SERVICEDESCRIPTION LIKE '%%gprs%' OR rom.SERVICEDESCRIPTION LIKE '%% %') and rom.SERVICEDESCRIPTION not LIKE '%%hsdpa%') THEN ' gprs' WHEN ((rom.SERVICEDESCRIPTION LIKE '%hsdpa%') AND rom.SERVICEDESCRIPTION NOT LIKE '%%') THEN 'hsdpa' WHEN (rom.SERVICEDESCRIPTION LIKE '%%hsdpa%') THEN ' hsdpa' WHEN ((rom.SERVICEDESCRIPTION LIKE '% %' OR rom.SERVICEDESCRIPTION LIKE '%%%' OR rom.SERVICEDESCRIPTION LIKE '%. %' OR rom.SERVICEDESCRIPTION LIKE '% , , WAP  %') AND rom.SERVICEDESCRIPTION NOT LIKE '%%') THEN ' ' WHEN (rom.SERVICEDESCRIPTION LIKE '%% %' OR rom.SERVICEDESCRIPTION LIKE '%%%%' OR rom.SERVICEDESCRIPTION LIKE '%%. %' OR rom.SERVICEDESCRIPTION LIKE '% .%') THEN '  ' WHEN rom.SERVICEDESCRIPTION LIKE '%% %' THEN '  ' WHEN ((rom.SERVICEDESCRIPTION LIKE '% %') AND rom.SERVICEDESCRIPTION NOT LIKE '%%') THEN ' ' WHEN rom.SERVICEDESCRIPTION LIKE '% %' THEN ' ' WHEN rom.SERVICEDESCRIPTION LIKE '%%' THEN '' ELSE '' END 


As a result, we obtain the standard values:
image

Next, we need to convert text values ​​by the amount of traffic to numeric
Using the same case statement

 CASE WHEN rom.serviceqtydescription LIKE '0.00 ' THEN '0' WHEN rom.serviceqtydescription LIKE '%.%.%' THEN '1' WHEN rom.serviceqtydescription LIKE ' ' OR rom.serviceqtydescription LIKE '' THEN '0' WHEN rom.serviceqtydescription LIKE '%' THEN CAST(replace(rom.serviceqtydescription, ' ', '') AS numeric(20, 2)) WHEN rom.serviceqtydescription LIKE '%.' THEN CAST(replace(rom.serviceqtydescription, ' .', '') AS numeric(20, 2)) WHEN rom.serviceqtydescription LIKE '%. 0 .' THEN CAST(replace(rom.serviceqtydescription, ' . 0 .', '') AS numeric(20, 2)) WHEN rom.serviceqtydescription LIKE '%' THEN CAST(replace(rom.serviceqtydescription, ' ', '') AS numeric(20, 2)) / 60 WHEN rom.serviceqtydescription LIKE '%' THEN CAST(replace(rom.serviceqtydescription, ' ', '') AS numeric(20, 2)) WHEN rom.serviceqtydescription LIKE '%' THEN CAST(replace(replace(rom.serviceqtydescription, ' ', ''),',','.') AS numeric(20, 2)) / 1024 WHEN (rom.serviceqtydescription LIKE '%.% ' AND rom.serviceqtydescription NOT LIKE '0.0 ') THEN CAST(replace(rom.serviceqtydescription, ' ', '') AS numeric(20, 2)) * 1024 WHEN rom.serviceqtydescription LIKE '%' THEN CAST(replace(rom.serviceqtydescription, ' ', '') AS numeric(20, 2)) when rom.serviceqtydescription = ' ' then 0 ELSE CAST(replace(rom.serviceqtydescription, ' ', '') AS numeric(20, 2)) END AS 

We get: SMS in pieces, voice in minutes, data transmission in killobytes

The final table looks like this.

image

Now the data on the amounts and traffic amenable to arithmetic operations.
In this form, the data is already quite amenable to analysis either from the managment studio window, or through the Excel interface, connecting the database as an external data source.
As you understood, we made a fact table for the future of OLAP Cuba.
To create a simple Cuba, you need to select the measurements from the fact table:


It is highly desirable to output the data into separate tables — directories with a separate primary key of the int format and replace the values ​​in the fact table with the corresponding values ​​from the directories. Values ​​in reference books should be unique.
All of the above does not apply to the directory of time, it is generated on the server separately.

We start the creation of the OLAP project

In BIDS, you need to create a new data mining project.

Create a new data source named DWH

image

Create a data source view where we add our tables.

image
In this example, there is no provider table, but it is added in the same way. And also I didn’t follow the recommendations with the keys on the field of the int format - it will work but with large amounts of data it will affect the speed.

I will not describe the creation of the time dimension, there is enough information in the network about this.

The next step is to create a cube, for this we launch the creation wizard.
On the first screen, select - use existing tables.
Next you need to select our measurement table and follow the wizard.

image

image

image

image

After creating the cube, measurements will automatically be added.
The cube will take the following form:

image

Also, you need to double-click the time dimension and drag the necessary time components to the attributes section:

image

And also, if necessary, create hierarchies

image

You can begin to deploy the cube on the server, to do this, click on the project and start processing.
After processing, you can connect to the cube via Excel, it is done this way:
Data - from other sources - from analytics services, in the wizard we enter the name of the server to which we deployed the cube.

The solution described is not perfect, experts in the sql field, and BI will point out many flaws, but I hope that this description will allow people who are not familiar with building a business analytics system to start doing this.

Source: https://habr.com/ru/post/157871/


All Articles