In this article, I would like to describe the main stages of building an analytical reporting system using MS SQL Server 2008 R2 in an organization that uses the
OLTP accounting system on the
1C platform. This article describes my first experience building a Business Intelligence solution.

General input
The company in which I work is engaged in wholesale trade and consists of approximately 30 offices located in regions of Russia. Each office has an information database 1C, in which data on sales are recorded. The organization uses two types of 1C database
configurations . One configuration is used in the central office in Moscow, the second - in the branches (in the regions of Russia). Microsoft SQL Server 2008 R2 (SP2) Standard Edition (64-bit) is used as a DBMS supporting the operation of 1C systems. There is no uniform general reference information (NSI). Reference "Products" and some other directories, which are classifiers of products and counterparties, are synchronized by code or other identifier that are stored in 1C systems. One of the main reports used by the organization is the sales report. The existing report on sales allows you to extract data only from the system in which it is formed. The generated reports are uploaded to MS Excel, where they are further processed. In connection with the growth of the company and the emergence of new offices, management has set a task for the IT department to develop a consolidated report that allows automatically obtaining information on sales across all offices of the organization.
Business requirements
The main requirement of the business was the automatic generation of a sales report for all offices of the company. In addition, the report should contain data on the number and amount of sales in the following analytical sections:
- Period (year, quarter, month, day).
- Products (including attributes that classify products).
- Counterparties (including attributes that classify counterparties).
The report should allow to impose filters on a sample of any of the analytical sections. As a filter, an arbitrary number of values can be specified. The report should be formed no longer than a minute. Report generation should not significantly affect the performance of 1C accounting systems. The implementation and further maintenance of the report should be minimally costly.
')
Preliminary assessment and decision selection
Based on the input data and requirements, the customer was offered the following solution:
- Develop a data warehouse that includes all the information necessary to generate a consolidated sales report.
- Deploy data storage to a SQL Server instance of the SQL Server Database Engine in a central office.
- Develop a multidimensional data model containing measures and dimensions necessary to generate a sales report.
- Deploy a multidimensional database containing a multidimensional model on an instance of SQL North SSAS in the central office.
- Develop ETL- packages SSIS , which will be used to update the data in the data warehouse and in a multidimensional database.
- Deploy SSIS packages to the SQL Server SSIS instance at the central office.
- Provide automatic execution of SSIS packages with notification by e-mail to technical support specialists on the status of package execution.
- Provide company employees with access to a multidimensional database for generating a consolidated sales report using the PivotTable Report object in MS Excel.
- Perform training for sales reporting staff.
Solution implementation
Stage number 1. Collection of information about data sources in 1C systems. Creating views to get access to the necessary data
Before starting to design the repository, I created views (View) in SQL databases that support the operation of 1C systems. I got two sets of views: a set for the database in the central office (see Fig. 1) and a set for the databases in the branches (Fig. 2). Let me remind you that the structure of the databases in the branches of the organization is the same, but differs from the structure of the database in the central office.
Fig. 1. Views in the central office SQL database
Fig. 2. Representations in branch office SQL databasesThe composition of representations in the central office and branches turned out to be different, as part of the NSI is common and is stored in full in a database at the central office. In particular, we are talking about views:
- dbo.ChainStores (Trading networks of clients).
- dbo.Countries (Classifier of countries of the world).
- dbo.Products (Products).
- dbo.ProductAnalogs (Product Analogs).
- dbo.ProductTypes (Product Type Classifier).
- dbo.Projects (Classifier types of clients).
- dbo.ProjectsForProductMatrix (Classifier of product types).
- dbo.CrossProductsAndProjectsForProductMatrix (a representation for providing a many-to-many relationship between the dbo.Products and dbo.ProjectsForProductMatrix views).
Creating views in SQL databases allows you to make the solution more universal. For example, when changing the structure of tables in 1C databases, we will not have to make changes to ETL packages, it will suffice to redo the views.
Stage number 2. Development of data warehouse structure. Deploy Data Warehouse
Having completed the first stage, we can easily obtain information on the composition and types of data stored in data sources, and design the structure of the repository. To do this, just look at the types of views columns. For example, the dbo.Clients view is as follows.
Fig. 3. Representation of dbo.ClientsNotice that in the dbo.Clients view, there is a ParentId field. With the help of this field, we can later build the Parent-child hierarchy in the multidimensional data model for the “Clients” dimension. A similar field is present in the dbo.Products and dbo.Managers views.
Before you start designing a data warehouse, you need to decide on its layout. There are two data storage schemes -
star and
snowflake . Both schemes have their pros and cons, and their comparison is beyond the scope of this article. I chose the snowflake scheme, guided by the fact that when moving to SQL Server 2012 and using self-service BI in the future, users will probably find it more convenient to operate with more normalized data from the data store when developing their own data models in
PowerPivot for Excel . The structure of the data warehouse developed by me is shown in the following figure.
Fig. 4. The structure of the data warehouseThe dim.DimDates (dates), dim.DimOffices (offices), dim.DimRegions (regions of Russia) tables were filled once and do not require automatic updates. The dim.DimOffices table contains the names of the company's offices. The dim.DimDates table contains the date information for the corresponding measurement in the multidimensional data model. The dimension tables contain a
surrogate key that serves as the primary key. This is due to the fact that the keys of records in different data sources may overlap.
Stage number 3. Development of a multidimensional data model. Deploy Multidimensional Database
When creating a multidimensional data model, a
Data Source View was created in which all tables from the data warehouse were included, except for the stage.FactSales table. This table will only be used to temporarily store sales data before being loaded into the fact.FactSales
fact table.
In the Sales cube, two measure groups are implemented (see Figure 5).
Fig. 5. MeasuresThe Cross Products And Projects For Product Matrix measure group provides a many-to-many relationship between the Product and Sales Channel dimensions for the product matrix.
The list of measurements is shown in Figure 6.
Fig. 6. MeasurementsFor the measurement Products, Clients, Managers, the parent-child hierarchy is implemented.
Fig. 7. Measuring GoodsTo control access to the multidimensional database, the Analists role has been created, which is granted Read and Drillthrough rights for the Sales cube. Drillthrough rights allow users to get a transcript with information about how cell values were calculated in a report.
Fig. 8. The Role of AnalistsTo deploy a multidimensional database on the server, specify the name of the SSAS SQL server instance, the name of the database on the server in the project properties, and click Deploy in the
BIDS menu. Connect to the SSAS instance using
SMS and see that the multidimensional database was created.
Fig. 9. Multidimensional Sales OLAP databaseStage number 4. Development of ETL packages. Deploy ETL packages. Setting up automatic execution of ETL packages
The most time-consuming stage in the design of Business Intelligence solutions is the development of ETL packages. This is due to the fact that data sources, as a rule, have a different structure, and the data stored in them contain errors and have a different format. For example, the gender of an employee in different databases, may be represented by the letters M and F or the numbers 0 and 1, and before loading this data into the repository, it is necessary to perform their cleaning and reduction to a general form. In addition, only data that has been entered or modified since the last download should be updated in the data warehouse. These are only the main difficulties, in fact, they are much more. However, thanks to the SSIS tools, most of these problems can be solved. In my implementation, the data in the dimension tables is updated completely, i.e. new entries are added, and existing entries are overwritten. The fact table is cleared and populated again for a default period of three months. The update depth of the fact table in months is stored in the SSIS package configuration, which is a separate table in the data warehouse.
Fig. 10. SSIS packagesFigure 10 shows 4 SSIS packages, the purpose of which is the following:
- Update DW and Process Sales OLAP.dtsx is a master package that implements the common logic of the ETL process and which runs all other packages.
- Import Dimensions and Facts from Moscow.dtsx - a package for loading data into dimension and fact tables from the central office database to the data warehouse.
- Import Dimensions and Facts from Filials.dtsx - a package for loading data into dimension and fact tables from branch office databases to data storage.
- Process Sales OLAP.dtsx - a package that performs data update (processing) in a multidimensional database.
The logic (Control Flow) of the master package is as follows (see Figure 11).
Fig. 11. Package Update DW and Process Sales OLAPConsider each element of this scheme:
- First, Set Package's Variables Values (Execute SQL Task) is executed. The task of this element is to get the values from the package configuration and write them into package variables. The package configuration also stores information about the depth of the fact table update in months. Package configurations are stored in a separate table in the database repository and are subject to change by IT specialists.
- Next, Insert Default Values In Dimensions (Execute SQL Task) checks and populates the data store with empty elements. For example, in the dim.DimProducts table, after completing this task, an element with the identifier (Id) equal to zero should appear. Entries with zero identifiers will be created in all dimension tables to ensure the logical integrity of the data, since all the fields of the fact table are NOT NULL and have a default value of zero. The presence of NULLs in the fact table leads to errors when processing a multidimensional database.
- Get List of Source OLTP Databases in Moscow (Execute SQL Task) gets a list of the central office databases (in my case there is only one such database, but for a more universal solution, I assumed that there could be several of them). The list of databases is stored in the dim.DimOffices table. Also in this table are stored database connection strings. The resulting sample is written to the package variable.
- For All OLTP Databases in Moscow (Foreach Loop Container) bypasses the sample obtained in the previous step, and for each row of the sample (that is, for each database) performs the Import Dimensions and Facts from Moscow.dtsx package. Parameters are transferred from the master package to the package being called by setting the package configuration values that the Set Package Configurations task (Execute SQL Task) performs.
- The next two steps are: Get List of the Source List (OLAP) for DTP in Filials (Execute Package Task) and For All OLTP (Databases in Filials (Foreach Loop Container)) are the same as the previous two, only for branch databases.
- The last step of the Process Sales OLAP (Execute Package Task) is to launch a data update package in a multidimensional database.
The packages described above are deployed on the SSIS SQL Server instance. To launch the master package automatically on the SQL server, an Update DW and Process Sales OLAP task was created (see Figure 12).
Fig. 12. SQL Job to run the SSIS packageTo monitor the implementation of the ETL process, the task is configured to notify support staff by e-mail about the completion of the task (see Figure 13).
Fig. 13 Setting up a job completion notification by e-mailStage number 5. Providing access to multidimensional database
Access to the multidimensional database is provided to employees of the organization through the inclusion of their domain accounts in the Analists role of the multidimensional database using SMS (see Figure 14).
Fig. 14. Membership in the role of AnalistsStage number 6. Training staff
For user training, a 15-minute video was recorded, which demonstrated the capabilities of MS Excel, allowing you to connect to a multidimensional database and build a report using the PivotTable Report object. One of the possible report options is shown in Figure 15.
Fig. 15. Example of a PivotTable Report in Excelfindings
Customer requirements were fully implemented. Beta testing was carried out by key users of the company who generate daily sales reports. In their report, key users described the created solution as very convenient, fast and sufficient to conduct a comprehensive analysis of sales. To evaluate the solution I cite some numbers:
- 40 man-hours were spent on the implementation of this decision. All described was performed by one person, i.e. me. I previously attended courses and successfully passed Microsoft exams, having received the certificate of Microsoft Certified Solutions Expert in the field of Business Intelligence.
- The fact table in the working database contains ~ 40 million rows.
- The ETL process takes about 20 minutes.
- Report generation is performed within a few seconds.