šŸ“œ ā¬†ļø ā¬‡ļø

"Kings of Mathematics": Big Data Analytics in a bank. Project GAUSS in VTB

Which bank proposals for opening current accounts and deposits can be considered successful, and which ones should be improved? What can be improved in the procedure of conducting currency exchange operations and in remote banking services? We in the VTB Transactional Business Department are constantly working on finding answers to these questions. How the IT development strategy helps us in this and how customers benefit from it - read under the cut.



How quickly count the sum of numbers from 1 to 100? According to legend, the great German mathematician Karl Friedrich Gauss was the first to solve this problem while still a schoolboy. He noticed that pairwise from opposite ends are the same: 1 + 100 = 101, 2 + 99 = 101, etc., and instantly received the result 50x101 = 5050, demonstrating excellent analytical skills.
')
Repetitive data processing tasks that occur daily in a modern bank are much more complicated than the task that the future ā€œking of mathematicsā€ did at the end of the 8th century. However, the approach to their solution has not changed since then. As before, to get results faster and increase its accuracy, you need to automate processes.

Building financial forecasts, creating analytical reports, analyzing trends and risks without implementing Big Data solutions is the same as counting the sum of numbers from 1 to 100, adding the numbers one by one. The pilot project of GAUSS (GAUSS, Global Transaction Business Analytic Union Source & System), launched in the Department of Transactional Business at VTB earlier this year, helps to gather together all the information from various databases of the bank and automate work with it.

What is the twenty-first century GAUSS?

A modern bank has a huge amount of data on all transactions, and their volumes are constantly growing. This information is of great value, but in order not to drown in it, you need to learn how to use it correctly.

The GAUSS project began with the consolidation of all information available in the bank for 2014-2016 and the implementation of convenient access to it. Employees working with the system can at any time obtain their materials of interest on an unlimited combination of parameters and options. This means that it takes a couple of hours to prepare reports, rather than a few days, as before, the efficiency of employees increases. Based on the reports, decisions are made to improve the quality of customer service, create more interesting offers, etc.

Further it is planned to develop the project, expanding the database by adding statistical information from all possible sources. The GAUSS should become the basis for building a unified corporate ā€œData Lakeā€ (Data Lake), where every time it will be possible to ā€œdiveā€ for information that is important at the moment.

However, the scope of the GAUSS project is much broader than the simple creation of reports. We hope that very soon it will be possible with its help:

Ā· Assess various risks (credit, customer, partnership);
Ā· Detect fraudulent schemes;
Ā· Simulate targeted commercial offers;
Ā· Work with the analytical system Microsoft Business intelligence, etc.

How does GAUSS work?

Working on the project, we deliberately abandoned the use of commercial solutions. Gauss is built on the Hadoop / Hive / Ambari / Oozie / Spark / ORC / YARN stack, and to build data marts, we use the PostgreSQL relational database, which we consider to be the world's leading ā€œopenā€ relational database. However, instead of PostgreSQL, you can use any other database without affecting the operation of the system.

Due to the huge amount of constantly arriving information and the emergence of new ways of analyzing it, any Big Data projects cannot be solved using standard templates, this is always a new complex task. Therefore, we have built a harmonious multi-stage architecture for loading RAW information from all sources, then aggregating, processing and enriching this data, and after preparing the final OLAP data cubes and information display windows. To solve the problem of correct data presentation, flexible mechanisms were developed for mapping source data with target information, quality management systems (Data Governance) of generated information, as well as mechanisms for obtaining detailed information on aggregates (data drilldown). This allows you to safely change the direction of work in the course of the project, to adapt to changes. The GAUSS system is developed according to the Agile / Scrum methodology, which allows you to take into account the new requirements of business customers, received feedback, incoming data and at the same time target each team member to achieve results. After all, when you work with Big Data, all the time new hypotheses arise about how you can use the information hidden in petabytes of the ā€œdata lakeā€.

Source: https://habr.com/ru/post/338810/


All Articles