Habr, hello! We publish a review of the third and fourth day of
Data Science Week 2016 , namely, it was Sberbank Data Day and the day dedicated to the topic of artificial intelligence.

Day 3
On the third day, Data Science Week mainly talked about the experience of solving specific tasks by
Sberbank using big data technologies, but some of the presentations were of a general conceptual nature.
The speakers reported on Sberbank’s desire to become a data-driven organization — a flexible structure in which business processes change and decisions are made in response to changes in incoming data. Due to this, Sberbank expects to gain a competitive advantage in the speed of launching new customer-requested solutions.
')
Sberbank created an effective infrastructure for storing and processing big data based on Hadoop, Spark and NoSQL solutions.
The main focus in the collection and use of data in Sberbank is done on customers, “combine data around the customer.” To solve business problems, companies analyze a wide range of internal and external data.
According to the internal data of the client’s questionnaires and applications, the history of transactions and the use of the bank’s services, extended customer profiles are built. Clients are segmented by socio-demographic parameters, needs, preferences, in order to understand which proposals they will be interested in, through which channels it is better to work with them.
Credit scoring uses not only traditional data, such as socio-demographic parameters, credit history, transaction history, financial statements, but also a number of others. For example, a company uses data from cellular operators, both in credit scoring and to detect fraud. The propensity to fraud is indicated by a large number of active SIM cards and a short time of their work, small and numerous replenishment of accounts, geography of calls. Also for scoring tasks, customer relationship graphs are used, which are based on remittance data and social network data. For credit scoring companies use news texts with their mention, for which an automatic analysis of tonality is carried out.
Currently, the underwriting procedure in the company (in terms of deciding on basic categories) is largely automated. The restructuring of the scoring card has also been automated, although the expert decides whether to accept the automatically rebuilt scorecard or not.
Alexander Kulikov from
Segmento talked about how the analysis of the sequence of transactions and payment patterns allows the company to identify important events in the lives of customers (for example, spending a large amount of money on treatment or buying a car) and predict which transactions the customer is likely to perform in the near future. what categories. This allows customers to make the most relevant offers. Analysis of data on customers and their behavior allows you to form offers of pre-approved loans and offer them to customers exactly when they are most in demand.
Search query data is used to personalize the display of the Sberbank site. For example, if a client was interested in tourism, he would be offered insurance for traveling abroad.
The company uses image analysis and deep learning methods. Some time ago, SAFI was implemented in Sberbank - a photo analysis system to prevent document fraud and customer identification. As a result, losses from this type of fraud have decreased by 10 times.
A separate presentation was devoted to the risks of using models. Here, the speaker identified three main areas of risk: data, models and processes. Risks in the data associated with their inconsistency, incompleteness, unrepresentativeness, the presence of emissions. If you do not notice and correct these problems in the data, the cost of the error will be very high. In terms of models and their application, there may be errors related to the illegitimacy of the assumptions taken, attempts to blindly transfer the model developed for one subject area to another, as well as the human factor (fraud, conflict of interest within the organization). In order to limit model risk, companies use user feedback, clear standards for modeling and data preparation, and model testing procedures for their applicability.
The last speech on this day was devoted to
the eToro social trading platform , with which Sberbank began active cooperation. This system is built on the principle of a social network, it aggregates and shows in an accessible form the data received from successful system traders - analytics, transaction history. Successful traders are automatically formed analogs of trust funds. Based on the user's profile, his experience and his attitude to risk, the leverage available to him changes, an automated offer is made of suitable assets and traders whose behavior can be copied. The purpose of this platform is to provide easy and understandable access to financial markets for everyone, including Sberbank customers who wish to manage their assets through it.
Day 4
The last day of Data Science Week was devoted to artificial intelligence. In the broad sense, little was said about artificial intelligence, mostly about the prospects of using chat bots and personal assistants.

The speech of Konstantin Savenkov from the
company Inten.to was devoted directly to this topic. According to the speaker, a number of trends indicate the rapid development of this area in the future.
First, people now spend more time in messengers than in social networks, and the business wants to go to its customers, including through this channel. One solution here is to use bots.
Secondly, almost all the largest companies that develop instant messengers create platforms for bots and personal assistants, although almost no one has used them yet. Huge investments are being made in this direction. There are services-connectors that allow you to run once written bot on different platforms.
Finally, the API market is growing, so now personal assistants have something to manage.
Speaking about the prospects of using bots and assistants, the speaker noted that attempts to replace user-friendly graphical interfaces by communicating with the bot do not lead to anything, they only complicate the process (for example, when ordering air tickets). However, when interaction is based on limited input of information, as when communicating with people, chat bots can be effective (examples: concierge, execution of instructions, legal services). Intelligent applications will help users avoid mistakes, provide advice in choosing, making decisions (as a waiter).
According to the speaker, today the paradigm of personal assistant is the most promising in this area, which uses sophisticated technologies of understanding speech and context of a message, but provides a simple service. Understanding of speech and context is followed by a decision point. For example, it can be a selection of wine to the dish on the ingredients included in it. Next comes the service platform, which is used to execute the user's instructions.
Today, the methods of carrying out specific assignments, as a rule, are recorded manually by the company or selected by crowdsourcing methods. Inten.to sees its place in the market in creating a tool for automatic selection of the necessary API by the personal assistant for solving the set tasks.
Eugene Light, representing the
company Segmento , spoke about the role of artificial intelligence in the development of technology and the main trends that can avoid in the future the fall in labor productivity. According to the speaker, in the future the sphere of human labor will change dramatically. The economic sector will expand on demand (examples: Uber, GetTaxi) when we order and receive the service when we need it. There will be an expansion of freelancing, more and more people in parallel with the main work will be employed in some other projects. Flexible teams for certain projects will be created, and the order of labor will become popular. More and more people will begin to perform small tasks (microtasks), and microproductivity will increase in these small operations. Finally, technologies based on artificial intelligence will enter our lives.

The presentation by
NVIDIA representative Anton Joraev was not dedicated to artificial intelligence itself, but to hardware and computing platforms for implementing deep learning, which is widely used in this area.
Today, neural networks, such as Baidu Deep Speech 2, have already become equal in quality of speech recognition with humans. However, this was achieved at the cost of repeatedly complicating the calculations and increasing the amount of data used. At the same time, the use of such technologies in applications requires quick response - the user will not wait too long. Therefore, NVIDIA has focused on creating software and hardware that generates a strategy for executing an already trained neural network and provides high performance. The company has developed its own analogue of the TensorFlow framework used in deep learning, which is designed for use with specific hardware and therefore works faster, can do logical optimizations.
The company Riftman , whose representative spoke last, in its system Xor plans to use bots in hiring staff in the field of IT. The system analyzes code examples laid out by developers on GitHub, StackOverflow, and other resources, and thus finds professionals with the right skills. The system uses similar mechanisms for validating resumes. Further, communication with the candidate is carried out with the help of a bot, regardless of whether he is currently seeking work or not.
According to Nikolai Manolov, a very large number of specialists have already outgrown their positions and are waiting for interesting proposals, but in fact they fall out of sight of HR specialists. It is easier to contact a person via a bot: the letter will get into spam, and the call may cause a negative reaction. If the candidate does not like the proposal, the bot collects feedback from him to further improve the selection model, to understand what conditions need to be offered and to whom. Also, the bot will be able to schedule an interview, send test items. Thus, almost all processes in this area can be automated.
»All presentations are posted
here.»Access to video speeches can be obtained
here.