Habr, hello! This year we once again organize the Imagine Cup - a competition of student startups in the field of IT. We invite everyone (students) to participate! In the meantime, talk about the guys from the team Social Globe, who took third place last year. And not for nothing, because their service is able to track the digital footprint and make a psychological profile of a person according to his data from social networks. And the identification takes place with the help of a conventional camera and neural networks.

Imagine Cup 2018 cycle
1. Psychological portrait with the help of a neural network and an ordinary camera .
2. Study history without studying books .
Preface, description of the idea and formulation of the problem
This year, the Imagine Cup competition was held for the 15th time, and our team of 3 people decided to take part in it. It all started with the hackathon held within the Imagine Cup at the Higher School of Economics, the purpose of which was to help come up with the idea of ​​a project that could be successfully presented in the competition. As a result of prolonged brainstorming, an idea was proposed for a project aimed at finding people who are suitable for the interests and character of the user, whether they are friends or the second half, based on information about a person from social networks.
')
There are many services involved in this on the basis of questionnaires, but filling out questionnaires for a long time, and most importantly, is extremely inefficient. People do not write the truth about themselves, they write about what kind of person they want to be. Because of this, the accuracy of such profiles is only 40%.
We began to study the scientific work of various universities in the world - University of Cambridge, Stanford University, University of Antwept, etc.
List of some of the scientific papers:
- Mining Facebook Data for Predictive Personality Modeling
(Dejan Markovikj, Sonja Gievska, Michal Kosinski, David Stillwell) - Personality Traits Recognition on Social Network - Facebook
(Firoj Alam, Evgeny A. Stepanov, Giuseppe Riccardi) - The Relationship Between Dimensions of Love, Personality, and Relationship Length
(Gorkan Ahmetoglu, Viren Swami, Tomas Chamorro-Premuzic)
In many works, primarily in the works of Michal Kosinski, accused of using data from social networks in the election campaign of Donald Trump, it was noted that the psychological information about a person can be accurately determined by analyzing the data of his “digital tracks” in social networks - posts , repost, likes, comments. The studies were conducted within the social network Facebook. For example, thanks to about 68 likes on the social network, the Kosinsky system could assert with 95% accuracy the user's skin color, 88% about sexual orientation, and 85% with support for a particular political party (USA).
In addition, on the basis of these works, it was possible to conclude that there is a correlation between the combination of character traits and the quality of relationships. Character is often evaluated using the Big-5 model, which describes a person’s personality with five independent characteristics: Extraversion, Compliance, Conscientiousness, Neuroticism, and Openness to Experience.
Based on these articles, we created the first prototype, which is a web service that people could log into using their Twitter account.
Then we received the first reviews. People were interested, and mostly they were not attracted by the search for people, but by information about their own psychological profile.
Having taken the second place with this prototype at the finals of the competition at the Higher School of Economics (and the first by the audience voting results), we began to prepare for the All-Russian finals, for which it was decided to modify the project. Now the user looks at people, their psychology and interests not in the web service, but in the real world! The service with the help of a regular camera, recognizes people and gives out about them the necessary information obtained from social networks, right next to the person’s face. In the future, it was intended to refine the project for use in conjunction with such a promising technology from Microsoft as Hololens.
Implementation and technologies used
Common words
The concept of implementing an analysis of a person’s profile in a social network was based on scientific articles that cited data on the correlation of human behavior on the Internet with his interests and character traits.
Therefore, various Azure cognitive services are very useful here for syntactic and semantic (keyword highlighting) text analysis, as well as an English translator so that you can work with different languages.
Private words about interests
Human interests were determined from a limited list. Each category in this list was characterized by several key words (for example, cooking might be characterized by the words: vegetables, fruits, fryer, pan, etc.). Using semantic analysis of words, we determined which categories the user's tweets belong to.
Private words about the character
In the scientific papers we studied, it was noted that the characteristics of Big 5 correlate with the syntactic features of a person’s letter, for example, with the average length of words, the number of words in the sentence and the number of verbs, as well as the emotional coloring of the text. To determine the latter, a dictionary was used in which each word was given an emotional characteristic. With the help of it, we determined the emotional nature of the keywords obtained in the previous stage, and fed this to the input of the neural network along with data about syntactic features.
Unfortunately, we didn’t have enough experience with the neural network and we had quite a bit of time to get a good idea of ​​this subtask. As a result, we were not able to achieve good performance in determining the characteristics of the Big 5, because we will not describe much of that part of the work in which we have not achieved much success.
Architecture
In general, the process of the user part was as follows:In the application, a frame is allocated every 3 seconds from the video stream, which is sent to the Face API service, where face detection and recognition takes place by matching with a pre-made base. The base is replenished by manually establishing a connection between a set of photos of a person and his account in the social network. In the future, you can think about the automatic collection of data from the pages and the establishment of appropriate links.
After receiving the necessary connection, data about interests and character are returned from the database, which are then displayed on the screen near each person found in the photo.
About database filling and profile analysis:For the implementation, a microservice based architecture was used to support horizontal scaling and sufficiently flexible to develop and run the project.
Go and C # were used as the main languages ​​for writing microservices. In C #, wrappers were written for word2vec, text analysis, translator, and a neural network for analyzing a person’s character. Go on itself kept processing information, sending requests to microservices, caching, and load balancing.
Interest Content Analysis Unit

The work of this unit consisted in processing user tweets in English (tweets in other languages ​​were previously translated into English by means of a translator service) and sent them to the Microsoft text analysis service to retrieve keywords.
After that, the received keywords were sent to Word2Vec microservice (
More ). With its help, the semantic distance from the word to the keywords from each category of interests was calculated. Significant distances were added, and as a result we obtained a distance vector in which the i-th number characterized the closeness of the entire text to the i-th category of interests. This vector was normalized, after which it could be used to determine the interests of a person and search for people with similar vector (s).
To determine the characteristics of the Big-5, a Kosinski-trained neural network was used, with five outputs — the severity of the characteristic of a person whose text was loaded into microservice.

The Visual Studio development environment has accelerated development thanks to the ability to quickly deploy new versions to servers, and Azure Cloud Analytics has quickly found bottlenecks and monitored servers.
Results
We are proud to have taken the third place in the Russian final of the Imagine Cup, but most of all we are pleased with the positive feedback we received from visitors to our stand.
Little about the team

Our goal is to help people find and develop their strengths in order to realize themselves in the era of the 4th industrial revolution. To this end, we are developing a system for analyzing people’s profiles based on the method of Michal Kosinski using machine learning and big data analysis, and subsequent personalized training.
By the way, here you can see a mini-interview with the guys.
Imagine Cup 2018
The largest international technology competition from Microsoft, in which you can compete for a prize of $ 100,000. To do this, you need to assemble a team of up to 3 people, come up with and implement the idea of ​​the project in the category AI, Big Data, Mixed Reality and present it to us.
All the latest information can be found in
the VKontakte group and in
the Telegram channel .
Sign up!
You are from Russia :
aka.ms/ImagineCup2018_ruYou are from Kazakhstan :
aka.ms/ImagineCup2018_kzYou are from Belarus :
aka.ms/ImagineCup2018_by