Test the theory of six handshakes

I want to tell you about my experiment in testing the Theory of Six Handshakes . I was inspired to write this material by the article “Analysis of VK Friendship Relations with Python” (to avoid repetitions, I will refer to it later). As a whole, the task was set differently by me, and the methods used are also different, I decided that it might be interesting.

Task formulation: visualize all connections between two users within one social network. In this connection should not be duplicated, for example, if Vanya knows Petya through Olya, then Olya does not participate in further iterations in finding common friends. To practice the API, I chose “Vkontakte”.
')
Based on the limitations of the API and the functionality of the methods, it was decided that the optimal number of "handshakes" from the point of view of the time of receiving information will be 3. So we will still check the "Theory of Three Handshakes", for now. Thus, with an average of 200 friends, we get a sample of 8 million people. For example, on the scale of Ukraine, I almost always found connections. Structurally, the task can be divided into the following stages:

Search for common friends between the source user 1 (user_1) and the source user 2 (user_2).
Search for common friends between user_2 and friends user_1.
Find common friends between user_2 friends and user_1 friends.
Getting detailed information about the found links.
Visualization.

So what we need:

import requests import time from threading import Thread from tokens import *

Requests - a common HTTP library for Python, described in the article "Library to simplify HTTP requests . "
Time is a basic module whose name speaks for itself. We will use to introduce delays in time.
Threading - the basic module for working with threads. Well described in the article "Learning to write multi-threaded and multiprocess applications in Python . "
Tokens - the tokens.py file will contain OAuth tokens for authorization in the API. How to get the token is described in the source article , as well as on the API page "Vkontakte" .

Before proceeding with the first stage, I’ll focus on the API functionality and some limitations:

To call the API method, use a POST or GET request.
The list of methods I used: users.get , friends.get , friends.getMutual , execute .
The execute method allows you to run up to 25 methods in a single query.
Per second, no more than 3 requests can be made (using one token).
The limit for the target_uids parameter of the friends.getMutual method is 300. I ’ll discuss this in more detail below.

Thus, globally, the scheme is reduced to sending GET requests to the Vkontakte server and analyzing responses from the server in json format. At the same time, we use the execute method and multithreading to optimize time.

Remark to the original article that inspired me. The author of the article STLEON uses the friends.getMutual method in one-to-one mode, using the target_uid parameter. I believe this was due to the lack of the target_uids parameter in the previous version of the API. I use this method in the “one to many” mode, which significantly saves time. The target_uids parameter has a limit on the length of the string, about which I did not find anything in the documentation. It was established experimentally that the maximum length is about 310-330 UID, depending on the length of each identifier. I rounded this figure to 300.

All the above is summarized by declaring the following constants:

 f_1_max = 300 f_2_max = 24 t = 0.35

Why f_2_max = 24, not 25, will be clear later.

Stage 1. Search for common friends between user_1 and user_2

Let us write the function with the help of which we will communicate with the server "Vkontakte" by means of a GET request:

 def vk (method, parameters, token): return requests.get('https://api.vk.com/method/%s?%s&access_token=%s' % (method, '&'.join(parameters), token)).json()

This function has three arguments:

method is the name of the method we are accessing via the API.
parameters are the parameters of this method (can be found in the description of each method).
token is a string that authorizes you on the server. I repeat that getting a token is described in detail here and here .

Further, we will use sets to save all the collected information. We initialize the sets for each of the three “handshakes”.

 edges_1, edges_2, edges_3 = set(), set(), set()

To fulfill the condition so that the links are not duplicated and Olya does not appear as a common friend of Petit and Vanya in all three “handshakes”, but only in the first, filters must be entered. Immediately add to the filter the first “handshake” of the original users.

 filter_1, filter_2 = set(), set() filter_1.update([user_1, user_2])

Find the friends of user_1 by calling the friends.get method. After completing the call to the API method, we introduce the necessary delay in time t = 0.35. Note that one of the parameters is the API version (v = 5.4 in my case). It is very important to indicate it everywhere, because inconsistencies may appear. The parameters of the order and count methods are optional.

 friends_1 = set(vk('friends.get', ['user_id=%s' % user_1, 'order=hints', 'count=900', 'v=5.4'], token_1)['response']['items']) time.sleep(t)

Next, we proceed directly to finding common friends between user_1 and user_2 by calling the friends.getMutual method.

 mutual_friends = vk('friends.getMutual', ['source_uid=%s' % user_1, 'order=hints', 'target_uid=%s' % user_2, 'v=5.4'], token_1)['response'] time.sleep(t)

And the last point of the first stage is saving information to multiple edges_1, updating filt_1 and removing found common friends from the user_1 friends list to avoid repetitions in the future.

 for user in mutual_friends: edges_1.update([(user_1, user), (user, user_2)]) friends_1.remove(user) filter_1.update([user])

Stage 2. Search for mutual friends between user_2 and friends user_1 (friends_1)

Globally, the second stage repeats the first, the only difference is that instead of searching for common friends in the one-to-one mode, we use the one-to-many mode, which requires a few extra lines of code.
We initialize the list in which we will save the friends we got, as well as some variables that we will need in intermediate calculations.

 user_1_mutual_friends, temp_users, j = [], [], 0

Next, counting the portions (not the most appropriate word) from the user_1 friends by 300 UID, we alternately send requests to the server about common friends between user_2 and the UID portion, which are written to the target_uids parameter of the friends.getMutual method.

 for i, friend in enumerate(friends_1): temp_users += [friend] j += 1 if j == f_1_max: user_1_mutual_friends += vk('friends.getMutual', ['source_uid=%s' % user_2, 'order=hints', 'target_uids=%s' % str(temp_users)[1:-1], 'v=5.4'], token_1)['response'] temp_users, j = [], 0 time.sleep(t) if i == len(friends_1) - 1 and len(friends_1) % f_1_max != 0: user_1_mutual_friends += vk('friends.getMutual', ['source_uid=%s' % user_2, 'order=hints', 'target_uids=%s' % str(temp_users)[1:-1], 'v=5.4'], token_1)['response'] time.sleep(t)

We save the received information in a set of edges_2 and update the information in the filter, as it was in the previous stage. There may be exceptions, for example, if the UID has closed access to shared friends or the user’s page is deleted, so we use the try-except construction.

 for friend in user_1_mutual_friends: if friend['id'] != user_2 and friend['id'] not in filter_1: try: if friend['common_count'] > 0: for common_friend in friend['common_friends']: if common_friend != user_1 and common_friend not in filter_1: edges_2.update([(user_1, friend['id']), (friend['id'], common_friend), (common_friend, user_2)]) friends_1.remove(friend['id']) filter_2.update([friend['id'], common_friend]) except: continue

Stage 3. Search for common friends between friends user_2 and friends user_1

This stage is the most time-consuming, since you need to send a lot of requests. It is here that it is impossible to do without using the execute method. From practice I will say that without the use of multithreading, the time to perform this stage according to this algorithm is 50 - 120 seconds, and in some cases even more. Using multiple threads, it is possible to reduce the time to the execution of a single execute request, which is processed from 5 to 12 seconds.

We declare filter_3, combining the sets filter_1 and filter_2. We transform the set of friends of user_1 (friends_1) into a list.

 filter_3 = filter_1.union(filter_2) friends_1 = list(friends_1)

This is followed by a monstrous block of code, in which we declare a function for finding common friends between friends user_1 and friends user_2 and saving information in multiple edges_3. Here again, the entire algorithm is the same as in the previous stages, only the “many-to-many” principle is used, which further complicates the code, especially in my implementation it is clearly redundant, so you have something to work on. Below I will give some explanations for this multi-letter.

 def get_edges_3 (friends_1, token): prefix_code = 'code=var friends = API.friends.get({"v": "5.4", "user_id":"%s", "count":"500", "order": "hints"}).items; ' % user_2 lines, j, k = [], 0, -1 for i, friend in enumerate(friends_1): lines += ['API.friends.getMutual({"v": "5.4", "source_uid": "%s", "count":"500", "target_uids": friends})' % friend] # Generating string for 'execute' request. j += 1 if j == f_2_max: code = prefix_code + 'return [' + ','.join(str(x) for x in lines) + '];' response = vk('execute', [code, 'v=5.4'], token_1) for friends in response['response']: k += 1 if len(edges_3) < max_edges_3: try: for one_friend in friends: if one_friend['common_count'] > 0: for common_friend in one_friend['common_friends']: if common_friend not in filter_3 and one_friend['id'] not in filter_3: edges_3.update([(user_1, friends_1[k]), (friends_1[k], common_friend), (common_friend, one_friend['id']), (one_friend['id'], user_2)]) except: continue lines, j = [], 0 time.sleep(t) if i == len(friends_1) - 1 and len(friends_1) % f_2_max != 0 : code = prefix_code + 'return [' + ','.join(str(x) for x in lines) + '];' response = vk('execute', [code, 'v=5.4'], token_1) for friends in response['response']: k += 1 if len(edges_3) < max_edges_3: try: for one_friend in friends: if one_friend['common_count'] > 0: for common_friend in one_friend['common_friends']: if common_friend not in filter_3 and one_friend['id'] not in filter_3: edges_3.update([(user_1, friends_1[k]), (friends_1[k], common_friend), (common_friend, one_friend['id']), (one_friend['id'], user_2)]) except: continue time.sleep(t)

The sum of the lines prefix_code and lines is VKScript code and is the only parameter for the execute method. This script contains 25 calls to API methods.

prefix_code is the part of the line containing reference number 1 to the friends.get method. Here we get the list of friends for user_2 and assign it to the variable friends.

lines - the second part of the line, containing calls №№ 2-25 to the method of friends.getMutual . Here we get a list of mutual friends between each of the 24 friends of user_1 and the list of friends of user_2. In the loop, we add prefix_code and 24 lines of lines, thus obtaining the code string, which we use as a parameter to the execute method.

Next, I will give an example using multiple threads, but I will not dwell on it in detail. All information can be found in the article "Learning to write multi-threaded and multiprocess applications in Python . "

 t1 = Thread(target=get_edges_3, args=(friends_1[ : len(friends_1) * 1/3], token_1)) t2 = Thread(target=get_edges_3, args=(friends_1[len(friends_1) * 1/3 : len(friends_1) * 2/3], token_2)) t3 = Thread(target=get_edges_3, args=(friends_1[len(friends_1) * 2/3 : ], token_3)) t1.start() t2.start() t3.start() t1.join() t2.join() t3.join()

Stage 4. Obtaining detailed information about the found links

Now we must lay down all the edges of our still unbuilt graph of friends and extract from them a list of vertices. Then, using the users.get method described above, in portions of 300 UID, we send requests for obtaining data on the last and first names of users. At the output we get a list, in each cell of which there will be a UID and a dictionary with information about this UID. These data, in combination with sets of edges, are later used for visualization.

 edges = list(edges_1) + list(edges_2) + list(edges_3) nodes = [] for edge in edges: nodes += [edge[0], edge[1]] nodes = list(set(nodes)) nodes_info, temp_nodes, j = [], [], 0 for i, node in enumerate(nodes): temp_nodes += [node] j += 1 if j == f_1_max: nodes_info += vk('users.get', ['user_ids=%s' % str(temp_nodes)[1:-1], 'fields=first_name, last_name', 'v=5.4'], token_1)['response'] temp_nodes, j = [], 0 time.sleep(t) if i == len(nodes) - 1 and len(nodes) % f_1_max != 0: nodes_info += vk('users.get', ['user_ids=%s' % str(temp_nodes)[1:-1], 'fields=first_name, last_name', 'v=5.4'], token_1)['response'] time.sleep(t) for i, node in enumerate(nodes_info): try: nodes[i] = (nodes[i], {'first_name': node['first_name'], 'last_name': node['last_name']}) except: continue

Stage 5. Visualization

I will not dwell on the technical implementation of this stage. I will describe only briefly my experience.

As in the original article, I tried to use the networkx library to build a graph. I changed the diameter and color of the vertices depending on the gender or the number of connections, tried many visualization methods that are available in this library, but I did not like the result. A random graph was not informative, with an average and a large number of edges and vertices. Information was lost.

I came to the conclusion that some kind of interactive solution is needed. The first thing I found was the D3.js library . But here, in the format of a regular graph, despite the interactivity, the result was unsatisfactory. Then, in the same library, an example of the tree structure “Radial Reingold – Tilford Tree” was found , which seemed to me suitable. With this construction, user_1 is in the center, and user_2 is at the edge of each branch of the tree.

I modeled the whole bundle using the CherryPy web framework and was satisfied with the result, although I had to enter all the same restrictions on the displayed data (depending on the type and number of links found). I deliberately omitted the preparation of data for visualization, as this procedure is of no interest and differs depending on the method chosen. My version of the code is available in the GitHub repository , which also describes the preparation of data for use with the D3.js library using the example of the “Radial Reingold – Tilford Tree” template.

It would also be interesting to display the relationship between the list of friends in this way (see figure below), so you can experiment. This example is also taken from D3.js and it is called D3 Dependencies .

As for the verification of the theory, in the scale of Ukraine the scheme with three handshakes works in 90% of cases. Exceptions are users with a very small number of friends.

Thanks for attention.

Source: https://habr.com/ru/post/273191/

All Articles