📜 ⬆️ ⬇️

Little secrets of big graphs


If you are wondering what kind of knowledge can be extracted from a large amount of data, how big are the graphs and what tasks of analyzing social graphs are offered by Facebook, Twitter, etc., then this article is for you.

So, we’ll consider three tasks in total, and the first one is Facebook Positive Link Prediction . To download data you need to register at kaggle.com .

A social graph is given, the number of test peaks is 262588, the number of edges in the column 9437519, the number of vertices in the column 1862220 is already a reason to be frightened;) This graph is obtained from the real one by deleting the edges. Task: for users given by the test sample, to predict up to 10 other users whom they should follow.

The competition was held under the motto: “Show them your talent, not just your resume”. The best members of Facebook will try to hire.
Useful links:
1. cs.stanford.edu/people/jure
2. www.machinedlearnings.com/2012/06/thought-on-link-prediction.html
3. cs.stanford.edu/people/jure
')
The next task is called Community Detection and, accordingly, is devoted to the problem of identifying communities on Twitter. Read the materials of the 19th World Wide Web conference and download the social graph from Twitter here . As is often the case, the English Wikipedia will help you to familiarize yourself with the topic: en.wikipedia.org/wiki/Community_structure . But if you are determined more resolutely than ever, you can use a more impressive source, for example, this one .

For those who are interested in where the wind is blowing from, the last task is Cascade Analysis. You can get acquainted with the models of informational confrontation in the media by reading the article by Yang and Leskovets , the full bibliography of the article will help you find answers to many questions. Data for experiments: snap.stanford.edu/data/memetracker9.html and snap.stanford.edu/data/bigdata/twitter7 .
memetracker.org/quotes-kdd09.pdf is an invaluable link for those who like to model informational battles.

If you decide to do any of the proposed tasks or a similar task, then this is an excellent occasion to issue an article or poster (depending on the goals and results achieved) and send it to the “Graphs theory and application” CSEDays'12 .
Good luck and fast converging methods! :)
Resources:
// Student reports
1. www.stanford.edu/class/cs224w/proj/jbank_Finalwriteup_v1.pdf
2. www.stanford.edu/class/cs224w/proj/jieyang_Finalwriteup_v3.pdf
// Data sets, publications, libraries for data analysis in C ++, visualization
3. snap.stanford.edu
4. odysseas.calit2.uci.edu/doku.php/public : online_social_networks
5. law.di.unimi.it/datasets.php
6. rise4fun.com/agl
// Jure Leskovec
7. cs.stanford.edu/people/jure

Source: https://habr.com/ru/post/148162/


All Articles