📜 ⬆️ ⬇️

LiveJournal Top1000 Statistics

What is the blogosphere today? You may not agree with me, but in my opinion 80% of what people understand by the word “blogosphere” is placed in the Runet in LiveJournal. Yes, Yandex indexes a large number of blog sites, there is also LiveInternet and diary.ru and blogs on mail.ru too. And much more. But try to remember when you read something interesting, worthy of attention on the blog from LiveInternet? Is there anything on mail.ru blogs?

It is a well-known case that in the LiveJournal the ball is ruled by thousandths (and recently already 10,000th).
Let's take a closer look, who are they, the top bloggers of Runet?

In a hurry, I sketched a robot that went to the profile of a thousand bloggers, the first by the criterion of “friends in”, according to the rating of LiveJournal . There is also the so-called Yandex credibility rating, but let's not talk about sad things today.
')
The robot collected personal data and carefully folded them into a common pile. The robot code was written in C #, I will not bore you with unnecessary technical details, everything is quite simple and straightforward - I went to the page, parsed it for the occurrence of the necessary variables, saved it, moved on to the next one.
And so 1000 times.

Here is the function code that receives the URL of the page as input, and outputs the HTML page as a string. Now it can be parsed with the usual string functions, or it can be used by RegExp.

private string GetPageByURL( string strURL)
{
try
{
// used to build entire input
StringBuilder sb = new StringBuilder ();

// used on each read operation
byte [] buf = new byte [8192];

// prepare the web page we will be asking for
HttpWebRequest request = (HttpWebRequest)
WebRequest.Create(strURL);

// execute the request
HttpWebResponse response = (HttpWebResponse)
request.GetResponse();

// we will read data via the response stream
Stream resStream = response.GetResponseStream();

string tempString = null ;
int count = 0;

do
{
// fill the buffer with data
count = resStream.Read(buf, 0, buf.Length);

// make sure we read some data
if (count != 0)
{
// translate from bytes to ASCII text
tempString = Encoding .GetEncoding( "UTF-8" ).GetString(buf, 0, count);

// continue building the string
sb.Append(tempString);
}
}
while (count > 0); // any more data to read?

return sb.ToString();
}
catch (Exception ex)
{
return "" ;
}
}


* This source code was highlighted with Source Code Highlighter .


Now in the cycle we go through the pages:
www.livejournal.com/ratings/users/?page=1
...
www.livejournal.com/ratings/users/?page=50

we extort them with the help of the above function, then we run over them like strings and collect the names of users and their “friends from u” in the ArrayList.

We get a list of 1000 people. Then we go through it in a loop, and go to the pages http: // [username] .livejournal.com / profile and parse them for the occurrences of the other variables.

After that, we write everything to the database, file, or simply spit it out on the page, and from there copy-paste with handles in Excel.

And in order for LiveJournal not to take offense at my robot - put a significant delay between calls, otherwise they very strictly warn you if you will come to us with your robots and not wipe your feet - banned. Therefore, the whole process took more than a day - writing a robot, testing, working, formatting the results. I agree, it was possible to cope with php in a screen and a half and 2 hours for everything about everything, but .NET is more familiar to me.

It was such a sign.


UserFriend ofFriendsCityRegionCountryJournal entriesTotal CommentsCreated onLast UpdatedAccount Type
drugoi69145749MoscowNorway13,1881,698,002 comments received, 66,105 comments posted2002-03-021 hour agoPermanent Account
tema6860124South PalmyraRussian Federation3,6382,049,489 comments received, 6,880 comments posted2001-09-044 hours agoPermanent Account
navalny5284010,000MoscowMoscowRussian Federation2.306957,191 comments received, 14,365 comments posted2006-04-193 hours agoPaid Account
sergeydolya519641991870243,261 comments received, 28,394 comments posted2007-11-091 day agoPermanent Account
pesen_net48525202RigaRussian Federation18753,083 comments received, 10,084 comments posted2007-04-226 weeks agoPaid Account
zyalt35617384MoscowMoscowRussian Federation1.619246,360 comments received, 11,344 comments posted2006-07-2622 hours agoPaid Account
dolboeb338201942MoscowRussian Federation8,335522,484 comments received, 38,400 comments posted2001-02-0658 minutes agoPermanent Account
belonika331514604781208,475 comments received, 36,079 comments posted2008-09-086 hours agoPaid Account
eprst200031454elevenMoscow timeMoscowRussian Federation46046,324 comments received, 3,724 comments posted2002-08-221 week agoPaid Account
tebe_interesno29831612MoscowMoscowRussian Federation54731,679 comments received, 8,823 comments posted2007-06-2510 weeks agoPaid Account
mi3ch29827738MoscowMoscowRussian Federation6,930374,776 comments received, 44,883 comments posted2003-04-032 hours agoPermanent Account
shpilenok29637119Bryansk regionRussian Federation30357,348 comments received, 4,461 comments posted2009-01-116 hours agoPaid Account
zhgun260812918822,301 comments received, 8,626 comments posted2002-04-285 weeks agoPaid Account
mantrabox25572373Russian Federation2,91560,720 comments received, 17,850 comments posted2002-12-291 week agoPaid Account
olegtinkov25291elevenMoscowRussian Federation638137,481 comments received, 6,277 comments posted2009-02-2118 hours agoPaid Account
radulova24682595MoscowRussian Federation8,622874,385 comments received, 31,657 comments posted2004-11-141 hour agoPaid Account
tanyant2428219931867,802 comments received, 6,868 comments posted2007-12-142 weeks agoPlus Account
stillavin236151703MoscowMoscowRussian Federation1,299311,283 comments received, 18,247 comments posted2006-08-233 days agoPaid Account
mzadornov2256880MoscowRussian Federation16162,221 comments received, 136 comments posted2009-09-153 days agoPlus Account
miumau2149547BerlinGermany2,957163,632 comments received, 13,520 comments posted2002-02-271 hour agoPaid Account


...
The entire table (and neither in height nor in width) did not fit into Habratopik, but the complete file with 1000 entries is in Google Docs . The data is relevant for today, July 21, 2011, for another couple of months, or even half a year, they are unlikely to change significantly.

I could not resist building a couple of charts and graphs, although everyone can use this data freely and at their discretion.

Even with the usual sorting columns up and down, you can observe interesting details.

For example, sorting entries by the number of friends, we find that the most friends are not [info] navalny , who has 10,000 of them (although the limit for mere mortals on LiveJournal is 5,000 friends), and for some user [info] inexi , who has 20624.

Or, for example, we sort by the number of blog entries. Most of them nastruchil of course [info] cypa , well, who else? Since 2003, he has made 43,390 entries.

And when reverse sorting, we immediately find a curious bot - [info] blog_d_medvedev . From the day it was created in 2009, this pseudo-browser has not made a single entry in its blog, but 5,816 people have added it as a friend. Obviously some kind of robot, apparently just a toy in the wrong hands. Surely it didn’t go without muhlezh - friends marathons, rating cheating, vote rigging - all matters.

Continuing the sorting, we learn that the oldest blog, which got into the TOP1000, was created on March 31, 2000, and the youngest - three months ago, in April of this year.

Also in TOP-139 Basic Account (Basic), 560 Paid Account (Paid), 15 Permanent Account (Permanent), 284 Plus Account (Improved) and one Early Adopter (and who is this at all, by the way? [info] billycorgan - what does he do in the Russian top if he lives in the USA and writes in English?).

It turns out - not so many paid accounts in the first 1000. Just over half of all.




Or, for example, a breakdown by country:


In short, you can think of a lot of work for analysts, statisticians, various specialists in promoting anything and other curious idlers.

At first I thought to make this service online and constantly updated, but then I decided that for the daily 1000 requests to the LiveJournal server (more precisely, even more) I would not be stroked over my head with my robot. So, limited to one-time statistics.

The statistics file is welcome for distribution, no restrictive copyright is provided for it.

UPD: I would be happy if you tell me how you can allow any user to sort the columns in Google Docs, but do not allow him to change the results, i.e. distort the data itself.
In any case, the file can be saved to your computer from the File-Download As-Excel menu, and you can sort where you want at home in Microsoft Office.

Source: https://habr.com/ru/post/124677/


All Articles