📜 ⬆️ ⬇️

The second version of the statistics of live log entries

top
The previous study was rightly criticized about the lack of evidence in the source data, which was vividly seen in the example of the “Ufa photo” tag. This time I corrected the situation and repeatedly increased the sample.

Initial data


The study involved 10,000 user diaries. There were missed the first hundred users, after which two users were selected from each rating page. Each user downloaded all the records up to 2006 . The exceptions were 18+ and subzamochnye posts. Records, in turn, have been extracted headlines, tags, text without HTML markup, the number of images and comments. The result was 1,777,308 records.

Enjoy watching!

Records


Header availability
Hastitle
')
Header length in characters
TitleLength

Character length
TextLength
As suggested by avenu , LiveJournal is very similar to Twitter by the prevailing number of characters in posts.

Hours and days of the week
Timeday

Number of user records
UserArticleCount
Looks like a conspiracy. A cursory review of the diaries showed no pattern.

Tags

HasTags

The length and number of tags
Tags

Popular tags
TopTags
pepelsbey , lytdybr is!

Comments

Hascomments

Number of comments to the post
CommentsCount

The length of the text to the number of comments
TextLengthComments

The average length of the text to the number of comments
AvgTextLengthComments

Conclusion


The increase in the number of entries significantly affected only popular tags and had almost no effect on the other parameters.

Thanks for attention. Waiting for your suggestions, criticism and comments.

UPD: User logs with 55 entries:
13whitemice , 55thairborngirl , a-mne-eshe , a-sebrov , akmych , al-re , ally-of-sunbeam , anton-platov , b0risl0dkin , bazil-t , beobachter , blog-knockknock , boriansky , brom-termit , catrin- flame , curious-ja , cybercool , da4 , dj-nicks , djrediska , dr-bass , dugla , dyxlesska , echarri , ekateriana , eklery , ennochka , ermolaev-vlad , escaldo , estetika-nice , fabyla , father-kot , geyzer76 gizir , gonish , green-tiffani , gyqyv , hmixa , iliora , jazz-fun , jelka3 , john-scar , k0mpas , karibus , krysia-i-basia , kushka , lagoun , lazutkina , light-tm , loony2004 , love-spring , magnumx , makova547 , malone-xbit , mariri , mashki , mia312 , minorland , more-produkt , mozgovik , nankin , new-zebra , nikita-avanti , oksk , ovine , pastsimple , pavel-lv , peshi-eshe , poignant-art , pugachevsky , roketa , ryzha-sonya , samaposebe99 , sank-a , saule-marsault , schattenphonix , seligoroff , sergik1977 , servinn , sevavladimirov , shtefanesko , sklyankin , snow-cat , stas-y , stei nboom , suhaverhi , svetik-sh , tamikori , tipo-femmina , tri- 4tyre , turobei , uberlastung , ulianich , users , vale4ka-babo4ka , vernon- dimirest
I completely forgot that not all user records were retrieved, but only until 2006. Therefore, the probability of bots becomes quite low. Most of the diaries are live: there are entries for November and December (I collected data in October). Therefore, it is not clear that this may be. It is necessary to collect similar statistics for some other resource (habru?).

Source: https://habr.com/ru/post/78942/


All Articles