📜 ⬆️ ⬇️

Hub Posts Rating


Hi, Habr!

I decided to look at the best posts of my favorite hub and with horror I discovered that there is no such feature.

Due to the fact that this opportunity has appeared, and I don’t want to take down the topic in drafts - the rating of the most commented and favorite articles added to favorites is lower. In addition, a few words about the implementation of the script.
')


The script is written in Python using the excellent grab library, or rather its Spider module. Thanks to the author (and part-time habourer) itforge for the detailed documentation .
Sqlite3 was used as a database. Only hub pages were parsed, i.e. In the post itself, the spider did not climb, which affected the speed of the script: information was received about 111690 posts in less than an hour, and this is in 1 thread.
Code
from grab.spider import Spider, Task import sqlite3 as lite class HabraParser(Spider): #    initial_urls = ['http://habrahabr.ru/hubs/'] def prepare(self): #    self.post = [] self.con = lite.connect('files/habra_hubs.db') def task_initial(self, grab, task): # ,       nav = grab.doc.select('//ul[@class="next-prev"]/li/a[@class="next"]') #         hub for elem in grab.doc.select('//div[@class="info"]/div[@class="stat"]/a[2]'): self.add_task(Task(name='hub', url=elem.attr('href'))) #    -     if nav.exists(): self.add_task(Task(name='initial', url=nav.attr('href'))) def task_hub(self, grab, task): nav = grab.doc.select('//a[@class="next" and @id="next_page"]') #   for elem in grab.doc.select('//div[@class="posts shortcuts_items"]/div'): #    if elem.attr('class') == 'ufo-was-here': continue comments = '' score = '' favs = '' post_url = elem.node.find('h1[@class="title"]/a').get('href') post_title = elem.node.find('h1[@class="title"]/a').text #    try: comments = int(elem.node.find('.//span[@class="all"]').text) except: comments = 0 try: score = int(elem.node.find('.//span[@class="score"]').text) except: score = 0 try: favs = int(elem.node.find('.//div[@class="favs_count"]').text) except: favs = 0 self.post.append([ score, comments, favs, post_url, post_title ]) if nav.exists(): self.add_task(Task(name='hub', url=nav.attr('href'))) else: #    hub = task.url.split('/')[4] #    self.save_data(hub) def save_data(self, hub): with self.con: self.cur = self.con.cursor() self.cur.execute("DROP TABLE IF EXISTS %s"%hub) self.cur.execute("CREATE TABLE %s(Score INT, Comments INT, Favs INT, Url TEXT, PostTitle TEXT)"%hub) self.cur.executemany("INSERT INTO %s VALUES(?, ?, ?, ?, ?)"%hub, self.post) self.post = [] 


Further it remains only to ask questions of the form:

 ("SELECT * FROM %s ORDER BY Score DESC LIMIT 10" % hub) 

For this, a simple script was also written, which generated most of this article.

Hubs were taken from the condition:

 habraindex > 100.0 and posts_number > 200 

Bd, source and so on github .

Well, now the fun part. I think everyone will find something to their liking.









Dura lex














C++
:
++ 534
, XXI 409
— 346
309
qutIM 0.3: ! 301
GOTO ! 270
5++ ++ 270
Web- C++? 264
Doom 3 238
— ++ 216


90 C++ 1159
1009
863
C++11, C++ 745
Doom 3 735
( 1) 639
++ 636
623
C++ 574
30 563


CSS
:
— 285
CSS 241
CSS 203
css- 174
. 170
HTML/CSS Google 167
margin, ( 1- ) 163
160
154
CSS 146


HTML/CSS Google 2789
- 2786
frontend 2668
- 2455
CSS3-. 1984
CSS- 1868
1727
- 2012 1699
1637
62 (Responsive web design) 1555

:
? 712
575
: 437
« » 407
? ! 394
368
, 366
— 100% 343
, Excel 340
333


1980
, : 1101
, , 919
— , 842
709
. 1. , income tax, , , Social Security taxes 549
1967 501
: 478
Forex ? 461
458








:
1236
410
«-» 377
: C# 287
10 285
30 2: Go 274
? 246
237
231
Google - Microsoft 182


NoSQL : 1022
Nginx 5 980
10 WordPress $15 688
: C# 676
Node.js 602
571
474
web- 465
PHP- Pinba 447
Google — 427







Interfaces


Php
:
«PHP-» 729
667
PHP — ! ! 605
. 550
PHP 537
PHP: 524
PHP 515
PHP 446
PHP 432
PHP , 427


php 1346
PHP: 1201
MVC - PHP 889
PHP 880
SQL- PHP MySQL 860
CMS (2013) 818
PHP MageConf 2012 814
PHPStorm 808
, 806
PDO 804

IT
:
Adobe Creative Suite 2 ( ?) 387
20 382
Sleep Box 378
( « ») 333
: 318
Sleepbox 313
. 293
284
? 282
275


frontend 2668
- 2131
UICloud: 1757
1740
5 1483
1178
: , , - 1136
- Metro 854
5 853
, … 830



HTML
:
HTML ?! 290
HTML- - 273
- . , 236
, 225
HTML5 194
. 186
, Internet Explorer 181
. 170
154
notepad 1 153


- 2939
CSS- 1868
1856
1727
1637
-. 1 1550
. 1461
- ( 2) 1449
HTML5, 1236
. 1234




:
! 1663
Google Wave 1476
DaruDar.org 1444
! 1313
Demonoid.Com 1209
2017 ? 1029
1019
996
, , 24 858
: 805


20 , 20 3437
( ) 3382
2234
, IT 1075
eBay. 1038
GTD : 987
EMS — 953
941
. Theory. 905
: ? 779

GTD
:
? 712
710
575
— 534
200 000 IT- 530
. 529
! .. ? 497
468
465
465


, ? 2805
1982
: 5 1314
2 1242
10 , 2013 1191
, 1116
, , «» : 1044
955
914
+10 884

UPD: Now each hub from the list has top 10 by the number of additions to favorites.
UPD2: Rating updated. Thanks to AraneusAdoro for noticing the error .
UPD3: Added: iOS Development, E-Commerce, Node.JS, Closet, GTD
I can add all 346 hubs, but the post will become very heavy. Therefore, I tried to cover the most useful hubs by putting restrictions (about which I wrote above). But if you want to see the rating of your favorite hub - write in a personal, add.
UPD4: a functional was written on habr!
UPD5: Post updated.

Source: https://habr.com/ru/post/204706/


All Articles