⬆️ ⬇️

Atom feed for Habrahabra sandbox

As you know, one of the most convenient ways to read periodic topics from wherever possible is rss-feed. A few days ago, the habraseur rozboris lamented the fact that he read habr.ru/new through his favorite rss-aggregator, and in order to see what appeared in the sandbox of habrabra , you need to use an equally favorite browser. The idea of ​​creating a sandbox feed seemed attractive to us.



And I wrote a script, by subscribing to which, you can read the latest posts from the sandbox.



What to write?



A lot of options: python, perl, php. I really like to write in python, but I chose php, because if the feed is popular, it will be easier for interested habrausers to deploy the script on their servers (in my opinion, php is more common).



How does it work



It's simple. When the aggregator accesses the script, the latter updates the posts from the first page of the sandbox in the following cases: if there is no post in the database (perhaps the script is running for the first time) or more than 20 minutes have passed since the last update.

')

The update is as follows: the script parses the topics, highlighting in them the headings, links, the body of the post, the time of publication. The date and time of publication is converted to atom timestamp (yes, the feed uses the atom protocol ), and if there is no topic in the database with the same timestamp, it is added there.



After that, xml is generated from the topics in the database, which is returned to the user.



Little things



To run the script in your home, you need to enter in your code the parameters for accessing your mysql (you will only need one table, the name of which you can also configure inside the script). I decided to use the database, not the file next to the script, so as not to suffer from the possible problem of simultaneous access to this magic file - say, one script updates posts in the database, and another tries to read from there.



In addition, according to the habr rules, any bot reading the habr content should follow certain rules , in particular, have the correct user-agent containing information about the bot owners. Therefore, I will quickly write a page with a small description of the project , our contacts and a link to the source code .



If you do decide to run the script in your home, then please enter your contact information in $user - so, in case of questions, Habr will be able to reach you.



PS



The script is working, but still raw. I hope you will benefit from my work. License - MIT License.



Oh yeah, here's the link to the feed!



UPDATE: In my spare time I will correct some inaccuracies so that the feed meets the specification, thanks for the comments!

Source: https://habr.com/ru/post/77831/



All Articles