📜 ⬆️ ⬇️

API Jellyfish: write full text RSS

Like many others who use RSS, I get strained by feeds that return the news not entirely, but only a short version. In this case, it is not possible to read the feed offline.

Meduza.io is not an exception: the text comes cut down, each time you have to go from the reader to the browser to read one news item. This looked especially awful when there was no normal mobile version on Medusa, the web version on the phone was very slow.

There are various services for parsing html in rss, but when I came across a console client for Medusa , I immediately had a question, what API is used there and can I use it to write my application?
')
image

API


I did not have to go far, the console code is laid out on the githaba and is js-ku, which refers to the desired API.

Getting a list of news


https://meduza.io/api/v3/search?chrono=news&page=0&per_page=10&locale=en
chrono - takes the values ​​of news, cards, articles, shapito or polygon , depending on the rubric that we want to receive;
page - page number;
per_page - the number of entries on the page;
locale - ru or en locale;

Receive a separate news


https://meduza.io/api/v3/shapito/2015/06/02/vyshel-neofitsialnyy-terminalnyy-klient-meduzy
It simply replaces the url obtained from the last item.

Idea


Take the original rss feed at https://meduza.io/rss/all , but instead of the trimmed news substitute the text of the news received through the API.

Example of implementation (prototype)


I picked up Ruby and wrote some code to parse the source rss feed:

Nokogiri::XML(open('https://meduza.io/rss/all')) 

And also the code that parses the json of a single news item:

 JSON::parse(open('https://meduza.io/api/v3/' + post_url).read)['root']['content']['body'] 


We substitute one into the other and we get something as follows:

 require 'open-uri' require 'json' require 'nokogiri' $meduza = 'https://meduza.io' $meduza_rss = $meduza + '/rss/%s' $meduza_api = $meduza + '/api/v3/%s' class Meduza def Meduza.generate(feed = 'all') doc = Nokogiri::XML(open($meduza_rss % feed)) doc.xpath('/rss/channel/item').each do |item| post_id = item.xpath('link').inner_text.gsub(/^#{$meduza}\//, '') json = JSON::parse(open($meduza_api % post_id).read) item.search('description').each do |description| description.content = json['root']['content']['body'].gsub('src="/image/', 'src="//meduza.io/image/') end end doc.to_xml end end puts Meduza.generate 

Along the way, we change the relative url of pictures to absolute ones using the gsub method.

Using


Written minimalist application on sinatra , which can be deployed, for example, on the hosting heroku and enjoy the health (and also completely free). Getting rid of “falling asleep” applications on heroku will help a service like this .

The source code of the application is posted on github , thank you for your attention!

PS Link to the running application meduza.herokuapp.com/rss (as long as the free heroku account can handle the load).

Source: https://habr.com/ru/post/259471/


All Articles