⬆️ ⬇️

Save Google Reader data. Php version

Announcement: this note does not claim to be a full article. There will be another “download all data” from the Google Reader service, a note from the category “I’ll just leave it here.”







Introduction



On the well-known all about the closure of the excellent RSS reader, several articles have already been written.

This one was inspired by this small note, which also gave an intermediate solution in the form of storing its data from the GReader using a python script.

Actually, the essence of this note of mine is only in one thing: I wrote a similar PHP script for myself (I don't know “python”), and I think it would be nice to share it.

On the one hand, it may help someone else, but on the other - maybe someone will help me by pointing out any mistakes or that I have forgotten something, left unfinished or somewhere nosyachil.

')

What is it?



A single-file script that can pull out Google Reader and save all your subscriptions to your hard drive, including history - absolutely all available posts. Including posts from the "long dead" sites that are still available in the Reader. Above in the “introduction” is a link to a post with a python script, which, as I understand it, is also able.

This script practically does not use third-party libraries, because extra settings and quests "go back there and download it, and over there still this one" does not require.



Where to get?



The script itself can be taken here on GitHub (although presumably in July this will all become irrelevant).

One script file, and one batch file (.bat file). The script was written under windows, but it should work everywhere.

With PHP for Windows, there is a simplified version, I already gave it here .

The essence is an autonomous archive with PHP , expand it to any folder (no matter what, preferably a shorter path without spaces), for example c: \ php, after which we either write this directory to the system environment variable path or rule first the line that runs the bat-nickname script, or if it is deployed in c: \ php, then you don’t need to do anything else at all (this is the one that is written in the attached batch file). Well, either download fresh from php.net , or many already have it all.



All that remains is for us to indicate at the beginning of the php script our authorization data to Google, set the desired settings, start the batch file and wait until it defuses everything.



How it works?



Now the description of the script and what it can.

For starters, you should probably indicate that it uses from the non-hard cURL libraries and the json_decode () function.

cURL, as I suppose, for many is enabled by default, while JSON functions have appeared, starting with some fifth version of PHP, but for earlier versions, the script works and includes by default the replacement of this function with simple regulars . That is, of the "obligatory" remains only cURL.

Also, to clear the conscience, it will probably be superfluous to mention that the authorization code on the service is taken from this small class . In fact, only a couple of functions were taken from there to get a token, the rest is perelopacheno and built into the end of the script.



Now settings. They are at the beginning of the script and look like this:

$GLOBALS['account_user']='googleuser@gmail.com'; $GLOBALS['account_password']='qwerty'; $GLOBALS['is_atom']=true; $GLOBALS['try_consolidate']=true; $GLOBALS['fetch_count']=1000; $GLOBALS['fetch_special_feeds']=true; $GLOBALS['fetch_regular_feeds']=true; $GLOBALS['atom_ext']="atom.xml.txt"; $GLOBALS['json_ext']="json.txt"; $GLOBALS['save_dir']="./feeds/"; $GLOBALS['log_file']=$GLOBALS['save_dir']."log.txt"; $GLOBALS['use_json_decode']=false;//function_exists('json_decode'); /* !!!!!!!!!! */ $GLOBALS['need_readinglist']=false; /* !!!!!!!!!! important! this will fetch a very full feed list, mixed from all subscribtions and ordered by post date. in most cases this data is unusefull and this option will double the script worktime and the hdd space requirement. so probably you don't need set this to true. !!!!!!!!!! */ 




Where to enter the username and password, I think it is clear)

With the configured two-stage authentication in a Google account, the application password works fine in the script.



Rest:

$ GLOBALS ['is_atom'] - drag data in json or xml (atom) format. if true, it will create the xml version.



$ GLOBALS ['try_consolidate'] - if true, tries to write each subscription to one continuous file.

Here the fact is that Google doesn’t give out more than a thousand entries in one request, so the script drags pieces of $ GLOBALS ['fetch_count'] records (1000 is the maximum valid value of this parameter), and can stack each such pack into numbered files are “thousandths”, so try to add all the time to the same file without breaking its structure (json and xml). Since In fact, parsing incoming data is expensive in the course of the script's work, it contains a rather clumsy mechanism for merging files on unpretentious regulars, which nevertheless works. In general, you can play with the parameters and see what happens at the output.



$ GLOBALS ['fetch_special_feeds'] = true; Whether to pull out special-purpose files, such as "notes", "marked entries", etc. Maybe someone does not need.



$ GLOBALS ['fetch_regular_feeds'] = true; Whether to pull the main feeds on the list separately. You can chop off, for example, if for some reason you only need the main tape, where everything is mixed up ( $ GLOBALS parameter ['need_readinglist'] ).



$ GLOBALS ['atom_ext'] = "atom.xml.txt";

$ GLOBALS ['json_ext'] = "json.txt";

These are the settings of the file extensions that the script will assign to everything that it downloads, depending on the parameter $ GLOBALS ['is_atom'] will choose either one or the other.



$ GLOBALS ['save_dir'] = "./ feeds /"; directory in which to download. by default, it will create a feeds directory next to it, as you might guess from this parameter)



$ GLOBALS ['log_file'] - by default in the feeds subdirectory there will be a log.txt file in which everything that the script displays on the screen will be duplicated.



$ GLOBALS ['use_json_decode'] - whether to use the json_decode function, or get along with a simplified version. If you do this:

$ GLOBALS ['use_json_decode'] = function_exists ('json_decode'); then the machine will use the system function if it is supported by your version of PHP. Theoretically, it should work, but in real life I have nothing to try.



Well, the last setting is $ GLOBALS ['need_readinglist'] = false; , highlighted by a bunch of exclamation marks and a comment. Whether to drag the main tape of the Reader. There are a lot of posts, theoretically these are all posts from all subscriptions, piled up in a heap and sorted by dates, but in practice I have, for example, there are a little more than half of posts from subscriptions. In any case, it will be a large file, it will take a long time to swing, and it is not clear why it is needed. Well, or so to say: I do not know why anyone might need it. If you enlighten in the comments, thanks in advance, maybe it will make sense to deflate it))



Conclusion



Well, like everything, good luck to everyone, I hope this hand-made article will help someone. And prepare a place on the "screws" - I pull out about a gigabyte of data. For example, the subscription of the main Habr tape currently occupies almost 80 thousand records, the oldest of which are no longer available at Habré itself.



PS I can’t answer the question of how to import this saved data into any RSS reader. I think that not all readers will, in principle, support the import of subscription content from external sources. For myself, I don’t ask this question - because I am writing to myself a reading room under OS X. I don’t know if I will do and lick it for everyone or I will leave it only to myself. But I think that because here, on Habré, there are authors of some online readers, they may well later implement in their services support for importing this data. Or maybe they'll peep how to pull out the whole story and still implement it in their place - it's not just the way users almost universally complain that the reader, if it supports importing from GReader, for some reason, only 500-1000 recent records are pulled out for each subscription and that's it.

Source: https://habr.com/ru/post/181420/



All Articles