📜 ⬆️ ⬇️

Saving Google Reader Data

The closer the closing date of Google Reader, the more urgent becomes the question of not only transferring subscriptions to a similar service, but also saving all current records.

The solutions found, including those in Habré ( here and here ), did not work mainly for two reasons: there is no possibility of saving to the database and the slow speed of work. I had to collect my bike - grbackup , which
grbackup -e fake@gmail.com -p password -ba -o mongodb://localhost:27017 -w 20 

for 20 minutes saved 328,250 records from 102 subscriptions to the local MongoDB database.

Key features:

')
The available storage types are determined by extensions (plug-ins) and are set using the option ( -o, --output ) of the type type: uri .
At the time of this writing, the following extensions are available:

Performance tested on Ubuntu (64) and Win7 (64).
Book of suggestions and comments can be found here .
Below is a detailed description of the utility.

Installation

 pip install grbackup 

or
 easy_install grbackup 

or
 pip install git+git://github.com/wistful/grbackup.git 

Command line options

Authorization

grbackup supports two actions:

The action can be performed on one of four data types:

Additional options:

All options, as well as plugin descriptions, can be viewed using the -h, --help option:
 grbackup -h 

previous command output
 Usage: grbackup [options] [args] Examples: list subscriptions: grbackup -e email@gmail.com -p password -ls list topics: grbackup -e email@gmail.com -p password -lt http://feed.com list starred: grbackup -e email@gmail.com -p password -lx list all items: grbackup -e email@gmail.com -p password -la backup subscriptions: grbackup -e email@gmail.com -p password -bs -o json:/tmp/subscriptions.json backup topics: grbackup -e email@gmail.com -p password -bt http://myfeed.com -o json:/tmp/myfeed.json backup starred into MongoDB: grbackup -e email@gmail.com -p password -bx -o mongodb://localhost:27017 backup all items into Redis: grbackup -e email@gmail.com -p password -ba -o redis://localhost:6379/3 Available plugins: mongodb: save items into MongoDB output scheme: mongodb://[username:password@]hostN[:portN]]][/[db][?opts]] output examples: mongodb://localhost:27017 mongodb://user:pwd@localhost,localhost:27018/?replicaSet=grbackup json: save items into file output scheme: json:/path/to/file.json output examples: json:/home/grbackup/grbackup.json json:/tmp/grbackup/grbackup.json redis: save items into Redis output scheme: redis://username:password@host[:port]/dbindex output examples: redis://localhost:6379/3 redis://user:password@localhost:6379/0 Options: Auth Options: -e EMAIL, --email=EMAIL gmail account -p PWD, --password=PWD account password Command Options: -b, --backup backup items -l, --list list items Scope Options: -a, --all processing all items -s, --subscriptions processing subscriptions only -t, --topics processing topics only -x, --starred processing starred topics only MongoDB Options: --mongodb-db=MONGODB_DB the name of the database[default: greader] --mongodb-scol=MONGODB_SUBSCRIPTIONS collection name for subscriptions[default: subscriptions] --mongodb-tcol=MONGODB_TOPICS collection name for topics[default: topics] --mongodb-tstar=MONGODB_STARRED collection name for starred items[default: starred] --mongodb-w=MONGODB_W <int> Write operations will block until they have been replicated to the specified number [default: 1] --mongodb-j block until write operations have been committed to the journal [default: False] Redis Options: --redis-scol-prefix=REDIS_SUBS subscriptions key prefix[default: subscription] --redis-tcol-prefix=REDIS_TOPICS topics key prefix[default: topic] --redis-xcol-prefix=REDIS_STARRED starred key prefix[default: starred] Other Options: -w WORKERS, --workers=WORKERS number of workers [default: 1] -o OUTPUT, --output=OUTPUT output uri -n COUNT, --count=COUNT the number of topics that can be read at once [default: 200] -c CODING, --coding=CODING output coding [default: utf8] -v, --verbose verbose output -h, --help 


Using

list of subscriptions:
 grbackup -e email@gmail.com -p password -ls 

list of specific subscription records:
  grbackup -e email@gmail.com -p password -lt http://habrahabr.ru/rss/hub/python/ 

list of all marked entries:
 grbackup -e email@gmail.com -p password -lx 

list of all entries:
 grbackup -e email@gmail.com -p password -la 

saving subscriptions to json file:
  grbackup -e email@gmail.com -p password -bs -o json:/tmp/subscriptions.json 

saving all records of a specific subscription to a json file:
 grbackup -e email@gmail.com -p password -bt http://habrahabr.ru/rss/hub/python/ -o json:/tmp/python.json 

save all selected entries in MongoDB:
 grbackup -e email@gmail.com -p password -bx -o mongodb://localhost:27017 

save all entries in Redis using 20 streams:
 grbackup -e email@gmail.com -p password -ba -o redis://localhost:6379/3 -w 20 

Plugins

Json

General URI format: json: /path/to/file.json
Multithreading support: yes
Usage example:
 grbackup -e email@gmail.com -p password -ba -o json:/home/grbackup/grbackup.json 

Records are saved to a separate file as a list of objects.
There are three types of objects:


MongoDB

General URI format: mongodb: // [username: password @] hostN [: portN]]] [/ [db] [? Opts]]
Multithreading support: yes
Usage example:
 grbackup -e email@gmail.com -p password -ba -o mongodb://localhost:27017 -w 20 

Records are arranged in three collections: subscriptions , topics , starred .
Collection names can be changed.

Redis

General URI format: redis: // username: password @ host [: port] / dbindex
Multithreading support: yes
Usage example:
 grbackup -e email@gmail.com -p password -ba -o redis://user:password@localhost:6379/0 -w 20 

To store records, the Hashes data type is used .
Keys can be of three types: “subscription: record_id” , “starred: record_id” , “topic: record_id” , where record_id is the unique identifier of the record.
Key prefixes can be changed.

Own plugin

The module should be in the grb_plugins package, the module name does not matter.
Module structure:

An example of a plugin using logging to save records to a file
 #!/usr/bin/env python # coding=utf-8 from optparse import OptionGroup import logging plugin_type = "myplugin" support_threads = True description = """save items using logging output scheme: myplugin:/path/to/logfile.log output examples: myplugin:/tmp/storage.log """ def add_option_group(parser): # Plugin Options myplugin_group = OptionGroup(parser, "myplugin Options") myplugin_group.add_option("--myplugin-format", dest="format", type="str", default="%(asctime)s %(message)s", help="record format" "[default: %default]") myplugin_group.add_option("--myplugin-datefmt", dest="datefmt", type="str", default="%m/%d/%Y %I:%M:%S %p", help="date format" "[default: %default]") parser.add_option_group(myplugin_group) class WriteMyPlugin(object): def __init__(self, logger): self.logger = logger def put_subscription(self, subscription): subscription_url = subscription['id'][5:] self.logger.warning("write subscription: %s", subscription_url) def put_all(self, subscription, topic): subscription_url = subscription['id'][5:] self.put_subscription(subscription) self.put_topic(subscription_url, topic) def put_starred(self, topic): self.logger.warning("write starred: %s", topic.get('title', '')) def put_topic(self, subscription_url, topic): self.logger.warning("write topic: %s %s", subscription_url, topic.get('title', '')) class writer(object): def __init__(self, opt): path = opt.output[opt.output.index(":") + 1:] self._logger = logging.getLogger("myplugin") handler = logging.FileHandler(path) handler.setFormatter(logging.Formatter(opt.format, opt.datefmt)) self._logger.addHandler(handler) def __enter__(self): return WriteMyPlugin(self._logger) def __exit__(self, *exc_info): pass 

Source: https://habr.com/ru/post/182712/


All Articles