📜 ⬆️ ⬇️

vk.com - Saving audio recordings, documents, the contents of the wall

I have long noticed that the data on social networks is stored poorly. For example, the repost you made will be empty if the author of the original entry deletes it. Recent problems with audio recordings in vk are the last straw, and I decided to save locally all the data that might be of interest in case of a nuclear war . After searching for ready-made solutions, I did not find anything that would suit me, so a script in Python was written in a few days.

Goals


Save all you can: audio recordings, documents, wall. From the wall you need to drag off all attachments to posts, and comments with all attachments will not be superfluous either. It is necessary at least to preserve all the posts with music and comments, where friends sent good tracks or cats . I’ll say right away that for my purposes there was no readable backup of additional information (likes, record creation time, etc.).

For the cause!


The process of creating such an application has already been described many times in Habré, so I will not repeat all the details, I will describe the work steps in brief, and I will also say a few words about the problems. That article was not overloaded with source codes, in the end there will be a link to github.
')
Considerations in the development



Continue

The reference code for the api is taken from the article of habrauzer dzhioev , and the handling of the situations described above is added. To be what to save (in the case of processing the wall), you must first find out the number of posts:
  #determine posts count (response, json_stuff) = call_api("wall.get", [("owner_id", args.id), ("count", 1), ("offset", 0)], args) count = response[0] 

Further we request each post separately and we sort it
  for x in xrange(args.wall_start, args.wall_end): (post, json_stuff) = call_api("wall.get", [("owner_id", args.id), ("count", 1), ("offset", x)], args) process_post(("wall post", x), post, post_parser, json_stuff) 

The result of the request is a set of data in JSON, which are interpreted into standard structures for python using json.loads () from the standard library. As a result, we have a hash array, in which some fields (key-value) carry the payload, and the rest do not interest us. In order not to write with our hands what field to use for which method, let us use the power of reflection: we will look for a method whose name matches the key of interest.
  for k in raw_data.keys(): try: f = getattr(self, k) keys.append(k) funcs.append(f) except AttributeError: logging.warning("Not implemented: {}".format(k)) logging.info("Saving: {} for {}".format(', '.join(keys), raw_data['id'])) for (f, k) in zip(funcs, keys): f(k, raw_data) 


Parsim

Now you need to deal with the answer fields. The interesting ones are attachments, text, comments. Attachments is a list of applications for the post (audio, pictures, documents, notes), you must be able to download each type. We determine which method to process each attachment in the same way: by the type of attachment, we are looking for a method with a suitable name. Here is an example of a rocker for audio:
  def dl_audio(self, data): aid = data["aid"] owner = data["owner_id"] request = "{}_{}".format(owner, aid) (audio_data, json_stuff) = call_api("audio.getById", [("audios", request), ], self.args) try: data = audio_data[0] name = u"{artist} - {title}.mp3".format(**data) self.save_url(data["url"], name) except IndexError: # deleted :( logging.warning("Deleted track: {}".format(str(data))) return # store lyrics if any try: lid = data["lyrics_id"] except KeyError: return (lyrics_data, json_stuff) = call_api("audio.getLyrics", [("lyrics_id", lid), ], self.args) text = lyrics_data["text"].encode('utf-8') ... 

Unfortunately, the recordings taken at the request of the copyright holders are no longer available; an empty response is returned for them.

But other?


Methods for processing images, text, notes, upload documents and the rest - in github . I can only say that everything is similar to the examples given. The script also has command line arguments, it makes no sense to describe them in the article. Examples and other details are in the readme .

Todo

I did not save the photo albums, because I have nothing important stored there, and the kilonet code from his article works well. The videos and notes are still not saved, it seemed to me not very necessary.

Lastly

The code is far from ideal and does not differ in the absence of crutches, but performs the task. I hope someone will be useful my craft, to save their records / documents / music, or for training.

UPD 12/18/2016

The hiwent user says that since 12/16/2016, the vk closed the possibility to use the API for working with audio recordings. In this regard, the script functionality provided for saving audio recordings does not work. In this regard, you can try to “pretend” to the native application vk, for example, the android version, or kate mobile. For them, the ability to work with audio recordings will not disappear, although there may be different methods.

Source: https://habr.com/ru/post/184224/


All Articles