Bulk recording from cameras on elections - 2

Habr - not for politics. This article deals only with the technical aspects of the implementation of a specific software solution. For the common good, please refrain from any political debates, speeches, campaigning and similar actions in the comments. In addition, please do not apply this knowledge for destructive purposes, do not start backing up the entire video archive without special need, and so on. Thank.

September 8, single voting day. This year, the general public is invited to monitor the elections for the capital’s mayor via the Internet. A number of citizens find it interesting to record a picture from the cameras: someone has a politically motivated interest, and most are simply curious to look at themselves and those familiar with the eyes of the Internet. This article is intended to demonstrate the principles of the current system and to offer working concepts.

Since the last election, the system has changed a little bit (otherwise there would have been no article), so first we will remember how everything worked before and how it began to work now. So, each camera has a unique uid and a pool of servers from which video is streamed. Having formed a special request using this data, you can get a link to a piece of video recorded by the selected camera.
')
To begin, find the data on all existing cameras. The following method seemed the easiest to me: let's start the search by site number, from 1 to 3800. To do this, send GET vybory.mos.ru/json/id_search aaa / bbb .json, where bbb is uid and aaa is len ( bbb ). For example, vybory.mos.ru/json/id_search/1/3.json

Get json with information about this site, something like this:

[{"id":7933,"name":"   №3","num":"3","location_id":1162,"address":" , 36/9","raw_address":".,   .,  36/9","is_standalone":false,"size":null,"location":{"id":1162,"address":", ,   , 36/9","raw_address":".,   .,  36/9","district_id":1,"area_id":null,"sub_area_id":null,"locality_id":1,"street_id":1590,"lat":55.753266,"lon":37.577301,"max_zoom":17}}]

Of particular interest here is the id . Send a GET of the form vybory.mos.ru/account/channels?station_id= id , in this case vybory.mos.ru/account/channels?station_id=7933

In the answer we will get a line with krakozyablami, which my editor swears, but containing cameras' hashes and server addresses inside. We extract the hashes from there by the regular form
\ $ ([0-9a-h] {8} - [0-9a-h] {4} - [0-9a-h] {4} - [0-9a-h] {4} - [0- 9a-h] {12}) and ip addresses of the regular form : *? (\ D {1,3} \. \ D {1,3} \. \ D {1,3} \. \ D {1,3 })

As a result, we obtain the required information about the cameras of the current section:
2e9dd8dc-edd4-11e2-9a6b-f0def1c0f84c 188.254.112.2 188.254.112.3 188.254.112.4
2ea32990-edd4-11e2-9a6b-f0def1c0f84c 188.254.112.2 188.254.112.3 188.254.112.4

Next come the nuances. There are three types of cameras: old, new and missing. How do they differ, I will tell you a little later, first we will figure out how to distinguish between them, and it is very simple to distinguish between them - you need to send a GET of the form http: // SERVER /master.m3u8?cid= UID
New camera will return something like

# EXTM3U
# EXT-X-VERSION: 2
# EXT-X-STREAM-INF: PROGRAM-ID = 777, BANDWIDTH = 3145728
/variant.m3u8?cid=e1164950-0c19-11e3-803b-00163ebf8df9&var=orig

An old camera will return something like this:

# EXTM3U
# EXT-X-MEDIA-SEQUENCE: 136
# EXT-X-TARGETDURATION: 15
# EXT-X-ALLOW-CACHE: NO
# EXT-X-PROGRAM-DATE-TIME: 2013-09-04T12: 05: 40Z
#EXTINF: 15,
/segment.ts?cid=2ea32990-edd4-11e2-9a6b-f0def1c0f84c&var=orig&ts=1378296340.93-1378296355.93
#EXTINF: 15,
/segment.ts?cid=2ea32990-edd4-11e2-9a6b-f0def1c0f84c&var=orig&ts=1378296355.93-1378296370.93
#EXTINF: 15,
/segment.ts?cid=2ea32990-edd4-11e2-9a6b-f0def1c0f84c&var=orig&ts=1378296370.93-1378296385.93
#EXTINF: 15,
/segment.ts?cid=2ea32990-edd4-11e2-9a6b-f0def1c0f84c&var=orig&ts=1378296385.93-1378296400.93

Missing camera will not return anything except 404 CID Was Not Found :)

Now, when we are able to obtain information about the cameras of a particular site, we will write a multi-threaded parsile which will collect all the necessary information for us. I prefer to add the data to a free Mongolian, but it is possible to do with the usual shelve. Knowing that there are 3500+ plots in Moscow, let's run through the cycle from 1 to 3800. Below is a working code, sketched on the knee, but nevertheless. In it, of course, you need to enter your cookies and passwords from the server of the monga.

 # -*- coding: utf-8 -*- import json, re import httplib import threading from time import sleep import Queue from pymongo import MongoClient client = MongoClient('mongodb://admin:@.mongolab.com:43368/elections') db = client['elections'] data = db['data'] data.drop() def get_data(uid): print uid headers = {'Origin': 'vybory.mos.ru', 'X-Requested-With': 'XMLHttpRequest', 'User-Agent': 'Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0);', 'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8', 'Accept': '*/*', 'Referer': 'http://vybory.mos.ru/', 'Accept-Encoding': 'deflate,sdch', 'Accept-Language': 'ru-RU,ru;q=0.8,en-US;q=0.6,en;q=0.4', 'Accept-Charset': 'windows-1251,utf-8;q=0.7,*;q=0.3', 'Cookie': 'rack.session=' } try: conn = httplib.HTTPConnection('vybory.mos.ru') conn.request('GET', '/json/id_search/%d/%d.json'%(len(str(uid)), uid), None,headers) resp = conn.getresponse() try: content = json.loads(resp.read())[0] conn.request('GET', '/account/channels?station_id=%s'%content['id'], None,headers) resp = conn.getresponse() cont = resp.read() cnt=0 for i in cont.split('\x00')[1:]: cnt+=1 uid=re.findall(r'\$([0-9a-h]{8}-[0-9a-h]{4}-[0-9a-h]{4}-[0-9a-h]{4}-[0-9a-h]{12})', i)[0] ip=re.findall(r'.*?(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})', i) conn2 = httplib.HTTPConnection('%s'%ip[0]) conn2.request('GET', '/master.m3u8?cid=%s'%(uid), None,headers) info = conn2.getresponse().read() conn2.close() if '/segment.ts' in info: camtype='old' elif '/variant.m3u8' in info: camtype='new' else: camtype='nil' #print content data.insert({ 'name':content['name'], 'num':content['num'], 'addr':content['address'], 'uid':uid, 'ip':ip, 'cnt':str(cnt), 'type':camtype }) except Exception,e: pass except Exception,e: print e conn.close() queue = Queue.Queue() def repeat(): while True: try: item = queue.get_nowait() except Queue.Empty: break get_data(item) sleep(0.01) queue.task_done() for i in xrange(1, 3800): queue.put(i) for i in xrange(10): t = threading.Thread(target=repeat) t.start() queue.join() print data.find().count(),'all cams' print data.find({'type':'nil'}).count(),'offline cams' print data.find({'type':'old'}).count(),'old cams' print data.find({'type':'new'}).count(),'new cams'

Now we have a fully assembled camera base. At the time of writing the article, the old cameras were 544, with them, alas, it will only work as before.
But now we have 5778 new cameras, and they have one feature. Chunks from old cameras after a very short time foul - you need to constantly download a fresh playlist, rip out links to chunks and swing them until they are rotten. New cameras lack this flaw. You can download arbitrary sizes of chunks for an arbitrary period of time by sending a GET like http: // SERVER /segment.ts?cid= UID & var = orig & ts = BEGIN - END There can be not more than 15 seconds between BEGIN and END . I stopped at the chunks lasting 5 minutes. In fact, you can specify at least an hour, but in some cases, as far as I can tell, if the broadcast was interrupted during the chunk limits, the entire chunk will not download. Roughly speaking, if you try to download 8 hours from the archive in chunks by the hour and at the same time there was virtually no broadcast during a few minutes of a single chunk, the entire hour chunk does not download. Therefore, it is reasonable to choose a smaller chunk. Algorithmization gurus (of which, as we remember, 10%) can write their binary search, so that not a single video is gone =)
By the way, in order to close the question - the absent is the camera, which is registered in the portal, but in fact does not work.

Automate the download process. Here you could fence off your multi-threaded python bike, but I decided to use third-party software. We will generate a metafile with links to chunks for aria2c , metafiles for tsmuxer and launch them sequentially.

For example, something like this:

 # -*- coding: utf-8 -*- from time import sleep, time from pymongo import MongoClient import os import subprocess import shutil # ,     directory='e:/dumps' #  delta=300 #   num='666' client = MongoClient('mongodb://:@.mongolab.com:43368/elections') db = client['elections'] data = db['data'] #    8  start=int(time())-3600*8 #       try: os.mkdir('%s/%s'%(directory,num)) except: pass #           for i in data.find({'num':num}): if i['type']=='nil': print 'Offline camera',i['uid'] elif i['type']=='old': print 'Old camera',i['uid'] else: print 'New camera',i['uid'] f=open('links-%s-%s.txt'%(num, i['cnt']),'w') #     try: os.mkdir('%s/%s/%s'%(directory,num,i['cnt'])) except: pass cur=start files='' #      while True: if cur+delta>time(): for ip in i['ip']: url = 'http://{0}/segment.ts?cid={1}&var=orig&ts={2}.00-{3}'.format(ip, i['uid'], cur, time()) f.write('%s\t'%url) f.write('\n dir={0}/{1}/{2}\n out={3}.ts\n'.format(directory,num,i['cnt'],url[-27:])) files += '"{0}/{1}/{2}/{3}.ts"+'.format(directory,num,i['cnt'],url[-27:]) break else: for ip in i['ip']: url = 'http://{0}/segment.ts?cid={1}&var=orig&ts={2}.00-{3}.00'.format(ip, i['uid'], cur, cur+delta) f.write('%s\t'%url) f.write('\n dir={0}/{1}/{2}\n out={3}.ts\n'.format(directory,num,i['cnt'],url[-27:])) files += '"{0}/{1}/{2}/{3}.ts"+'.format(directory,num,i['cnt'],url[-27:]) cur+=delta #        . m=open('%s-%s.meta'%(num,i['cnt']),'w') m.write('MUXOPT --no-pcr-on-video-pid --new-audio-pes --vbr --vbv-len=500\n') m.write('V_MPEG4/ISO/AVC, %s, fps=23.976, insertSEI, contSPS, track=3300\n'%files[:-1]) m.write('A_AAC, %s, timeshift=-20ms, track=3301\n'%files[:-1]) m.close() f.close() subprocess.Popen('aria2c.exe -i links-%s-%s.txt -d %s -x 16'%(num, i['cnt'], directory), shell=True).communicate() subprocess.Popen('tsMuxeR.exe %s-%s.meta %s/%s-%s.ts\n'%(num, i['cnt'], directory, num,i['cnt']), shell=True).communicate() shutil.rmtree('%s/%s'%(directory,num)) os.remove('%s-%s.meta'%(num, i['cnt'])) os.remove('links-%s-%s.txt'%(num, i['cnt']))

Again, the code was written solely for the purpose of testing the concept and is not an example of compliance with PEP8, but it works fine. Download speed for obvious reasons depends on many factors.

UPD There is an opinion that the old cameras are systematically replaced with new ones. Last night there were 337 old and 5776 new ones, this morning - 273 old, 5811 new ones.

UPD It turns out that there is also webvybory2013.ru , where the picture goes with other options too. All that is written in this article applies to them, only the domain needs to be changed.

UPD Cameras are constantly changing their status, pay attention to it. With the old system are replaced with new ones.

Source: https://habr.com/ru/post/192590/

All Articles

Bulk recording from cameras on elections - 2

More articles: