Capturing video from network cameras, part 2

In my first article “measuring the distance to an object and its speed,” I looked at capturing images from webcams via Video4Linux2 and via DirectX. In the next article, “Capturing Video from Network Cameras, Part 1,” I looked at how to work with Motion-JPEG network cameras. Now I will tell you about capturing images from network RTSP cameras, in particular the Motion-JPEG stream via RTSP.

This task is more complicated than Motion-JPEG via HTTP, since more actions are needed, more connections are needed, but in return we get more flexibility, speed, functionality and even some kind of versatility. Honestly, RTSP for simple tasks is redundant, but I have no doubt that there will be situations where it will be necessary.

What is RTSP

RTSP stands for Real Time Streaming Protocol - a real-time streaming protocol - in essence it is a broadcast control protocol, it allows you to perform several commands, such as “start”, “stop”, “transition to a specific time”. This protocol is similar to HTTP in the implementation, there are also headers, everything is also transmitted in text form. Here are the main commands from the specification :

OPTIONS - returns a list of supported methods (OPTIONS, DESCRIBE, etc.);
DESCRIBE - content description request, describes each track in SDP format;
SETUP - request to establish connections and transport for threads;
PLAY - start broadcasting;
TEARDOWN - stop broadcasting.

And the feature of RTSP is that it does not in itself transmit the video data we need! The whole protocol is just to establish a connection. Here is an analogy with MVC, there is a separation between the data and their description.
')
The workhorse is another protocol: RTP - Real-time Transport Protocol - real-time transport protocol. With the help of it, the data we need is transmitted. It should be noted that it is very pleasant to work with this protocol, the fact is that it facilitates the client software to recover data after their fragmentation at the data link level. It also carries several useful fields: the format of the transmitted data, the time stamp and the synchronization field (if, for example, audio and video are transmitted simultaneously). Although this protocol can work over TCP, it is usually used with UDP because of its speed orientation. That is, RTP data is a UDP datagram with a header and payload of media content (payload).

It would seem that we do not need anything else. Connect via RTSP, pick up via RTP. But this was not the case, smart uncles came up with a third protocol: RTCP - Real-time Transport Control Protocol - a real-time transport control protocol. This protocol serves to determine the quality of service; with its help, the client and server know how good or bad the content is being transferred. In accordance with this data, the server, for example, can lower the bitrate or even switch to another codec.

It is assumed that RTP uses an even port number, and RTCP is the next odd one.

RTSP communication example

I have only one source of the RTSP stream - the eVidence APIX Box M1 camera, so all examples relate to it.

Below is a log of communication between the VLC player (it really helps me in my research) and this camera. The first request from VLC is on port 554 of the camera. The answer is a blank line and begins with "RTSP / 1.0".

01: OPTIONS rtsp://192.168.0.254/jpeg RTSP/1.0 02: CSeq: 1 03: User-Agent: VLC media player (LIVE555 Streaming Media v2008.07.24) 04: 05: RTSP/1.0 200 OK 06: CSeq: 1 07: Date: Fri, Apr 23 2010 19:54:20 GMT 08: Public: OPTIONS, DESCRIBE, SETUP, TEARDOWN, PLAY, PAUSE 09: 10: DESCRIBE rtsp://192.168.0.254/jpeg RTSP/1.0 11: CSeq: 2 12: Accept: application/sdp 13: User-Agent: VLC media player (LIVE555 Streaming Media v2008.07.24) 14: 15: RTSP/1.0 200 OK 16: CSeq: 2 17: Date: Fri, Apr 23 2010 19:54:20 GMT 18: Content-Base: rtsp://192.168.0.254/jpeg/ 19: Content-Type: application/sdp 20: Content-Length: 442 21: x-Accept-Dynamic-Rate: 1 22: 23: v=0 24: o=- 1272052389382023 1 IN IP4 0.0.0.0 25: s=Session streamed by "nessyMediaServer" 26: i=jpeg 27: t=0 0 28: a=tool:LIVE555 Streaming Media v2008.04.09 29: a=type:broadcast 30: a=control:* 31: a=range:npt=0- 32: a=x-qt-text-nam:Session streamed by "nessyMediaServer" 33: a=x-qt-text-inf:jpeg 34: m=video 0 RTP/AVP 26 35: c=IN IP4 0.0.0.0 36: a=control:track1 37: a=cliprect:0,0,720,1280 38: a=framerate:25.000000 39: m=audio 7878 RTP/AVP 0 40: a=rtpmap:0 PCMU/8000/1 41: a=control:track2 42: 43: 44: SETUP rtsp://192.168.0.254/jpeg/track1 RTSP/1.0 45: CSeq: 3 46: Transport: RTP/AVP;unicast;client_port=41760-41761 47: User-Agent: VLC media player (LIVE555 Streaming Media v2008.07.24) 48: 49: RTSP/1.0 200 OK 50: CSeq: 3 51: Cache-Control: must-revalidate 52: Date: Fri, Apr 23 2010 19:54:20 GMT 53: Transport: RTP/AVP;unicast;destination=192.168.0.4;source=192.168.0.254;client_port=41760-41761; server_port=6970-6971 54: Session: 1 55: x-Transport-Options: late-tolerance=1.400000 56: x-Dynamic-Rate: 1 57: 58: SETUP rtsp://192.168.0.254/jpeg/track2 RTSP/1.0 59: CSeq: 4 60: Transport: RTP/AVP;unicast;client_port=7878-7879 61: Session: 1 62: User-Agent: VLC media player (LIVE555 Streaming Media v2008.07.24) 63: 64: RTSP/1.0 200 OK 65: CSeq: 4 66: Cache-Control: must-revalidate 67: Date: Fri, Apr 23 2010 19:54:20 GMT 68: Transport: RTP/AVP;unicast;destination=192.168.0.4;source=192.168.0.254;client_port=7878-7879; server_port=6972-6973 69: Session: 1 70: x-Transport-Options: late-tolerance=1.400000 71: x-Dynamic-Rate: 1 72: 73: PLAY rtsp://192.168.0.254/jpeg/ RTSP/1.0 74: CSeq: 5 75: Session: 1 76: Range: npt=0.000- 77: User-Agent: VLC media player (LIVE555 Streaming Media v2008.07.24) 78: 79: RTSP/1.0 200 OK 80: CSeq: 5 81: Date: Fri, Apr 23 2010 19:54:20 GMT 82: Range: npt=0.000- 83: Session: 1 84: RTP-Info: url=rtsp://192.168.0.254/jpeg/track1;seq=20730; rtptime=3869319494,url=rtsp://192.168.0.254/jpeg/track2;seq=33509;rtptime=3066362516 85: 86: #              87: 88: TEARDOWN rtsp://192.168.0.254/jpeg/ RTSP/1.0 89: CSeq: 6 90: Session: 1 91: User-Agent: VLC media player (LIVE555 Streaming Media v2008.07.24) 92: 93: RTSP/1.0 200 OK 94: CSeq: 6 95: Date: Fri, Apr 23 2010 19:54:25 GMT

First of all, VLC asks the camera:
- What can I do with you? (OPTIONS)
- And hello to you. And can you ask me to do any of OPTIONS, DESCRIBE, SETUP, TEARDOWN, PLAY and PAUSE.
- Okay, then tell me what you have on request "/ jpeg"? (DESCRIBE)
- Here I have a video of the first track, M-JPEG, and the second track is simple audio.
- It is interesting to look at the video, the first track, pour it to me, please in the pocket number 41760, and you can throw off any husks in the pocket number 41761. (SETUP track1)
- OK, at your command ...
- And I also want to listen to the sound, a rash in 7878, 7879 pockets. (SETUP track2)
- No problem.
- Well, sprinkled. (PLAY)
Over time:
- Okay, enough, I saw enough. (TEARDOWN)
- As you say.

This little lyrical digression ends. In the first query, " OPTIONS rtsp://192.168.0.254/jpeg RTSP/1.0 " reminds " GET /jpeg HTTP/1.1 " in the sense that the conversation begins with this, and the HTTP protocol also has an OPTIONS method. Here 192.168.0.254 is the IP address of my camera. CSeq reflects the sequence number of the request, the response from the server must contain the same CSeq .

And the response from the server starts with " RTSP/1.0 200 OK ", this is just like " HTTP/1.1 200 OK " - a sign that everything is fine: the request is accepted, the request is clear and there were no problems with its implementation. And direct text should list all available methods.

Next, we collect information about what awaits us on request / jpeg, because we just followed him and came to the link " rtsp://192.168.0.254/jpeg ". We also indicate that we want to receive a response in the form of SDP (line 12).

In response, we receive an RTSP header indicating the Content-Type and Content-Length , and after the header, the content itself is in the SDP format via the empty string:

 v=0 o=- 1272052389382023 1 IN IP4 0.0.0.0 s=Session streamed by "nessyMediaServer" i=jpeg t=0 0 a=tool:LIVE555 Streaming Media v2008.04.09 a=type:broadcast a=control:* a=range:npt=0- a=x-qt-text-nam:Session streamed by "nessyMediaServer" a=x-qt-text-inf:jpeg m=video 0 RTP/AVP 26 c=IN IP4 0.0.0.0 a=control:track1 a=cliprect:0,0,720,1280 a=framerate:25.000000 m=audio 7878 RTP/AVP 0 a=rtpmap:0 PCMU/8000/1 a=control:track2

Everything is pretty obvious here. We need the following lines:

 #   m=video 0 RTP/AVP 26 #   RTP/AVP,  ,   26,   Motion-JPEG a=control:track1 #   a=cliprect:0,0,720,1280 #    a=framerate:25.000000 #       #   m=audio 7878 RTP/AVP 0 #  7878,    , 0 - PCM a=control:track2 #

If we want to receive only video, then from the audio data we ignore everything except the name of the track. We need it to configure the stream, but no one forces us to accept this stream, but the camera refuses to work if the audio is completely ignored (if SETUP done only for the video track).

Honestly, I don’t know how different cameras will react if we neglect the port number for the audio stream (7878), because we specify it with the SETUP command.

Next come two SETUP requests, with an indication of the ports to which we would like to receive video and audio streams. The first number is the port for RTP, the second is for RTCP. The response of the camera contains information about the ports, you can check them to make sure that everything is configured correctly. We also need to remember the Session ID. We will need to specify it in all subsequent calls.

After the PLAY command, the transmission of video to port 41760 and audio to port 7878 will begin. And the TEARDOWN command TEARDOWN broadcasting, the connection is broken.

MJPEG over RTP

RTP packets come to us, we need to decrypt them. For this, I will provide here a table of such a package with a description of all the fields.

+ Bit offset	0-1	2	3	4-7	eight	9-15	16-31
0	V	P	X	CC	M	PT	Sequence number
32	Timestamp
64	SSRC Identifier
96	... CSRC Identifiers ...
96+ (CC × 32)	Extension Header ID						Extension Header Length (EHL)
96+ (CC × 32) + (X × 32)	... Extension Header ...
96+ (CC × 32) + (X × 32) + (X × EHL)	Payload

V (Version): (2) protocol version. Now version number 2.
P (Padding, Addition): (1) is used in cases when the RTP packet is supplemented with empty bytes at the end, for example, for encryption algorithms.
X (Extension): (1) indicates the presence of an extended header, determined by the application. In our case, this is not used.
CC (CSRC Count): (4) contains the number of CSRC identifiers. We are also not used.
M (Marker): (1) is used at the application level, in our case this bit is set to one if the RTP packet contains the end of a JPEG frame.
PT (Payload Type): (7) indicates the format of the payload - the transmitted data. For MJPEG it is 26.
Sequence Number : (16) RTP packet number, used to detect lost packets.
Timestamp (32): timestamp, in our case, 90000 hertz (90000 = 1 second).
SSRC (Synchronization Source): (32) synchronizer identifier, no matter how funny it would sound. Specifies the source of the stream.
CSRC (Contributing Source): (32) identifiers of additional sources, used when our stream comes from several places.
Extension Header ID : (16) an extension identifier, if we have one, we need to know what it is. In our case, not used.
Extension Header Length : (16) is the length of this header in bytes.
Extension Header : The header itself. Content can be very different, depending on context.
Payload : The payload is our very JPEG frames. Fragmented, of course.

Fields starting with CSRC are optional. To transfer MJPEG from cameras, they are not used, as far as I know.

Fast forward to one level of encapsulation. Now the task is to convert the received video data into a full JPEG image. In the case of MJPEG over HTTP, everything is simple - we cut a piece of the stream and work with it right away as with a JPEG image. In the case of RTP, the image is not completely transmitted, the JPEG header is omitted to save traffic. It must be restored independently from the attached data.

The RTP Payload for MJPEG specification is described in RFC2435 . I will also give you a table with a description of all the format fields:

+ Bit offset	0-7	8-15	16-23	24-31
0	Type-specific	Fragment offset
32	Type	Q	Width	Height
if Type in 64..127	Restart marker header
if Q in 128..255	MBZ	Precision	Length
if Q in 128..255	Quantization Table Data

Type-specific (depends on type): (8) the meaning of the field depends on the implementation, in our case it does not apply.
Fragment Offset : (24) indicates the position of the current frame fragment in the entire frame.
Type (Type): (8) depends on the type of how the image is restored.
Q (Quality): (8) image quality.
Width : (8) frame width.
Height : (8) and height.
Restart Marker header (RST markers header): (32) is used when decoding JPEG, if RST markers are used. I do not know if their cameras are using them or not, but I ignore this headline. This field appears only when Type is from 64 to 127.
Quantization Table Data (quantization tables): if they are present, then you do not need to calculate them separately. And they need to properly recreate images from JPEG data. If these tables are not correct, then the image will be with wrong colors and contrasts. There should be two tables: Luma and Chroma for brightness and chromaticity, respectively.
MBZ, Precision, Length : (32) parameters of quantization tables, I ignore them, Length set equal to 128 - two tables of 64 bytes each. Otherwise, I do not know how to work with them.

The header of the RST and quantization tables may not be present. If there is no first, then very well, since I don’t count on anything else. If there is no second, the required tables are calculated based on the parameter Q.

The RTCP package contains a certain subset, it is of four types: 201 - source report, 202 - receiver's report, 203 - source description, and 204 - the destination is determined by the application. We must first take the 201 type, then send the 202 type. 203 and 204 are optional, but I also consider them. There can be several RTCP packets in a single UDP packet.

All types have a similar structure. Any RTCP packet starts with the following data:

+ Bit offset	0-1		2	3-7					8-15								16-31
0	Version		Padding	SC or RC or Subtype					Packet type								Length

Version : (2) RTP version.
Padding : (1) the same as for RTP.
SC or RC or Subtype : (5) depending on the type may be the number of sources (Sources Count) or the number of recipients (Receivers Count) included in the report of the recipient and the source, respectively. If this is an APP packet, this field defines the subtype of such packet.
Packet Type : (8) packet type, 201 — Sender's Report SS, 202 — Receiver's Report RR, 203 — Source Description SDES, and 204 — the destination is determined by the application (APP).
Length : (16) The size of the data following the header is measured in 32 bit units.

Further I will not give fields for each subtype, it is possible to look at them in RFC3550 . Let me just say that the SS and RR types carry information about sent / received packets and time delays. SDES carries various text fields that define the source, such as its name, email, telephone, location, etc.

This introduction ends.

Python MJPEG over RTSP client

So we got to the python. The client consists of several files, main.py contains a callback function that processes the received images, it also launches the mechanisms of the Twisted network framework and stores the connection parameters to the camera. All the listings I quote are shortened, the full version can be downloaded from the link at the end of the article.
main.py

 20: def processImage(img): 21: 'This function is invoked by the MJPEG Client protocol' 22: # Process image 23: # Just save it as a file in this example 24: f = open('frame.jpg', 'wb') 25: f.write(img) 26: f.close() 27: 28: def main(): 29: print 'Python M-JPEG Over RSTP Client 0.1' 30: config = {'request': '/jpeg', 31: 'login': '', 32: 'password': 'admin', 33: 'ip': '192.168.0.252', 34: 'port': 554, 35: 'udp_port': 41760, 36: 'callback': processImage} 37: # Prepare RTP MJPEG client (technically it's a server) 38: reactor.listenUDP(config['udp_port'], rtp_mjpeg_client.RTP_MJPEG_Client(config)) 39: reactor.listenUDP(config['udp_port'] + 1, rtcp_client.RTCP_Client()) # RTCP 40: # And RSTP client 41: reactor.connectTCP(config['ip'], config['port'], rtsp_client.RTSPFactory(config)) 42: # Run both of them 43: reactor.run() 44: # On exit: 45: print 'Python M-JPEG Client stopped.'

In principle, you can work without implementing the RTCP protocol and receiving audio data. In this case, the camera breaks the connection in about a minute. You have to reconnect all the time, this is done automatically, so it does not cause problems. However, for the article I added a part of the RTCP and made a preparation for receiving audio data.

The next important file is rtsp_client.py . He is the most confused, but his goal is obvious - to establish the connection correctly described above.
rtsp_client.py

 012: class RTSPClient(Protocol): 013: def __init__(self): 014: self.config = {} 015: self.wait_description = False 016: 017: def connectionMade(self): 018: self.session = 1 019: # Authorization part 020: if self.config['login']: 021: authstring = 'Authorization: Basic ' + b64encode(self.config['login']+':'+self.config['password']) + '\r\n' 022: else: 023: authstring = '' 024: # send OPTIONS request 025: to_send = """\ 026: OPTIONS rtsp://""" + self.config['ip'] + self.config['request'] + """ RTSP/1.0\r 027: """ + authstring + """CSeq: 1\r 028: User-Agent: Python MJPEG Client\r 029: \r 030: """ 031: self.transport.write(to_send) 032: if debug: 033: print 'We say:\n', to_send 034: 035: def dataReceived(self, data): 036: if debug: 037: print 'Server said:\n', data 038: # Unify input data 039: data_ln = data.lower().strip().split('\r\n', 5) 040: # Next behaviour is relevant to CSeq 041: # which defines current conversation state 042: if data_ln[0] == 'rtsp/1.0 200 ok' or self.wait_description: 043: # There might be an audio stream 044: if 'audio_track' in self.config: 045: cseq_audio = 1 046: else: 047: cseq_audio = 0 048: to_send = '' 049: if 'cseq: 1' in data_ln: 050: # CSeq 1 -> DESCRIBE 051: to_send = """\ 052: DESCRIBE rtsp://""" + self.config['ip'] + self.config['request'] + """ RTSP/1.0\r 053: CSeq: 2\r 054: Accept: application/sdp\r 055: User-Agent: Python MJPEG Client\r 056: \r 057: """ 058: elif 'cseq: 2' in data_ln or self.wait_description: 059: # CSeq 2 -> Parse SDP and then SETUP 060: data_sp = data.lower().strip().split('\r\n\r\n', 1) 061: # wait_description is used when SDP is sent in another UDP 062: # packet 063: if len(data_sp) == 2 or self.wait_description: 064: # SDP parsing 065: video = audio = False 066: is_MJPEG = False 067: video_track = '' 068: audio_track = '' 069: if len(data_sp) == 2: 070: s = data_sp[1].lower() 071: elif self.wait_description: 072: s = data.lower() 073: for line in s.strip().split('\r\n'): 074: if line.startswith('m=video'): 075: video = True 076: audio = False 077: if line.endswith('26'): 078: is_MJPEG = True 079: if line.startswith('m=audio'): 080: video = False 081: audio = True 082: self.config['udp_port_audio'] = int(line.split(' ')[1]) 083: if video: 084: params = line.split(':', 1) 085: if params[0] == 'a=control': 086: video_track = params[1] 087: if audio: 088: params = line.split(':', 1) 089: if params[0] == 'a=control': 090: audio_track = params[1] 091: if not is_MJPEG: 092: print "Stream", self.config['ip'] + self.config['request'], 'is not an MJPEG stream!' 093: if video_track: self.config['video_track'] = 'rtsp://' + self.config['ip'] + self.config['request'] + '/' + basename(video_track) 094: if audio_track: self.config['audio_track'] = 'rtsp://' + self.config['ip'] + self.config['request'] + '/' + basename(audio_track) 095: to_send = """\ 096: SETUP """ + self.config['video_track'] + """ RTSP/1.0\r 097: CSeq: 3\r 098: Transport: RTP/AVP;unicast;client_port=""" + str(self.config['udp_port']) + """-"""+ str(self.config['udp_port'] + 1) + """\r 099: User-Agent: Python MJPEG Client\r 100: \r 101: """ 102: self.wait_description = False 103: else: 104: # Do not have SDP in the first UDP packet, wait for it 105: self.wait_description = True 106: elif "cseq: 3" in data_ln and 'audio_track' in self.config: 107: # CSeq 3 -> SETUP audio if present 108: self.session = data_ln[5].strip().split(' ')[1] 109: to_send = """\ 110: SETUP """ + self.config['audio_track'] + """ RTSP/1.0\r 111: CSeq: 4\r 112: Transport: RTP/AVP;unicast;client_port=""" + str(self.config['udp_port_audio']) + """-"""+ str(self.config['udp_port_audio'] + 1) + """\r 113: Session: """ + self.session + """\r 114: User-Agent: Python MJPEG Client\r 115: \r 116: """ 117: reactor.listenUDP(self.config['udp_port_audio'], rtp_audio_client.RTP_AUDIO_Client(self.config)) 118: reactor.listenUDP(self.config['udp_port_audio'] + 1, rtcp_client.RTCP_Client()) # RTCP 119: elif "cseq: "+str(3+cseq_audio) in data_ln: 120: # PLAY 121: to_send = """\ 122: PLAY rtsp://""" + self.config['ip'] + self.config['request'] + """/ RTSP/1.0\r 123: CSeq: """ + str(4+cseq_audio) + """\r 124: Session: """ + self.session + """\r 125: Range: npt=0.000-\r 126: User-Agent: Python MJPEG Client\r 127: \r 128: """ 129: elif "cseq: "+str(4+cseq_audio) in data_ln: 130: if debug: 131: print 'PLAY' 132: pass 133: 134: elif "cseq: "+str(5+cseq_audio) in data_ln: 135: if debug: 136: print 'TEARDOWN' 137: pass 138: 139: if to_send: 140: self.transport.write(to_send) 141: if debug: 142: print 'We say:\n', to_send

In the case of an audio track, this module also runs rtp_audio_client.py and the corresponding RTCP client.

After a successful connection, rtp_mjpeg_client.py accepted for rtp_mjpeg_client.py , processing the incoming data stream.
rtp_mjpeg_client.py

 08: class RTP_MJPEG_Client(DatagramProtocol): 09: def __init__(self, config): 10: self.config = config 11: # Previous fragment sequence number 12: self.prevSeq = -1 13: self.lost_packet = 0 14: # Object that deals with JPEGs 15: self.jpeg = rfc2435jpeg.RFC2435JPEG() 16: 17: def datagramReceived(self, datagram, address): 18: # When we get a datagram, parse it 19: rtp_dg = rtp_datagram.RTPDatagram() 20: rtp_dg.Datagram = datagram 21: rtp_dg.parse() 22: # Check for lost packets 23: if self.prevSeq != -1: 24: if (rtp_dg.SequenceNumber != self.prevSeq + 1) and rtp_dg.SequenceNumber != 0: 25: self.lost_packet = 1 26: self.prevSeq = rtp_dg.SequenceNumber 27: # Handle Payload 28: if rtp_dg.PayloadType == 26: # JPEG compressed video 29: self.jpeg.Datagram = rtp_dg.Payload 30: self.jpeg.parse() 31: # Marker = 1 if we just received the last fragment 32: if rtp_dg.Marker: 33: if not self.lost_packet: 34: # Obtain complete JPEG image and give it to the 35: # callback function 36: self.jpeg.makeJpeg() 37: self.config['callback'](self.jpeg.JpegImage) 38: else: 39: #print "RTP packet lost" 40: self.lost_packet = 0 41: self.jpeg.JpegPayload = ""

He is easy to understand. Every time we take another datagram, we parse it using the rtp_datagram.py module, and feed the result to the rfc2435jpeg.py module, which creates a full-fledged JPEG image. Next, we wait for the appearance of the marker rtp_dg.Marker and how it appears, we call the callback function with the restored image.

The RTP datagram parser looks like this:
rtp_datagram.py

 26: def parse(self): 27: Ver_P_X_CC, M_PT, self.SequenceNumber, self.Timestamp, self.SyncSourceIdentifier = unpack('!BBHII', self.Datagram[:12]) 28: self.Version = (Ver_P_X_CC & 0b11000000) >> 6 29: self.Padding = (Ver_P_X_CC & 0b00100000) >> 5 30: self.Extension = (Ver_P_X_CC & 0b00010000) >> 4 31: self.CSRCCount = Ver_P_X_CC & 0b00001111 32: self.Marker = (M_PT & 0b10000000) >> 7 33: self.PayloadType = M_PT & 0b01111111 34: i = 0 35: for i in range(0, self.CSRCCount, 4): 36: self.CSRS.append(unpack('!I', self.Datagram[12+i:16+i])) 37: if self.Extension: 38: i = self.CSRCCount * 4 39: (self.ExtensionHeaderID, self.ExtensionHeaderLength) = unpack('!HH', self.Datagram[12+i:16+i]) 40: self.ExtensionHeader = self.Datagram[16+i:16+i+self.ExtensionHeaderLength] 41: i += 4 + self.ExtensionHeaderLength 42: self.Payload = self.Datagram[12+i:]

The JPEG recovery module is quite large, as it contains several tables and a rather long function for generating a header. Therefore, I will omit them here, providing only the functions of parsing the RTP payload and creating the final JPEG image.
rfc2435jpeg.py

 287: def parse(self): 288: HOffset = 0 289: LOffset = 0 290: # Straightforward parsing 291: (self.TypeSpecific, 292: HOffset, #3 byte offset 293: LOffset, 294: self.Type, 295: self.Q, 296: self.Width, 297: self.Height) = unpack('!BBHBBBB', self.Datagram[:8]) 298: self.Offest = (HOffset << 16) + LOffset 299: self.Width = self.Width << 3 300: self.Height = self.Height << 3 301: 302: # Check if we have Restart Marker header 303: if 64 <= self.Type <= 127: 304: # TODO: make use of that header 305: self.RM_Header = self.Datagram[8:12] 306: rm_i = 4 # Make offset for JPEG Header 307: else: 308: rm_i = 0 309: 310: # Check if we have Quantinization Tables embedded into JPEG Header 311: # Only the first fragment will have it 312: if self.Q > 127 and not self.JpegPayload: 313: self.JpegPayload = self.Datagram[rm_i+8+132:] 314: QT_Header = self.Datagram[rm_i+8:rm_i+140] 315: (self.QT_MBZ, 316: self.QT_Precision, 317: self.QT_Length) = unpack('!BBH', QT_Header[:4]) 318: self.QT_luma = string2list(QT_Header[4:68]) 319: self.QT_chroma = string2list(QT_Header[68:132]) 320: else: 321: self.JpegPayload += self.Datagram[rm_i+8:] 322: # Clear tables. Q might be dynamic. 323: if self.Q <= 127: 324: self.QT_luma = [] 325: self.QT_chroma = [] 326: 327: def makeJpeg(self): 328: lqt = [] 329: cqt = [] 330: dri = 0 331: # Use exsisting tables or generate ours 332: if self.QT_luma: 333: lqt=self.QT_luma 334: cqt=self.QT_chroma 335: else: 336: MakeTables(self.Q,lqt,cqt) 337: JPEGHdr = [] 338: # Make a complete JPEG header 339: MakeHeaders(JPEGHdr, self.Type, int(self.Width), int(self.Height), lqt, cqt, dri) 340: self.JpegHeader = list2string(JPEGHdr) 341: # And a complete JPEG image 342: self.JpegImage = self.JpegHeader + self.JpegPayload 343: self.JpegPayload = '' 344: self.JpegHeader = '' 345: self.Datagram = ''

I also implemented the rtp_audio_client.py audio data receiving module, but did not convert them into playable data. If it will be necessary for someone, I made a sketch in this file how everything should be. It is only necessary to organize parsing on the similarity of rfc2435jpeg.py . Audio data is easier as it is not fragmented. Each package carries enough data to play. I will not give this module here, since the article is already very long (Habrafold would quickly realize it).

To work correctly, we need to accept and send RTCP packets, accept Sender's Reports, send Receiver's Reports. To simplify the task, we will send our RR immediately after receiving SR from the camera and we will lay in them idealized data that everything is fine.
rtcp_client.py

 09: class RTCP_Client(DatagramProtocol): 10: def __init__(self): 11: # Object that deals with RTCP datagrams 12: self.rtcp = rtcp_datagram.RTCPDatagram() 13: def datagramReceived(self, datagram, address): 14: # SSRC Report received 15: self.rtcp.Datagram = datagram 16: self.rtcp.parse() 17: # Send back our Receiver Report 18: # saying that everything is fine 19: RR = self.rtcp.generateRR() 20: self.transport.write(RR, address)

But the module works directly with RTCP datagrams. It was also quite large.
rtcp_datagram.py

 049: def parse(self): 050: # RTCP parsing is complete 051: # including SDES, BYE and APP 052: # RTCP Header 053: (Ver_P_RC, 054: PacketType, 055: Length) = unpack('!BBH', self.Datagram[:4]) 056: Version = (Ver_P_RC & 0b11000000) >> 6 057: Padding = (Ver_P_RC & 0b00100000) >> 5 058: # Byte offset 059: off = 4 060: # Sender's Report 061: if PacketType == 200: 062: # Sender's information 063: (self.SSRC_sender, 064: self.NTP_TimestampH, 065: self.NTP_TimestampL, 066: self.RTP_Timestamp, 067: self.SenderPacketCount, 068: self.SenderOctetCount) = unpack('!IIIIII', self.Datagram[off: off + 24]) 069: off += 24 070: ReceptionCount = Ver_P_RC & 0b00011111 071: if debug: 072: print 'SDES: SR from', str(self.SSRC_sender) 073: # Included Receiver Reports 074: self.Reports = [] 075: i = 0 076: for i in range(ReceptionCount): 077: self.Reports.append(Report()) 078: self.Reports[i].SSRC, 079: self.Reports[i].FractionLost, 080: self.Reports[i].CumulativeNumberOfPacketsLostH, 081: self.Reports[i].CumulativeNumberOfPacketsLostL, 082: self.Reports[i].ExtendedHighestSequenceNumberReceived, 083: self.Reports[i].InterarrivalJitter, 084: self.Reports[i].LastSR, 085: self.Reports[i].DelaySinceLastSR = unpack('!IBBHIIII', self.Datagram[off: off + 24]) 086: off += 24 087: # Source Description (SDES) 088: elif PacketType == 202: 089: # RC now is SC 090: SSRCCount = Ver_P_RC & 0b00011111 091: self.SourceDescriptions = [] 092: i = 0 093: for i in range(SSRCCount): 094: self.SourceDescriptions.append(SDES()) 095: SSRC, = unpack('!I', self.Datagram[off: off + 4]) 096: off += 4 097: self.SourceDescriptions[i].SSRC = SSRC 098: SDES_Item = -1 099: # Go on the list of descriptions 100: while SDES_Item != 0: 101: SDES_Item, = unpack('!B', self.Datagram[off]) 102: off += 1 103: if SDES_Item != 0: 104: SDES_Length, = unpack('!B', self.Datagram[off]) 105: off += 1 106: Value = self.Datagram[off: off + SDES_Length] 107: off += SDES_Length 108: if debug: 109: print 'SDES:', SDES_Item, Value 110: if SDES_Item == 1: 111: self.SourceDescriptions[i].CNAME = Value 112: elif SDES_Item == 2: 113: self.SourceDescriptions[i].NAME = Value 114: elif SDES_Item == 3: 115: self.SourceDescriptions[i].EMAIL = Value 116: elif SDES_Item == 4: 117: self.SourceDescriptions[i].PHONE = Value 118: elif SDES_Item == 5: 119: self.SourceDescriptions[i].LOC = Value 120: elif SDES_Item == 6: 121: self.SourceDescriptions[i].TOOL = Value 122: elif SDES_Item == 7: 123: self.SourceDescriptions[i].NOTE = Value 124: elif SDES_Item == 8: 125: self.SourceDescriptions[i].PRIV = Value 126: # Extra parsing for PRIV is needed 127: elif SDES_Item == 0: 128: # End of list. Padding to 32 bits 129: while (off % 4): 130: off += 1 131: # BYE Packet 132: elif PacketType == 203: 133: SSRCCount = Ver_P_RC & 0b00011111 134: i = 0 135: for i in range(SSRCCount): 136: SSRC, = unpack('!I', self.Datagram[off: off + 4]) 137: off += 4 138: print 'SDES: SSRC ' + str(SSRC) + ' is saying goodbye.' 139: # Application specific packet 140: elif PacketType == 204: 141: Subtype = Ver_P_RC & 0b00011111 142: SSRC, = unpack('!I', self.Datagram[off: off + 4]) 143: Name = self.Datagram[off + 4: off + 8] 144: AppData = self.Datagram[off + 8: off + Length] 145: print 'SDES: APP Packet "' + Name + '" from SSRC ' + str(SSRC) + '.' 146: off += Length 147: # Check if there is something else in the datagram 148: if self.Datagram[off:]: 149: self.Datagram = self.Datagram[off:] 150: self.parse() 151: 152: def generateRR(self): 153: # Ver 2, Pad 0, RC 1 154: Ver_P_RC = 0b10000001 155: # PT 201, Length 7, SSRC 0xF00F - let it be our ID 156: Header = pack('!BBHI', Ver_P_RC, 201, 7, 0x0000F00F) 157: NTP_32 = (self.NTP_TimestampH & 0x0000FFFF) + ((self.NTP_TimestampL & 0xFFFF0000) >> 16) 158: # No lost packets, no delay in receiving data, RR sent right after receiving SR 159: # Instead of self.SenderPacketCount should be proper value 160: ReceiverReport = pack('!IBBHIIII', self.SSRC_sender, 0, 0, 0, self.SenderPacketCount, 1, NTP_32, 1) 161: return Header + ReceiverReport

Parsing strictly according to RFC. I use the function unpackto convert data into numerical variables, I move through the data array using a variable offthat contains the current offset.

Here is the link: Python MJPEG over RTSP client .

There was no power to make a version of listings with Russian comments, so forgive me if it is not so convenient for anyone.

It is useful to read

On this article the end, and who mastered - well done!

Source: https://habr.com/ru/post/117735/

All Articles