
In my first article
“measuring the distance to an object and its speed,” I looked at capturing images from webcams via Video4Linux2 and via DirectX. In the next article,
“Capturing Video from Network Cameras, Part 1,” I looked at how to work with Motion-JPEG network cameras. Now I will tell you about capturing images from network RTSP cameras, in particular the Motion-JPEG stream via RTSP.
This task is more complicated than Motion-JPEG via HTTP, since more actions are needed, more connections are needed, but in return we get more flexibility, speed, functionality and even some kind of versatility. Honestly, RTSP for simple tasks is redundant, but I have no doubt that there will be situations where it will be necessary.
What is RTSP
RTSP stands for Real Time Streaming Protocol - a real-time streaming protocol - in essence it is a broadcast control protocol, it allows you to perform several commands, such as “start”, “stop”, “transition to a specific time”. This protocol is similar to HTTP in the implementation, there are also headers, everything is also transmitted in text form. Here are the main commands from the
specification :
- OPTIONS - returns a list of supported methods (OPTIONS, DESCRIBE, etc.);
- DESCRIBE - content description request, describes each track in SDP format;
- SETUP - request to establish connections and transport for threads;
- PLAY - start broadcasting;
- TEARDOWN - stop broadcasting.
And the feature of RTSP is that it does not in itself transmit the video data we need! The whole protocol is just to establish a connection. Here is an analogy with MVC, there is a separation between the data and their description.
')
The workhorse is another protocol:
RTP - Real-time Transport Protocol - real-time transport protocol. With the help of it, the data we need is transmitted. It should be noted that it is very pleasant to work with this protocol, the fact is that it facilitates the client software to recover data after their fragmentation at the data link level. It also carries several useful fields: the format of the transmitted data, the time stamp and the synchronization field (if, for example, audio and video are transmitted simultaneously). Although this protocol can work over TCP, it is usually used with UDP because of its speed orientation. That is, RTP data is a UDP datagram with a header and payload of media content (payload).
It would seem that we do not need anything else. Connect via RTSP, pick up via RTP. But this was not the case, smart uncles came up with a third protocol:
RTCP - Real-time Transport Control Protocol - a real-time transport control protocol. This protocol serves to determine the quality of service; with its help, the client and server know how good or bad the content is being transferred. In accordance with this data, the server, for example, can lower the bitrate or even switch to another codec.
It is assumed that RTP uses an even port number, and RTCP is the next odd one.
RTSP communication example
I have only one source of the RTSP stream - the
eVidence APIX Box M1 camera, so all examples relate to it.
Below is a log of communication between the VLC player (it really helps me in my research) and this camera. The first request from VLC is on port 554 of the camera. The answer is a blank line and begins with "RTSP / 1.0".
01: OPTIONS rtsp://192.168.0.254/jpeg RTSP/1.0 02: CSeq: 1 03: User-Agent: VLC media player (LIVE555 Streaming Media v2008.07.24) 04: 05: RTSP/1.0 200 OK 06: CSeq: 1 07: Date: Fri, Apr 23 2010 19:54:20 GMT 08: Public: OPTIONS, DESCRIBE, SETUP, TEARDOWN, PLAY, PAUSE 09: 10: DESCRIBE rtsp://192.168.0.254/jpeg RTSP/1.0 11: CSeq: 2 12: Accept: application/sdp 13: User-Agent: VLC media player (LIVE555 Streaming Media v2008.07.24) 14: 15: RTSP/1.0 200 OK 16: CSeq: 2 17: Date: Fri, Apr 23 2010 19:54:20 GMT 18: Content-Base: rtsp://192.168.0.254/jpeg/ 19: Content-Type: application/sdp 20: Content-Length: 442 21: x-Accept-Dynamic-Rate: 1 22: 23: v=0 24: o=- 1272052389382023 1 IN IP4 0.0.0.0 25: s=Session streamed by "nessyMediaServer" 26: i=jpeg 27: t=0 0 28: a=tool:LIVE555 Streaming Media v2008.04.09 29: a=type:broadcast 30: a=control:* 31: a=range:npt=0- 32: a=x-qt-text-nam:Session streamed by "nessyMediaServer" 33: a=x-qt-text-inf:jpeg 34: m=video 0 RTP/AVP 26 35: c=IN IP4 0.0.0.0 36: a=control:track1 37: a=cliprect:0,0,720,1280 38: a=framerate:25.000000 39: m=audio 7878 RTP/AVP 0 40: a=rtpmap:0 PCMU/8000/1 41: a=control:track2 42: 43: 44: SETUP rtsp://192.168.0.254/jpeg/track1 RTSP/1.0 45: CSeq: 3 46: Transport: RTP/AVP;unicast;client_port=41760-41761 47: User-Agent: VLC media player (LIVE555 Streaming Media v2008.07.24) 48: 49: RTSP/1.0 200 OK 50: CSeq: 3 51: Cache-Control: must-revalidate 52: Date: Fri, Apr 23 2010 19:54:20 GMT 53: Transport: RTP/AVP;unicast;destination=192.168.0.4;source=192.168.0.254;client_port=41760-41761; server_port=6970-6971 54: Session: 1 55: x-Transport-Options: late-tolerance=1.400000 56: x-Dynamic-Rate: 1 57: 58: SETUP rtsp://192.168.0.254/jpeg/track2 RTSP/1.0 59: CSeq: 4 60: Transport: RTP/AVP;unicast;client_port=7878-7879 61: Session: 1 62: User-Agent: VLC media player (LIVE555 Streaming Media v2008.07.24) 63: 64: RTSP/1.0 200 OK 65: CSeq: 4 66: Cache-Control: must-revalidate 67: Date: Fri, Apr 23 2010 19:54:20 GMT 68: Transport: RTP/AVP;unicast;destination=192.168.0.4;source=192.168.0.254;client_port=7878-7879; server_port=6972-6973 69: Session: 1 70: x-Transport-Options: late-tolerance=1.400000 71: x-Dynamic-Rate: 1 72: 73: PLAY rtsp://192.168.0.254/jpeg/ RTSP/1.0 74: CSeq: 5 75: Session: 1 76: Range: npt=0.000- 77: User-Agent: VLC media player (LIVE555 Streaming Media v2008.07.24) 78: 79: RTSP/1.0 200 OK 80: CSeq: 5 81: Date: Fri, Apr 23 2010 19:54:20 GMT 82: Range: npt=0.000- 83: Session: 1 84: RTP-Info: url=rtsp://192.168.0.254/jpeg/track1;seq=20730; rtptime=3869319494,url=rtsp://192.168.0.254/jpeg/track2;seq=33509;rtptime=3066362516 85: 86: # 87: 88: TEARDOWN rtsp://192.168.0.254/jpeg/ RTSP/1.0 89: CSeq: 6 90: Session: 1 91: User-Agent: VLC media player (LIVE555 Streaming Media v2008.07.24) 92: 93: RTSP/1.0 200 OK 94: CSeq: 6 95: Date: Fri, Apr 23 2010 19:54:25 GMT
First of all, VLC asks the camera:
- What can I do with you? (OPTIONS)
- And hello to you. And can you ask me to do any of OPTIONS, DESCRIBE, SETUP, TEARDOWN, PLAY and PAUSE.
- Okay, then tell me what you have on request "/ jpeg"? (DESCRIBE)
- Here I have a video of the first track, M-JPEG, and the second track is simple audio.
- It is interesting to look at the video, the first track, pour it to me, please in the pocket number 41760, and you can throw off any husks in the pocket number 41761. (SETUP track1)
- OK, at your command ...
- And I also want to listen to the sound, a rash in 7878, 7879 pockets. (SETUP track2)
- No problem.
- Well, sprinkled. (PLAY)
Over time:
- Okay, enough, I saw enough. (TEARDOWN)
- As you say.
This little lyrical digression ends. In the first query, "
OPTIONS rtsp://192.168.0.254/jpeg RTSP/1.0
" reminds "
GET /jpeg HTTP/1.1
" in the sense that the conversation begins with this, and the HTTP protocol also has an
OPTIONS method. Here 192.168.0.254 is the IP address of my camera.
CSeq
reflects the sequence number of the request, the response from the server must contain the same
CSeq
.
And the response from the server starts with "
RTSP/1.0 200 OK
", this is just like "
HTTP/1.1 200 OK
" - a sign that everything is fine: the request is accepted, the request is clear and there were no problems with its implementation. And direct text should list all available methods.
Next, we collect information about what awaits us on request / jpeg, because we just followed him and came to the link "
rtsp://192.168.0.254/jpeg
". We also indicate that we want to receive a response in the form of SDP (line 12).
In response, we receive an RTSP header indicating the
Content-Type
and
Content-Length
, and after the header, the content itself is in the SDP format via the empty string:
v=0 o=- 1272052389382023 1 IN IP4 0.0.0.0 s=Session streamed by "nessyMediaServer" i=jpeg t=0 0 a=tool:LIVE555 Streaming Media v2008.04.09 a=type:broadcast a=control:* a=range:npt=0- a=x-qt-text-nam:Session streamed by "nessyMediaServer" a=x-qt-text-inf:jpeg m=video 0 RTP/AVP 26 c=IN IP4 0.0.0.0 a=control:track1 a=cliprect:0,0,720,1280 a=framerate:25.000000 m=audio 7878 RTP/AVP 0 a=rtpmap:0 PCMU/8000/1 a=control:track2
Everything is pretty obvious here. We need the following lines:
# m=video 0 RTP/AVP 26 # RTP/AVP, , 26, Motion-JPEG a=control:track1 # a=cliprect:0,0,720,1280 # a=framerate:25.000000 # # m=audio 7878 RTP/AVP 0 # 7878, , 0 - PCM a=control:track2 #
If we want to receive only video, then from the audio data we ignore everything except the name of the track. We need it to configure the stream, but no one forces us to accept this stream, but the camera refuses to work if the audio is completely ignored (if
SETUP
done only for the video track).
Honestly, I don’t know how different cameras will react if we neglect the port number for the audio stream (7878), because we specify it with the
SETUP
command.
Next come two
SETUP
requests, with an indication of the ports to which we would like to receive video and audio streams. The first number is the port for RTP, the second is for RTCP. The response of the camera contains information about the ports, you can check them to make sure that everything is configured correctly. We also need to remember the
Session
ID. We will need to specify it in all subsequent calls.
After the
PLAY
command, the transmission of video to port 41760 and audio to port 7878 will begin. And the
TEARDOWN
command
TEARDOWN
broadcasting, the connection is broken.
MJPEG over RTP
RTP packets come to us, we need to decrypt them. For this, I will provide here a table of such a package with a description of all the fields.
+ Bit offset | 0-1 | 2 | 3 | 4-7 | eight | 9-15 | 16-31 |
---|
0 | V | P | X | CC | M | PT | Sequence number |
---|
32 | Timestamp |
---|
64 | SSRC Identifier |
---|
96 | ... CSRC Identifiers ... |
---|
96+ (CC × 32) | Extension Header ID | Extension Header Length (EHL) |
---|
96+ (CC × 32) + (X × 32) | ... Extension Header ... |
---|
96+ (CC × 32) + (X × 32) + (X × EHL) | Payload |
---|
- V (Version): (2) protocol version. Now version number 2.
- P (Padding, Addition): (1) is used in cases when the RTP packet is supplemented with empty bytes at the end, for example, for encryption algorithms.
- X (Extension): (1) indicates the presence of an extended header, determined by the application. In our case, this is not used.
- CC (CSRC Count): (4) contains the number of CSRC identifiers. We are also not used.
- M (Marker): (1) is used at the application level, in our case this bit is set to one if the RTP packet contains the end of a JPEG frame.
- PT (Payload Type): (7) indicates the format of the payload - the transmitted data. For MJPEG it is 26.
- Sequence Number : (16) RTP packet number, used to detect lost packets.
- Timestamp (32): timestamp, in our case, 90000 hertz (90000 = 1 second).
- SSRC (Synchronization Source): (32) synchronizer identifier, no matter how funny it would sound. Specifies the source of the stream.
- CSRC (Contributing Source): (32) identifiers of additional sources, used when our stream comes from several places.
- Extension Header ID : (16) an extension identifier, if we have one, we need to know what it is. In our case, not used.
- Extension Header Length : (16) is the length of this header in bytes.
- Extension Header : The header itself. Content can be very different, depending on context.
- Payload : The payload is our very JPEG frames. Fragmented, of course.
Fields starting with CSRC are optional. To transfer MJPEG from cameras, they are not used, as far as I know.
Fast forward to one level of encapsulation. Now the task is to convert the received video data into a full JPEG image. In the case of MJPEG over HTTP, everything is simple - we cut a piece of the stream and work with it right away as with a JPEG image. In the case of RTP, the image is not completely transmitted, the JPEG header is omitted to save traffic. It must be restored independently from the attached data.
The RTP Payload for MJPEG specification is described in
RFC2435 . I will also give you a table with a description of all the format fields:
+ Bit offset | 0-7 | 8-15 | 16-23 | 24-31 |
---|
0 | Type-specific | Fragment offset |
---|
32 | Type | Q | Width | Height |
---|
if Type in 64..127 | Restart marker header |
---|
if Q in 128..255 | MBZ | Precision | Length |
---|
Quantization Table Data |
- Type-specific (depends on type): (8) the meaning of the field depends on the implementation, in our case it does not apply.
- Fragment Offset : (24) indicates the position of the current frame fragment in the entire frame.
- Type (Type): (8) depends on the type of how the image is restored.
- Q (Quality): (8) image quality.
- Width : (8) frame width.
- Height : (8) and height.
- Restart Marker header (RST markers header): (32) is used when decoding JPEG, if RST markers are used. I do not know if their cameras are using them or not, but I ignore this headline. This field appears only when Type is from 64 to 127.
- Quantization Table Data (quantization tables): if they are present, then you do not need to calculate them separately. And they need to properly recreate images from JPEG data. If these tables are not correct, then the image will be with wrong colors and contrasts. There should be two tables: Luma and Chroma for brightness and chromaticity, respectively.
- MBZ, Precision, Length : (32) parameters of quantization tables, I ignore them, Length set equal to 128 - two tables of 64 bytes each. Otherwise, I do not know how to work with them.
The header of the RST and quantization tables may not be present. If there is no first, then very well, since I don’t count on anything else. If there is no second, the required tables are calculated based on the parameter Q.
The RTCP package contains a certain subset, it is of four types: 201 - source report, 202 - receiver's report, 203 - source description, and 204 - the destination is determined by the application. We must first take the 201 type, then send the 202 type. 203 and 204 are optional, but I also consider them. There can be several RTCP packets in a single UDP packet.
All types have a similar structure. Any RTCP packet starts with the following data:
+ Bit offset | 0-1 | 2 | 3-7 | 8-15 | 16-31 |
---|
0 | Version | Padding | SC or RC or Subtype | Packet type | Length |
---|
- Version : (2) RTP version.
- Padding : (1) the same as for RTP.
- SC or RC or Subtype : (5) depending on the type may be the number of sources (Sources Count) or the number of recipients (Receivers Count) included in the report of the recipient and the source, respectively. If this is an APP packet, this field defines the subtype of such packet.
- Packet Type : (8) packet type, 201 — Sender's Report SS, 202 — Receiver's Report RR, 203 — Source Description SDES, and 204 — the destination is determined by the application (APP).
- Length : (16) The size of the data following the header is measured in 32 bit units.
Further I will not give fields for each subtype, it is possible to look at
them in
RFC3550 . Let me just say that the SS and RR types carry information about sent / received packets and time delays. SDES carries various text fields that define the source, such as its name, email, telephone, location, etc.
This introduction ends.
Python MJPEG over RTSP client
So we got to the python. The client consists of several files,
main.py
contains a callback function that processes the received images, it also launches the mechanisms of the Twisted network framework and stores the connection parameters to the camera. All the listings I quote are shortened, the full version can be downloaded from the link at the end of the article.
main.py 20: def processImage(img): 21: 'This function is invoked by the MJPEG Client protocol' 22:
In principle, you can work without implementing the RTCP protocol and receiving audio data. In this case, the camera breaks the connection in about a minute. You have to reconnect all the time, this is done automatically, so it does not cause problems. However, for the article I added a part of the RTCP and made a preparation for receiving audio data.
The next important file is
rtsp_client.py
. He is the most confused, but his goal is obvious - to establish the connection correctly described above.
rtsp_client.py 012: class RTSPClient(Protocol): 013: def __init__(self): 014: self.config = {} 015: self.wait_description = False 016: 017: def connectionMade(self): 018: self.session = 1 019:
In the case of an audio track, this module also runs
rtp_audio_client.py
and the corresponding RTCP client.
After a successful connection,
rtp_mjpeg_client.py
accepted for
rtp_mjpeg_client.py
, processing the incoming data stream.
rtp_mjpeg_client.py 08: class RTP_MJPEG_Client(DatagramProtocol): 09: def __init__(self, config): 10: self.config = config 11:
He is easy to understand. Every time we take another datagram, we parse it using the
rtp_datagram.py
module, and feed the result to the
rfc2435jpeg.py
module, which creates a full-fledged JPEG image. Next, we wait for the appearance of the marker
rtp_dg.Marker
and how it appears, we call the callback function with the restored image.
The RTP datagram parser looks like this:
rtp_datagram.py 26: def parse(self): 27: Ver_P_X_CC, M_PT, self.SequenceNumber, self.Timestamp, self.SyncSourceIdentifier = unpack('!BBHII', self.Datagram[:12]) 28: self.Version = (Ver_P_X_CC & 0b11000000) >> 6 29: self.Padding = (Ver_P_X_CC & 0b00100000) >> 5 30: self.Extension = (Ver_P_X_CC & 0b00010000) >> 4 31: self.CSRCCount = Ver_P_X_CC & 0b00001111 32: self.Marker = (M_PT & 0b10000000) >> 7 33: self.PayloadType = M_PT & 0b01111111 34: i = 0 35: for i in range(0, self.CSRCCount, 4): 36: self.CSRS.append(unpack('!I', self.Datagram[12+i:16+i])) 37: if self.Extension: 38: i = self.CSRCCount * 4 39: (self.ExtensionHeaderID, self.ExtensionHeaderLength) = unpack('!HH', self.Datagram[12+i:16+i]) 40: self.ExtensionHeader = self.Datagram[16+i:16+i+self.ExtensionHeaderLength] 41: i += 4 + self.ExtensionHeaderLength 42: self.Payload = self.Datagram[12+i:]
The JPEG recovery module is quite large, as it contains several tables and a rather long function for generating a header. Therefore, I will omit them here, providing only the functions of parsing the RTP payload and creating the final JPEG image.
rfc2435jpeg.py 287: def parse(self): 288: HOffset = 0 289: LOffset = 0 290:
I also implemented the
rtp_audio_client.py
audio data receiving module, but did not convert them into playable data. If it will be necessary for someone, I made a sketch in this file how everything should be. It is only necessary to organize parsing on the similarity of
rfc2435jpeg.py
. Audio data is easier as it is not fragmented. Each package carries enough data to play. I will not give this module here, since the article is already very long (Habrafold would quickly realize it).
To work correctly, we need to accept and send RTCP packets, accept Sender's Reports, send Receiver's Reports. To simplify the task, we will send our RR immediately after receiving SR from the camera and we will lay in them idealized data that everything is fine.
rtcp_client.py 09: class RTCP_Client(DatagramProtocol): 10: def __init__(self): 11:
But the module works directly with RTCP datagrams. It was also quite large.
rtcp_datagram.py 049: def parse(self): 050:
Parsing strictly according to RFC. I use the function unpack
to convert data into numerical variables, I move through the data array using a variable off
that contains the current offset.Here is the link: Python MJPEG over RTSP client .There was no power to make a version of listings with Russian comments, so forgive me if it is not so convenient for anyone.It is useful to read
- Multimedia over the Internet
- List of RTP profiles for audio and video
On this article the end, and who mastered - well done!