Before the advent of the Mirai botnet, only those who were especially interested knew what was inside ordinary IP cameras. In most cases, there is a usual Linux, and often with a default root password, or even without it: we have such a camera in our office, with firmware from December 2016 and passwordless telnet root.
But what next, what software is running on this Linux? There are some cool
datacompboy articles about search for a
bug which is not there , there is still scattered information, but in general the situation is this: there is a specially patched kernel on the IP camera that gives the program access through a special library to the hardware that outputs the compressed video frames.
The sad reality is that very often this software is not written in the best way. Suffice it to say that most of the cameras that hang on the street suffer greatly because of the large distance to the server, because the authors of their firmware have mastered the skill of data loss over TCP.
')
We decided to correct this situation with our firmware, and having made a bet on Rust.
Working conditions
We need to do a couple of trivia: deal with the SDK, write code that tunes hardware, takes H264 frames and sends them to the network. A couple of trivia, especially considering how easy and simple it is to deploy on IP cameras and debug it all. Well, the rest of the little thing: we decided to write this code in Rust.
Rust was chosen as an experiment for its amazing property: compile time guaranteeing the integrity of the memory along with the lack of runtime. This means that we can expect to control the allocation of memory, which is very important, given the constraint on resources.
Why not Go, Erlang or some Java / C #? Because on the IP camera, a flash drive with 8 megabytes and 128 megabytes of memory, of which half is taken away from the core for the needs of the video. It is clear that there are different cameras, but they always try to do the minimum so as not to raise the cost without need. On the same camera, we saw a 64 MB flash drive, of course you can turn around, but there are enough tiny flash drives.
So, the usual picture on a cheap camera for 3000 rubles we see:
In such conditions, crappy software starts to suffer very much from 3-4 connections. The golden rule when working with IP cameras: generally try not to connect more than one connection (or two, one for each quality) and this is connected not only with the narrow channel to the camera, but also because the fourth client to the IP camera often makes it impossible view from the first three. Looking ahead, I will say that we and the 50 clients had no problems.
How does the camera
Before going further, I will tell you a little about the camera device with which we are working at the current stage.
An SPI flash drive is soldered to the camera. This is the same flash drive as the one on which some locker is flashing itself into the BIOS. The contents of this SPI flash drive can be read, picked up by ticks, you can write (if you're lucky), the processor reads the data into the memory and executes it. It happens that the flash drive is not SPI, but NAND, then everything is more complicated: you can’t pick it up just like that with ticks - you have to be more responsible.
At the very beginning of the flash drive is uboot. This is a downloader used in almost all embedded devices: not only cameras, but routers and phones. Those. Most likely, it can be argued that there are more copies in the world than copies of Windows.
At uboot open source codes, but the data specific to a specific piece of iron are stored in it. If you copy a USB flash drive from a camera made by XM to a camera made by Hikvision, then there is a big chance that even uboot will not boot.
Those. Already at this stage, a fascinating process of keeping a register of well-known cameras arises, taking them into account, which is greatly facilitated by the delightful ability of our neighbors to send exactly what you ordered. A good example is the recent history of our customers (the largest national operator of the country), who signed a 3-year contract for the supply of cameras of a particular model and characteristics, after which a week later the cameras came with a different model and with completely different characteristics.
But do not worry, all this is a solvable question, moving on.
And then there is the Linux kernel. It would be too easy, if it were possible to assemble one core for all possible cameras and then just poke the modules. No, this is not possible, so for different versions of the chipset we need different cores: somewhere 2.xy, somewhere 3.xy Why is that? Because closed modules go to the core. Somewhere you can contrive, but still unify everything will not work.
After this is the usual household buildroot. Here everything is just like people.
Next, you need to run tricky scripts that configure hardware through i2c (and maybe something else), load the correct modules and start specially written software.
Video capture
There is a lot of iron preparation in capturing video. If you read the onvif specification and manual on the IP camera's SDK, then you can see a lot in common - the software interface reflects the overall structure of most hardware and it is as follows: the video is removed from the sensor, processed a little, then loaded into encoders (hardware of course) and then you can take it into software from a certain place in memory ready H264 NAL units. For the base scenario, it remains only to attach user management, settings, and some network protocol. For a full-fledged camera, we still need the support of all mass tuning mechanisms (discovery, onvif, psia, etc ..) and analytics.
And what about Rust
That's just our streamer rusty. A whole bundle of unsafe code, autogenerated from the sdish code SDK using bindgen, patched binding to libc (we will try to upload a patch to the upstream) and further implementation of RTSP on tokio. Even it is already possible to watch the video from the camera in a regular browser - this is an unattainable luxury for Chinese cameras that all require the installation of ActiveX.
The structure is very unusual after an erlang: there are no processes and messages here, there are channels, and with them everything becomes a little bit different. As I wrote above, modernly written code with proper organization gives the opportunity to distribute video not to 2-3 customers, but more than 50 without any performance drawdown.
An important point: during the development, not a single segolight has happened yet. While there is a persistent feeling that Rust makes you write like, in principle, they write good gray-haired sichniki, who have seen all bad things. So while all like it.
During August, there are plans to finish the work on the baseline scenario, so there is a question for the audience that goes to the survey. Well, ask questions that have arisen.