Measuring the distance to the object and its speed

I did not see the technology that I am going to introduce to you in the methods I found for determining the distance to an object in an image. It is neither universal nor complex, its essence lies in the fact that the visible field (we will assume that we use a video camera) is calibrated with a ruler and then the coordinate of the object in the image is compared with the mark on the ruler. That is, the measurement is conducted on a single line or axis. But we do not need to keep a mark on the ruler for each pixel; the algorithm for calibration needs only to know the ruler size in pixels and in meters, as well as the pixel coordinate, which is the actual center of the ruler. The obvious limitation - works only on flat surfaces.

In addition to the method itself, the article describes its implementation in Python using the OpenCV library, and also discusses the features of receiving images from webcams in Linux using the video4linux2 API.

In practice, it was necessary to measure the distance to the car and its speed on any straight stretch of road. I used a long tape measure, stretched it to the side of the road, in the middle of the canvas, then set up the camera so that the entire tape measure just entered the camera's field of vision and was aligned with the X axis of the image. The next step was to put something bright in the middle of the roulette, fix the camera so that it did not move away, and write down the coordinates of the pixel of this middle.
')
All calculations are reduced to a single formula:
l = L * K / (W / x - 1 + K) , where
l - the desired distance to the object, m;
L - the length of the "line", m;
W - the length of the "ruler" in pixels, usually the same as the width of the image;
x - the coordinate of the object in the image;
K = (W - M) / M is the coefficient reflecting the tilt of the camera, here M is the coordinate of the middle of the “ruler”.

In the derivation of this formula, the school knowledge of trigonometry was very useful to me.

The plot of this function is shown in the figure:

Graph of the function L = 10, K = 0.3, W = 640

The greater the tilt of the camera, the steeper the graph. In the boundary case, when the camera axis is perpendicular to the “ruler” plane ( M = W / 2 ), the graph becomes a straight line.

But the article would be too short, if this were the point. Therefore, I decided to make a demo program that would connect to the webcam of the computer and monitor some object, calculating the distance to it and its speed. As a programming language, I chose Python, a language with a very large number of advantages, to build a graphical user interface I chose the Tkinter framework that comes with Python, so it does not need to be installed separately. OpenCV is good for tracking an object, I use version 2.2, but in the repository of the current version of ubuntu (10.10) there is only version 2.1, and their API has changed slightly for the better and the program under version 2.1 will not work. In principle, it would be possible to build the entire program on OpenCV, placing the functions of a graphical interface and image capture on it, but I wanted to separate it from the main part of the program, so that it would be possible to replace this library with something else or simply remove it by turning off tracking. . I started to recycle the old program, deleting everything unnecessary, and to my surprise there were only a few lines left from the program with direct calculation of distance and speed, which in principle was logical, since the program does not use the graphical interface in the original, follows the car according to another algorithm and and instead of a webcam, a megapixel network camera with an RTSP connection is used.

As for obtaining images from the webcam, then not everything is so simple. Under Windows, the program uses DirectX to connect to the camera via the VideoCapture library, everything is quite simple here. But under Linux there are very few intelligible articles about the use of webcams from Python, and those examples that exist usually do not work due to some regular API change. In the past, I used ffmpeg for these purposes and the program was in C, but ffmpeg is a bit “on the wheel of a cannon”, and I didn’t want to burden the end program with additional dependencies. It was possible to use OpenCV, which also uses ffmpeg, but the way to write its own video4linux2 API wrapper for Python was chosen.

The source codes were taken from the page of some science department . Of these, I quickly deleted everything unnecessary for my purpose, eventually leaving two edited files: V4L2.cpp and V4L2.h This is actually the minimum required API for connecting to the webcam. During the work on the Python wrapper, I found out that the video4linux2 devices can be accessed in three ways: READ, MMAP and STREAM, but only the MMAP method works with my webcams. As it turned out, other examples of programs that did not work for me, used the READ method.

It also implies that the webcam gives the image in YUYV format (YUV422), it differs from RGB in that it has 2 times less color information. In YUYV, two pixels are encoded in 4 bytes, and in RGB, six, hence the saving in one and a half times. Y is the brightness component, for each pixel it has its own. U and V are the color difference components that define the color of a pixel, and so every two pixels use the same U and V. If you represent the stream of bytes from the webcam in this notation, it will look like YUYV YUYV YUYV YUYV YUYV YUYV 12 pixels. You can figure out in which format your webcam works with the help of a VLC player, open an exciting device with it and then request information about the codec, it should be as in the figure:

VLC player window with codec information

Here is the source code of the library for accessing the webcam:
main_v4l2.cpp

 #include "V4L2.h" #include <cstring> #include <iostream> using namespace std; extern "C" { // Specify the video device here V4L2 v4l2("/dev/video0"); unsigned char *rgbFrame; float clamp(float num) { if (num < 0) num = 0; if (num > 255) num = 255; return num; } // Convert between YUV and RGB colorspaces void yuv2rgb(unsigned char y, unsigned char u, unsigned char v, unsigned char &r, unsigned char &g, unsigned char &b) { float C = y - 16; float D = u - 128; float E = v - 128; r = (char)clamp(C + ( 1.402 * E )) ; g = (char)clamp(C - ( 0.344136 * D + 0.714136 * E )) ; b = (char)clamp(C + ( 1.772 * D )) ; } unsigned char *getFrame() { unsigned char *frame = (unsigned char *)v4l2.getFrame(); int i = 0, k = 0; unsigned char Y, U, V, R, G, B; for (i=0;i<640*480*2;i+=4) { Y = frame[i]; U = frame[i+1]; V = frame[i+3]; yuv2rgb(Y, U, V, R, G, B); rgbFrame[k] = R; k++; rgbFrame[k] = G; k++; rgbFrame[k] = B; k++; Y = frame[i+2]; yuv2rgb(Y, U, V, R, G, B); rgbFrame[k] = R; k++; rgbFrame[k] = G; k++; rgbFrame[k] = B; k++; } return rgbFrame; } void stopCapture() { v4l2.freeBuffers(); } // Call this before using the device void openDevice() { // set format struct v4l2_format fmt; CLEAR(fmt); fmt.type = V4L2_BUF_TYPE_VIDEO_CAPTURE; // Adjust resolution fmt.fmt.pix.width = 640; fmt.fmt.pix.height = 480; fmt.fmt.pix.pixelformat = V4L2_PIX_FMT_YUYV; if (!v4l2.set(fmt)) { fprintf(stderr, "device does not support used settings.\n"); } v4l2.initBuffers(); v4l2.startCapture(); rgbFrame = (unsigned char *)malloc(640*480*3); } }

The algorithm is quite understandable - first open the device whose name is specified first ("/ dev / video0"), and then for each getFrame request we read the frame from the webcam, convert it to RGB format and give the link to the frame to the one who requested it. I also provide a Makefile to quickly compile this library, if you need it.

And here is the wrapper for this library for Python:
v4l2.py

 from ctypes import * import Image import time lib = cdll.LoadLibrary("linux/libv4l2.so") class VideoDevice(object): def __init__(self): lib.openDevice() lib.getFrame.restype = c_void_p def getImage(self): buf = lib.getFrame() frame = (c_char * (640*480*3)).from_address(buf) img = Image.frombuffer('RGB', (640, 480), frame, 'raw', 'RGB', 0, 1) return img, time.time()

As you can see absolutely nothing complicated. The library is connected using the ctypes module. There were no problems in writing the wrapper, except for the line:

 frame = (c_char * (640*480*3)).from_address(buf)

To which I did not immediately come. The fact is that if you read data from getFrame() as c_char_p , ctypes will interpret the data as a string with a zero ending, that is, as soon as zero is encountered in the byte stream, the reading will stop. The same design allows you to clearly define how many bytes should be counted. In our case, this is always a fixed value - 640 * 480 * 3.

I will not give here the source code for getting the image in Windows, but it also does not differ in any complexity and is located in the archive in the windows folder with the name directx.py .

And I’ll give better the source code of the object tracking class, which, I remind, is written using OpenCV. I took the example of lkdemo.py , supplied with OpenCV, as a basis, and again simplified it for our needs, modifying it in a class:
tracker.py

 class Tracker(object): "Simple object tracking class" def __init__(self): self.grey = None self.point = None self.WIN_SIZE = 10 def target(self, x, y): "Tell which object to track" # It needs to be an array for the optical flow calculation self.point = [(x, y)] def takeImage(self, img): "Loads and processes next frame" # Convert it to IPL Image frame = cv.CreateImageHeader(img.size, 8, 3) cv.SetData(frame, img.tostring()) if self.grey is None: # create the images we need self.grey = cv.CreateImage (cv.GetSize (frame), 8, 1) self.prev_grey = cv.CreateImage (cv.GetSize (frame), 8, 1) self.pyramid = cv.CreateImage (cv.GetSize (frame), 8, 1) self.prev_pyramid = cv.CreateImage (cv.GetSize (frame), 8, 1) cv.CvtColor (frame, self.grey, cv.CV_BGR2GRAY) if self.point: # calculate the optical flow new_point, status, something = cv.CalcOpticalFlowPyrLK ( self.prev_grey, self.grey, self.prev_pyramid, self.pyramid, self.point, (self.WIN_SIZE, self.WIN_SIZE), 3, (cv.CV_TERMCRIT_ITER|cv.CV_TERMCRIT_EPS, 20, 0.03), 0) # If the point is still alive if status[0]: self.point = new_point else: self.point = None # swapping self.prev_grey, self.grey = self.grey, self.prev_grey self.prev_pyramid, self.pyramid = self.pyramid, self.prev_pyramid

First we have to tell him which point we want to follow, for this is the target method. Then we give it frame by frame using the takeImage method, it in turn converts the image frame into a format that it understands, creates the image necessary for the algorithm, converts the frame from color to grayscale and then feeds all of the CalcOpticalFlowPyrLK functions that CalcOpticalFlowPyrLK optical stream pyramidal method of Lucas-Canada. At the output of this function, we get new coordinates of the point we are following. If the point is lost, then status[0] will be zero. Optical flow can be calculated not only for one point. Run the lkdemo.py program with the webcam and see how well it handles a lot of points.

I’ll say more about converting images from the Python Imaging Library into the OpenCV format, the fact is that OpenCV for color images uses a different order of color components - BGR, for full conversion you would have to add cv.CvtColor(frame, frame, cv.CV_BGR2RGB) , but most of the tracking algorithms are absolutely still confused with your color components or not, our example generally uses only black and white images. Therefore, this line can not be included in the code.

I also do not cite the source code of the class in the article for directly calculating the distance, since there is only the simplest mathematics. It is located in the file distance_measure.py .

It remains only to show the source code of the main script, which forms the graphical interface and loads all other modules.
main.py

 from distance_measure import Calculator from webcam import WebCam from tracker import Tracker from Tkinter import * import ImageTk as PILImageTk import time class GUIFramework(Frame): "This is the GUI" def __init__(self,master=None): Frame.__init__(self,master) self.grid(padx=10,pady=10) self.distanceLabel = Label(self, text='Distance =') self.distanceLabel.grid(row=0, column=0) self.speedLabel = Label(self, text='Speed =') self.speedLabel.grid(row=0, column=1) self.imageLabel = None self.cameraImage = None self.webcam = WebCam() # M = 510, L = 0.5, W = 640 self.dist_calculator = Calculator(500, 0.5, 640, 1) self.tracker = Tracker() self.after(100, self.drawImage) def updateMeasure(self, x): (distance, speed) = self.dist_calculator.calculate(x, time.time()) self.distanceLabel.config(text = 'Distance = '+str(distance)) # If you want get km/h instead of m/s just multiply # m/s value by 3.6 #speed *= 3.6 self.speedLabel.config(text = 'Speed = '+str(speed) + ' m/s') def imgClicked(self, event): """ On left mouse button click calculate distance and tell tracker which object to track """ self.updateMeasure(event.x) self.tracker.target(event.x, event.y) def drawImage(self): "Load and display the image" img, timestamp = self.webcam.getImage() # Pass image to tracker self.tracker.takeImage(img) if self.tracker.point: pt = self.tracker.point[0] self.updateMeasure(pt[0]) # Draw rectangle around tracked point img.paste((128, 255, 128), (int(pt[0])-2, int(pt[1])-2, int(pt[0])+2, int(pt[1])+2)) self.cameraImage = PILImageTk.PhotoImage(img) if not self.imageLabel: self.imageLabel = Label(self, image = self.cameraImage) self.imageLabel.bind("<Button-1>", self.imgClicked) self.imageLabel.grid(row=1, column=0, columnspan=2) else: self.imageLabel.config(image = self.cameraImage) # 30 FPS refresh rate self.after(1000/30, self.drawImage) if __name__ == '__main__': guiFrame = GUIFramework() guiFrame.mainloop()

As I said above, I chose the Tkinter library to create a graphical interface, I worked with other toolkits, such as GTK, QT and, of course, wxPython, but they needed to be installed additionally, while Tkinter works right away and it is quite is easy to use, however, a complex interface on it, of course, cannot be created, but its capabilities are more than enough for the task. In class initialization, I create a grid grid to position other widgets in it: two text fields and one image. With Tkinter, I didn’t even have to separately create streams for downloading images from a webcam, because there is an after method that allows you to perform the specified function after a certain period of time. You can update the text and image with the Label method config . Very simple! Handling a mouse click event with the bind method is translated to the imgClicked method.

The image and its timestamp are self.webcam.getImage by the self.webcam.getImage function. The webcam module just simply loads the appropriate module for working with the webcam, depending on which operating system the program is running under.

Once again I will provide a link to the archive with the program - distance-measure .
Required packages for ubuntu: python, python-imaging, python-imaging-tk, opencv version 2.2, and build-essential to compile the V4L2 wrapper.
The program runs through:
python main.py
To start tracking an object, you need to click on it.

That's all.

useful links

Source: https://habr.com/ru/post/115661/

All Articles

Measuring the distance to the object and its speed

useful links

More articles: