Ekspozzer - creating a panorama from a video, averaging a video stream
Hi, Habrahabr!
I must say: there is nothing phenomenal in the article. This article is devoted to the program developed on the knee to create panoramas from video and temporal averaging of the video stream (frames). The program can also be used as a virtual slit-camera. The article will be interesting to all those who are interested in video and image processing, as well as geek art. A very simple program - a very interesting result. At the end of the article link to download. Caution traffic! I noticed that a good and, most importantly, demanded software is born not just like that, but from the arising needs to solve this or that problem. I do not know about others, but this is what happens all the time. And this story is no exception.
Once, on one graphic forum, I accidentally got into a dispute with one guy, who argued something like the following: “It’s impossible to get an image of an empty Red Square in the afternoon.” Of course, the statement is wildly controversial: yes, the people there dofig day, but you can also ask everyone to leave. Theoretically this is possible. A mere mortal, of course, will fail, however, if you are some Utin, then, having really tried and calling someone you need, you will be able to organize something like that. And what does “image” of empty Red Square mean? The image is not a photograph, the image of the empty Red Square I can draw on a piece of paper by hand. In general, I don’t remember all the details of the dispute over the years, but it was wildly hot and we, damn it, came to this clarified statement (approximately): “Because of the huge number of people in Red Square, it’s impossible to resort to only photo and video technology, to get a photorealistic image (photo) of Red Square without a single person on it from the angle of a full-length person, being a simple person who has no leverage on the government, in the daytime during the warm tourist season (for example, summer). ” Well, clarification! Literally legal. Now everything fell into place. ')
The opponent argued that such a photo can be made only by resorting to Photoshop (or another image editing program) in order to remove all people from the photo of Red Square. This procedure is long and laborious, and in order to get a decent shot, it will be necessary for an experienced editor to work for at least three or four hours. Yes, of course, he is right, it can be done. But if this person (by profession an artist and photographer) knew even a little bit mathematics or imagined the possibilities of elementary programming, then he would never undertake to assert this. Especially on the dispute. And I proved to him the opposite.
If at least once in your life you had a camera in your hands, and, moreover, took pictures, then you probably know what exposure is . If it is very simple (for people who shoot in the "auto" mode) - this is the length of time the frame is taken. Surely you do not get "blurry" photos in motion. Here! This is just the wrong exposure: it was too big. And if the photo suddenly turned out too dark, then, most likely, the exposure was too small. I’m telling you in a very childish language so that everyone can understand: I don’t touch on the settings of the aperture and other subtleties of shooting. So, it was thinking about the exposure that came up with the idea of ​​solving this problem-dispute.
I thought: what if you take a very, very, very sooooo long (huge exposure) photograph of Red Square, but with a very, very, very-very-wide and tightly closed diaphragm (so that there is no overexposure). After all, then the people walking in the frame will simply be lubricated, and the permanent details (buildings, the Kremlin, the square) will remain in place. Yes! This is the key to the solution. We should somehow try to do it. But how? After all, I have no opportunity to get to Red Square. Not only for the sake of experiment, but in general, not to mention "for the sake of experiment." Okay, any other area is suitable for testing, this is not a problem. The problem is that my Canon 550D camera, which I bought a million years ago, can take pictures with a maximum exposure of 30 seconds, which is very little for the experiment. I can't buy a new camera for the sake of experiment either. The exposure is really long, about 30 minutes. Why? To increase the chance that people in the square will change their position and leave the places where they were at the beginning of the frame. Roughly speaking, in 30 minutes more area than people should get to each point of the frame. I began to think how to solve the problem of photographing with a little blood?
But we are not bored! After all, we have in our hands the most powerful tool available to the few - programming! I decided: you can create a “virtual” camera that would just take a picture of the screen with any arbitrary exposure. You know all these screen capture programs: SnagIt, BandiCam, FRAPS ... only it would record more than one frame (photo) or a sequence of frames (video), and accumulate information (as in a long exposure, which, in fact, is the exposure , only electronic), and at the end of the record would have averaged the information received. Then, if on the screen you just play back the recording from the camera from the square, then this will be the required snapshot! Hooray! The problem is solved ... theoretically. It remains only to create the necessary software and find a video from a fixed camera that would take an area for half an hour.
It is good that the requests to the program are trivial, and I easily realized what was required in the evening.
There were no questions with the video either, as there are a billion. Any more or less good recordings from webcams or surveillance cameras are suitable, since they are predominantly fixed and do not move throughout the video.
Experiment 1. So, the experiments began. Here is a time-lapse video from Red Square. But do not rush to be surprised if you do not see familiar places on the video. Red Square in the world is not one (like St. Petersburg and the other painfully familiar names), there are about twenty of them. Presented on the video Red Square is located near the University of Washington. This is a very crowded place, which is a landmark of the University and even the city. On the square there are always a large number of students, tourists, travelers, applicants, teachers and just passersby. By the way, an interesting fact: our Red Square is “red” because the word “red” in ancient times meant “beautiful” (and the square itself was originally built white , made of white brick), and Red Square near the University of Washington is “red” precisely because made of reddish stone.
By the way, here is the irony: in a dispute with this tipchik, we did not specify exactly which Red Square is meant. Just meant our own. Since there are several Red Squares in the world, maybe among them there would be such an uncrowded one, in which at certain moments of the day there would be no people. Then the photo can be taken, then I win the argument automatically.
Well, down with lyrical digressions and irony. Here is what happened after averaging the Red Square:
The video lasts only 17 seconds, but since this is a time lapse, the actual video of the past time is much longer than 17 seconds.Maybe 5 minutes, maybe 15.
As can be seen from the result, only very long people sitting in the same place throughout the video remained in the photo.Some of them get up and go, and so-called "ghosts" are obtained.In general, the result is almost what we need.
Compare now how many people are on video and how many are on photo. And how much I would suffer in Photoshop, cutting people out and looking for frames, in which there are no people in the cut fragments, to insert these fragments into empty parts, and, besides, the insert would have turned out to be torn, since even background lighting changes due to cloud shadows, write errors, and so on. And my ekspozzer did it in just 17 seconds; It turned out smoothly and without much difficulty. Cool? Cool! And this is just the beginning! Experiment 2. Let us return to our Red Square. I never found a good enough and long video shot from the square itself with a fixed camera. Even time lapse: in general there the guys all the time will smoothly move the camera. I found only this video:
Pay attention to the huge number of cars on the Big Stone Bridge.
It turned out very nice and smooth, despite the dancing shadows from the clouds.
And what happened to the car after averaging? That's right: disappeared. Check out what a neat picture turned out. Well, is it possible to catch such a frame during the day? Of course, any method has errors. So is mine: no, no, yes, and some ghosts of cars or people will remain here and there. In fact, the math is pretty simple here. If in a video from 100 frames a person is found in 5 frames, then it will be ghostly at 100 - 5 = 95 percent. That is, 95 percent of the information will be received from the area, and 5 - from the person. With such a proportion it is almost invisible. And since people and cars in general are constantly moving, the percentage is even less! Just chocolate! Experiment 3. Go ahead, take the most densely filled area in the world - New York Times Square:
Here, everything is literally teeming with people and cars.
And at the exit they got only a police car standing alone
Experiment 4. ... and a bunch of ghostly divorces in the street on the left. Well, this is the imperfection of the method. Still:
The video lasts only 16 seconds.
Consequently, the result will be worse!
Experiment 5.
Busy Wall Street.
The result is impressive.Purely!It means that almost everyone moves and does not stand on the spot.
Everything is clear: the shorter the exposure and the slower the objects move, the more ghosts will be expressed. And vice versa: the longer the exposure and faster objects move, the better the background will be visible. In this case, the perfect long time lapse video taken with a still camera . Well, such rollers are great.
Experiment 6. We start to look at other results. Here is a video from the outdoor surveillance camera at the intersection where the accident occurs:
Averaged from 20 to 40 seconds, just 20 seconds.
Nice clean intersection with the ghost of a white car.
Experiment 7. And here is just the perfect copy: a long time lapse video, recording from a crowded street in Arnsberg, Germany:
Notice how flags flap in the wind.
As a result, they are also moving in the averaged photo.I watch people barely.
Experiment 8. So where do without the Eiffel Tower!
This time lapse lasts almost a day!Ideal, but how will the averaging behave when going from day to night and vice versa?
It turned out very even tolerably and mysteriously.Some indefinite time of day.
Experiment 9. Well, then you can just play around and ply the average. For example, there is a video where the type, traveling, fotka himself every day. Let's see what comes of it.
Interesting video.I want it too!
It turned out very psychedelic.It seems to me, or he looks like Jesus.Or is Jesus himself a kind of averaged image?
Experiment 10. Why, with this thing you can look under the water !!! This is what I mean: when the sea is swaying, the waves refract the bottom pattern. If, of course, it is visible. Then, taking the averaged distorted picture with a large exposure, we obtain an image of the bottom without the influence of water. Cool! Choose a cool long video in which you can see the bottom through the undulating water and watch:
Try to catch the flat surface of the water.Will not work!
The averager averages the fluctuations of the surface of the water, averaging thereby the refraction.See the bottom pattern and mirror-like smooth water surface!
Experiment 11. And if you average the motion, taken from the window of a vehicle, you can get the effect of rapid forward movement. We take the video from the train and average literally 1 second!
Experiment 12.
Get good effects! Fine. I began to experiment with different videos and get interesting results. But when I pampered, it seemed to me somehow not enough. Cool, but not enough. And here another interesting idea came to me. When I watched and averaged a video shot from the windows of moving trains (for the effect of rapid movement), I realized what functionality my little program lacks! And let her start shooting panoramas too!
Yes, with panoramas everything is quite simple. The train moves, the picture in the window changes, you only need to take a sequence of images from the window with a certain offset and glue one to the other from left to right or from right to left. Then get a huge pano with the image of all that passed through the window. I immediately began to experiment. Wrote an intelligent glue detector with borders, but it turned out very bad! All the time, the effect of the barrel and the illumination jumping from one fragment to another interfered. I realized that before such giants as, for example, Autopano Giga , my program almost never reached out, and began to cheat. Dream up. How to make the panorama gluing smooth and continuous. The first idea that came to my mind became decisive: it was necessary to glue together each frame, not fragments, with each frame adding one column to the resulting picture. We take the first frame, cut out a thin vertical strip of the image, take the next frame, cut out the same thin strip and glue it to the first cut strip. Left or right - depending on the direction of movement of the camera, which can be specified explicitly. Since the second frame differs from the first one by a certain offset, the image in the two glued stripes will be something like a sweep of the panorama. A kind of cheap analog slit-camera ( one , two , three ). Happened? Go!
Experiment 13. First, I need a video in which a fixed camera would shoot a moving object on a fixed background. Then, if the object is quite long, you can scan it all! Moving cars and trains are perfect for playing such videos. Here I have collected such a handsome man, see:
The train runs from 01:57 to 03:17.
Different lengths of cars turned out due to the changing speed of the train.The picture is clickable.
It turned out unexpectedly great! True, the program removes the panorama terribly flattened horizontally, and in order to return the correct proportions, you have to compress the resulting image severely vertically, which makes it small. This is probably a drawback of both the program and the video fed to the input: if the train moved very slowly on it, then the proportions would be normal. Experiment 14. Let's collect another picture, but this time from the window of a moving train.
The camera is fixed pretty good c 3:25.
It turned out a mini-panorama of the city.The picture is clickable.
And here with the flaws everything is clear: strong distortion of objects. Moving objects closer to the train fly the frame faster, moving farther from the train - slower. The law of parallax. This means that the foreground objects will be strongly flattened horizontally, and the background objects will be strongly elongated. Here it is necessary to tune in to some specific one plane (range of objects from the camera) of perception. In this case, at home in the distance. They turned out quite "collected". Everything that is closer (trees, wires, pillars) will be strongly flattened; everything that is further is stretched. A perfect image in all planes cannot be obtained with the help of a slit scan. Experiment 15. Take the following example and collect the platform of the station in Kislovodsk:
Here we see disproportionately flattened lights. I confess, my mistake: since we were assembling the platform, and the lamps were right in the center of the platform, they were supposed to be completely flat. We take now and collect panorama from another video:
It can be seen as unevenly suspended wires.The picture is clickable.
Then I flipped trees and tuned in to the remotely standing huts. Experiment 16. Let's now another, with the suburb of Peter:
Panorama was going to 02:43.
Far houses are slightly stretched, neighbors are slightly flattened.The system will not break.The picture is clickable.
Experiment 17. Why not try cars, not trains? I found an interesting video from the parade on the square, where the motionless camera was shooting cars passing by on the parade:
The video quality is terrible, the camera is shooting with very low FPS.
Hence the quality of the final panorama.However, the structure of the front column is read quite naturally.The picture is clickable.
Having played enough, I began to mess around stupidly.
Experiment 18.
Mad
Slit-Mad Picture is clickable.
Experiment 19.
Michael
Slit-Michael Picture clickable.
Experiment 20.
Volodya
Slit-Volodya Picture clickable.
Charming! =) Experiment 21. And lastly even more charming: the panorama shows a smooth change in the color of the sky in the evening. Here, too, everything is quite simple:
Panorama was created from 1 to 38 seconds.
We observe the sunrise from left to right.
Now you can play around by downloading this small and cool program.
Management in averaging mode: select the "averaging" mode. Move the mouse to the upper left corner of the video and hold down "[" - the program remembers the upper left corner. It is not necessary that the application showing the video be active at this moment; Any application can be active. Move the mouse to the bottom right corner of the video and click "]" - the program remembers the bottom right corner and, thus, all the coordinates of the frame from the video completely. Launch the video. We begin to average, at any time by holding down the "/" on the numeric keypad on the right.During averaging, the program reports the number of averaged frames. Identical neighboring frames are not averaged, but are ignored (thus, when a video is frozen, the program will not spoil the result). We average the sample amount of time. To complete the averaging, hold down the "*" on the numeric keypad on the right. The result is written in the same folder as Ekspozzer.
Control in panning mode: select the "panorama" mode. We select the arrow to the left if the camera in the video “flies” to the left (tobish the picture moves to the right); or to the right, if the camera in the video “flies” to the right (tobish the picture moves to the left). Select the width of the panorama in pixels. Move the mouse to the upper edge of the video (approximately in the center) and hold down "[" - the program remembers the upper left corner. We move the mouse to the bottom edge of the video, slightly retreating to the right, and click "]" - the program remembers the lower right corner and, thus, the coordinates of the "slot" through which a panorama will be collected from the video. Launch the video. We start to collect a panorama, at any moment of time holding the "/" on the numeric keypad on the right. The result is written in the same folder as Ekspozzer.
I ask you not to beat: the program was originally created “on the knee” and for yourself, there is no usability in the program. All the above is a purely entertaining popular science experiment.