Seamless splitting and splicing video with DirectShow

One of our departments is engaged in manual testing of multimedia components for cars. At the same time, all the actions performed (pressing buttons, inserting discs, etc.) and the system response are constantly recorded: one of the cameras is directed to the display. The video in this case is evidence of the observation of the error, and also provides developers with valuable information about what actions were taken and how quickly. Agree, the information is very important for bug reports, isn't it?

The specificity of the systems is that errors can occur spontaneously and unexpectedly, at any stage of the test case, or even generally just in standby mode, when video recording is not being conducted. I invite those interested in it under the cat, where I will describe the solution I developed for seamless splitting and pasting videos. Thanks to him, the recording is kept all day, and the video is saved into convenient files of a small size, which allows us to catch and document rare errors, and at the same time please developers with dozens of videos with impossible system responses.

Why break a video?

In fact - why? They would save themselves eight-hour files, and then in some video editor they would cut the pieces of the desired video, and that's that. In fact, everything happened earlier, but there are many shortcomings: it is inconvenient to dig in long files, after cutting, one way or another, you have to recompress the video with a loss of time and quality, and the place on the discs is not rubber yet. In addition, sooner or later you have to stop recording in order to work with files, and it is here that, according to the law of meanness, the most harmful and non-recurring errors always occur. In general, the verdict - it is necessary to cut at the recording stage!

What does DirectShow offer us?

DirectShow technology was chosen for implementation. One of the requirements is also that the output should be Windows Media files, namely WMV3 , so it was decided to compress on the fly, since modern computers can easily do this. The basic idea is this: we need the ability to switch incoming audio and video streams to another file at any time, without losing a frame. So we will be able to record in files lasting, say, two minutes, and, if necessary, seamlessly glue them together.
')
Construct the most common graph of filters for recording video with sound in Windows Media format and with a preview. It turns out something like this:

How does this graph work? Two input filters for audio and video are served by different streams that deliver samples (samples) to the input pins of the following filters. Using the Smart Tee filter, we duplicate the incoming video data, one copy is sent to the screen in Video Renderer , and the second goes to the WM ASF Writer filter, which actually synchronizes the audio and video, compresses them and writes to the file.

The “head-on” solution with two filters that could be alternately used in the graph, changing the names of the output files, does not work.

The DirectShow graph has one feature: until it is stopped, all its “output” filters keep the files open and do not finalize them. In addition, without stopping the graph it is impossible to change the names of the output files or connect / disconnect filters. But stopping and starting the graph is fraught with losses of several frames, or even a few seconds! It is clear that standard means can not do.

Independent graphs

One solution is to make the audio and video capture graph ( Capture Graph ) independent of the Record Graph so that the latter can be stopped to finalize the files. This is possible, for example, using GMFBridge from the creator of DirectShow - Geraint Davies . An approximate scheme of the entire system would look like this:

GMFBridge is simultaneously in all three graphs, allowing on the fly to switch sample streams between the first and second Record Graph , without losing a single sample. While one of the recording graphs matches our video, we set up a second graph (the name of the output file), since it can easily be stopped without affecting the others. At the right moment, we start the second graph, switch the GMFBridge and stop the first one. Voila!

But this solution has obvious drawbacks. First, the resources. You must have two copies of the write graphs, which adversely affects the overall performance and memory usage. In addition, if you have both video and audio at the same time, it is extremely difficult to synchronize them - each graph has its own countdown, and the samples themselves are supplied in different streams. All this led to spontaneous “freezes” of GMFBridge itself at the moment of switching the graphs, so it was decided to give up this decision. The source code of the tool, of course, is open, and if you would like, you could understand the reasons for its unstable work, but still the desire to save resources outweighed, and I decided to approach the task from the other side.

Writing your ASF Writer

~~With preference and courtesans~~ . For sure! We need a WM ASF Writer that would be able to switch to another file on command without having to stop the graph. Then we can take the first and easiest graph, insert our custom filter there instead of the standard WM ASF Writer and enjoy life.
Create your own filter by adding another new StreamToFile to the standard methods, which will be used to switch between files.

class CCustomASFWriter : public CBaseFilter { public: STDMETHOD(StreamToFile)(BSTR szFileName); }

Concerning code examples

All the code samples below are quite simplified for clarity, for example, error handling has been completely thrown away, various additional checks removed, and so on.

In order not to lose samples at the moment of switching, and also not to block sample delivery streams, we will add a multi-threaded queue for incoming data to our filter. I used an implementation like this one , a little bit to finish it for use in multiple producers - single consumer mode. I decided to use the queue for video and audio, and here's why. It all comes down to our new file transfer feature. For this it is important to remember that the frequency of delivery of video samples, as a rule, is much higher than that of audio: for example, 30 Hz video and 2 Hz (500 ms per sample) for audio. Accordingly, the switch must be made immediately after the delivery of the audio sample. Observing the natural order of samples in the queue, you can very conveniently do just that.

Given this, our StreamToFile method will only signal the filter that it should immediately after the next audio sample close the current file and start recording to a new one. While a new file is being prepared, all incoming samples are saved in the queue.

 HRESULT STDMETHODCALLTYPE CCustomASFWriter::StreamToFile(BSTR szFileName) { wcscpy_s(m_szCurrentFile, szFileName); { CAutoLock lock(m_pLock); m_bSwitchRequested = TRUE; } return S_OK; }

Actually, the compression itself and writing to the files takes place using the Windows Media Format SDK , namely the IWMWriter interface.

  IWMWriter *pWriter = NULL; WMCreateWriter(NULL, &pWriter);

To do this, in a separate thread turns the processing cycle of incoming samples:

  while (bRunning) { StreamSamplesToWriter(pWriter); DWORD dwWaitResult = WaitForSingleObject(hEventStopStreaming, 33); if (dwWaitResult == WAIT_OBJECT_0) { m_pPinVideo->StopQueuingNow(); m_pPinAudio->StopQueuingNow(); bRunning = FALSE; } }

The most interesting thing happens in the StreamSamplesToWriter method. Here, the samples are sent to IWMWriter , and the files are switched to the correct point in time if a switch signal was given using the StreamToFile method.

 STDMETHODIMP CCustomASFWriter::StreamSamplesToWriter(IWMWriter *pWriter) { BOOL bMustSwitch = FALSE; void *pObject = NULL; while (m_pSamplesQueue->Pop(pObject)) { CQueuedSample *pSample = (CQueuedSample*)pObject; DWORD inputNumber = pSample->MediaType == MEDIATYPE_Video ? m_pPinVideo->InputNumber : m_pPinAudio->InputNumber; INSSBuffer *pBuffer = NULL; pWriter->AllocateSample(pSample->DataSize, &pBuffer); LPBYTE pbDestBuffer = NULL; pBuffer->GetBuffer(&pbDestBuffer); CopyMemory(pbDestBuffer, pSample->Data, pSample->DataSize); pWriter->WriteSample(inputNumber, pSample->Start, pSample->IsDiscontinuity | pSample->IsSyncPoint, pBuffer); pBuffer->Release(); if (inputNumber == m_pPinAudio->InputNumber) { { CAutoLock lock(m_pLock); bMustSwitch = m_bSwitchRequested; if (m_bSwitchRequested) m_bSwitchRequested = FALSE; } if (bMustSwitch) { pWriter->EndWriting(); pWriter->SetOutputFilename(m_szCurrentFile); pWriter->BeginWriting(); } } delete pSample; } }

So, we managed to achieve a result! By pulling the StreamToFile method at arbitrary points in time, we get new files without losing a single frame.

Glue the cut

Well, we got a bunch of files for two minutes. And what to do if we need a video with a length of 4 minutes, and the most interesting place is observed exactly at the moment of switching from one file to another? It does not matter - we can very simply glue these files into one, and do it without transcoding! At the same time, the gluing will be really seamless, since not a single frame was lost during the recording.

For this we use IWMSyncReader and IWMWriterAdvanced .

  IWMWriter *pWriter = NULL; IWMWriterAdvanced *pWriterA = NULL; WMCreateWriter(NULL, &pWriter); pWriter->QueryInterface(IID_IWMWriterAdvanced, (void**)&pWriterA; IWMSyncReader *pReader = NULL; WMCreateSyncReader(NULL, 0, &pReader); for (element = m_oMergeFileList.begin(); element < m_oMergeFileList.end(); element ++) { pReader->Open(element->FileName); IWMProfile *pProfile = NULL; pReader->QueryInterface(IID_IWMProfile, (void**)&pProfile); //   ,      ,      " " for (WORD i = 0; i < dwStreamCount; i++) { pProfile->GetStream(i, &pStream); pStream->GetStreamNumber(&wStreamNumber); pReader->SetReadStreamSamples(wStreamNumber, TRUE); } HRESULT hr = S_OK; while (SUCCEEDED(hr)) { hr = pReader->GetNextSample(0, &pSample, &cnsSampleTime, &cnsDuration, &dwFlags, &dwOutputNum, &wStreamNum); pWriterA->WriteStreamSample(wStreamNum, qwSampleTimeToWrite, 0, cnsDuration, dwFlags, pSample); } }

As a result, very quickly (milliseconds!) We get a file with a length of 4 minutes, the place of gluing in which cannot be detected. Strictly speaking, this is not entirely true, and under rare and certain conditions, gluing still does not work so perfectly (for example, at a very low frame rate set on the camera). However, in normal conditions when viewing this place is really unnoticeable.

I also want to cut and glue!

I did not stop at that and decided to go further. A magic button was added that allows you to instantly get a video file in the last minute (in fact, the desired duration is configured). The function turned out to be very popular with testers - I saw an unexpected error, poked a button, got a video.

I will illustrate the situation to make it more clear. Suppose a button is pressed almost immediately after the automatic switching to the following file has already been done:

Clicking on the magic button again switches to the new file so that File 2 becomes available. But now we need to cut File 1 , and then merge it with File 2 .

Here, too, nothing complicated. I already wrote about gluing, and the cutting is done in the same way: compressed packages are read without decompression and all unnecessary ones are skipped, and starting from a certain point in time all readable packets are written to a file with time stamp corrections so that the first packet has 00:00:00. Here it is also necessary to choose the right moment of cutting so that the first packet of the new file contains the reference frame (key or I-frame), and not the predicted P-frame (delta frame). Reference frames can be placed in WMV files even once in half a minute with small changes in the picture, so I had to configure the use of forced reference frames. I chose the maximum duration between two 1-second reference frames as a compromise between positioning accuracy when cutting and file size.

  IWMVideoMediaProps *pVMProps = NULL; pStreamConfig->QueryInterface(IID_IWMVideoMediaProps, (void**)&pVMProps); pVMProps->SetMaxKeyFrameSpacing(10000000i64);

Hooray! The magic button works!

Conclusion

As a conclusion, I want to say that I understood all the intricacies of DirectShow on my own - MSDN, code samples on the Internet, and trial and error. But this is exactly what made the result so pleasant - the application is actively used to record video, and simultaneously from several cameras. When errors are found, testers happily poke at the magic button and after 2 seconds they receive ready and synchronized video files from three cameras without the need for any video editing at all! And nothing pleases the developer so much as thanks from users, isn't it?

Source: https://habr.com/ru/post/208788/

All Articles