This article introduces my “Hobby” project - CaptureManager for the Windows desktop platform. This project is a simple feature set (SDK) for including support for a wide range of video and audio sources in the developed application.
CaptureManager is based on the Microsoft Media Foundation - a new generation of media technology, replacing the outdated DirectShow. The Microsoft Media Foundation was first included in Windows Vista and received support for video and audio sources starting with Windows 7. The advantage of the Microsoft Media Foundation is a new model for processing media data, optimal for multiprocessor systems, and its continued development and support from Microsoft.
In the CaptureManager project , I wanted to solve a number of problems I encountered when writing applications using the Microsoft Media Foundation:
Implementation of COM functionality. Strange as it may sound, but with the Microsoft Media Foundation technology, Microsoft retreated from its own application model - from COM. Of course, all the class interfaces in the Microsoft Media Foundation are also all derived from IUnknown and are associated with a GUID. But classes themselves are created through direct “C” function calls from statically linked system libraries. This is different from the implementation of DirectShow, which requires a call to CoCreateInstance and access via COM abstraction. In my opinion, this solution by Microsoft is a drawback - firstly, it is difficult for Microsoft Media Foundation to integrate into projects written not in C / C ++, for example, C # projects, which, by the way, interact with COM objects almost seamlessly on Windows , generating the required interface definitions from the TLB. Secondly, the risk of loss of application compatibility with the next version of Windows increases when the function is migrated from one statically linked library to another — with the Microsoft Media Foundation this happened once: Library Changes in Windows 7 - “Starting in Windows 7, certain Media Foundation functions are exported from different DLL files than previous versions. ".
In my opinion, the Microsoft Media Foundation is overloaded with functions and interfaces - it would be nice to hide most of them behind an additional level of abstraction to optimize the task of capturing and recording video and audio data.
A significant drawback, in my opinion, is the limitations in supporting video and audio recording in the Microsoft Media Foundation. The Microsoft Media Foundation provides two mechanisms for working with media data: through graph-topology and SourceReader-SinkWriter . The first involves the assembly of the desired configuration of the nodes of converters and allows you to flexibly configure the desired configuration. The second one offers receiving portions of media data from SourceReader and sending them to SinkWriter in the context of the application being developed. The graph topology is very convenient, in my opinion, and makes it easy to generate the required recording configuration at the user's request. However, this solution from Microsoft does not allow to solve the recording task - the fact is that the object for creating a working recording session based on the topology with the IMFMediaSession interface from the MFCreateMediaSession function is optimized for playing media data and does not perform a number of required operations - for example, when write to a file, the calculation of the metrics you want to - to calculate the average speed of flow and calculate the play time - but IMFMediaSession of MFCreateMediaSession this does not function - to reproduce the problem of calculating the operation Bess metrics yslenna. There is also a problem with timing - the IMFMediaSession from the MFCreateMediaSession function considers the start of playback from zero time - this is logical when playing a media file. However, video and audio sources such as webcams or microphones use the current system time — according to the Microsoft Media Foundation documentation, they must be initialized to zero time, but they do not fulfill this requirement.
I think, and I think many will agree that the above problems are significant and it would be desirable to solve them. This was the reason for the start of the CaptureManager project (as well as the task of capturing video from two webcams and recording this video into one media file).
')
In short, what the CaptureManager is :
A full COM In-Process Server - or as it is sometimes called - ActiveX . It includes TLB and can be integrated into C ++, C #, Python projects along with DirectShow.
CaptureManager is associated with the Microsoft Media Foundation libraries, but uses “pending linking” —the Microsoft Media Foundation libraries are loaded in the CaptureManager code and associated with the appropriate functions during the execution of the application. If it is impossible to find a function in the libraries, it is replaced by the function stopper that returns the error code - E_NOTIMPL . Thus, CaptureManager reduces the risk of a target application crashing in a situation of migrating functions from one Microsoft Media Foundation library to another.
CaptureManager has a simplified set of interfaces. An important feature is the generation of data describing media sources, codecs and media containers in XML document format — it is much easier to process an XML document than numerous Variant and PropVariant , especially on high-level APIs like WPF.
CaptureManager includes a number of video and audio sources not found in the original Microsoft Media Foundation: Screen Capture for capturing images from the display (or several displays), AudioLoop Capture for capturing audio from the audio output, DirectShow-Crossbar Capture for capturing video from video capture cards.
CaptureManager includes “battery shots”, allowing you to get a series of extreme shots.
CaptureManager includes its own implementation of the IMFMediaSession interface optimized for the recording task - i.e. implemented a complete rejection of the call MFCreateMediaSession function.
CaptureManager includes functionality for changing webcam video processor parameters and camera parameters (focus, exposure, etc.).
The CaptureManager functionality is presented in the demo programs available on GitHub - CaptureManager-SDK-Demos :
CPPDemos:
EVRWebCapViewerViaCOMServer is a simple C ++ application for demonstrating the functionality of viewing video sources via the CaptureManager renderer.
OpenGLWebCamViewerViaCOMServer is a simple C ++ application for demonstrating the functionality of viewing video sources through the OpenGL renderer.
TextInjectorDemo is a simple C ++ application for demonstrating the functionality of mixing dough with video stream from the camera.
WaterMarkInjectorDemo is a simple C ++ application for demonstrating the functionality of mixing images from a video stream from a camera.
EVRVieweingAndRecording is a simple C ++ application for demonstrating the functionality of recording from video and audio sources into one media file.
NativeMediaFoundationPlayer is a simple C ++ application for the demonstration of playing multiple video files in a common renderer.
CSharpDemos:
WPFMultiSourceRecorder is a simple C # application for demonstrating the functionality of recording from one, two or more video and audio sources into one common media file.
WPFMediaFoundationPlayer is a simple C # application for demonstrating playback of multiple video files in a common renderer.
WPFVideoAndAudioRecorder is a simple C # application for demonstrating the functionality of recording from video and audio sources into one media file.
WPFIPCameraMJPEGMultiSourceViewer is a simple C # application for demonstrating the functionality of capturing video from several Internet cameras and playing them in a general renderer.
WPFMultiSourceViewer is a simple C # application for demonstrating the functionality of video capture from several several and playing them in a common renderer.
WPFViewerEVRDisplay is a simple C # application to demonstrate the functionality of integrating the CaptureManager renderer into a WPF application.
WPFIPCameraMJPEGViewer is a simple C # application to demonstrate the functionality of capturing video from an Internet camera.
WPFImageViewer is a simple C # application to demonstrate the functionality of image capture from a file.
WindowsFormsDemo is a simple C # application to demonstrate the functionality of viewing and recording video sources.
WPFWebCamSerialShots is a simple C # application for demonstrating the “frame accumulator” functionality.
WPFWebCamShot is a simple C # application for demonstrating the functionality of frame capture from a video source.
WPFRecorder is a simple C # application to demonstrate the functionality of viewing and recording video sources.
WPFWebViewerEVR is a simple C # application to demonstrate the functionality of viewing video sources through the CaptureManager renderer.
WPFWebViewerCallback is a simple C # application for demonstrating the functionality of capturing frames from a video source through copying from the CaptureManager stream.
WPFWebViewerCall is a simple C # application for demonstrating the functionality of frame capture from a video source through a direct call to the CaptureManager methods.
WPFSourceInfoViewer is a simple C # application to demonstrate the functionality of getting information about available video and audio sources.
PythonDemos:
CaptureManagerSDKPythonDemo is a simple Python application to demonstrate the functionality of viewing and recording video sources.
QtMinGWDemos:
CaptureManagerSDKQtMinGWDemo is a simple C ++ application on Qt for demonstrating the functionality of viewing and recording video sources.
UnityDemos:
UnityWebCamViewer is a simple application to demonstrate the functionality of working with a video source in Unity3D.