Preparing video for sound design. Which codec to choose

The material of the article reflects the personal experience of the author and does not pretend to scientific accuracy. I will be glad to any corrections and additions. For those who do not want to read further, the correct answer to the main question: MJPEG.

Introduction

Clients often prefer archaic ways of transmitting design materials and cases where a 5-minute film sent by mail attachment compressed to 20 MB is not uncommon. The material for familiarization becomes the material for the work, which entails a number of unobvious problems, the main of which are low image detail (caused by excessive compression) and the use of video codecs that are not intended for audio editing.

Low detailing, pixelation and general motion blur make it difficult for the sound engineer to work at the very beginning, when there is an assessment of the plot and visual elements of the film that can be voiced. Hence, there is such a problem as an aesthetic mismatch, when, for example, a plastic (by design) object is sounded as metal or glass.
')
Poor picture quality also makes it difficult for the sound engineer to determine the beginning and completion of dynamic visual events, which leads to their vague synchronization with audio. But most often, the lag or sound ahead of the picture is related to the features of the video codec that was used in the audomontage process, which will be discussed below.

general information

But first, some information about what a video file is all about. In short, this container is a metafile that contains several streams of data. Streams include audio, video, images, subtitles, menus, chapter information, metadata, tags, etc. Inside the container there can be several streams of the same type at once (for example, 2 video tracks, 3 audio tracks, subtitles in several languages), and each of them can be compressed with different codecs. Here it is worth recalling the terms:

mux - packing multiple streams in one container
demux - extract streams from container to separate files
remux - replacing one or more threads in a container

All these operations are without loss of quality, i.e. they have no effect on the contents of the streams.

AVI, FLV, MOV, MP4, MKV, OGG, TS, WebM - these are not video codecs, but containers - the video file extension almost does not reflect the nature of the content. Video codecs are DivX, XVid, H.264, MPEG, MJPEG, Theora, VP9 and they are of three types: lossy , lossless and intra-only . It is codecs that determine image quality and suitability for audio editing. About intra-only will be discussed below, and the principles of the first two types are well described in this article . In short, the codec divides the video stream into groups of frames ( G group O f P ictures) and to reduce the file size, only the first frame (i-frame) is fully stored in each GOP, and the rest (b- and p-frames) contain Only information about changes in the picture. As a result, the structure of each GOP looks like this: ibbpbbpbbp. The more compressed the video, the higher the threshold for passing changes to b- and p-frames. The longer the GOP, the more problems there will be with rewinding (sticky frames, etc.). Hence the conclusion: for audio editing, lossy and lossless codecs are conditionally suitable only if the video was converted with a small GOP value.

Synchronization

Synchronization of streams is carried out through timestamps (timestamps), which are generated by the codec during (de) coding. If an error occurs at that moment, the codec skips such frames and assigns the timestamp of the problematic packet to the next non-problematic one. As a result, out of sync the “broken” stream with the others occurs. When re-converting such a file to lossy / lossless format, the effect may increase.

Intra-frame

A distinctive feature of the intra-frame codecs is that each frame of the stream is a key (i-frame). Inadequate intermediate frames are missing. One of the most popular codecs of this type is MJPEG (Motion JPEG) . It converts a video into a sequence of independently compressed JPEG images.

Pros MJPEG:

fast conversion speed
smooth rewind
suitable for audio editing

Minuses:

file size can be quite large

You can convert any video file to MJPEG using the ffmpeg utility. The command will be something like this:

ffmpeg -i input.avi -c:v mjpeg -q:v 1 -c:a copy output.mov

In order not to perform this operation every time from the command line, create a script like this (for Windows) and simply drag and drop video files onto it (several can be done at once):

 for %%A in (%*) do ffmpeg -i %%A -c:v mjpeg -q:v 1 -c:a copy "%%~nA"_mjpeg.mov

For convenience, a shortcut to this script can be thrown into the SendTo folder (in the shortcut properties you will need to clear the “Start in” field).

Finally

In advance, ask the client to send the video in good quality (not very compressed lossy or lossless), then convert any video sent to MJPEG and do the voice acting for this format. When the sound is ready, remux the client video by adding your audio track to the container. Some containers have limitations, for example, MP4 does not support audio streams in PCM (WAV / AIFF), the sound in this case will have to be transferred to MP3 or ALAC. Detailed table of compatibility is on Wikipedia .

Source: https://habr.com/ru/post/203396/

All Articles