Colleagues from Baikal Electronics offered to work with the processor Baikal-T1 [ L1 ] and write about their impressions. For them, this is a way to tell developers about the capabilities and features of their processor. For me, a chance to get a closer look at the system on a modern processor core and in the future to invent fewer “bicycles”, adding, for example, new functionality to the MIPSfpga-plus [ L2 ] project. Well, the usual engineering curiosity, again ...
Today we will discuss the vector expansion of the MIPS SIMD architecture, which is available in the MIPS Warrior P-class P5600 [ L3 ] cores, which means it is also present in the Baikal-T1 processor. The article is aimed at novice developers.
Practically always with the development of a device (device / hardware / software and hardware complex, etc.) is associated with the solution of the problem of processing digital and analog signals. Input may include sensor readings, signals from I / O devices, information from a file on a disk, etc. At the output: the image on the monitor, the sound from the speakers, the drive control signals, the indications of the indicators on the dashboard, etc. A "between" input and output - a set of certain mathematical operations.
If we briefly list the ways to implement this "mathematics in hardware", we get the following list of tools available to the developer, which can be applied both together and separately:
implementation as an analog circuit
Despite the dominance of digital devices, the world and the human senses still remain analog. Whether we like it or not, but even processing the information “exclusively” in a figure, we still have to set up a filter before entering the ADC. Similarly, integrators, differentiators, adders, etc. can be implemented on analog components. Over the decades of the dominance of analog electronics, engineers have accumulated vast experience in solving various problems and a good developer (even if digital electronics) takes into account this legacy [ D1 ];
software implementation on a microcontroller
The processed signals are few, and the mathematics is not complicated or not demanding of resources? In this case, a relatively inexpensive microcontroller with its ADC, low frequency of operation, energy saving capabilities is quite an option. If necessary, bottlenecks can be written in assembler;
FPGA implementation
If we have high requirements for speed, parallelism, scaling solutions, then we describe mathematics in the form of a module on Verilog or VHDL, choose FPGA, which can work at the frequency necessary for processing. If the solution turns out to be very successful and there will be a sense in its wide replication - welcome to the world of ASIC [ L3 ];
software and hardware implementation as a system on a chip
The system is too complicated to describe it entirely on Verilog, I want to program separate logic in a high-level language, and indeed, to manage everything from Linux? In this case, a solution for us is SoC (System-on-a-Chip, SoC): we take a ready-made processor core (Nios II, MIPSfpga, etc.) and hang it with the necessary peripheral modules, among which will be our special, performing cunning mathematics. Some operations can be made available in the form of processor commands [ L4 ]. And yes, in perspective this can also be implemented in ASIC;
use of a digital signal processor (DSP)
Here we are, in fact, buying a ready-made chip with a processor core, its own set of peripherals and a set of commands specifically targeted at high-speed digital signal processing. Around her, we are building our decision [ L10 , L5 ];
software implementation on a general purpose processor
Each processor manufacturer offers its own architectural solutions to optimize the performance of certain mathematical operations. And the task of the software developer, if necessary, is to use the possibilities offered by the manufacturer to speed up the calculations. This is exactly what will be discussed below for MIPS processors;
There are no perfect tools. The best tool is the one that guarantees the solution of the problem in a reasonable time, for which the project team has the necessary competencies, and which is either available or can be acquired with minimal cost. Budget decisions, customer requirements, and sometimes political reasons are imposed on such decisions.
I hope that this article will be useful as an introduction for those readers who will face the need to optimize the performance of certain calculations on the Baikal-T1 processor or another friend based on the MIPS core (s) where MIPS SIMD technology is available.
Before proceeding further, consider one of the most common problems of digital signal processing (DSP) - filtering. As an example, take a filter with a finite impulse response (FIR, FIR, finite impulse response) [ L8 ]. Without going into the theory of DSP and mathematical calculations, we note the main thing - the equation that describes this type of digital filters:
where x (n) is the input signal, y (n) is the output signal, P is the order of the filter, bi is the filter coefficients. The same equation can be written as follows:
The nature of the input signal x (n) in this case is ignored. Let it be the data obtained from the ADC, but with the same success they can be read from the file. For us in this case it does not matter. Since our current article is “about calculations” and not “about DSP”, then, accordingly, we will not dive into the Magic of filters, but simply use one of the online services to calculate the coefficients [ L9 ]:
Set the desired filtering parameters (for example, one of the predefined options: band stop - notch filter [ L11 ]) and click the Design Filter button:
The result of the calculation is the frequency response of the filter [ L12 ]:
Factor set and source code:
#ifndef SAMPLEFILTER_H_ #define SAMPLEFILTER_H_ /* FIR filter designed with http://t-filter.appspot.com sampling frequency: 2000 Hz * 0 Hz - 200 Hz gain = 1 desired ripple = 5 dB actual ripple = 3.1077303934211127 dB * 300 Hz - 500 Hz gain = 0 desired attenuation = -40 dB actual attenuation = -42.49314043914754 dB * 600 Hz - 1000 Hz gain = 1 desired ripple = 5 dB actual ripple = 3.1077303934211127 dB */ #define SAMPLEFILTER_TAP_NUM 25 typedef struct { double history[SAMPLEFILTER_TAP_NUM]; unsigned int last_index; } SampleFilter; void SampleFilter_init(SampleFilter* f); void SampleFilter_put(SampleFilter* f, double input); double SampleFilter_get(SampleFilter* f); #endif
#include "SampleFilter.h" static double filter_taps[SAMPLEFILTER_TAP_NUM] = { 0.037391727827352596, -0.03299884552335979, 0.044230583967321345, 0.0023050970833628304, -0.06768087195950104, -0.046347105409124706, -0.011717387509232432, -0.0707342284185183, -0.049766517282999544, 0.16086413543836361, 0.21561058688743148, -0.10159456907827959, 0.6638637561392535, -0.10159456907827959, 0.21561058688743148, 0.16086413543836361, -0.049766517282999544, -0.0707342284185183, -0.011717387509232432, -0.046347105409124706, -0.06768087195950104, 0.0023050970833628304, 0.044230583967321345, -0.03299884552335979, 0.037391727827352596 }; void SampleFilter_init(SampleFilter* f) { int i; for(i = 0; i < SAMPLEFILTER_TAP_NUM; ++i) f->history[i] = 0; f->last_index = 0; } void SampleFilter_put(SampleFilter* f, double input) { f->history[f->last_index++] = input; if(f->last_index == SAMPLEFILTER_TAP_NUM) f->last_index = 0; } double SampleFilter_get(SampleFilter* f) { double acc = 0; int index = f->last_index, i; for(i = 0; i < SAMPLEFILTER_TAP_NUM; ++i) { index = index != 0 ? index-1 : SAMPLEFILTER_TAP_NUM-1; acc += f->history[index] * filter_taps[i]; }; return acc; }
Let's look at the filtering parameters, the SampleFilter_get function, recall the FIR filter equation above and note the most important points for us:
Now suppose that the conditions of the task were clarified due to some objective reasons:
As a result, we get the following frequency response:
What is important for us:
I would not like the reader to pay attention to the absolute values ​​of the filter parameters given in the example. The main idea that I would like to convey is that at some point an increase in computation-intensive resources may go beyond the previously predicted limits. And after a small change in the algorithm, its parameters, or input data, all of a sudden, it may turn out that your processor core only deals with what it "shoves" the data, and even then it does not have time to do it, not to mention the performance of the remaining tasks. Or it successfully copes, but the load created at the same time or the duration of the calculations no longer meet the requirements for the system. And at this moment you are faced with more than ever the task of optimization.
After we have seen a problem with a simple example, we will look at its solution. Processor manufacturers go to a number of tricks in order to increase the speed of calculations: increase the frequency, increase the number of processor cores, add new commands, experiment with the configuration of the conveyor, cache size, switch to using more and more high-speed buses and interfaces.
Also, no one takes away from the developer the right to realize the most bottlenecks on the assembler, experiment with the compiler options, change the algorithm to a less resource-intensive one or behave more predictably on the same range of parameters.
We will concentrate on two ways to increase processing speed, which cannot be used without support from the processor architecture:
Let's return to our filter and carefully look at the code for the SampleFilter_get function:
double SampleFilter_get(SampleFilter* f) { double acc = 0; int index = f->last_index, i; for(i = 0; i < SAMPLEFILTER_TAP_NUM; ++i) { index = index != 0 ? index-1 : SAMPLEFILTER_TAP_NUM-1; acc += f->history[index] * filter_taps[i]; }; return acc; }
And especially on the line:
acc += f->history[index] * filter_taps[i];
Here we see 2 consecutively performed operations: multiplication by a coefficient and accumulation of the results of this command in a battery variable. This combination of multiplication and addition is very common in DSP algorithms. But if these two operations are so often side by side with each other, then why not combine them into a single command that is executed during 1 clock cycle? The idea of ​​such a union came to the heads of engineers for a long time. Thus, the command “multiply with accumulation” (combined multiplication-addition, multiply – accumulate operation, MAC) appeared, which is now present in all digital signal processors:
Let's approach the solution from the other side. And why would we, instead of working with each x (n) , separately, not combine several samples into one vector (array) and apply the command to the whole vector at once, or more precisely, to each element of the vector at the same time? In this case, in one cycle (apart from working with memory) several samples can be processed at once:
And the larger the maximum size of the vector, the higher the processing speed will be. This principle of organization of calculations is called SIMD (single instruction, multiple data; single stream of commands, multiple data stream) [ L13 ]. This approach builds the work of vector processors [ L14 ] and vector extensions to scalar processors: SSE and AVX for x86 architecture, MIPS SIMD [ L15 ] for MIPS architecture.
Now that we understand the basic principles on which the vector extensions of the architecture are built, we can go directly to MIPS SIMD. An exhaustive description of this extension is given in the documentation [ D2 ], we note the main points:
Work is done using 32x 128-bit registers for processing vector data. Each of which can be represented as: 16x 8-bit vectors, 8x 16-bit, 4x 32-bit, or 2x 64-bit vectors:
Mnemonic | Instruction Description |
---|---|
ADDV, ADDVI | Add |
ADD_A, ADDS_A | Add and Saturated Add Absolute Values |
ADDS_S, ADDS_U | Signed and Unsigned Saturated Add |
HADD_S, HADD_U | Signed and Unsigned Horizontal Add |
ASUB_S, ASUB_U | Absolute Value of Signed and Unsigned Subtract |
AVE_S, AVE_U | Signed and Unsigned Average |
AVER_S, AVER_U | Signed and Unsigned Average with Rounding |
DOTP_S, DOTP_U | Signed and Unsigned Dot Product |
DPADD_S, DPADD_U | Signed and Unsigned Dot Product Add |
DPSUB_S, DPSUB_U | Signed and Unsigned Dot Product Subtract |
Div_s div_u | Divide |
MADDV | Multiply-add |
MAX_A, MIN_A | Maximum and Minimum of Absolute Values |
MAX_S, MAXI_S, MAX_U, MAXI_U | Signed and Unsigned Maximum |
MIN_S, MINI_S, MIN_U, MINI_U | Signed and Unsigned Maximum |
MSUBV | Multiply-subtract |
MULV | Multiply |
MOD_S, MOD_U | Signed and Unsigned Remainder (Modulo) |
SAT_S, SAT_U | Signed and Unsigned Saturate |
SUBS_S, SUBS_U | Signed and Unsigned Saturated Subtract |
HSUB_S, HSUB_U | Signed and Unsigned Horizontal Subtract |
SUBSUU_S | Signed Saturated Unsigned Subtract |
SUBSUS_U | Unsigned Saturated Signed Subtract from Unsigned |
SUBV, SUBVI | Subtract |
Mnemonic | Instruction Description |
---|---|
AND, ANDI | Logical and |
BCLR, BCLRI | Bit clear |
BINSL, BINSLI, BINSR, BINSRI | Bit Insert Left and Right |
BMNZ, BMNZI | Bit Move If Not Zero |
BMZ, BMZI | Bit Move If Zero |
BNEG, BNEGI | Bit negate |
BSEL, BSELI | Bit select |
BSET, BSETI | Bit set |
NLOC Leading | One beat count |
NLZC Leading | Zero Bits Count |
NOR, NORI | Logical negative or |
PCNT | Population (Bits Set to 1) Count |
OR, ORI | Logical or |
SLL, SLLI | Shift left |
SRA, SRAI | Shift Right Arithmetic |
SRAR, SRARI | Rounding Shift Right Arithmetic |
SRL, SRLI | Shift right logical |
SRLR, SRLRI | Rounding Shift Right Logical |
XOR, XORI | Logical Exclusive Or |
Mnemonic | Instruction Description |
---|---|
FADD | Floating-point addition |
Fdiv | Floating-point division |
FEXP2 | Floating point base 2 exponentiation |
FLOG2 | Floating point base 2 logarithm |
FMADD, FMSUB | Floating-Point Fused Multiply-Add and Multiply-Subtract |
FMAX, FMIN | Floating-Point Maximum and Minimum |
FMAX_A, FMIN_A | Floating-Point Maximum and Minimum of Absolute Values |
FMUL | Floating-point multiplication |
FRCP | Approximate Floating-Point Reciprocal |
FRINT | Floating point round to integer |
FRSQRT | Approximate Floating-Point Reciprocal of Square Root |
FSQRT | Floating point square root |
FSUB | Floating point subtraction |
Mnemonic | Instruction Description |
---|---|
FCLASS | Floating point class mask |
Mnemonic | Instruction Description |
---|---|
FCAF | Floating-Point Quiet Compare Always False |
FCUN | Floating-Point Quiet Compare Unordered |
FCOR | Floating-Point Quiet Compare Ordered |
FCEQ | Floating-Point Quiet Compare Equal |
FCUNE | Floating-Point Quiet Compare Unordered or Not Equal |
FCUEQ | Floating-Point Quiet Compare Unordered or Equal |
FCNE | Floating-Point Quiet Compare Not Equal |
FCLT | Floating Point Quiet Compare Less Than |
FCULT | Floating-Point Quiet Compare Unordered or Less Than |
FCLE | Floating-Point Quiet |
FCULE | Floating-Point Quiet Compare Unordered or Less Than or Equal |
FSAF | Floating-Point Signaling Compare Always False |
FSUN | Floating-Point Signaling Compare Unordered |
FSOR | Floating-Point Signaling Compare Ordered |
FSEQ | Floating-Point Signaling Compare Equal |
FSUNE | Floating-Point Signaling Compare Unordered or Not Equal |
FSUEQ | Floating-Point Signaling Compare Unordered or Equal |
FSNE | Floating-Point Signaling Compare Not Equal |
Fslt | Floating-Point Signaling Compare Less Than |
FSULT | Floating-Point Signaling Compare Unordered or Less Than |
FSLE | Floating-Point Signaling Compare Less Than or Equal |
FSULE | Floating-Point Signaling Compare Unordered or Less Than or Equal |
Mnemonic | Instruction Description |
---|---|
FEXDO | Floating Point Down-Convert Interchange Format |
FEXUPL, FEXUPR | Left-Half and Right-Half Floating-Point Up-Convert Interchange Format |
FFINT_S, FFINT_U | Floating-Point Convert from Signed and Unsigned Integer |
FFQL, FFQR | Left-Half and Right-Half Floating-Point Convert from Fixed-Point |
FTINT_S, FTINT_U | Floating-Point Round and Convert to Signed and Unsigned Integer |
FTRUNC_S, FTRUNC_U | Truncate and Convert to Signed and Unsigned Integer |
FTQ | Floating Point Round and Convert to Fixed Point |
Mnemonic | Instruction Description |
---|---|
MADD_Q, MADDR_Q | Fixed-Point Multiply and Add-on and Rounding |
MSUB_Q, MSUBR_Q | Fixed-Point Multiply and Subtract without and with Rounding |
MUL_Q, MULR_Q | Fixed-Point Multiply without and Rounding |
Mnemonic | Instruction Description |
---|---|
Bnz | Branch If Not Zero |
Bz | Branch if zero |
CEQ, CEQI | Compare Equal |
CLE_S, CLEI_S, CLE_U, CLEI_U | Compare Less-Than-or-Equal Signed and Unsigned |
CLT_S, CLTI_S, CLT_U, CLTI_U | Compare Less-Than Signed and Unsigned |
Mnemonic | Instruction Description |
---|---|
CFCMSA, CTCMSA | Copy Register and MSA Control Register |
LD | Load vector |
Ldi | Load Immediate |
MOVE | Vector to Vector Move |
SPLAT, SPLATI | Replicate Vector Element |
FILL FILL | Vector from GPR |
INSERT, INSVE | Insert GPR and Vector element 0 to Vector Element |
COPY_S, COPY_U | Copy element to GPR Signed and Unsigned |
ST | Store Vector |
Mnemonic | Instruction Description |
---|---|
ILVEV, ILVOD | Interleave Even, odd |
ILVL, ILVR | Interleave the Left, Right |
PCKEV, PCKOD | Pack Even and Odd Elements |
SHF | Set shuffle |
SLD, SLDI | Element slide |
VSHF | Vector shuffle |
Mnemonic | Instruction Description |
---|---|
LSA | Left-shift add or load / store address calculation |
MIPS SIMD is supported by the gcc compiler, however, this support has its own features:
#define ROUND_POWER_OF_TWO(value, n) (((value) + (1 << ((n) - 1))) >> (n)) static inline unsigned char clip_pixel(int i32Val) { return ((i32Val) > 255) ? 255u : ((i32Val) < 0) ? 0u : (i32Val); } void vert_filter_8taps_16width_c(unsigned char *pSrc, // SOURCE POINTER int SrcStride, // SOURCE BUFFER PITCH unsigned char *pDst, // DEST POINTER int DstStride, // DEST BUFFER PITCH char *pFilter, // POINTER TO FILTER BANK int Height) // HEIGHT OF THE BLOCK { unsigned int Row, Col; int FiltSum; short Src0, Src1, Src2, Src3, Src4, Src5, Src6, Src7; pSrc -= (8 / 2 - 1) * SrcStride; // MOVE INPUT SRC POINTER TO APPROPRIATE POSITION // LOOP FOR NUMBER OF COLUMNS-16 for (Col = 0; Col < 16; ++Col) { Src0 = pSrc[0 * SrcStride]; Src1 = pSrc[1 * SrcStride]; Src2 = pSrc[2 * SrcStride]; Src3 = pSrc[3 * SrcStride]; Src4 = pSrc[4 * SrcStride]; Src5 = pSrc[5 * SrcStride]; Src6 = pSrc[6 * SrcStride]; // LOOP FOR NUMBER OF ROWS for (Row = 0; Row < Height; Row++) { Src7 = pSrc[(7 + Row) * SrcStride]; FiltSum = 0; // ACCUMULATED FILTER SUM += PIXEL * FILTER COEFF FiltSum += (Src0 * pi8Filter[0]); FiltSum += (Src1 * pi8Filter[1]); FiltSum += (Src2 * pi8Filter[2]); FiltSum += (Src3 * pi8Filter[3]); FiltSum += (Src4 * pi8Filter[4]); FiltSum += (Src5 * pi8Filter[5]); FiltSum += (Src6 * pi8Filter[6]); FiltSum += (Src7 * pi8Filter[7]); FiltSum = ROUND_POWER_OF_TWO(FiltSum, 7); // ROUNDING pDst[Row * DstStride] = clip_pixel(FiltSum);// CLIP RESULT IN 0-255(UNSIGNED CHAR) // PREPARING FOR NEXT CONVOLUTION- SLIDING WINDOW Src0 = Src1; Src1 = Src2; Src2 = Src3; Src3 = Src4; Src4 = Src5; Src5 = Src6; Src6 = Src7; } pSrc += 1; pDst += 1; } }
/* MSA VECTOR TYPES */ #define WRLEN 128 // VECTOR REGISTER LENGTH 128-BIT #define NUMWRELEM (WRLEN >> 3) typedef signed char IMG_VINT8 __attribute__ ((vector_size(NUMWRELEM))); //VEC SIGNED BYTES typedef unsigned char IMG_VUINT8 __attribute__ ((vector_size(NUMWRELEM))); //VEC UNSIGNED BYTES typedef short IMG_VINT16 __attribute__ ((vector_size(NUMWRELEM))); //VEC SIGNED HALF-WORD #define LOAD_UNPACK_VEC(pSrc, SrcStride, vi16VecRight, vi16VecLeft) \ { \ IMG_VUINT8 vu8Src; \ IMG_VINT16 vi16Vec0; \ IMG_VINT8 vi8Tmp0; \ /* LOAD INPUT VECTOR */ \ vu8Src = *((IMG_VINT8 *)(pSrc)); \ /* RANGE WARPING TO MAINTAIN 16 BIT PRECISION */ \ vi16Vec0 = __builtin_msa_xori_b(vu8Src, 128); \ /* CALCULATE SIGN EXTENSION */ \ vi8Tmp0 = __builtin_msa_clti_s_b(vi16Vec0, 0); \ /* INTERLEAVE RIGHT TO 16 BIT VEC */ \ vi16VecRight = __builtin_msa_ilvr_b(vi8Tmp0, vi16Vec0); \ /* INTERLEAVE LEFT TO 16 BIT VEC */ \ vi16VecLeft = __builtin_msa_ilvl_b(vi8Tmp0, vi16Vec0); \ pSrc += SrcStride; \ } void vert_filter_8taps_16width_msa(unsigned char *pSrc, // SOURCE POINTER int SrcStride, // SOURCE BUFFER PITCH unsigned char *pDst, // DEST POINTER int DstStride, // DEST BUFFER PITH char *pFilter, // POINTER TO FILTER BANK int Height) // HEIGHT OF THE BLOCK { int u32LoopCnt; VINT16 vi16Vec0Right, vi16Vec1Right, vi16Vec2Right, vi16Vec3Right; VINT16 vi16Vec4Right, vi16Vec5Right, vi16Vec6Right, vi16Vec7Right; VINT16 vi16Vec0Left, vi16Vec1Left, vi16Vec2Left, vi16Vec3Left; VINT16 vi16Vec4Left, vi16Vec5Left, vi16Vec6Left, vi16Vec7Left; VINT16 vi16Temp1Right, vi16Temp1Left; VINT16 vi16Filt0, vi16Filt1, vi16Filt2, vi16Filt3; VINT16 vi16Filt4, vi16Filt5, vi16Filt6, vi16Filt7; pSrc -= (3 * SrcStride); // PREPARE FILTER COEFF IN VEC REGISTERS vi16Filt0 = __builtin_msa_fill_h(*(pFilter)); vi16Filt1 = __builtin_msa_fill_h(*(pFilter + 1)); vi16Filt2 = __builtin_msa_fill_h(*(pFilter + 2)); vi16Filt3 = __builtin_msa_fill_h(*(pFilter + 3)); vi16Filt4 = __builtin_msa_fill_h(*(pFilter + 4)); vi16Filt5 = __builtin_msa_fill_h(*(pFilter + 5)); vi16Filt6 = __builtin_msa_fill_h(*(pFilter + 6)); vi16Filt7 = __builtin_msa_fill_h(*(pFilter + 7)); //LOAD 7 INPUT VECTORS LOAD_UNPACK_VEC(pSrc, SrcStride, vi16Vec0Right, vi16Vec0Left) LOAD_UNPACK_VEC(pSrc, SrcStride, vi16Vec1Right, vi16Vec1Left) LOAD_UNPACK_VEC(pSrc, SrcStride, vi16Vec2Right, vi16Vec2Left) LOAD_UNPACK_VEC(pSrc, SrcStride, vi16Vec3Right, vi16Vec3Left) LOAD_UNPACK_VEC(pSrc, SrcStride, vi16Vec4Right, vi16Vec4Left) LOAD_UNPACK_VEC(pSrc, SrcStride, vi16Vec5Right, vi16Vec5Left) LOAD_UNPACK_VEC(pSrc, SrcStride, vi16Vec6Right, vi16Vec6Left) // START CONVOLUTION VERTICALLY for (u32LoopCnt = Height; u32LoopCnt--; ) { //LOAD 8TH INPUT VECTOR LOAD_UNPACK_VEC(pSrc, SrcStride, vi16Vec7Right, vi16Vec7Left) /* FILTER CALC */ IMG_VINT16 vi16Tmp1, vi16Tmp2; IMG_VINT8 vi8Tmp3; // 8 TAP VECTORIZED CONVOLUTION FOR RIGHT HALF vi16Tmp1 = (vi16Vec0Right * vi16Filt0); vi16Tmp1 += (vi16Vec1Right * vi16Filt1); vi16Tmp1 += (vi16Vec2Right * vi16Filt2); vi16Tmp1 += (vi16Vec3Right * vi16Filt3); vi16Tmp2 = (vi16Vec4Right * vi16Filt4); vi16Tmp2 += (vi16Vec5Right * vi16Filt5); vi16Tmp2 += (vi16Vec6Right * vi16Filt6); vi16Tmp2 += (vi16Vec7Right * vi16Filt7); vi16Temp1Right = __builtin_msa_adds_s_h(vi16Tmp1, vi16Tmp2); // 8 TAP VECTORIZED CONVOLUTION FOR LEFT HALF vi16Tmp1 = (vi16Vec0Left * vi16Filt0); vi16Tmp1 += (vi16Vec1Left * vi16Filt1); vi16Tmp1 += (vi16Vec2Left * vi16Filt2); vi16Tmp1 += (vi16Vec3Left * vi16Filt3); vi16Tmp2 = (vi16Vec4Left * vi16Filt4); vi16Tmp2 += (vi16Vec5Left * vi16Filt5); vi16Tmp2 += (vi16Vec6Left * vi16Filt6); vi16Tmp2 += (vi16Vec7Left * vi16Filt7); vi16Temp1Left = __builtin_msa_adds_s_h(vi16Tmp1, vi16Tmp2); // ROUNDING RIGHT SHIFT RANGE CLIPPING AND NARROWING vi16Temp1Right = __builtin_msa_srari_h(vi16Temp1Right, 7); vi16Temp1Right = __builtin_msa_sat_s_h(vi16Temp1Right, 7); vi16Temp1Left = __builtin_msa_srari_h(vi16Temp1Left, 7); vi16Temp1Left = __builtin_msa_sat_s_h(vi16Temp1Left, 7); vi8Tmp3 = __builtin_msa_pckev_b(vi16Temp1Left, vi16Temp1Right); vi8Tmp3 = __builtin_msa_xori_b(vi8Tmp3, 128); // STORE OUTPUT VEC *((IMG_VINT8 *)(pDst)) = (vi8Tmp3); pDst += DstStride; // PREPARING FOR NEXT CONVOLUTION- SLIDING WINDOW vi16Vec0Right = vi16Vec1Right; vi16Vec1Right = vi16Vec2Right; vi16Vec2Right = vi16Vec3Right; vi16Vec3Right = vi16Vec4Right; vi16Vec4Right = vi16Vec5Right; vi16Vec5Right = vi16Vec6Right; vi16Vec6Right = vi16Vec7Right; vi16Vec0Left = vi16Vec1Left; vi16Vec1Left = vi16Vec2Left; vi16Vec2Left = vi16Vec3Left; vi16Vec3Left = vi16Vec4Left; vi16Vec4Left = vi16Vec5Left; vi16Vec5Left = vi16Vec6Left; vi16Vec6Left = vi16Vec7Left; } }
Initially, I had thoughts to write the simplest application (synthetic test) to assess the performance gain when using MIPS SIMD. But despite its attractiveness, this option is not indicative, due to its isolation from the user's real tasks. Fortunately, the employees of Imagination Technologies and MIPS have made a significant contribution to ffmpeg [ L18 ] - a widely used open source application designed to convert audio and video [ L19 ]. I believe that they, like no one else, know how to properly use the technology in question, which means this code should be as efficient as possible.
Thus, if we compile ffmpeg in two versions: with and without MIPS SIMD support, we can compare the speed of work on the same input data and draw some conclusion about the effectiveness of vector calculations based on the results.
Runs on an x86 machine running Linux. Development tools from Imagination Technologies [ L20 ] are used in cross-compilation mode. Tests are performed on the latest stable release at the time of writing this article - ffmpeg 3.3 [ L19 ].
Configuration ffmpeg for the version with support for MIPS SIMD:
./configure --enable-cross-compile --prefix=../ffmpeg-msa --cross-prefix=/home/stas/mipsfpga/toolchain/mips-mti-linux-gnu/2016.05-03/bin/mips-mti-linux-gnu- --arch=mips --cpu=p5600 --target-os=linux --extra-cflags="-EL -static" --extra-ldflags="-EL -static" --disable-iconv
And with disabled MIPS SIMD support:
./configure --enable-cross-compile --prefix=../ffmpeg-soft --cross-prefix=/home/stas/mipsfpga/toolchain/mips-mti-linux-gnu/2016.05-03/bin/mips-mti-linux-gnu- --arch=mips --cpu=p5600 --target-os=linux --extra-cflags="-EL -static" --extra-ldflags="-EL -static" --disable-iconv --disable-msa
Parameter | Description |
---|---|
--enable-cross-compile | assembly will be carried out on a machine with an excellent architecture |
--prefix = .. / ffmpeg-msa | the directory in which files will be placed after the make install command |
--cross-prefix = .. / mips-mti-linux-gnu- | toolchain path |
--arch = mips | target architecture - MIPS |
--cpu = p5600 | target processor core - p5600 |
--target-os = linux | target OS - Linux |
- extra-cflags = "- EL -static" | target system - little endian, use static binding |
- extra-ldflags = "- EL -static" | similarly |
--disable-iconv | disable text encoding functionality |
--disable-msa | do not use MIPS SIMD |
If you plan to repeat these steps, then note that the ffmpeg 3.3 build with MIPS SIMD support falls out with a minor error, which you need to add to the libavcodec \ mips \ hevcpred_msa.c file to add:
#include "libavcodec/hevcdec.h"
It is executed on the Baikal-T1 processor:
# uname -a Linux baikal-BFK-18446744073709551615 4.4.41-bfk #0 SMP Tue Apr 25 15:54:24 MSK 2017 mips GNU/Linux
The input data are two videos encoded using x264 [ L21 ] and x265 [ L22 ]. The test task is to decode the video with getting screenshots at regular intervals:
./ffmpeg-mips/ffmpeg-msa/bin/ffmpeg -i ./The\ Simpsons\ Movie\ -\ Trailer_x264.mp4 -vf fps=1/10 ./out_img/ffmpeg-msa_x264_%d.jpg -report -benchmark ./ffmpeg-mips/ffmpeg-msa/bin/ffmpeg -i ./Tears_400_x265.mp4 -vf fps=1 ./out_img/ffmpeg-msa_x265_%d.jpg -report -benchmark ./ffmpeg-mips/ffmpeg-soft/bin/ffmpeg -i ./The\ Simpsons\ Movie\ -\ Trailer_x264.mp4 -vf fps=1/10 ./out_img/ffmpeg-soft_x264_%d.jpg -report -benchmark ./ffmpeg-mips/ffmpeg-soft/bin/ffmpeg -i ./Tears_400_x265.mp4 -vf fps=1 ./out_img/ffmpeg-soft_x265_%d.jpg -report -benchmark
Parameter | Description |
---|---|
-i ./Tears_400_x265.mp4 | file to be processed |
-vf fps = 1 | period (frequency) of taking screenshots (1 sec - for a short video, 10 sec - for a long one) |
./out_img/ffmpeg-soft x264 % d.jpg | output file name pattern |
-report | generate a report on the results of work |
-benchmark | include performance data |
Scenario | Duration (sec) |
---|---|
x264 decoding with MIPS SIMD support | 113 |
x265 MIPS SIMD | 22 |
x264 MIPS SIMD | 164 |
x265 MIPS SIMD | 52 |
, MIPS SIMD 1.5 — 2.4 .
ffmpeg github [ L23 ].
ffmpeg started on 2010-10-18 at 00:30:26 Report written to "ffmpeg-20101018-003026.log" Command line: ./ffmpeg-mips/ffmpeg-msa/bin/ffmpeg -i "./The Simpsons Movie - Trailer_x264.mp4" -vf "fps=1/10" "./out_img/ffmpeg-msa_x264_%d.jpg" -report -benchmark ffmpeg version 3.3 Copyright (c) 2000-2017 the FFmpeg developers built with gcc 4.9.2 (Codescape GNU Tools 2016.05-03 for MIPS MTI Linux) configuration: --enable-cross-compile --prefix=../ffmpeg-msa --cross-prefix=/home/stas/mipsfpga/toolchain/mips-mti-linux-gnu/2016.05-03/bin/mips-mti-linux-gnu- --arch=mips --cpu=p5600 --target-os=linux --extra-cflags='-EL -static' --extra-ldflags='-EL -static' --disable-iconv libavutil 55. 58.100 / 55. 58.100 libavcodec 57. 89.100 / 57. 89.100 libavformat 57. 71.100 / 57. 71.100 libavdevice 57. 6.100 / 57. 6.100 libavfilter 6. 82.100 / 6. 82.100 libswscale 4. 6.100 / 4. 6.100 libswresample 2. 7.100 / 2. 7.100 Splitting the commandline. Reading option '-i' ... matched as input url with argument './The Simpsons Movie - Trailer_x264.mp4'. Reading option '-vf' ... matched as option 'vf' (set video filters) with argument 'fps=1/10'. Reading option './out_img/ffmpeg-msa_x264_%d.jpg' ... matched as output url. Reading option '-report' ... matched as option 'report' (generate a report) with argument '1'. Reading option '-benchmark' ... matched as option 'benchmark' (add timings for benchmarking) with argument '1'. Finished splitting the commandline. Parsing a group of options: global . Applying option report (generate a report) with argument 1. Applying option benchmark (add timings for benchmarking) with argument 1. Successfully parsed a group of options. Parsing a group of options: input url ./The Simpsons Movie - Trailer_x264.mp4. Successfully parsed a group of options. Opening an input file: ./The Simpsons Movie - Trailer_x264.mp4. [file @ 0x1fce0e0] Setting default whitelist 'file,crypto' [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1fcd980] Format mov,mp4,m4a,3gp,3g2,mj2 probed with size=2048 and score=100 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1fcd980] ISO: File Type Major Brand: isom [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1fcd980] Unknown dref type 0x206c7275 size 12 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1fcd980] Unknown dref type 0x206c7275 size 12 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1fcd980] Before avformat_find_stream_info() pos: 73516232 bytes read:65587 seeks:1 nb_streams:2 [h264 @ 0x1fcecb0] nal_unit_type: 7, nal_ref_idc: 3 [h264 @ 0x1fcecb0] nal_unit_type: 8, nal_ref_idc: 3 [h264 @ 0x1fcecb0] nal_unit_type: 6, nal_ref_idc: 0 [h264 @ 0x1fcecb0] nal_unit_type: 5, nal_ref_idc: 3 [h264 @ 0x1fcecb0] user data:"x264 - core 54 svn-620M - H.264/MPEG-4 AVC codec - Copyleft 2005 - http://www.videolan.org/x264.html - options: cabac=1 ref=5 deblock=1:0:0 analyse=0x1:0x131 me=umh subme=6 brdo=1 mixed_ref=0 me_range=16 chroma_me=1 trellis=1 8x8dct=0 cqm=0 deadzone=21,11 chroma_qp_offset=0 threads=1 nr=0 decimate=1 mbaff=0 bframes=1 b_pyramid=0 b_adapt=1 b_bias=0 direct=3 wpredb=0 bime=0 keyint=250 keyint_min=25 scenecut=40 rc=2pass bitrate=4214 ratetol=1.0 rceq='blurCplx^(1-qComp)' qcomp=0.60 qpmin=10 qpmax=51 qpstep=4 cplxblur=20.0 qblur=0.5 ip_ratio=1.40 pb_ratio=1.30" [h264 @ 0x1fcecb0] Reinit context to 1280x544, pix_fmt: yuv420p [h264 @ 0x1fcecb0] no picture [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1fcd980] All info found [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1fcd980] After avformat_find_stream_info() pos: 94845 bytes read:141348 seeks:2 frames:13 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from './The Simpsons Movie - Trailer_x264.mp4': Metadata: major_brand : isom minor_version : 1 compatible_brands: isomavc1 creation_time : 2007-02-19T05:03:04.000000Z Duration: 00:02:17.30, start: 0.000000, bitrate: 4283 kb/s Stream #0:0(und), 12, 1/24000: Video: h264 (Main) (avc1 / 0x31637661), yuv420p, 1280x544, 4221 kb/s, 23.98 fps, 23.98 tbr, 24k tbn, 47.95 tbc (default) Metadata: creation_time : 2007-02-19T05:03:04.000000Z handler_name : GPAC ISO Video Handler Stream #0:1(und), 1, 1/48000: Audio: aac (HE-AAC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 64 kb/s (default) Metadata: creation_time : 2007-02-19T05:03:08.000000Z handler_name : GPAC ISO Audio Handler Successfully opened the file. Parsing a group of options: output url ./out_img/ffmpeg-msa_x264_%d.jpg. Applying option vf (set video filters) with argument fps=1/10. Successfully parsed a group of options. Opening an output file: ./out_img/ffmpeg-msa_x264_%d.jpg. Successfully opened the file. detected 2 logical cores [h264 @ 0x20191e0] nal_unit_type: 7, nal_ref_idc: 3 [h264 @ 0x20191e0] nal_unit_type: 8, nal_ref_idc: 3 Stream mapping: Stream #0:0 -> #0:0 (h264 (native) -> mjpeg (native)) Press [q] to stop, [?] for help cur_dts is invalid (this is harmless if it occurs once at the start per stream) cur_dts is invalid (this is harmless if it occurs once at the start per stream) [h264 @ 0x20191e0] nal_unit_type: 6, nal_ref_idc: 0 [h264 @ 0x20191e0] nal_unit_type: 5, nal_ref_idc: 3 [h264 @ 0x20191e0] user data:"x264 - core 54 svn-620M - H.264/MPEG-4 AVC codec - Copyleft 2005 - http://www.videolan.org/x264.html - options: cabac=1 ref=5 deblock=1:0:0 analyse=0x1:0x131 me=umh subme=6 brdo=1 mixed_ref=0 me_range=16 chroma_me=1 trellis=1 8x8dct=0 cqm=0 deadzone=21,11 chroma_qp_offset=0 threads=1 nr=0 decimate=1 mbaff=0 bframes=1 b_pyramid=0 b_adapt=1 b_bias=0 direct=3 wpredb=0 bime=0 keyint=250 keyint_min=25 scenecut=40 rc=2pass bitrate=4214 ratetol=1.0 rceq='blurCplx^(1-qComp)' qcomp=0.60 qpmin=10 qpmax=51 qpstep=4 cplxblur=20.0 qblur=0.5 ip_ratio=1.40 pb_ratio=1.30" [h264 @ 0x20191e0] Reinit context to 1280x544, pix_fmt: yuv420p [h264 @ 0x20191e0] no picture cur_dts is invalid (this is harmless if it occurs once at the start per stream) [h264 @ 0x2026050] nal_unit_type: 1, nal_ref_idc: 2 [h264 @ 0x2060f00] nal_unit_type: 1, nal_ref_idc: 0 cur_dts is invalid (this is harmless if it occurs once at the start per stream) [h264 @ 0x20191e0] nal_unit_type: 1, nal_ref_idc: 2 [Parsed_fps_0 @ 0x202bf50] Setting 'fps' to value '1/10' [Parsed_fps_0 @ 0x202bf50] fps=1/10 [graph 0 input from stream 0:0 @ 0x202c6f0] Setting 'video_size' to value '1280x544' [graph 0 input from stream 0:0 @ 0x202c6f0] Setting 'pix_fmt' to value '0' [graph 0 input from stream 0:0 @ 0x202c6f0] Setting 'time_base' to value '1/24000' [graph 0 input from stream 0:0 @ 0x202c6f0] Setting 'pixel_aspect' to value '0/1' [graph 0 input from stream 0:0 @ 0x202c6f0] Setting 'sws_param' to value 'flags=2' [graph 0 input from stream 0:0 @ 0x202c6f0] Setting 'frame_rate' to value '24000/1001' [graph 0 input from stream 0:0 @ 0x202c6f0] w:1280 h:544 pixfmt:yuv420p tb:1/24000 fr:24000/1001 sar:0/1 sws_param:flags=2 [format @ 0x202c030] compat: called with args=[yuvj420p|yuvj422p|yuvj444p] [format @ 0x202c030] Setting 'pix_fmts' to value 'yuvj420p|yuvj422p|yuvj444p' [auto_scaler_0 @ 0x202c660] Setting 'flags' to value 'bicubic' [auto_scaler_0 @ 0x202c660] w:iw h:ih flags:'bicubic' interl:0 [format @ 0x202c030] auto-inserting filter 'auto_scaler_0' between the filter 'Parsed_fps_0' and the filter 'format' [AVFilterGraph @ 0x202bb80] query_formats: 4 queried, 2 merged, 1 already done, 0 delayed [auto_scaler_0 @ 0x202c660] picking yuvj420p out of 3 ref:yuv420p alpha:0 [swscaler @ 0x202cf00] deprecated pixel format used, make sure you did set range correctly [auto_scaler_0 @ 0x202c660] w:1280 h:544 fmt:yuv420p sar:0/1 -> w:1280 h:544 fmt:yuvj420p sar:0/1 flags:0x4 [mjpeg @ 0x1ff5f90] Forcing thread count to 1 for MJPEG encoding, use -thread_type slice or a constant quantizer if you want to use multiple cpu cores [mjpeg @ 0x1ff5f90] intra_quant_bias = 96 inter_quant_bias = 0 Output #0, image2, to './out_img/ffmpeg-msa_x264_%d.jpg': Metadata: major_brand : isom minor_version : 1 compatible_brands: isomavc1 encoder : Lavf57.71.100 Stream #0:0(und), 0, 10/1: Video: mjpeg, yuvj420p(pc), 1280x544, q=2-31, 200 kb/s, 0.10 fps, 0.10 tbn, 0.10 tbc (default) Metadata: creation_time : 2007-02-19T05:03:04.000000Z handler_name : GPAC ISO Video Handler encoder : Lavc57.89.100 mjpeg Side data: cpb: bitrate max/min/avg: 0/0/200000 buffer size: 0 vbv_delay: -1 cur_dts is invalid (this is harmless if it occurs once at the start per stream) [Parsed_fps_0 @ 0x202bf50] Dropping 1 frame(s). cur_dts is invalid (this is harmless if it occurs once at the start per stream) ... [h264 @ 0x2060f00] nal_unit_type: 1, nal_ref_idc: 2 [Parsed_fps_0 @ 0x202bf50] Dropping 1 frame(s). [Parsed_fps_0 @ 0x202bf50] Dropping 1 frame(s). [Parsed_fps_0 @ 0x202bf50] Dropping 1 frame(s). [file @ 0x209dc40] Setting default whitelist 'file,crypto' [AVIOContext @ 0x2189ab0] Statistics: 0 seeks, 1 writeouts frame= 15 fps=0.2 q=1.6 size=N/A time=00:02:30.00 bitrate=N/A speed=2.03x No more output streams to write to, finishing. frame= 15 fps=0.2 q=1.6 Lsize=N/A time=00:02:30.00 bitrate=N/A speed=2.03x video:1382kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown Input file #0 (./The Simpsons Movie - Trailer_x264.mp4): Input stream #0:0 (video): 3288 packets read (72364468 bytes); 3288 frames decoded; Input stream #0:1 (audio): 1 packets read (134 bytes); Total: 3289 packets (72364602 bytes) demuxed Output file #0 (./out_img/ffmpeg-msa_x264_%d.jpg): Output stream #0:0 (video): 15 frames encoded; 15 packets muxed (1414925 bytes); Total: 15 packets (1414925 bytes) muxed bench: utime=113.070s 3288 frames successfully decoded, 0 decoding errors bench: maxrss=39264kB [Parsed_fps_0 @ 0x202bf50] 3288 frames in, 15 frames out; 3273 frames dropped, 0 frames duplicated. [AVIOContext @ 0x1fd6230] Statistics: 73517562 bytes read, 5 seeks
ffmpeg started on 2010-10-18 at 00:27:58 Report written to "ffmpeg-20101018-002758.log" Command line: ./ffmpeg-mips/ffmpeg-msa/bin/ffmpeg -i ./Tears_400_x265.mp4 -vf "fps=1" "./out_img/ffmpeg-msa_x265_%d.jpg" -report -benchmark ffmpeg version 3.3 Copyright (c) 2000-2017 the FFmpeg developers built with gcc 4.9.2 (Codescape GNU Tools 2016.05-03 for MIPS MTI Linux) configuration: --enable-cross-compile --prefix=../ffmpeg-msa --cross-prefix=/home/stas/mipsfpga/toolchain/mips-mti-linux-gnu/2016.05-03/bin/mips-mti-linux-gnu- --arch=mips --cpu=p5600 --target-os=linux --extra-cflags='-EL -static' --extra-ldflags='-EL -static' --disable-iconv libavutil 55. 58.100 / 55. 58.100 libavcodec 57. 89.100 / 57. 89.100 libavformat 57. 71.100 / 57. 71.100 libavdevice 57. 6.100 / 57. 6.100 libavfilter 6. 82.100 / 6. 82.100 libswscale 4. 6.100 / 4. 6.100 libswresample 2. 7.100 / 2. 7.100 Splitting the commandline. Reading option '-i' ... matched as input url with argument './Tears_400_x265.mp4'. Reading option '-vf' ... matched as option 'vf' (set video filters) with argument 'fps=1'. Reading option './out_img/ffmpeg-msa_x265_%d.jpg' ... matched as output url. Reading option '-report' ... matched as option 'report' (generate a report) with argument '1'. Reading option '-benchmark' ... matched as option 'benchmark' (add timings for benchmarking) with argument '1'. Finished splitting the commandline. Parsing a group of options: global . Applying option report (generate a report) with argument 1. Applying option benchmark (add timings for benchmarking) with argument 1. Successfully parsed a group of options. Parsing a group of options: input url ./Tears_400_x265.mp4. Successfully parsed a group of options. Opening an input file: ./Tears_400_x265.mp4. [file @ 0x1fce0e0] Setting default whitelist 'file,crypto' [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1fcd980] Format mov,mp4,m4a,3gp,3g2,mj2 probed with size=2048 and score=100 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1fcd980] ISO: File Type Major Brand: iso4 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1fcd980] Unknown dref type 0x206c7275 size 12 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1fcd980] Before avformat_find_stream_info() pos: 705972 bytes read:32827 seeks:1 nb_streams:1 [hevc @ 0x1fceca0] nal_unit_type: 32(VPS), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1fceca0] Decoding VPS [hevc @ 0x1fceca0] Main profile bitstream [hevc @ 0x1fceca0] nal_unit_type: 33(SPS), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1fceca0] Decoding SPS [hevc @ 0x1fceca0] Main profile bitstream [hevc @ 0x1fceca0] Decoding VUI [hevc @ 0x1fceca0] nal_unit_type: 34(PPS), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1fceca0] Decoding PPS [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1fcd980] All info found [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1fcd980] After avformat_find_stream_info() pos: 20299 bytes read:65595 seeks:2 frames:1 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from './Tears_400_x265.mp4': Metadata: major_brand : iso4 minor_version : 1 compatible_brands: iso4hvc1 creation_time : 2014-08-25T18:10:46.000000Z Duration: 00:00:13.96, start: 0.125000, bitrate: 404 kb/s Stream #0:0(und), 1, 1/24000: Video: hevc (Main) (hvc1 / 0x31637668), yuv420p(tv), 1920x800, 402 kb/s, 24 fps, 24 tbr, 24k tbn, 24 tbc (default) Metadata: creation_time : 2014-08-25T18:10:46.000000Z handler_name : hevc:fps=24@GPAC0.5.1-DEV-rev4807 Successfully opened the file. Parsing a group of options: output url ./out_img/ffmpeg-msa_x265_%d.jpg. Applying option vf (set video filters) with argument fps=1. Successfully parsed a group of options. Opening an output file: ./out_img/ffmpeg-msa_x265_%d.jpg. Successfully opened the file. detected 2 logical cores [hevc @ 0x1fe5a00] nal_unit_type: 32(VPS), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1fe5a00] Decoding VPS [hevc @ 0x1fe5a00] Main profile bitstream [hevc @ 0x1fe5a00] nal_unit_type: 33(SPS), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1fe5a00] Decoding SPS [hevc @ 0x1fe5a00] Main profile bitstream [hevc @ 0x1fe5a00] Decoding VUI [hevc @ 0x1fe5a00] nal_unit_type: 34(PPS), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1fe5a00] Decoding PPS Stream mapping: Stream #0:0 -> #0:0 (hevc (native) -> mjpeg (native)) Press [q] to stop, [?] for help cur_dts is invalid (this is harmless if it occurs once at the start per stream) cur_dts is invalid (this is harmless if it occurs once at the start per stream) [hevc @ 0x1fe5a00] nal_unit_type: 39(SEI_PREFIX), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1fe5a00] nal_unit_type: 39(SEI_PREFIX), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1fe5a00] nal_unit_type: 19(IDR_W_RADL), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1fe5a00] Decoding SEI [hevc @ 0x1fe5a00] Skipped PREFIX SEI 5 [hevc @ 0x1fe5a00] Decoding SEI [hevc @ 0x1fe5a00] Skipped PREFIX SEI 6 cur_dts is invalid (this is harmless if it occurs once at the start per stream) [hevc @ 0x1ffeba0] nal_unit_type: 1(TRAIL_R), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x200c4d0] nal_unit_type: 1(TRAIL_R), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x200c4d0] Output frame with POC 0. [hevc @ 0x1fe5a00] Decoded frame with POC 0. cur_dts is invalid (this is harmless if it occurs once at the start per stream) [hevc @ 0x1fe5a00] nal_unit_type: 0(TRAIL_N), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1fe5a00] Output frame with POC 1. [hevc @ 0x1ffeba0] Decoded frame with POC 5. cur_dts is invalid (this is harmless if it occurs once at the start per stream) [hevc @ 0x1ffeba0] nal_unit_type: 0(TRAIL_N), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1ffeba0] Output frame with POC 2. [hevc @ 0x200c4d0] Decoded frame with POC 3. [Parsed_fps_0 @ 0x201a6c0] Setting 'fps' to value '1' [Parsed_fps_0 @ 0x201a6c0] fps=1/1 [graph 0 input from stream 0:0 @ 0x201abb0] Setting 'video_size' to value '1920x800' [graph 0 input from stream 0:0 @ 0x201abb0] Setting 'pix_fmt' to value '0' [graph 0 input from stream 0:0 @ 0x201abb0] Setting 'time_base' to value '1/24000' [graph 0 input from stream 0:0 @ 0x201abb0] Setting 'pixel_aspect' to value '0/1' [graph 0 input from stream 0:0 @ 0x201abb0] Setting 'sws_param' to value 'flags=2' [graph 0 input from stream 0:0 @ 0x201abb0] Setting 'frame_rate' to value '24/1' [graph 0 input from stream 0:0 @ 0x201abb0] w:1920 h:800 pixfmt:yuv420p tb:1/24000 fr:24/1 sar:0/1 sws_param:flags=2 [format @ 0x201aad0] compat: called with args=[yuvj420p|yuvj422p|yuvj444p] [format @ 0x201aad0] Setting 'pix_fmts' to value 'yuvj420p|yuvj422p|yuvj444p' [auto_scaler_0 @ 0x201a350] Setting 'flags' to value 'bicubic' [auto_scaler_0 @ 0x201a350] w:iw h:ih flags:'bicubic' interl:0 [format @ 0x201aad0] auto-inserting filter 'auto_scaler_0' between the filter 'Parsed_fps_0' and the filter 'format' [AVFilterGraph @ 0x201a2f0] query_formats: 4 queried, 2 merged, 1 already done, 0 delayed [auto_scaler_0 @ 0x201a350] picking yuvj420p out of 3 ref:yuv420p alpha:0 [swscaler @ 0x21d7da0] deprecated pixel format used, make sure you did set range correctly [auto_scaler_0 @ 0x201a350] w:1920 h:800 fmt:yuv420p sar:0/1 -> w:1920 h:800 fmt:yuvj420p sar:0/1 flags:0x4 [mjpeg @ 0x1fe2ba0] Forcing thread count to 1 for MJPEG encoding, use -thread_type slice or a constant quantizer if you want to use multiple cpu cores [mjpeg @ 0x1fe2ba0] intra_quant_bias = 96 inter_quant_bias = 0 Output #0, image2, to './out_img/ffmpeg-msa_x265_%d.jpg': Metadata: major_brand : iso4 minor_version : 1 compatible_brands: iso4hvc1 encoder : Lavf57.71.100 Stream #0:0(und), 0, 1/1: Video: mjpeg, yuvj420p(pc), 1920x800, q=2-31, 200 kb/s, 1 fps, 1 tbn, 1 tbc (default) Metadata: creation_time : 2014-08-25T18:10:46.000000Z handler_name : hevc:fps=24@GPAC0.5.1-DEV-rev4807 encoder : Lavc57.89.100 mjpeg Side data: cpb: bitrate max/min/avg: 0/0/200000 buffer size: 0 vbv_delay: -1 cur_dts is invalid (this is harmless if it occurs once at the start per stream) ... No more output streams to write to, finishing. frame= 15 fps=0.7 q=24.8 Lsize=N/A time=00:00:15.00 bitrate=N/A speed=0.668x video:1084kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown Input file #0 (./Tears_400_x265.mp4): Input stream #0:0 (video): 335 packets read (701773 bytes); 335 frames decoded; Total: 335 packets (701773 bytes) demuxed Output file #0 (./out_img/ffmpeg-msa_x265_%d.jpg): Output stream #0:0 (video): 15 frames encoded; 15 packets muxed (1109604 bytes); Total: 15 packets (1109604 bytes) muxed bench: utime=22.300s 335 frames successfully decoded, 0 decoding errors bench: maxrss=72432kB [Parsed_fps_0 @ 0x201a6c0] 335 frames in, 15 frames out; 320 frames dropped, 0 frames duplicated. [AVIOContext @ 0x1fd6220] Statistics: 734659 bytes read, 2 seeks
ffmpeg started on 2010-10-18 at 00:28:31 Report written to "ffmpeg-20101018-002831.log" Command line: ./ffmpeg-mips/ffmpeg-soft/bin/ffmpeg -i "./The Simpsons Movie - Trailer_x264.mp4" -vf "fps=1/10" "./out_img/ffmpeg-soft_x264_%d.jpg" -report -benchmark ffmpeg version 3.3 Copyright (c) 2000-2017 the FFmpeg developers built with gcc 4.9.2 (Codescape GNU Tools 2016.05-03 for MIPS MTI Linux) configuration: --enable-cross-compile --prefix=../ffmpeg-soft --cross-prefix=/home/stas/mipsfpga/toolchain/mips-mti-linux-gnu/2016.05-03/bin/mips-mti-linux-gnu- --arch=mips --cpu=p5600 --target-os=linux --extra-cflags='-EL -static' --extra-ldflags='-EL -static' --disable-iconv --disable-msa libavutil 55. 58.100 / 55. 58.100 libavcodec 57. 89.100 / 57. 89.100 libavformat 57. 71.100 / 57. 71.100 libavdevice 57. 6.100 / 57. 6.100 libavfilter 6. 82.100 / 6. 82.100 libswscale 4. 6.100 / 4. 6.100 libswresample 2. 7.100 / 2. 7.100 Splitting the commandline. Reading option '-i' ... matched as input url with argument './The Simpsons Movie - Trailer_x264.mp4'. Reading option '-vf' ... matched as option 'vf' (set video filters) with argument 'fps=1/10'. Reading option './out_img/ffmpeg-soft_x264_%d.jpg' ... matched as output url. Reading option '-report' ... matched as option 'report' (generate a report) with argument '1'. Reading option '-benchmark' ... matched as option 'benchmark' (add timings for benchmarking) with argument '1'. Finished splitting the commandline. Parsing a group of options: global . Applying option report (generate a report) with argument 1. Applying option benchmark (add timings for benchmarking) with argument 1. Successfully parsed a group of options. Parsing a group of options: input url ./The Simpsons Movie - Trailer_x264.mp4. Successfully parsed a group of options. Opening an input file: ./The Simpsons Movie - Trailer_x264.mp4. [file @ 0x1f4a0e0] Setting default whitelist 'file,crypto' [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1f49980] Format mov,mp4,m4a,3gp,3g2,mj2 probed with size=2048 and score=100 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1f49980] ISO: File Type Major Brand: isom [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1f49980] Unknown dref type 0x206c7275 size 12 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1f49980] Unknown dref type 0x206c7275 size 12 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1f49980] Before avformat_find_stream_info() pos: 73516232 bytes read:65587 seeks:1 nb_streams:2 [h264 @ 0x1f4acb0] nal_unit_type: 7, nal_ref_idc: 3 [h264 @ 0x1f4acb0] nal_unit_type: 8, nal_ref_idc: 3 [h264 @ 0x1f4acb0] nal_unit_type: 6, nal_ref_idc: 0 [h264 @ 0x1f4acb0] nal_unit_type: 5, nal_ref_idc: 3 [h264 @ 0x1f4acb0] user data:"x264 - core 54 svn-620M - H.264/MPEG-4 AVC codec - Copyleft 2005 - http://www.videolan.org/x264.html - options: cabac=1 ref=5 deblock=1:0:0 analyse=0x1:0x131 me=umh subme=6 brdo=1 mixed_ref=0 me_range=16 chroma_me=1 trellis=1 8x8dct=0 cqm=0 deadzone=21,11 chroma_qp_offset=0 threads=1 nr=0 decimate=1 mbaff=0 bframes=1 b_pyramid=0 b_adapt=1 b_bias=0 direct=3 wpredb=0 bime=0 keyint=250 keyint_min=25 scenecut=40 rc=2pass bitrate=4214 ratetol=1.0 rceq='blurCplx^(1-qComp)' qcomp=0.60 qpmin=10 qpmax=51 qpstep=4 cplxblur=20.0 qblur=0.5 ip_ratio=1.40 pb_ratio=1.30" [h264 @ 0x1f4acb0] Reinit context to 1280x544, pix_fmt: yuv420p [h264 @ 0x1f4acb0] no picture [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1f49980] All info found [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1f49980] After avformat_find_stream_info() pos: 94845 bytes read:141348 seeks:2 frames:13 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from './The Simpsons Movie - Trailer_x264.mp4': Metadata: major_brand : isom minor_version : 1 compatible_brands: isomavc1 creation_time : 2007-02-19T05:03:04.000000Z Duration: 00:02:17.30, start: 0.000000, bitrate: 4283 kb/s Stream #0:0(und), 12, 1/24000: Video: h264 (Main) (avc1 / 0x31637661), yuv420p, 1280x544, 4221 kb/s, 23.98 fps, 23.98 tbr, 24k tbn, 47.95 tbc (default) Metadata: creation_time : 2007-02-19T05:03:04.000000Z handler_name : GPAC ISO Video Handler Stream #0:1(und), 1, 1/48000: Audio: aac (HE-AAC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 64 kb/s (default) Metadata: creation_time : 2007-02-19T05:03:08.000000Z handler_name : GPAC ISO Audio Handler Successfully opened the file. Parsing a group of options: output url ./out_img/ffmpeg-soft_x264_%d.jpg. Applying option vf (set video filters) with argument fps=1/10. Successfully parsed a group of options. Opening an output file: ./out_img/ffmpeg-soft_x264_%d.jpg. Successfully opened the file. detected 2 logical cores [h264 @ 0x1f951e0] nal_unit_type: 7, nal_ref_idc: 3 [h264 @ 0x1f951e0] nal_unit_type: 8, nal_ref_idc: 3 Stream mapping: Stream #0:0 -> #0:0 (h264 (native) -> mjpeg (native)) Press [q] to stop, [?] for help cur_dts is invalid (this is harmless if it occurs once at the start per stream) cur_dts is invalid (this is harmless if it occurs once at the start per stream) [h264 @ 0x1f951e0] nal_unit_type: 6, nal_ref_idc: 0 [h264 @ 0x1f951e0] nal_unit_type: 5, nal_ref_idc: 3 [h264 @ 0x1f951e0] user data:"x264 - core 54 svn-620M - H.264/MPEG-4 AVC codec - Copyleft 2005 - http://www.videolan.org/x264.html - options: cabac=1 ref=5 deblock=1:0:0 analyse=0x1:0x131 me=umh subme=6 brdo=1 mixed_ref=0 me_range=16 chroma_me=1 trellis=1 8x8dct=0 cqm=0 deadzone=21,11 chroma_qp_offset=0 threads=1 nr=0 decimate=1 mbaff=0 bframes=1 b_pyramid=0 b_adapt=1 b_bias=0 direct=3 wpredb=0 bime=0 keyint=250 keyint_min=25 scenecut=40 rc=2pass bitrate=4214 ratetol=1.0 rceq='blurCplx^(1-qComp)' qcomp=0.60 qpmin=10 qpmax=51 qpstep=4 cplxblur=20.0 qblur=0.5 ip_ratio=1.40 pb_ratio=1.30" [h264 @ 0x1f951e0] Reinit context to 1280x544, pix_fmt: yuv420p [h264 @ 0x1f951e0] no picture cur_dts is invalid (this is harmless if it occurs once at the start per stream) [h264 @ 0x1fa2050] nal_unit_type: 1, nal_ref_idc: 2 [h264 @ 0x1fdcf00] nal_unit_type: 1, nal_ref_idc: 0 cur_dts is invalid (this is harmless if it occurs once at the start per stream) [h264 @ 0x1f951e0] nal_unit_type: 1, nal_ref_idc: 2 [Parsed_fps_0 @ 0x1fa7f50] Setting 'fps' to value '1/10' [Parsed_fps_0 @ 0x1fa7f50] fps=1/10 [graph 0 input from stream 0:0 @ 0x1fa86f0] Setting 'video_size' to value '1280x544' [graph 0 input from stream 0:0 @ 0x1fa86f0] Setting 'pix_fmt' to value '0' [graph 0 input from stream 0:0 @ 0x1fa86f0] Setting 'time_base' to value '1/24000' [graph 0 input from stream 0:0 @ 0x1fa86f0] Setting 'pixel_aspect' to value '0/1' [graph 0 input from stream 0:0 @ 0x1fa86f0] Setting 'sws_param' to value 'flags=2' [graph 0 input from stream 0:0 @ 0x1fa86f0] Setting 'frame_rate' to value '24000/1001' [graph 0 input from stream 0:0 @ 0x1fa86f0] w:1280 h:544 pixfmt:yuv420p tb:1/24000 fr:24000/1001 sar:0/1 sws_param:flags=2 [format @ 0x1fa8030] compat: called with args=[yuvj420p|yuvj422p|yuvj444p] [format @ 0x1fa8030] Setting 'pix_fmts' to value 'yuvj420p|yuvj422p|yuvj444p' [auto_scaler_0 @ 0x1fa8660] Setting 'flags' to value 'bicubic' [auto_scaler_0 @ 0x1fa8660] w:iw h:ih flags:'bicubic' interl:0 [format @ 0x1fa8030] auto-inserting filter 'auto_scaler_0' between the filter 'Parsed_fps_0' and the filter 'format' [AVFilterGraph @ 0x1fa7b80] query_formats: 4 queried, 2 merged, 1 already done, 0 delayed [auto_scaler_0 @ 0x1fa8660] picking yuvj420p out of 3 ref:yuv420p alpha:0 [swscaler @ 0x1fa8f00] deprecated pixel format used, make sure you did set range correctly [auto_scaler_0 @ 0x1fa8660] w:1280 h:544 fmt:yuv420p sar:0/1 -> w:1280 h:544 fmt:yuvj420p sar:0/1 flags:0x4 [mjpeg @ 0x1f71f90] Forcing thread count to 1 for MJPEG encoding, use -thread_type slice or a constant quantizer if you want to use multiple cpu cores [mjpeg @ 0x1f71f90] intra_quant_bias = 96 inter_quant_bias = 0 Output #0, image2, to './out_img/ffmpeg-soft_x264_%d.jpg': Metadata: major_brand : isom minor_version : 1 compatible_brands: isomavc1 encoder : Lavf57.71.100 Stream #0:0(und), 0, 10/1: Video: mjpeg, yuvj420p(pc), 1280x544, q=2-31, 200 kb/s, 0.10 fps, 0.10 tbn, 0.10 tbc (default) Metadata: creation_time : 2007-02-19T05:03:04.000000Z handler_name : GPAC ISO Video Handler encoder : Lavc57.89.100 mjpeg Side data: cpb: bitrate max/min/avg: 0/0/200000 buffer size: 0 vbv_delay: -1 cur_dts is invalid (this is harmless if it occurs once at the start per stream) [Parsed_fps_0 @ 0x1fa7f50] Dropping 1 frame(s). [h264 @ 0x1fa2050] nal_unit_type: 1, nal_ref_idc: 0 ... [AVIOContext @ 0x2229af0] Statistics: 0 seeks, 1 writeouts No more output streams to write to, finishing. frame= 15 fps=0.1 q=1.6 Lsize=N/A time=00:02:30.00 bitrate=N/A speed=1.45x video:1382kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown Input file #0 (./The Simpsons Movie - Trailer_x264.mp4): Input stream #0:0 (video): 3288 packets read (72364468 bytes); 3288 frames decoded; Input stream #0:1 (audio): 1 packets read (134 bytes); Total: 3289 packets (72364602 bytes) demuxed Output file #0 (./out_img/ffmpeg-soft_x264_%d.jpg): Output stream #0:0 (video): 15 frames encoded; 15 packets muxed (1414925 bytes); Total: 15 packets (1414925 bytes) muxed bench: utime=164.240s 3288 frames successfully decoded, 0 decoding errors bench: maxrss=39936kB [Parsed_fps_0 @ 0x1fa7f50] 3288 frames in, 15 frames out; 3273 frames dropped, 0 frames duplicated. [AVIOContext @ 0x1f52230] Statistics: 73517562 bytes read, 5 seeks
ffmpeg started on 2010-10-18 at 00:27:14 Report written to "ffmpeg-20101018-002714.log" Command line: ./ffmpeg-mips/ffmpeg-soft/bin/ffmpeg -i ./Tears_400_x265.mp4 -vf "fps=1" "./out_img/ffmpeg-soft_x265_%d.jpg" -report -benchmark ffmpeg version 3.3 Copyright (c) 2000-2017 the FFmpeg developers built with gcc 4.9.2 (Codescape GNU Tools 2016.05-03 for MIPS MTI Linux) configuration: --enable-cross-compile --prefix=../ffmpeg-soft --cross-prefix=/home/stas/mipsfpga/toolchain/mips-mti-linux-gnu/2016.05-03/bin/mips-mti-linux-gnu- --arch=mips --cpu=p5600 --target-os=linux --extra-cflags='-EL -static' --extra-ldflags='-EL -static' --disable-iconv --disable-msa libavutil 55. 58.100 / 55. 58.100 libavcodec 57. 89.100 / 57. 89.100 libavformat 57. 71.100 / 57. 71.100 libavdevice 57. 6.100 / 57. 6.100 libavfilter 6. 82.100 / 6. 82.100 libswscale 4. 6.100 / 4. 6.100 libswresample 2. 7.100 / 2. 7.100 Splitting the commandline. Reading option '-i' ... matched as input url with argument './Tears_400_x265.mp4'. Reading option '-vf' ... matched as option 'vf' (set video filters) with argument 'fps=1'. Reading option './out_img/ffmpeg-soft_x265_%d.jpg' ... matched as output url. Reading option '-report' ... matched as option 'report' (generate a report) with argument '1'. Reading option '-benchmark' ... matched as option 'benchmark' (add timings for benchmarking) with argument '1'. Finished splitting the commandline. Parsing a group of options: global . Applying option report (generate a report) with argument 1. Applying option benchmark (add timings for benchmarking) with argument 1. Successfully parsed a group of options. Parsing a group of options: input url ./Tears_400_x265.mp4. Successfully parsed a group of options. Opening an input file: ./Tears_400_x265.mp4. [file @ 0x1f4a0e0] Setting default whitelist 'file,crypto' [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1f49980] Format mov,mp4,m4a,3gp,3g2,mj2 probed with size=2048 and score=100 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1f49980] ISO: File Type Major Brand: iso4 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1f49980] Unknown dref type 0x206c7275 size 12 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1f49980] Before avformat_find_stream_info() pos: 705972 bytes read:32827 seeks:1 nb_streams:1 [hevc @ 0x1f4aca0] nal_unit_type: 32(VPS), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1f4aca0] Decoding VPS [hevc @ 0x1f4aca0] Main profile bitstream [hevc @ 0x1f4aca0] nal_unit_type: 33(SPS), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1f4aca0] Decoding SPS [hevc @ 0x1f4aca0] Main profile bitstream [hevc @ 0x1f4aca0] Decoding VUI [hevc @ 0x1f4aca0] nal_unit_type: 34(PPS), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1f4aca0] Decoding PPS [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1f49980] All info found [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1f49980] After avformat_find_stream_info() pos: 20299 bytes read:65595 seeks:2 frames:1 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from './Tears_400_x265.mp4': Metadata: major_brand : iso4 minor_version : 1 compatible_brands: iso4hvc1 creation_time : 2014-08-25T18:10:46.000000Z Duration: 00:00:13.96, start: 0.125000, bitrate: 404 kb/s Stream #0:0(und), 1, 1/24000: Video: hevc (Main) (hvc1 / 0x31637668), yuv420p(tv), 1920x800, 402 kb/s, 24 fps, 24 tbr, 24k tbn, 24 tbc (default) Metadata: creation_time : 2014-08-25T18:10:46.000000Z handler_name : hevc:fps=24@GPAC0.5.1-DEV-rev4807 Successfully opened the file. Parsing a group of options: output url ./out_img/ffmpeg-soft_x265_%d.jpg. Applying option vf (set video filters) with argument fps=1. Successfully parsed a group of options. Opening an output file: ./out_img/ffmpeg-soft_x265_%d.jpg. Successfully opened the file. detected 2 logical cores [hevc @ 0x1f61a00] nal_unit_type: 32(VPS), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1f61a00] Decoding VPS [hevc @ 0x1f61a00] Main profile bitstream [hevc @ 0x1f61a00] nal_unit_type: 33(SPS), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1f61a00] Decoding SPS [hevc @ 0x1f61a00] Main profile bitstream [hevc @ 0x1f61a00] Decoding VUI [hevc @ 0x1f61a00] nal_unit_type: 34(PPS), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1f61a00] Decoding PPS Stream mapping: Stream #0:0 -> #0:0 (hevc (native) -> mjpeg (native)) Press [q] to stop, [?] for help cur_dts is invalid (this is harmless if it occurs once at the start per stream) cur_dts is invalid (this is harmless if it occurs once at the start per stream) [hevc @ 0x1f61a00] nal_unit_type: 39(SEI_PREFIX), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1f61a00] nal_unit_type: 39(SEI_PREFIX), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1f61a00] nal_unit_type: 19(IDR_W_RADL), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1f61a00] Decoding SEI [hevc @ 0x1f61a00] Skipped PREFIX SEI 5 [hevc @ 0x1f61a00] Decoding SEI [hevc @ 0x1f61a00] Skipped PREFIX SEI 6 cur_dts is invalid (this is harmless if it occurs once at the start per stream) [hevc @ 0x1f7aba0] nal_unit_type: 1(TRAIL_R), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1f884d0] nal_unit_type: 1(TRAIL_R), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1f884d0] Output frame with POC 0. [hevc @ 0x1f61a00] Decoded frame with POC 0. cur_dts is invalid (this is harmless if it occurs once at the start per stream) [hevc @ 0x1f61a00] nal_unit_type: 0(TRAIL_N), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1f61a00] Output frame with POC 1. [hevc @ 0x1f7aba0] Decoded frame with POC 5. cur_dts is invalid (this is harmless if it occurs once at the start per stream) [hevc @ 0x1f7aba0] nal_unit_type: 0(TRAIL_N), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1f7aba0] Output frame with POC 2. [hevc @ 0x1f884d0] Decoded frame with POC 3. [Parsed_fps_0 @ 0x1f966c0] Setting 'fps' to value '1' [Parsed_fps_0 @ 0x1f966c0] fps=1/1 [graph 0 input from stream 0:0 @ 0x1f96bb0] Setting 'video_size' to value '1920x800' [graph 0 input from stream 0:0 @ 0x1f96bb0] Setting 'pix_fmt' to value '0' [graph 0 input from stream 0:0 @ 0x1f96bb0] Setting 'time_base' to value '1/24000' [graph 0 input from stream 0:0 @ 0x1f96bb0] Setting 'pixel_aspect' to value '0/1' [graph 0 input from stream 0:0 @ 0x1f96bb0] Setting 'sws_param' to value 'flags=2' [graph 0 input from stream 0:0 @ 0x1f96bb0] Setting 'frame_rate' to value '24/1' [graph 0 input from stream 0:0 @ 0x1f96bb0] w:1920 h:800 pixfmt:yuv420p tb:1/24000 fr:24/1 sar:0/1 sws_param:flags=2 [format @ 0x1f96ad0] compat: called with args=[yuvj420p|yuvj422p|yuvj444p] [format @ 0x1f96ad0] Setting 'pix_fmts' to value 'yuvj420p|yuvj422p|yuvj444p' [hevc @ 0x1f61a00] Decoded frame with POC 1. [hevc @ 0x1f7aba0] Decoded frame with POC 2. [auto_scaler_0 @ 0x1f96350] Setting 'flags' to value 'bicubic' [auto_scaler_0 @ 0x1f96350] w:iw h:ih flags:'bicubic' interl:0 [format @ 0x1f96ad0] auto-inserting filter 'auto_scaler_0' between the filter 'Parsed_fps_0' and the filter 'format' [AVFilterGraph @ 0x1f962f0] query_formats: 4 queried, 2 merged, 1 already done, 0 delayed [auto_scaler_0 @ 0x1f96350] picking yuvj420p out of 3 ref:yuv420p alpha:0 [swscaler @ 0x2153da0] deprecated pixel format used, make sure you did set range correctly [auto_scaler_0 @ 0x1f96350] w:1920 h:800 fmt:yuv420p sar:0/1 -> w:1920 h:800 fmt:yuvj420p sar:0/1 flags:0x4 [mjpeg @ 0x1f5eba0] Forcing thread count to 1 for MJPEG encoding, use -thread_type slice or a constant quantizer if you want to use multiple cpu cores [mjpeg @ 0x1f5eba0] intra_quant_bias = 96 inter_quant_bias = 0 Output #0, image2, to './out_img/ffmpeg-soft_x265_%d.jpg': Metadata: major_brand : iso4 minor_version : 1 compatible_brands: iso4hvc1 encoder : Lavf57.71.100 Stream #0:0(und), 0, 1/1: Video: mjpeg, yuvj420p(pc), 1920x800, q=2-31, 200 kb/s, 1 fps, 1 tbn, 1 tbc (default) Metadata: creation_time : 2014-08-25T18:10:46.000000Z handler_name : hevc:fps=24@GPAC0.5.1-DEV-rev4807 encoder : Lavc57.89.100 mjpeg Side data: cpb: bitrate max/min/avg: 0/0/200000 buffer size: 0 vbv_delay: -1 cur_dts is invalid (this is harmless if it occurs once at the start per stream) [hevc @ 0x1f884d0] nal_unit_type: 0(TRAIL_N), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1f884d0] Output frame with POC 3. [Parsed_fps_0 @ 0x1f966c0] Dropping 1 frame(s). frame= 0 fps=0.0 q=0.0 size=N/A time=00:00:00.00 bitrate=N/A speed= 0x cur_dts is invalid (this is harmless if it occurs once at the start per stream) [Parsed_fps_0 @ 0x1f966c0] Dropping 1 frame(s). [hevc @ 0x1f61a00] nal_unit_type: 1(TRAIL_R), nuh_layer_id: 0, temporal_id: 0 cur_dts is invalid (this is harmless if it occurs once at the start per stream) [hevc @ 0x1f61a00] Output frame with POC 4. ... No more output streams to write to, finishing. frame= 15 fps=0.5 q=24.8 Lsize=N/A time=00:00:15.00 bitrate=N/A speed=0.451x video:1084kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown Input file #0 (./Tears_400_x265.mp4): Input stream #0:0 (video): 335 packets read (701773 bytes); 335 frames decoded; Total: 335 packets (701773 bytes) demuxed Output file #0 (./out_img/ffmpeg-soft_x265_%d.jpg): Output stream #0:0 (video): 15 frames encoded; 15 packets muxed (1109604 bytes); Total: 15 packets (1109604 bytes) muxed bench: utime=52.330s 335 frames successfully decoded, 0 decoding errors bench: maxrss=72480kB [Parsed_fps_0 @ 0x1f966c0] 335 frames in, 15 frames out; 320 frames dropped, 0 frames duplicated. [AVIOContext @ 0x1f52220] Statistics: 734659 bytes read, 2 seeks
[L1] — -1 ;
[L2] — MIPSfpga-plus github ;
[L3] — P-Class P5600 Multiprocessor Core ;
[L4] — MIPS, ;
[L5] — Texas Instruments. Digital Signal Processors ;
[L6] — GPU ;
[L7] — OpenCL. ;
[L8] — Wikipedia: ;
[L9] — TFilter. Free online FIR filter design tool ;
[L10] — Wikipedia: ;
[L11] — Wikipedia: - ;
[L12] — Wikipedia: - ;
[L13] — Wikipedia: SIMD ;
[L14] — Wikipedia: ;
[L15] — MIPS SIMD ;
[L16] — GCC: MIPS SIMD Architecture (MSA) Support ;
[L17] — GCC: MIPS SIMD Architecture Built-in Functions ;
[L18] — ffmpeg github ( libavcodec/mips/) ;
[L19] — FFmpeg multimedia framework ;
[L20] — Codescape MIPS SDK ;
[L21] — H.264 Demo Clips ;
[L22] — x256. Sample HEVC Video Files ;
[L23] — ffmpeg
[L24] — ;
[D1] — ., . — ;
[D2] — MIPS Architecture for Programmers Volume IV-j: The MIPS32 SIMD Architecture Module ;
[D3] — MIPS SIMD programming. Optimizing multimedia codecs ;
[P1] — - -1 . (: L1 );
[P2] — TFilter. 1 ();
[P3] — TFilter. 1 ;
[P4] — TFilter. 2 ();
[P5] — TFilter. 2 ;
[P6] — SIMD- (: D3 );
[P7] — MSA Vector registers (: D2 );
[P8] — MADDV Operation description (: D2 );
Source: https://habr.com/ru/post/328566/
All Articles