📜 ⬆️ ⬇️

MIPS SIMD technology and Baikal-T1 processor

Colleagues from Baikal Electronics offered to work with the processor Baikal-T1 [ L1 ] and write about their impressions. For them, this is a way to tell developers about the capabilities and features of their processor. For me, a chance to get a closer look at the system on a modern processor core and in the future to invent fewer “bicycles”, adding, for example, new functionality to the MIPSfpga-plus [ L2 ] project. Well, the usual engineering curiosity, again ...


Today we will discuss the vector expansion of the MIPS SIMD architecture, which is available in the MIPS Warrior P-class P5600 [ L3 ] cores, which means it is also present in the Baikal-T1 processor. The article is aimed at novice developers.



Introduction


Practically always with the development of a device (device / hardware / software and hardware complex, etc.) is associated with the solution of the problem of processing digital and analog signals. Input may include sensor readings, signals from I / O devices, information from a file on a disk, etc. At the output: the image on the monitor, the sound from the speakers, the drive control signals, the indications of the indicators on the dashboard, etc. A "between" input and output - a set of certain mathematical operations.


If we briefly list the ways to implement this "mathematics in hardware", we get the following list of tools available to the developer, which can be applied both together and separately:



The same list, but in more detail
  • implementation as an analog circuit
    Despite the dominance of digital devices, the world and the human senses still remain analog. Whether we like it or not, but even processing the information “exclusively” in a figure, we still have to set up a filter before entering the ADC. Similarly, integrators, differentiators, adders, etc. can be implemented on analog components. Over the decades of the dominance of analog electronics, engineers have accumulated vast experience in solving various problems and a good developer (even if digital electronics) takes into account this legacy [ D1 ];


  • software implementation on a microcontroller
    The processed signals are few, and the mathematics is not complicated or not demanding of resources? In this case, a relatively inexpensive microcontroller with its ADC, low frequency of operation, energy saving capabilities is quite an option. If necessary, bottlenecks can be written in assembler;


  • FPGA implementation
    If we have high requirements for speed, parallelism, scaling solutions, then we describe mathematics in the form of a module on Verilog or VHDL, choose FPGA, which can work at the frequency necessary for processing. If the solution turns out to be very successful and there will be a sense in its wide replication - welcome to the world of ASIC [ L3 ];


  • software and hardware implementation as a system on a chip
    The system is too complicated to describe it entirely on Verilog, I want to program separate logic in a high-level language, and indeed, to manage everything from Linux? In this case, a solution for us is SoC (System-on-a-Chip, SoC): we take a ready-made processor core (Nios II, MIPSfpga, etc.) and hang it with the necessary peripheral modules, among which will be our special, performing cunning mathematics. Some operations can be made available in the form of processor commands [ L4 ]. And yes, in perspective this can also be implemented in ASIC;


  • use of a digital signal processor (DSP)
    Here we are, in fact, buying a ready-made chip with a processor core, its own set of peripherals and a set of commands specifically targeted at high-speed digital signal processing. Around her, we are building our decision [ L10 , L5 ];


  • software implementation on a general purpose processor
    Each processor manufacturer offers its own architectural solutions to optimize the performance of certain mathematical operations. And the task of the software developer, if necessary, is to use the possibilities offered by the manufacturer to speed up the calculations. This is exactly what will be discussed below for MIPS processors;


  • using a graphics controller for computing
    In order to make the present list more complete, it is impossible not to mention the possibility of bringing the most complex and resource-intensive calculations to the video card [ L6 , L7 ].

There are no perfect tools. The best tool is the one that guarantees the solution of the problem in a reasonable time, for which the project team has the necessary competencies, and which is either available or can be acquired with minimal cost. Budget decisions, customer requirements, and sometimes political reasons are imposed on such decisions.


I hope that this article will be useful as an introduction for those readers who will face the need to optimize the performance of certain calculations on the Baikal-T1 processor or another friend based on the MIPS core (s) where MIPS SIMD technology is available.


Computation resource intensity


Before proceeding further, consider one of the most common problems of digital signal processing (DSP) - filtering. As an example, take a filter with a finite impulse response (FIR, FIR, finite impulse response) [ L8 ]. Without going into the theory of DSP and mathematical calculations, we note the main thing - the equation that describes this type of digital filters:



where x (n) is the input signal, y (n) is the output signal, P is the order of the filter, bi is the filter coefficients. The same equation can be written as follows:



The nature of the input signal x (n) in this case is ignored. Let it be the data obtained from the ADC, but with the same success they can be read from the file. For us in this case it does not matter. Since our current article is “about calculations” and not “about DSP”, then, accordingly, we will not dive into the Magic of filters, but simply use one of the online services to calculate the coefficients [ L9 ]:


Set the desired filtering parameters (for example, one of the predefined options: band stop - notch filter [ L11 ]) and click the Design Filter button:



The result of the calculation is the frequency response of the filter [ L12 ]:


AFC filter


Factor set and source code:


SampleFilter.h
#ifndef SAMPLEFILTER_H_ #define SAMPLEFILTER_H_ /* FIR filter designed with http://t-filter.appspot.com sampling frequency: 2000 Hz * 0 Hz - 200 Hz gain = 1 desired ripple = 5 dB actual ripple = 3.1077303934211127 dB * 300 Hz - 500 Hz gain = 0 desired attenuation = -40 dB actual attenuation = -42.49314043914754 dB * 600 Hz - 1000 Hz gain = 1 desired ripple = 5 dB actual ripple = 3.1077303934211127 dB */ #define SAMPLEFILTER_TAP_NUM 25 typedef struct { double history[SAMPLEFILTER_TAP_NUM]; unsigned int last_index; } SampleFilter; void SampleFilter_init(SampleFilter* f); void SampleFilter_put(SampleFilter* f, double input); double SampleFilter_get(SampleFilter* f); #endif 

SampleFilter.s
 #include "SampleFilter.h" static double filter_taps[SAMPLEFILTER_TAP_NUM] = { 0.037391727827352596, -0.03299884552335979, 0.044230583967321345, 0.0023050970833628304, -0.06768087195950104, -0.046347105409124706, -0.011717387509232432, -0.0707342284185183, -0.049766517282999544, 0.16086413543836361, 0.21561058688743148, -0.10159456907827959, 0.6638637561392535, -0.10159456907827959, 0.21561058688743148, 0.16086413543836361, -0.049766517282999544, -0.0707342284185183, -0.011717387509232432, -0.046347105409124706, -0.06768087195950104, 0.0023050970833628304, 0.044230583967321345, -0.03299884552335979, 0.037391727827352596 }; void SampleFilter_init(SampleFilter* f) { int i; for(i = 0; i < SAMPLEFILTER_TAP_NUM; ++i) f->history[i] = 0; f->last_index = 0; } void SampleFilter_put(SampleFilter* f, double input) { f->history[f->last_index++] = input; if(f->last_index == SAMPLEFILTER_TAP_NUM) f->last_index = 0; } double SampleFilter_get(SampleFilter* f) { double acc = 0; int index = f->last_index, i; for(i = 0; i < SAMPLEFILTER_TAP_NUM; ++i) { index = index != 0 ? index-1 : SAMPLEFILTER_TAP_NUM-1; acc += f->history[index] * filter_taps[i]; }; return acc; } 

Let's look at the filtering parameters, the SampleFilter_get function, recall the FIR filter equation above and note the most important points for us:



Now suppose that the conditions of the task were clarified due to some objective reasons:



As a result, we get the following frequency response:


AFC filter


What is important for us:



I would not like the reader to pay attention to the absolute values ​​of the filter parameters given in the example. The main idea that I would like to convey is that at some point an increase in computation-intensive resources may go beyond the previously predicted limits. And after a small change in the algorithm, its parameters, or input data, all of a sudden, it may turn out that your processor core only deals with what it "shoves" the data, and even then it does not have time to do it, not to mention the performance of the remaining tasks. Or it successfully copes, but the load created at the same time or the duration of the calculations no longer meet the requirements for the system. And at this moment you are faced with more than ever the task of optimization.


Computation speed


After we have seen a problem with a simple example, we will look at its solution. Processor manufacturers go to a number of tricks in order to increase the speed of calculations: increase the frequency, increase the number of processor cores, add new commands, experiment with the configuration of the conveyor, cache size, switch to using more and more high-speed buses and interfaces.


Also, no one takes away from the developer the right to realize the most bottlenecks on the assembler, experiment with the compiler options, change the algorithm to a less resource-intensive one or behave more predictably on the same range of parameters.


We will concentrate on two ways to increase processing speed, which cannot be used without support from the processor architecture:



Combining arithmetic operations


Let's return to our filter and carefully look at the code for the SampleFilter_get function:


 double SampleFilter_get(SampleFilter* f) { double acc = 0; int index = f->last_index, i; for(i = 0; i < SAMPLEFILTER_TAP_NUM; ++i) { index = index != 0 ? index-1 : SAMPLEFILTER_TAP_NUM-1; acc += f->history[index] * filter_taps[i]; }; return acc; } 

And especially on the line:


 acc += f->history[index] * filter_taps[i]; 

Here we see 2 consecutively performed operations: multiplication by a coefficient and accumulation of the results of this command in a battery variable. This combination of multiplication and addition is very common in DSP algorithms. But if these two operations are so often side by side with each other, then why not combine them into a single command that is executed during 1 clock cycle? The idea of ​​such a union came to the heads of engineers for a long time. Thus, the command “multiply with accumulation” (combined multiplication-addition, multiply – accumulate operation, MAC) appeared, which is now present in all digital signal processors:



SIMD approach


Let's approach the solution from the other side. And why would we, instead of working with each x (n) , separately, not combine several samples into one vector (array) and apply the command to the whole vector at once, or more precisely, to each element of the vector at the same time? In this case, in one cycle (apart from working with memory) several samples can be processed at once:



And the larger the maximum size of the vector, the higher the processing speed will be. This principle of organization of calculations is called SIMD (single instruction, multiple data; single stream of commands, multiple data stream) [ L13 ]. This approach builds the work of vector processors [ L14 ] and vector extensions to scalar processors: SSE and AVX for x86 architecture, MIPS SIMD [ L15 ] for MIPS architecture.


MIPS SIMD


Now that we understand the basic principles on which the vector extensions of the architecture are built, we can go directly to MIPS SIMD. An exhaustive description of this extension is given in the documentation [ D2 ], we note the main points:



Integer operations
MnemonicInstruction Description
ADDV, ADDVIAdd
ADD_A, ADDS_AAdd and Saturated Add Absolute Values
ADDS_S, ADDS_USigned and Unsigned Saturated Add
HADD_S, HADD_USigned and Unsigned Horizontal Add
ASUB_S, ASUB_UAbsolute Value of Signed and Unsigned Subtract
AVE_S, AVE_USigned and Unsigned Average
AVER_S, AVER_USigned and Unsigned Average with Rounding
DOTP_S, DOTP_USigned and Unsigned Dot Product
DPADD_S, DPADD_USigned and Unsigned Dot Product Add
DPSUB_S, DPSUB_USigned and Unsigned Dot Product Subtract
Div_s div_uDivide
MADDVMultiply-add
MAX_A, MIN_AMaximum and Minimum of Absolute Values
MAX_S, MAXI_S, MAX_U, MAXI_USigned and Unsigned Maximum
MIN_S, MINI_S, MIN_U, MINI_USigned and Unsigned Maximum
MSUBVMultiply-subtract
MULVMultiply
MOD_S, MOD_USigned and Unsigned Remainder (Modulo)
SAT_S, SAT_USigned and Unsigned Saturate
SUBS_S, SUBS_USigned and Unsigned Saturated Subtract
HSUB_S, HSUB_USigned and Unsigned Horizontal Subtract
SUBSUU_SSigned Saturated Unsigned Subtract
SUBSUS_UUnsigned Saturated Signed Subtract from Unsigned
SUBV, SUBVISubtract

Bit operations
MnemonicInstruction Description
AND, ANDILogical and
BCLR, BCLRIBit clear
BINSL, BINSLI, BINSR, BINSRIBit Insert Left and Right
BMNZ, BMNZIBit Move If Not Zero
BMZ, BMZIBit Move If Zero
BNEG, BNEGIBit negate
BSEL, BSELIBit select
BSET, BSETIBit set
NLOC LeadingOne beat count
NLZC LeadingZero Bits Count
NOR, NORILogical negative or
PCNTPopulation (Bits Set to 1) Count
OR, ORILogical or
SLL, SLLIShift left
SRA, SRAIShift Right Arithmetic
SRAR, SRARIRounding Shift Right Arithmetic
SRL, SRLIShift right logical
SRLR, SRLRIRounding Shift Right Logical
XOR, XORILogical Exclusive Or

Floating point arithmetic
MnemonicInstruction Description
FADDFloating-point addition
FdivFloating-point division
FEXP2Floating point base 2 exponentiation
FLOG2Floating point base 2 logarithm
FMADD, FMSUBFloating-Point Fused Multiply-Add and Multiply-Subtract
FMAX, FMINFloating-Point Maximum and Minimum
FMAX_A, FMIN_AFloating-Point Maximum and Minimum of Absolute Values
FMULFloating-point multiplication
FRCPApproximate Floating-Point Reciprocal
FRINTFloating point round to integer
FRSQRTApproximate Floating-Point Reciprocal of Square Root
FSQRTFloating point square root
FSUBFloating point subtraction

Non-floating point arithmetic
MnemonicInstruction Description
FCLASSFloating point class mask

Floating point comparisons
MnemonicInstruction Description
FCAFFloating-Point Quiet Compare Always False
FCUNFloating-Point Quiet Compare Unordered
FCORFloating-Point Quiet Compare Ordered
FCEQFloating-Point Quiet Compare Equal
FCUNEFloating-Point Quiet Compare Unordered or Not Equal
FCUEQFloating-Point Quiet Compare Unordered or Equal
FCNEFloating-Point Quiet Compare Not Equal
FCLTFloating Point Quiet Compare Less Than
FCULTFloating-Point Quiet Compare Unordered or Less Than
FCLEFloating-Point Quiet
FCULEFloating-Point Quiet Compare Unordered or Less Than or Equal
FSAFFloating-Point Signaling Compare Always False
FSUNFloating-Point Signaling Compare Unordered
FSORFloating-Point Signaling Compare Ordered
FSEQFloating-Point Signaling Compare Equal
FSUNEFloating-Point Signaling Compare Unordered or Not Equal
FSUEQFloating-Point Signaling Compare Unordered or Equal
FSNEFloating-Point Signaling Compare Not Equal
FsltFloating-Point Signaling Compare Less Than
FSULTFloating-Point Signaling Compare Unordered or Less Than
FSLEFloating-Point Signaling Compare Less Than or Equal
FSULEFloating-Point Signaling Compare Unordered or Less Than or Equal

Float operations
MnemonicInstruction Description
FEXDOFloating Point Down-Convert Interchange Format
FEXUPL, FEXUPRLeft-Half and Right-Half Floating-Point Up-Convert Interchange Format
FFINT_S, FFINT_UFloating-Point Convert from Signed and Unsigned Integer
FFQL, FFQRLeft-Half and Right-Half Floating-Point Convert from Fixed-Point
FTINT_S, FTINT_UFloating-Point Round and Convert to Signed and Unsigned Integer
FTRUNC_S, FTRUNC_UTruncate and Convert to Signed and Unsigned Integer
FTQFloating Point Round and Convert to Fixed Point

Fixed point operations
MnemonicInstruction Description
MADD_Q, MADDR_QFixed-Point Multiply and Add-on and Rounding
MSUB_Q, MSUBR_QFixed-Point Multiply and Subtract without and with Rounding
MUL_Q, MULR_QFixed-Point Multiply without and Rounding

Branch operations
MnemonicInstruction Description
BnzBranch If Not Zero
BzBranch if zero
CEQ, CEQICompare Equal
CLE_S, CLEI_S, CLE_U, CLEI_UCompare Less-Than-or-Equal Signed and Unsigned
CLT_S, CLTI_S, CLT_U, CLTI_UCompare Less-Than Signed and Unsigned

Vector Loading and Unloading Operations
MnemonicInstruction Description
CFCMSA, CTCMSACopy Register and MSA Control Register
LDLoad vector
LdiLoad Immediate
MOVEVector to Vector Move
SPLAT, SPLATIReplicate Vector Element
FILL FILLVector from GPR
INSERT, INSVEInsert GPR and Vector element 0 to Vector Element
COPY_S, COPY_UCopy element to GPR Signed and Unsigned
STStore Vector

Vector element permutation operations
MnemonicInstruction Description
ILVEV, ILVODInterleave Even, odd
ILVL, ILVRInterleave the Left, Right
PCKEV, PCKODPack Even and Odd Elements
SHFSet shuffle
SLD, SLDIElement slide
VSHFVector shuffle

Other operations
MnemonicInstruction Description
LSALeft-shift add or load / store address calculation


Formal command description


Compiler support


MIPS SIMD is supported by the gcc compiler, however, this support has its own features:



Code before optimization
 #define ROUND_POWER_OF_TWO(value, n) (((value) + (1 << ((n) - 1))) >> (n)) static inline unsigned char clip_pixel(int i32Val) { return ((i32Val) > 255) ? 255u : ((i32Val) < 0) ? 0u : (i32Val); } void vert_filter_8taps_16width_c(unsigned char *pSrc, // SOURCE POINTER int SrcStride, // SOURCE BUFFER PITCH unsigned char *pDst, // DEST POINTER int DstStride, // DEST BUFFER PITCH char *pFilter, // POINTER TO FILTER BANK int Height) // HEIGHT OF THE BLOCK { unsigned int Row, Col; int FiltSum; short Src0, Src1, Src2, Src3, Src4, Src5, Src6, Src7; pSrc -= (8 / 2 - 1) * SrcStride; // MOVE INPUT SRC POINTER TO APPROPRIATE POSITION // LOOP FOR NUMBER OF COLUMNS-16 for (Col = 0; Col < 16; ++Col) { Src0 = pSrc[0 * SrcStride]; Src1 = pSrc[1 * SrcStride]; Src2 = pSrc[2 * SrcStride]; Src3 = pSrc[3 * SrcStride]; Src4 = pSrc[4 * SrcStride]; Src5 = pSrc[5 * SrcStride]; Src6 = pSrc[6 * SrcStride]; // LOOP FOR NUMBER OF ROWS for (Row = 0; Row < Height; Row++) { Src7 = pSrc[(7 + Row) * SrcStride]; FiltSum = 0; // ACCUMULATED FILTER SUM += PIXEL * FILTER COEFF FiltSum += (Src0 * pi8Filter[0]); FiltSum += (Src1 * pi8Filter[1]); FiltSum += (Src2 * pi8Filter[2]); FiltSum += (Src3 * pi8Filter[3]); FiltSum += (Src4 * pi8Filter[4]); FiltSum += (Src5 * pi8Filter[5]); FiltSum += (Src6 * pi8Filter[6]); FiltSum += (Src7 * pi8Filter[7]); FiltSum = ROUND_POWER_OF_TWO(FiltSum, 7); // ROUNDING pDst[Row * DstStride] = clip_pixel(FiltSum);// CLIP RESULT IN 0-255(UNSIGNED CHAR) // PREPARING FOR NEXT CONVOLUTION- SLIDING WINDOW Src0 = Src1; Src1 = Src2; Src2 = Src3; Src3 = Src4; Src4 = Src5; Src5 = Src6; Src6 = Src7; } pSrc += 1; pDst += 1; } } 

Code after optimization using MIPS SIMD
 /* MSA VECTOR TYPES */ #define WRLEN 128 // VECTOR REGISTER LENGTH 128-BIT #define NUMWRELEM (WRLEN >> 3) typedef signed char IMG_VINT8 __attribute__ ((vector_size(NUMWRELEM))); //VEC SIGNED BYTES typedef unsigned char IMG_VUINT8 __attribute__ ((vector_size(NUMWRELEM))); //VEC UNSIGNED BYTES typedef short IMG_VINT16 __attribute__ ((vector_size(NUMWRELEM))); //VEC SIGNED HALF-WORD #define LOAD_UNPACK_VEC(pSrc, SrcStride, vi16VecRight, vi16VecLeft) \ { \ IMG_VUINT8 vu8Src; \ IMG_VINT16 vi16Vec0; \ IMG_VINT8 vi8Tmp0; \ /* LOAD INPUT VECTOR */ \ vu8Src = *((IMG_VINT8 *)(pSrc)); \ /* RANGE WARPING TO MAINTAIN 16 BIT PRECISION */ \ vi16Vec0 = __builtin_msa_xori_b(vu8Src, 128); \ /* CALCULATE SIGN EXTENSION */ \ vi8Tmp0 = __builtin_msa_clti_s_b(vi16Vec0, 0); \ /* INTERLEAVE RIGHT TO 16 BIT VEC */ \ vi16VecRight = __builtin_msa_ilvr_b(vi8Tmp0, vi16Vec0); \ /* INTERLEAVE LEFT TO 16 BIT VEC */ \ vi16VecLeft = __builtin_msa_ilvl_b(vi8Tmp0, vi16Vec0); \ pSrc += SrcStride; \ } void vert_filter_8taps_16width_msa(unsigned char *pSrc, // SOURCE POINTER int SrcStride, // SOURCE BUFFER PITCH unsigned char *pDst, // DEST POINTER int DstStride, // DEST BUFFER PITH char *pFilter, // POINTER TO FILTER BANK int Height) // HEIGHT OF THE BLOCK { int u32LoopCnt; VINT16 vi16Vec0Right, vi16Vec1Right, vi16Vec2Right, vi16Vec3Right; VINT16 vi16Vec4Right, vi16Vec5Right, vi16Vec6Right, vi16Vec7Right; VINT16 vi16Vec0Left, vi16Vec1Left, vi16Vec2Left, vi16Vec3Left; VINT16 vi16Vec4Left, vi16Vec5Left, vi16Vec6Left, vi16Vec7Left; VINT16 vi16Temp1Right, vi16Temp1Left; VINT16 vi16Filt0, vi16Filt1, vi16Filt2, vi16Filt3; VINT16 vi16Filt4, vi16Filt5, vi16Filt6, vi16Filt7; pSrc -= (3 * SrcStride); // PREPARE FILTER COEFF IN VEC REGISTERS vi16Filt0 = __builtin_msa_fill_h(*(pFilter)); vi16Filt1 = __builtin_msa_fill_h(*(pFilter + 1)); vi16Filt2 = __builtin_msa_fill_h(*(pFilter + 2)); vi16Filt3 = __builtin_msa_fill_h(*(pFilter + 3)); vi16Filt4 = __builtin_msa_fill_h(*(pFilter + 4)); vi16Filt5 = __builtin_msa_fill_h(*(pFilter + 5)); vi16Filt6 = __builtin_msa_fill_h(*(pFilter + 6)); vi16Filt7 = __builtin_msa_fill_h(*(pFilter + 7)); //LOAD 7 INPUT VECTORS LOAD_UNPACK_VEC(pSrc, SrcStride, vi16Vec0Right, vi16Vec0Left) LOAD_UNPACK_VEC(pSrc, SrcStride, vi16Vec1Right, vi16Vec1Left) LOAD_UNPACK_VEC(pSrc, SrcStride, vi16Vec2Right, vi16Vec2Left) LOAD_UNPACK_VEC(pSrc, SrcStride, vi16Vec3Right, vi16Vec3Left) LOAD_UNPACK_VEC(pSrc, SrcStride, vi16Vec4Right, vi16Vec4Left) LOAD_UNPACK_VEC(pSrc, SrcStride, vi16Vec5Right, vi16Vec5Left) LOAD_UNPACK_VEC(pSrc, SrcStride, vi16Vec6Right, vi16Vec6Left) // START CONVOLUTION VERTICALLY for (u32LoopCnt = Height; u32LoopCnt--; ) { //LOAD 8TH INPUT VECTOR LOAD_UNPACK_VEC(pSrc, SrcStride, vi16Vec7Right, vi16Vec7Left) /* FILTER CALC */ IMG_VINT16 vi16Tmp1, vi16Tmp2; IMG_VINT8 vi8Tmp3; // 8 TAP VECTORIZED CONVOLUTION FOR RIGHT HALF vi16Tmp1 = (vi16Vec0Right * vi16Filt0); vi16Tmp1 += (vi16Vec1Right * vi16Filt1); vi16Tmp1 += (vi16Vec2Right * vi16Filt2); vi16Tmp1 += (vi16Vec3Right * vi16Filt3); vi16Tmp2 = (vi16Vec4Right * vi16Filt4); vi16Tmp2 += (vi16Vec5Right * vi16Filt5); vi16Tmp2 += (vi16Vec6Right * vi16Filt6); vi16Tmp2 += (vi16Vec7Right * vi16Filt7); vi16Temp1Right = __builtin_msa_adds_s_h(vi16Tmp1, vi16Tmp2); // 8 TAP VECTORIZED CONVOLUTION FOR LEFT HALF vi16Tmp1 = (vi16Vec0Left * vi16Filt0); vi16Tmp1 += (vi16Vec1Left * vi16Filt1); vi16Tmp1 += (vi16Vec2Left * vi16Filt2); vi16Tmp1 += (vi16Vec3Left * vi16Filt3); vi16Tmp2 = (vi16Vec4Left * vi16Filt4); vi16Tmp2 += (vi16Vec5Left * vi16Filt5); vi16Tmp2 += (vi16Vec6Left * vi16Filt6); vi16Tmp2 += (vi16Vec7Left * vi16Filt7); vi16Temp1Left = __builtin_msa_adds_s_h(vi16Tmp1, vi16Tmp2); // ROUNDING RIGHT SHIFT RANGE CLIPPING AND NARROWING vi16Temp1Right = __builtin_msa_srari_h(vi16Temp1Right, 7); vi16Temp1Right = __builtin_msa_sat_s_h(vi16Temp1Right, 7); vi16Temp1Left = __builtin_msa_srari_h(vi16Temp1Left, 7); vi16Temp1Left = __builtin_msa_sat_s_h(vi16Temp1Left, 7); vi8Tmp3 = __builtin_msa_pckev_b(vi16Temp1Left, vi16Temp1Right); vi8Tmp3 = __builtin_msa_xori_b(vi8Tmp3, 128); // STORE OUTPUT VEC *((IMG_VINT8 *)(pDst)) = (vi8Tmp3); pDst += DstStride; // PREPARING FOR NEXT CONVOLUTION- SLIDING WINDOW vi16Vec0Right = vi16Vec1Right; vi16Vec1Right = vi16Vec2Right; vi16Vec2Right = vi16Vec3Right; vi16Vec3Right = vi16Vec4Right; vi16Vec4Right = vi16Vec5Right; vi16Vec5Right = vi16Vec6Right; vi16Vec6Right = vi16Vec7Right; vi16Vec0Left = vi16Vec1Left; vi16Vec1Left = vi16Vec2Left; vi16Vec2Left = vi16Vec3Left; vi16Vec3Left = vi16Vec4Left; vi16Vec4Left = vi16Vec5Left; vi16Vec5Left = vi16Vec6Left; vi16Vec6Left = vi16Vec7Left; } } 

Performance evaluation


Initially, I had thoughts to write the simplest application (synthetic test) to assess the performance gain when using MIPS SIMD. But despite its attractiveness, this option is not indicative, due to its isolation from the user's real tasks. Fortunately, the employees of Imagination Technologies and MIPS have made a significant contribution to ffmpeg [ L18 ] - a widely used open source application designed to convert audio and video [ L19 ]. I believe that they, like no one else, know how to properly use the technology in question, which means this code should be as efficient as possible.


Thus, if we compile ffmpeg in two versions: with and without MIPS SIMD support, we can compare the speed of work on the same input data and draw some conclusion about the effectiveness of vector calculations based on the results.


Build ffmpeg


Runs on an x86 machine running Linux. Development tools from Imagination Technologies [ L20 ] are used in cross-compilation mode. Tests are performed on the latest stable release at the time of writing this article - ffmpeg 3.3 [ L19 ].


Configuration ffmpeg for the version with support for MIPS SIMD:


 ./configure --enable-cross-compile --prefix=../ffmpeg-msa --cross-prefix=/home/stas/mipsfpga/toolchain/mips-mti-linux-gnu/2016.05-03/bin/mips-mti-linux-gnu- --arch=mips --cpu=p5600 --target-os=linux --extra-cflags="-EL -static" --extra-ldflags="-EL -static" --disable-iconv 

And with disabled MIPS SIMD support:


 ./configure --enable-cross-compile --prefix=../ffmpeg-soft --cross-prefix=/home/stas/mipsfpga/toolchain/mips-mti-linux-gnu/2016.05-03/bin/mips-mti-linux-gnu- --arch=mips --cpu=p5600 --target-os=linux --extra-cflags="-EL -static" --extra-ldflags="-EL -static" --disable-iconv --disable-msa 

Description of configuration settings
ParameterDescription
--enable-cross-compileassembly will be carried out on a machine with an excellent architecture
--prefix = .. / ffmpeg-msathe directory in which files will be placed after the make install command
--cross-prefix = .. / mips-mti-linux-gnu-toolchain path
--arch = mipstarget architecture - MIPS
--cpu = p5600target processor core - p5600
--target-os = linuxtarget OS - Linux
- extra-cflags = "- EL -static"target system - little endian, use static binding
- extra-ldflags = "- EL -static"similarly
--disable-iconvdisable text encoding functionality
--disable-msado not use MIPS SIMD

If you plan to repeat these steps, then note that the ffmpeg 3.3 build with MIPS SIMD support falls out with a minor error, which you need to add to the libavcodec \ mips \ hevcpred_msa.c file to add:


 #include "libavcodec/hevcdec.h" 

Testing


It is executed on the Baikal-T1 processor:


 # uname -a Linux baikal-BFK-18446744073709551615 4.4.41-bfk #0 SMP Tue Apr 25 15:54:24 MSK 2017 mips GNU/Linux 

The input data are two videos encoded using x264 [ L21 ] and x265 [ L22 ]. The test task is to decode the video with getting screenshots at regular intervals:


 ./ffmpeg-mips/ffmpeg-msa/bin/ffmpeg -i ./The\ Simpsons\ Movie\ -\ Trailer_x264.mp4 -vf fps=1/10 ./out_img/ffmpeg-msa_x264_%d.jpg -report -benchmark ./ffmpeg-mips/ffmpeg-msa/bin/ffmpeg -i ./Tears_400_x265.mp4 -vf fps=1 ./out_img/ffmpeg-msa_x265_%d.jpg -report -benchmark ./ffmpeg-mips/ffmpeg-soft/bin/ffmpeg -i ./The\ Simpsons\ Movie\ -\ Trailer_x264.mp4 -vf fps=1/10 ./out_img/ffmpeg-soft_x264_%d.jpg -report -benchmark ./ffmpeg-mips/ffmpeg-soft/bin/ffmpeg -i ./Tears_400_x265.mp4 -vf fps=1 ./out_img/ffmpeg-soft_x265_%d.jpg -report -benchmark 

Description of launch options
ParameterDescription
-i ./Tears_400_x265.mp4file to be processed
-vf fps = 1period (frequency) of taking screenshots (1 sec - for a short video, 10 sec - for a long one)
./out_img/ffmpeg-soft x264 % d.jpgoutput file name pattern
-reportgenerate a report on the results of work
-benchmarkinclude performance data

Ffmpeg results


ScenarioDuration (sec)
x264 decoding with MIPS SIMD support113
x265 MIPS SIMD22
x264 MIPS SIMD164
x265 MIPS SIMD52

, MIPS SIMD 1.5 — 2.4 .


ffmpeg github [ L23 ].


x264 MIPS SIMD (, )
 ffmpeg started on 2010-10-18 at 00:30:26 Report written to "ffmpeg-20101018-003026.log" Command line: ./ffmpeg-mips/ffmpeg-msa/bin/ffmpeg -i "./The Simpsons Movie - Trailer_x264.mp4" -vf "fps=1/10" "./out_img/ffmpeg-msa_x264_%d.jpg" -report -benchmark ffmpeg version 3.3 Copyright (c) 2000-2017 the FFmpeg developers built with gcc 4.9.2 (Codescape GNU Tools 2016.05-03 for MIPS MTI Linux) configuration: --enable-cross-compile --prefix=../ffmpeg-msa --cross-prefix=/home/stas/mipsfpga/toolchain/mips-mti-linux-gnu/2016.05-03/bin/mips-mti-linux-gnu- --arch=mips --cpu=p5600 --target-os=linux --extra-cflags='-EL -static' --extra-ldflags='-EL -static' --disable-iconv libavutil 55. 58.100 / 55. 58.100 libavcodec 57. 89.100 / 57. 89.100 libavformat 57. 71.100 / 57. 71.100 libavdevice 57. 6.100 / 57. 6.100 libavfilter 6. 82.100 / 6. 82.100 libswscale 4. 6.100 / 4. 6.100 libswresample 2. 7.100 / 2. 7.100 Splitting the commandline. Reading option '-i' ... matched as input url with argument './The Simpsons Movie - Trailer_x264.mp4'. Reading option '-vf' ... matched as option 'vf' (set video filters) with argument 'fps=1/10'. Reading option './out_img/ffmpeg-msa_x264_%d.jpg' ... matched as output url. Reading option '-report' ... matched as option 'report' (generate a report) with argument '1'. Reading option '-benchmark' ... matched as option 'benchmark' (add timings for benchmarking) with argument '1'. Finished splitting the commandline. Parsing a group of options: global . Applying option report (generate a report) with argument 1. Applying option benchmark (add timings for benchmarking) with argument 1. Successfully parsed a group of options. Parsing a group of options: input url ./The Simpsons Movie - Trailer_x264.mp4. Successfully parsed a group of options. Opening an input file: ./The Simpsons Movie - Trailer_x264.mp4. [file @ 0x1fce0e0] Setting default whitelist 'file,crypto' [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1fcd980] Format mov,mp4,m4a,3gp,3g2,mj2 probed with size=2048 and score=100 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1fcd980] ISO: File Type Major Brand: isom [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1fcd980] Unknown dref type 0x206c7275 size 12 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1fcd980] Unknown dref type 0x206c7275 size 12 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1fcd980] Before avformat_find_stream_info() pos: 73516232 bytes read:65587 seeks:1 nb_streams:2 [h264 @ 0x1fcecb0] nal_unit_type: 7, nal_ref_idc: 3 [h264 @ 0x1fcecb0] nal_unit_type: 8, nal_ref_idc: 3 [h264 @ 0x1fcecb0] nal_unit_type: 6, nal_ref_idc: 0 [h264 @ 0x1fcecb0] nal_unit_type: 5, nal_ref_idc: 3 [h264 @ 0x1fcecb0] user data:"x264 - core 54 svn-620M - H.264/MPEG-4 AVC codec - Copyleft 2005 - http://www.videolan.org/x264.html - options: cabac=1 ref=5 deblock=1:0:0 analyse=0x1:0x131 me=umh subme=6 brdo=1 mixed_ref=0 me_range=16 chroma_me=1 trellis=1 8x8dct=0 cqm=0 deadzone=21,11 chroma_qp_offset=0 threads=1 nr=0 decimate=1 mbaff=0 bframes=1 b_pyramid=0 b_adapt=1 b_bias=0 direct=3 wpredb=0 bime=0 keyint=250 keyint_min=25 scenecut=40 rc=2pass bitrate=4214 ratetol=1.0 rceq='blurCplx^(1-qComp)' qcomp=0.60 qpmin=10 qpmax=51 qpstep=4 cplxblur=20.0 qblur=0.5 ip_ratio=1.40 pb_ratio=1.30" [h264 @ 0x1fcecb0] Reinit context to 1280x544, pix_fmt: yuv420p [h264 @ 0x1fcecb0] no picture [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1fcd980] All info found [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1fcd980] After avformat_find_stream_info() pos: 94845 bytes read:141348 seeks:2 frames:13 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from './The Simpsons Movie - Trailer_x264.mp4': Metadata: major_brand : isom minor_version : 1 compatible_brands: isomavc1 creation_time : 2007-02-19T05:03:04.000000Z Duration: 00:02:17.30, start: 0.000000, bitrate: 4283 kb/s Stream #0:0(und), 12, 1/24000: Video: h264 (Main) (avc1 / 0x31637661), yuv420p, 1280x544, 4221 kb/s, 23.98 fps, 23.98 tbr, 24k tbn, 47.95 tbc (default) Metadata: creation_time : 2007-02-19T05:03:04.000000Z handler_name : GPAC ISO Video Handler Stream #0:1(und), 1, 1/48000: Audio: aac (HE-AAC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 64 kb/s (default) Metadata: creation_time : 2007-02-19T05:03:08.000000Z handler_name : GPAC ISO Audio Handler Successfully opened the file. Parsing a group of options: output url ./out_img/ffmpeg-msa_x264_%d.jpg. Applying option vf (set video filters) with argument fps=1/10. Successfully parsed a group of options. Opening an output file: ./out_img/ffmpeg-msa_x264_%d.jpg. Successfully opened the file. detected 2 logical cores [h264 @ 0x20191e0] nal_unit_type: 7, nal_ref_idc: 3 [h264 @ 0x20191e0] nal_unit_type: 8, nal_ref_idc: 3 Stream mapping: Stream #0:0 -> #0:0 (h264 (native) -> mjpeg (native)) Press [q] to stop, [?] for help cur_dts is invalid (this is harmless if it occurs once at the start per stream) cur_dts is invalid (this is harmless if it occurs once at the start per stream) [h264 @ 0x20191e0] nal_unit_type: 6, nal_ref_idc: 0 [h264 @ 0x20191e0] nal_unit_type: 5, nal_ref_idc: 3 [h264 @ 0x20191e0] user data:"x264 - core 54 svn-620M - H.264/MPEG-4 AVC codec - Copyleft 2005 - http://www.videolan.org/x264.html - options: cabac=1 ref=5 deblock=1:0:0 analyse=0x1:0x131 me=umh subme=6 brdo=1 mixed_ref=0 me_range=16 chroma_me=1 trellis=1 8x8dct=0 cqm=0 deadzone=21,11 chroma_qp_offset=0 threads=1 nr=0 decimate=1 mbaff=0 bframes=1 b_pyramid=0 b_adapt=1 b_bias=0 direct=3 wpredb=0 bime=0 keyint=250 keyint_min=25 scenecut=40 rc=2pass bitrate=4214 ratetol=1.0 rceq='blurCplx^(1-qComp)' qcomp=0.60 qpmin=10 qpmax=51 qpstep=4 cplxblur=20.0 qblur=0.5 ip_ratio=1.40 pb_ratio=1.30" [h264 @ 0x20191e0] Reinit context to 1280x544, pix_fmt: yuv420p [h264 @ 0x20191e0] no picture cur_dts is invalid (this is harmless if it occurs once at the start per stream) [h264 @ 0x2026050] nal_unit_type: 1, nal_ref_idc: 2 [h264 @ 0x2060f00] nal_unit_type: 1, nal_ref_idc: 0 cur_dts is invalid (this is harmless if it occurs once at the start per stream) [h264 @ 0x20191e0] nal_unit_type: 1, nal_ref_idc: 2 [Parsed_fps_0 @ 0x202bf50] Setting 'fps' to value '1/10' [Parsed_fps_0 @ 0x202bf50] fps=1/10 [graph 0 input from stream 0:0 @ 0x202c6f0] Setting 'video_size' to value '1280x544' [graph 0 input from stream 0:0 @ 0x202c6f0] Setting 'pix_fmt' to value '0' [graph 0 input from stream 0:0 @ 0x202c6f0] Setting 'time_base' to value '1/24000' [graph 0 input from stream 0:0 @ 0x202c6f0] Setting 'pixel_aspect' to value '0/1' [graph 0 input from stream 0:0 @ 0x202c6f0] Setting 'sws_param' to value 'flags=2' [graph 0 input from stream 0:0 @ 0x202c6f0] Setting 'frame_rate' to value '24000/1001' [graph 0 input from stream 0:0 @ 0x202c6f0] w:1280 h:544 pixfmt:yuv420p tb:1/24000 fr:24000/1001 sar:0/1 sws_param:flags=2 [format @ 0x202c030] compat: called with args=[yuvj420p|yuvj422p|yuvj444p] [format @ 0x202c030] Setting 'pix_fmts' to value 'yuvj420p|yuvj422p|yuvj444p' [auto_scaler_0 @ 0x202c660] Setting 'flags' to value 'bicubic' [auto_scaler_0 @ 0x202c660] w:iw h:ih flags:'bicubic' interl:0 [format @ 0x202c030] auto-inserting filter 'auto_scaler_0' between the filter 'Parsed_fps_0' and the filter 'format' [AVFilterGraph @ 0x202bb80] query_formats: 4 queried, 2 merged, 1 already done, 0 delayed [auto_scaler_0 @ 0x202c660] picking yuvj420p out of 3 ref:yuv420p alpha:0 [swscaler @ 0x202cf00] deprecated pixel format used, make sure you did set range correctly [auto_scaler_0 @ 0x202c660] w:1280 h:544 fmt:yuv420p sar:0/1 -> w:1280 h:544 fmt:yuvj420p sar:0/1 flags:0x4 [mjpeg @ 0x1ff5f90] Forcing thread count to 1 for MJPEG encoding, use -thread_type slice or a constant quantizer if you want to use multiple cpu cores [mjpeg @ 0x1ff5f90] intra_quant_bias = 96 inter_quant_bias = 0 Output #0, image2, to './out_img/ffmpeg-msa_x264_%d.jpg': Metadata: major_brand : isom minor_version : 1 compatible_brands: isomavc1 encoder : Lavf57.71.100 Stream #0:0(und), 0, 10/1: Video: mjpeg, yuvj420p(pc), 1280x544, q=2-31, 200 kb/s, 0.10 fps, 0.10 tbn, 0.10 tbc (default) Metadata: creation_time : 2007-02-19T05:03:04.000000Z handler_name : GPAC ISO Video Handler encoder : Lavc57.89.100 mjpeg Side data: cpb: bitrate max/min/avg: 0/0/200000 buffer size: 0 vbv_delay: -1 cur_dts is invalid (this is harmless if it occurs once at the start per stream) [Parsed_fps_0 @ 0x202bf50] Dropping 1 frame(s). cur_dts is invalid (this is harmless if it occurs once at the start per stream) ... [h264 @ 0x2060f00] nal_unit_type: 1, nal_ref_idc: 2 [Parsed_fps_0 @ 0x202bf50] Dropping 1 frame(s). [Parsed_fps_0 @ 0x202bf50] Dropping 1 frame(s). [Parsed_fps_0 @ 0x202bf50] Dropping 1 frame(s). [file @ 0x209dc40] Setting default whitelist 'file,crypto' [AVIOContext @ 0x2189ab0] Statistics: 0 seeks, 1 writeouts frame= 15 fps=0.2 q=1.6 size=N/A time=00:02:30.00 bitrate=N/A speed=2.03x No more output streams to write to, finishing. frame= 15 fps=0.2 q=1.6 Lsize=N/A time=00:02:30.00 bitrate=N/A speed=2.03x video:1382kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown Input file #0 (./The Simpsons Movie - Trailer_x264.mp4): Input stream #0:0 (video): 3288 packets read (72364468 bytes); 3288 frames decoded; Input stream #0:1 (audio): 1 packets read (134 bytes); Total: 3289 packets (72364602 bytes) demuxed Output file #0 (./out_img/ffmpeg-msa_x264_%d.jpg): Output stream #0:0 (video): 15 frames encoded; 15 packets muxed (1414925 bytes); Total: 15 packets (1414925 bytes) muxed bench: utime=113.070s 3288 frames successfully decoded, 0 decoding errors bench: maxrss=39264kB [Parsed_fps_0 @ 0x202bf50] 3288 frames in, 15 frames out; 3273 frames dropped, 0 frames duplicated. [AVIOContext @ 0x1fd6230] Statistics: 73517562 bytes read, 5 seeks 

x265 MIPS SIMD (, )
 ffmpeg started on 2010-10-18 at 00:27:58 Report written to "ffmpeg-20101018-002758.log" Command line: ./ffmpeg-mips/ffmpeg-msa/bin/ffmpeg -i ./Tears_400_x265.mp4 -vf "fps=1" "./out_img/ffmpeg-msa_x265_%d.jpg" -report -benchmark ffmpeg version 3.3 Copyright (c) 2000-2017 the FFmpeg developers built with gcc 4.9.2 (Codescape GNU Tools 2016.05-03 for MIPS MTI Linux) configuration: --enable-cross-compile --prefix=../ffmpeg-msa --cross-prefix=/home/stas/mipsfpga/toolchain/mips-mti-linux-gnu/2016.05-03/bin/mips-mti-linux-gnu- --arch=mips --cpu=p5600 --target-os=linux --extra-cflags='-EL -static' --extra-ldflags='-EL -static' --disable-iconv libavutil 55. 58.100 / 55. 58.100 libavcodec 57. 89.100 / 57. 89.100 libavformat 57. 71.100 / 57. 71.100 libavdevice 57. 6.100 / 57. 6.100 libavfilter 6. 82.100 / 6. 82.100 libswscale 4. 6.100 / 4. 6.100 libswresample 2. 7.100 / 2. 7.100 Splitting the commandline. Reading option '-i' ... matched as input url with argument './Tears_400_x265.mp4'. Reading option '-vf' ... matched as option 'vf' (set video filters) with argument 'fps=1'. Reading option './out_img/ffmpeg-msa_x265_%d.jpg' ... matched as output url. Reading option '-report' ... matched as option 'report' (generate a report) with argument '1'. Reading option '-benchmark' ... matched as option 'benchmark' (add timings for benchmarking) with argument '1'. Finished splitting the commandline. Parsing a group of options: global . Applying option report (generate a report) with argument 1. Applying option benchmark (add timings for benchmarking) with argument 1. Successfully parsed a group of options. Parsing a group of options: input url ./Tears_400_x265.mp4. Successfully parsed a group of options. Opening an input file: ./Tears_400_x265.mp4. [file @ 0x1fce0e0] Setting default whitelist 'file,crypto' [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1fcd980] Format mov,mp4,m4a,3gp,3g2,mj2 probed with size=2048 and score=100 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1fcd980] ISO: File Type Major Brand: iso4 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1fcd980] Unknown dref type 0x206c7275 size 12 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1fcd980] Before avformat_find_stream_info() pos: 705972 bytes read:32827 seeks:1 nb_streams:1 [hevc @ 0x1fceca0] nal_unit_type: 32(VPS), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1fceca0] Decoding VPS [hevc @ 0x1fceca0] Main profile bitstream [hevc @ 0x1fceca0] nal_unit_type: 33(SPS), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1fceca0] Decoding SPS [hevc @ 0x1fceca0] Main profile bitstream [hevc @ 0x1fceca0] Decoding VUI [hevc @ 0x1fceca0] nal_unit_type: 34(PPS), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1fceca0] Decoding PPS [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1fcd980] All info found [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1fcd980] After avformat_find_stream_info() pos: 20299 bytes read:65595 seeks:2 frames:1 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from './Tears_400_x265.mp4': Metadata: major_brand : iso4 minor_version : 1 compatible_brands: iso4hvc1 creation_time : 2014-08-25T18:10:46.000000Z Duration: 00:00:13.96, start: 0.125000, bitrate: 404 kb/s Stream #0:0(und), 1, 1/24000: Video: hevc (Main) (hvc1 / 0x31637668), yuv420p(tv), 1920x800, 402 kb/s, 24 fps, 24 tbr, 24k tbn, 24 tbc (default) Metadata: creation_time : 2014-08-25T18:10:46.000000Z handler_name : hevc:fps=24@GPAC0.5.1-DEV-rev4807 Successfully opened the file. Parsing a group of options: output url ./out_img/ffmpeg-msa_x265_%d.jpg. Applying option vf (set video filters) with argument fps=1. Successfully parsed a group of options. Opening an output file: ./out_img/ffmpeg-msa_x265_%d.jpg. Successfully opened the file. detected 2 logical cores [hevc @ 0x1fe5a00] nal_unit_type: 32(VPS), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1fe5a00] Decoding VPS [hevc @ 0x1fe5a00] Main profile bitstream [hevc @ 0x1fe5a00] nal_unit_type: 33(SPS), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1fe5a00] Decoding SPS [hevc @ 0x1fe5a00] Main profile bitstream [hevc @ 0x1fe5a00] Decoding VUI [hevc @ 0x1fe5a00] nal_unit_type: 34(PPS), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1fe5a00] Decoding PPS Stream mapping: Stream #0:0 -> #0:0 (hevc (native) -> mjpeg (native)) Press [q] to stop, [?] for help cur_dts is invalid (this is harmless if it occurs once at the start per stream) cur_dts is invalid (this is harmless if it occurs once at the start per stream) [hevc @ 0x1fe5a00] nal_unit_type: 39(SEI_PREFIX), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1fe5a00] nal_unit_type: 39(SEI_PREFIX), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1fe5a00] nal_unit_type: 19(IDR_W_RADL), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1fe5a00] Decoding SEI [hevc @ 0x1fe5a00] Skipped PREFIX SEI 5 [hevc @ 0x1fe5a00] Decoding SEI [hevc @ 0x1fe5a00] Skipped PREFIX SEI 6 cur_dts is invalid (this is harmless if it occurs once at the start per stream) [hevc @ 0x1ffeba0] nal_unit_type: 1(TRAIL_R), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x200c4d0] nal_unit_type: 1(TRAIL_R), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x200c4d0] Output frame with POC 0. [hevc @ 0x1fe5a00] Decoded frame with POC 0. cur_dts is invalid (this is harmless if it occurs once at the start per stream) [hevc @ 0x1fe5a00] nal_unit_type: 0(TRAIL_N), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1fe5a00] Output frame with POC 1. [hevc @ 0x1ffeba0] Decoded frame with POC 5. cur_dts is invalid (this is harmless if it occurs once at the start per stream) [hevc @ 0x1ffeba0] nal_unit_type: 0(TRAIL_N), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1ffeba0] Output frame with POC 2. [hevc @ 0x200c4d0] Decoded frame with POC 3. [Parsed_fps_0 @ 0x201a6c0] Setting 'fps' to value '1' [Parsed_fps_0 @ 0x201a6c0] fps=1/1 [graph 0 input from stream 0:0 @ 0x201abb0] Setting 'video_size' to value '1920x800' [graph 0 input from stream 0:0 @ 0x201abb0] Setting 'pix_fmt' to value '0' [graph 0 input from stream 0:0 @ 0x201abb0] Setting 'time_base' to value '1/24000' [graph 0 input from stream 0:0 @ 0x201abb0] Setting 'pixel_aspect' to value '0/1' [graph 0 input from stream 0:0 @ 0x201abb0] Setting 'sws_param' to value 'flags=2' [graph 0 input from stream 0:0 @ 0x201abb0] Setting 'frame_rate' to value '24/1' [graph 0 input from stream 0:0 @ 0x201abb0] w:1920 h:800 pixfmt:yuv420p tb:1/24000 fr:24/1 sar:0/1 sws_param:flags=2 [format @ 0x201aad0] compat: called with args=[yuvj420p|yuvj422p|yuvj444p] [format @ 0x201aad0] Setting 'pix_fmts' to value 'yuvj420p|yuvj422p|yuvj444p' [auto_scaler_0 @ 0x201a350] Setting 'flags' to value 'bicubic' [auto_scaler_0 @ 0x201a350] w:iw h:ih flags:'bicubic' interl:0 [format @ 0x201aad0] auto-inserting filter 'auto_scaler_0' between the filter 'Parsed_fps_0' and the filter 'format' [AVFilterGraph @ 0x201a2f0] query_formats: 4 queried, 2 merged, 1 already done, 0 delayed [auto_scaler_0 @ 0x201a350] picking yuvj420p out of 3 ref:yuv420p alpha:0 [swscaler @ 0x21d7da0] deprecated pixel format used, make sure you did set range correctly [auto_scaler_0 @ 0x201a350] w:1920 h:800 fmt:yuv420p sar:0/1 -> w:1920 h:800 fmt:yuvj420p sar:0/1 flags:0x4 [mjpeg @ 0x1fe2ba0] Forcing thread count to 1 for MJPEG encoding, use -thread_type slice or a constant quantizer if you want to use multiple cpu cores [mjpeg @ 0x1fe2ba0] intra_quant_bias = 96 inter_quant_bias = 0 Output #0, image2, to './out_img/ffmpeg-msa_x265_%d.jpg': Metadata: major_brand : iso4 minor_version : 1 compatible_brands: iso4hvc1 encoder : Lavf57.71.100 Stream #0:0(und), 0, 1/1: Video: mjpeg, yuvj420p(pc), 1920x800, q=2-31, 200 kb/s, 1 fps, 1 tbn, 1 tbc (default) Metadata: creation_time : 2014-08-25T18:10:46.000000Z handler_name : hevc:fps=24@GPAC0.5.1-DEV-rev4807 encoder : Lavc57.89.100 mjpeg Side data: cpb: bitrate max/min/avg: 0/0/200000 buffer size: 0 vbv_delay: -1 cur_dts is invalid (this is harmless if it occurs once at the start per stream) ... No more output streams to write to, finishing. frame= 15 fps=0.7 q=24.8 Lsize=N/A time=00:00:15.00 bitrate=N/A speed=0.668x video:1084kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown Input file #0 (./Tears_400_x265.mp4): Input stream #0:0 (video): 335 packets read (701773 bytes); 335 frames decoded; Total: 335 packets (701773 bytes) demuxed Output file #0 (./out_img/ffmpeg-msa_x265_%d.jpg): Output stream #0:0 (video): 15 frames encoded; 15 packets muxed (1109604 bytes); Total: 15 packets (1109604 bytes) muxed bench: utime=22.300s 335 frames successfully decoded, 0 decoding errors bench: maxrss=72432kB [Parsed_fps_0 @ 0x201a6c0] 335 frames in, 15 frames out; 320 frames dropped, 0 frames duplicated. [AVIOContext @ 0x1fd6220] Statistics: 734659 bytes read, 2 seeks 

x264 MIPS SIMD (, )
 ffmpeg started on 2010-10-18 at 00:28:31 Report written to "ffmpeg-20101018-002831.log" Command line: ./ffmpeg-mips/ffmpeg-soft/bin/ffmpeg -i "./The Simpsons Movie - Trailer_x264.mp4" -vf "fps=1/10" "./out_img/ffmpeg-soft_x264_%d.jpg" -report -benchmark ffmpeg version 3.3 Copyright (c) 2000-2017 the FFmpeg developers built with gcc 4.9.2 (Codescape GNU Tools 2016.05-03 for MIPS MTI Linux) configuration: --enable-cross-compile --prefix=../ffmpeg-soft --cross-prefix=/home/stas/mipsfpga/toolchain/mips-mti-linux-gnu/2016.05-03/bin/mips-mti-linux-gnu- --arch=mips --cpu=p5600 --target-os=linux --extra-cflags='-EL -static' --extra-ldflags='-EL -static' --disable-iconv --disable-msa libavutil 55. 58.100 / 55. 58.100 libavcodec 57. 89.100 / 57. 89.100 libavformat 57. 71.100 / 57. 71.100 libavdevice 57. 6.100 / 57. 6.100 libavfilter 6. 82.100 / 6. 82.100 libswscale 4. 6.100 / 4. 6.100 libswresample 2. 7.100 / 2. 7.100 Splitting the commandline. Reading option '-i' ... matched as input url with argument './The Simpsons Movie - Trailer_x264.mp4'. Reading option '-vf' ... matched as option 'vf' (set video filters) with argument 'fps=1/10'. Reading option './out_img/ffmpeg-soft_x264_%d.jpg' ... matched as output url. Reading option '-report' ... matched as option 'report' (generate a report) with argument '1'. Reading option '-benchmark' ... matched as option 'benchmark' (add timings for benchmarking) with argument '1'. Finished splitting the commandline. Parsing a group of options: global . Applying option report (generate a report) with argument 1. Applying option benchmark (add timings for benchmarking) with argument 1. Successfully parsed a group of options. Parsing a group of options: input url ./The Simpsons Movie - Trailer_x264.mp4. Successfully parsed a group of options. Opening an input file: ./The Simpsons Movie - Trailer_x264.mp4. [file @ 0x1f4a0e0] Setting default whitelist 'file,crypto' [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1f49980] Format mov,mp4,m4a,3gp,3g2,mj2 probed with size=2048 and score=100 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1f49980] ISO: File Type Major Brand: isom [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1f49980] Unknown dref type 0x206c7275 size 12 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1f49980] Unknown dref type 0x206c7275 size 12 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1f49980] Before avformat_find_stream_info() pos: 73516232 bytes read:65587 seeks:1 nb_streams:2 [h264 @ 0x1f4acb0] nal_unit_type: 7, nal_ref_idc: 3 [h264 @ 0x1f4acb0] nal_unit_type: 8, nal_ref_idc: 3 [h264 @ 0x1f4acb0] nal_unit_type: 6, nal_ref_idc: 0 [h264 @ 0x1f4acb0] nal_unit_type: 5, nal_ref_idc: 3 [h264 @ 0x1f4acb0] user data:"x264 - core 54 svn-620M - H.264/MPEG-4 AVC codec - Copyleft 2005 - http://www.videolan.org/x264.html - options: cabac=1 ref=5 deblock=1:0:0 analyse=0x1:0x131 me=umh subme=6 brdo=1 mixed_ref=0 me_range=16 chroma_me=1 trellis=1 8x8dct=0 cqm=0 deadzone=21,11 chroma_qp_offset=0 threads=1 nr=0 decimate=1 mbaff=0 bframes=1 b_pyramid=0 b_adapt=1 b_bias=0 direct=3 wpredb=0 bime=0 keyint=250 keyint_min=25 scenecut=40 rc=2pass bitrate=4214 ratetol=1.0 rceq='blurCplx^(1-qComp)' qcomp=0.60 qpmin=10 qpmax=51 qpstep=4 cplxblur=20.0 qblur=0.5 ip_ratio=1.40 pb_ratio=1.30" [h264 @ 0x1f4acb0] Reinit context to 1280x544, pix_fmt: yuv420p [h264 @ 0x1f4acb0] no picture [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1f49980] All info found [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1f49980] After avformat_find_stream_info() pos: 94845 bytes read:141348 seeks:2 frames:13 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from './The Simpsons Movie - Trailer_x264.mp4': Metadata: major_brand : isom minor_version : 1 compatible_brands: isomavc1 creation_time : 2007-02-19T05:03:04.000000Z Duration: 00:02:17.30, start: 0.000000, bitrate: 4283 kb/s Stream #0:0(und), 12, 1/24000: Video: h264 (Main) (avc1 / 0x31637661), yuv420p, 1280x544, 4221 kb/s, 23.98 fps, 23.98 tbr, 24k tbn, 47.95 tbc (default) Metadata: creation_time : 2007-02-19T05:03:04.000000Z handler_name : GPAC ISO Video Handler Stream #0:1(und), 1, 1/48000: Audio: aac (HE-AAC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 64 kb/s (default) Metadata: creation_time : 2007-02-19T05:03:08.000000Z handler_name : GPAC ISO Audio Handler Successfully opened the file. Parsing a group of options: output url ./out_img/ffmpeg-soft_x264_%d.jpg. Applying option vf (set video filters) with argument fps=1/10. Successfully parsed a group of options. Opening an output file: ./out_img/ffmpeg-soft_x264_%d.jpg. Successfully opened the file. detected 2 logical cores [h264 @ 0x1f951e0] nal_unit_type: 7, nal_ref_idc: 3 [h264 @ 0x1f951e0] nal_unit_type: 8, nal_ref_idc: 3 Stream mapping: Stream #0:0 -> #0:0 (h264 (native) -> mjpeg (native)) Press [q] to stop, [?] for help cur_dts is invalid (this is harmless if it occurs once at the start per stream) cur_dts is invalid (this is harmless if it occurs once at the start per stream) [h264 @ 0x1f951e0] nal_unit_type: 6, nal_ref_idc: 0 [h264 @ 0x1f951e0] nal_unit_type: 5, nal_ref_idc: 3 [h264 @ 0x1f951e0] user data:"x264 - core 54 svn-620M - H.264/MPEG-4 AVC codec - Copyleft 2005 - http://www.videolan.org/x264.html - options: cabac=1 ref=5 deblock=1:0:0 analyse=0x1:0x131 me=umh subme=6 brdo=1 mixed_ref=0 me_range=16 chroma_me=1 trellis=1 8x8dct=0 cqm=0 deadzone=21,11 chroma_qp_offset=0 threads=1 nr=0 decimate=1 mbaff=0 bframes=1 b_pyramid=0 b_adapt=1 b_bias=0 direct=3 wpredb=0 bime=0 keyint=250 keyint_min=25 scenecut=40 rc=2pass bitrate=4214 ratetol=1.0 rceq='blurCplx^(1-qComp)' qcomp=0.60 qpmin=10 qpmax=51 qpstep=4 cplxblur=20.0 qblur=0.5 ip_ratio=1.40 pb_ratio=1.30" [h264 @ 0x1f951e0] Reinit context to 1280x544, pix_fmt: yuv420p [h264 @ 0x1f951e0] no picture cur_dts is invalid (this is harmless if it occurs once at the start per stream) [h264 @ 0x1fa2050] nal_unit_type: 1, nal_ref_idc: 2 [h264 @ 0x1fdcf00] nal_unit_type: 1, nal_ref_idc: 0 cur_dts is invalid (this is harmless if it occurs once at the start per stream) [h264 @ 0x1f951e0] nal_unit_type: 1, nal_ref_idc: 2 [Parsed_fps_0 @ 0x1fa7f50] Setting 'fps' to value '1/10' [Parsed_fps_0 @ 0x1fa7f50] fps=1/10 [graph 0 input from stream 0:0 @ 0x1fa86f0] Setting 'video_size' to value '1280x544' [graph 0 input from stream 0:0 @ 0x1fa86f0] Setting 'pix_fmt' to value '0' [graph 0 input from stream 0:0 @ 0x1fa86f0] Setting 'time_base' to value '1/24000' [graph 0 input from stream 0:0 @ 0x1fa86f0] Setting 'pixel_aspect' to value '0/1' [graph 0 input from stream 0:0 @ 0x1fa86f0] Setting 'sws_param' to value 'flags=2' [graph 0 input from stream 0:0 @ 0x1fa86f0] Setting 'frame_rate' to value '24000/1001' [graph 0 input from stream 0:0 @ 0x1fa86f0] w:1280 h:544 pixfmt:yuv420p tb:1/24000 fr:24000/1001 sar:0/1 sws_param:flags=2 [format @ 0x1fa8030] compat: called with args=[yuvj420p|yuvj422p|yuvj444p] [format @ 0x1fa8030] Setting 'pix_fmts' to value 'yuvj420p|yuvj422p|yuvj444p' [auto_scaler_0 @ 0x1fa8660] Setting 'flags' to value 'bicubic' [auto_scaler_0 @ 0x1fa8660] w:iw h:ih flags:'bicubic' interl:0 [format @ 0x1fa8030] auto-inserting filter 'auto_scaler_0' between the filter 'Parsed_fps_0' and the filter 'format' [AVFilterGraph @ 0x1fa7b80] query_formats: 4 queried, 2 merged, 1 already done, 0 delayed [auto_scaler_0 @ 0x1fa8660] picking yuvj420p out of 3 ref:yuv420p alpha:0 [swscaler @ 0x1fa8f00] deprecated pixel format used, make sure you did set range correctly [auto_scaler_0 @ 0x1fa8660] w:1280 h:544 fmt:yuv420p sar:0/1 -> w:1280 h:544 fmt:yuvj420p sar:0/1 flags:0x4 [mjpeg @ 0x1f71f90] Forcing thread count to 1 for MJPEG encoding, use -thread_type slice or a constant quantizer if you want to use multiple cpu cores [mjpeg @ 0x1f71f90] intra_quant_bias = 96 inter_quant_bias = 0 Output #0, image2, to './out_img/ffmpeg-soft_x264_%d.jpg': Metadata: major_brand : isom minor_version : 1 compatible_brands: isomavc1 encoder : Lavf57.71.100 Stream #0:0(und), 0, 10/1: Video: mjpeg, yuvj420p(pc), 1280x544, q=2-31, 200 kb/s, 0.10 fps, 0.10 tbn, 0.10 tbc (default) Metadata: creation_time : 2007-02-19T05:03:04.000000Z handler_name : GPAC ISO Video Handler encoder : Lavc57.89.100 mjpeg Side data: cpb: bitrate max/min/avg: 0/0/200000 buffer size: 0 vbv_delay: -1 cur_dts is invalid (this is harmless if it occurs once at the start per stream) [Parsed_fps_0 @ 0x1fa7f50] Dropping 1 frame(s). [h264 @ 0x1fa2050] nal_unit_type: 1, nal_ref_idc: 0 ... [AVIOContext @ 0x2229af0] Statistics: 0 seeks, 1 writeouts No more output streams to write to, finishing. frame= 15 fps=0.1 q=1.6 Lsize=N/A time=00:02:30.00 bitrate=N/A speed=1.45x video:1382kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown Input file #0 (./The Simpsons Movie - Trailer_x264.mp4): Input stream #0:0 (video): 3288 packets read (72364468 bytes); 3288 frames decoded; Input stream #0:1 (audio): 1 packets read (134 bytes); Total: 3289 packets (72364602 bytes) demuxed Output file #0 (./out_img/ffmpeg-soft_x264_%d.jpg): Output stream #0:0 (video): 15 frames encoded; 15 packets muxed (1414925 bytes); Total: 15 packets (1414925 bytes) muxed bench: utime=164.240s 3288 frames successfully decoded, 0 decoding errors bench: maxrss=39936kB [Parsed_fps_0 @ 0x1fa7f50] 3288 frames in, 15 frames out; 3273 frames dropped, 0 frames duplicated. [AVIOContext @ 0x1f52230] Statistics: 73517562 bytes read, 5 seeks 

x265 MIPS SIMD (, )
 ffmpeg started on 2010-10-18 at 00:27:14 Report written to "ffmpeg-20101018-002714.log" Command line: ./ffmpeg-mips/ffmpeg-soft/bin/ffmpeg -i ./Tears_400_x265.mp4 -vf "fps=1" "./out_img/ffmpeg-soft_x265_%d.jpg" -report -benchmark ffmpeg version 3.3 Copyright (c) 2000-2017 the FFmpeg developers built with gcc 4.9.2 (Codescape GNU Tools 2016.05-03 for MIPS MTI Linux) configuration: --enable-cross-compile --prefix=../ffmpeg-soft --cross-prefix=/home/stas/mipsfpga/toolchain/mips-mti-linux-gnu/2016.05-03/bin/mips-mti-linux-gnu- --arch=mips --cpu=p5600 --target-os=linux --extra-cflags='-EL -static' --extra-ldflags='-EL -static' --disable-iconv --disable-msa libavutil 55. 58.100 / 55. 58.100 libavcodec 57. 89.100 / 57. 89.100 libavformat 57. 71.100 / 57. 71.100 libavdevice 57. 6.100 / 57. 6.100 libavfilter 6. 82.100 / 6. 82.100 libswscale 4. 6.100 / 4. 6.100 libswresample 2. 7.100 / 2. 7.100 Splitting the commandline. Reading option '-i' ... matched as input url with argument './Tears_400_x265.mp4'. Reading option '-vf' ... matched as option 'vf' (set video filters) with argument 'fps=1'. Reading option './out_img/ffmpeg-soft_x265_%d.jpg' ... matched as output url. Reading option '-report' ... matched as option 'report' (generate a report) with argument '1'. Reading option '-benchmark' ... matched as option 'benchmark' (add timings for benchmarking) with argument '1'. Finished splitting the commandline. Parsing a group of options: global . Applying option report (generate a report) with argument 1. Applying option benchmark (add timings for benchmarking) with argument 1. Successfully parsed a group of options. Parsing a group of options: input url ./Tears_400_x265.mp4. Successfully parsed a group of options. Opening an input file: ./Tears_400_x265.mp4. [file @ 0x1f4a0e0] Setting default whitelist 'file,crypto' [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1f49980] Format mov,mp4,m4a,3gp,3g2,mj2 probed with size=2048 and score=100 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1f49980] ISO: File Type Major Brand: iso4 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1f49980] Unknown dref type 0x206c7275 size 12 [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1f49980] Before avformat_find_stream_info() pos: 705972 bytes read:32827 seeks:1 nb_streams:1 [hevc @ 0x1f4aca0] nal_unit_type: 32(VPS), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1f4aca0] Decoding VPS [hevc @ 0x1f4aca0] Main profile bitstream [hevc @ 0x1f4aca0] nal_unit_type: 33(SPS), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1f4aca0] Decoding SPS [hevc @ 0x1f4aca0] Main profile bitstream [hevc @ 0x1f4aca0] Decoding VUI [hevc @ 0x1f4aca0] nal_unit_type: 34(PPS), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1f4aca0] Decoding PPS [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1f49980] All info found [mov,mp4,m4a,3gp,3g2,mj2 @ 0x1f49980] After avformat_find_stream_info() pos: 20299 bytes read:65595 seeks:2 frames:1 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from './Tears_400_x265.mp4': Metadata: major_brand : iso4 minor_version : 1 compatible_brands: iso4hvc1 creation_time : 2014-08-25T18:10:46.000000Z Duration: 00:00:13.96, start: 0.125000, bitrate: 404 kb/s Stream #0:0(und), 1, 1/24000: Video: hevc (Main) (hvc1 / 0x31637668), yuv420p(tv), 1920x800, 402 kb/s, 24 fps, 24 tbr, 24k tbn, 24 tbc (default) Metadata: creation_time : 2014-08-25T18:10:46.000000Z handler_name : hevc:fps=24@GPAC0.5.1-DEV-rev4807 Successfully opened the file. Parsing a group of options: output url ./out_img/ffmpeg-soft_x265_%d.jpg. Applying option vf (set video filters) with argument fps=1. Successfully parsed a group of options. Opening an output file: ./out_img/ffmpeg-soft_x265_%d.jpg. Successfully opened the file. detected 2 logical cores [hevc @ 0x1f61a00] nal_unit_type: 32(VPS), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1f61a00] Decoding VPS [hevc @ 0x1f61a00] Main profile bitstream [hevc @ 0x1f61a00] nal_unit_type: 33(SPS), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1f61a00] Decoding SPS [hevc @ 0x1f61a00] Main profile bitstream [hevc @ 0x1f61a00] Decoding VUI [hevc @ 0x1f61a00] nal_unit_type: 34(PPS), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1f61a00] Decoding PPS Stream mapping: Stream #0:0 -> #0:0 (hevc (native) -> mjpeg (native)) Press [q] to stop, [?] for help cur_dts is invalid (this is harmless if it occurs once at the start per stream) cur_dts is invalid (this is harmless if it occurs once at the start per stream) [hevc @ 0x1f61a00] nal_unit_type: 39(SEI_PREFIX), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1f61a00] nal_unit_type: 39(SEI_PREFIX), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1f61a00] nal_unit_type: 19(IDR_W_RADL), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1f61a00] Decoding SEI [hevc @ 0x1f61a00] Skipped PREFIX SEI 5 [hevc @ 0x1f61a00] Decoding SEI [hevc @ 0x1f61a00] Skipped PREFIX SEI 6 cur_dts is invalid (this is harmless if it occurs once at the start per stream) [hevc @ 0x1f7aba0] nal_unit_type: 1(TRAIL_R), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1f884d0] nal_unit_type: 1(TRAIL_R), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1f884d0] Output frame with POC 0. [hevc @ 0x1f61a00] Decoded frame with POC 0. cur_dts is invalid (this is harmless if it occurs once at the start per stream) [hevc @ 0x1f61a00] nal_unit_type: 0(TRAIL_N), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1f61a00] Output frame with POC 1. [hevc @ 0x1f7aba0] Decoded frame with POC 5. cur_dts is invalid (this is harmless if it occurs once at the start per stream) [hevc @ 0x1f7aba0] nal_unit_type: 0(TRAIL_N), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1f7aba0] Output frame with POC 2. [hevc @ 0x1f884d0] Decoded frame with POC 3. [Parsed_fps_0 @ 0x1f966c0] Setting 'fps' to value '1' [Parsed_fps_0 @ 0x1f966c0] fps=1/1 [graph 0 input from stream 0:0 @ 0x1f96bb0] Setting 'video_size' to value '1920x800' [graph 0 input from stream 0:0 @ 0x1f96bb0] Setting 'pix_fmt' to value '0' [graph 0 input from stream 0:0 @ 0x1f96bb0] Setting 'time_base' to value '1/24000' [graph 0 input from stream 0:0 @ 0x1f96bb0] Setting 'pixel_aspect' to value '0/1' [graph 0 input from stream 0:0 @ 0x1f96bb0] Setting 'sws_param' to value 'flags=2' [graph 0 input from stream 0:0 @ 0x1f96bb0] Setting 'frame_rate' to value '24/1' [graph 0 input from stream 0:0 @ 0x1f96bb0] w:1920 h:800 pixfmt:yuv420p tb:1/24000 fr:24/1 sar:0/1 sws_param:flags=2 [format @ 0x1f96ad0] compat: called with args=[yuvj420p|yuvj422p|yuvj444p] [format @ 0x1f96ad0] Setting 'pix_fmts' to value 'yuvj420p|yuvj422p|yuvj444p' [hevc @ 0x1f61a00] Decoded frame with POC 1. [hevc @ 0x1f7aba0] Decoded frame with POC 2. [auto_scaler_0 @ 0x1f96350] Setting 'flags' to value 'bicubic' [auto_scaler_0 @ 0x1f96350] w:iw h:ih flags:'bicubic' interl:0 [format @ 0x1f96ad0] auto-inserting filter 'auto_scaler_0' between the filter 'Parsed_fps_0' and the filter 'format' [AVFilterGraph @ 0x1f962f0] query_formats: 4 queried, 2 merged, 1 already done, 0 delayed [auto_scaler_0 @ 0x1f96350] picking yuvj420p out of 3 ref:yuv420p alpha:0 [swscaler @ 0x2153da0] deprecated pixel format used, make sure you did set range correctly [auto_scaler_0 @ 0x1f96350] w:1920 h:800 fmt:yuv420p sar:0/1 -> w:1920 h:800 fmt:yuvj420p sar:0/1 flags:0x4 [mjpeg @ 0x1f5eba0] Forcing thread count to 1 for MJPEG encoding, use -thread_type slice or a constant quantizer if you want to use multiple cpu cores [mjpeg @ 0x1f5eba0] intra_quant_bias = 96 inter_quant_bias = 0 Output #0, image2, to './out_img/ffmpeg-soft_x265_%d.jpg': Metadata: major_brand : iso4 minor_version : 1 compatible_brands: iso4hvc1 encoder : Lavf57.71.100 Stream #0:0(und), 0, 1/1: Video: mjpeg, yuvj420p(pc), 1920x800, q=2-31, 200 kb/s, 1 fps, 1 tbn, 1 tbc (default) Metadata: creation_time : 2014-08-25T18:10:46.000000Z handler_name : hevc:fps=24@GPAC0.5.1-DEV-rev4807 encoder : Lavc57.89.100 mjpeg Side data: cpb: bitrate max/min/avg: 0/0/200000 buffer size: 0 vbv_delay: -1 cur_dts is invalid (this is harmless if it occurs once at the start per stream) [hevc @ 0x1f884d0] nal_unit_type: 0(TRAIL_N), nuh_layer_id: 0, temporal_id: 0 [hevc @ 0x1f884d0] Output frame with POC 3. [Parsed_fps_0 @ 0x1f966c0] Dropping 1 frame(s). frame= 0 fps=0.0 q=0.0 size=N/A time=00:00:00.00 bitrate=N/A speed= 0x cur_dts is invalid (this is harmless if it occurs once at the start per stream) [Parsed_fps_0 @ 0x1f966c0] Dropping 1 frame(s). [hevc @ 0x1f61a00] nal_unit_type: 1(TRAIL_R), nuh_layer_id: 0, temporal_id: 0 cur_dts is invalid (this is harmless if it occurs once at the start per stream) [hevc @ 0x1f61a00] Output frame with POC 4. ... No more output streams to write to, finishing. frame= 15 fps=0.5 q=24.8 Lsize=N/A time=00:00:15.00 bitrate=N/A speed=0.451x video:1084kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown Input file #0 (./Tears_400_x265.mp4): Input stream #0:0 (video): 335 packets read (701773 bytes); 335 frames decoded; Total: 335 packets (701773 bytes) demuxed Output file #0 (./out_img/ffmpeg-soft_x265_%d.jpg): Output stream #0:0 (video): 15 frames encoded; 15 packets muxed (1109604 bytes); Total: 15 packets (1109604 bytes) muxed bench: utime=52.330s 335 frames successfully decoded, 0 decoding errors bench: maxrss=72480kB [Parsed_fps_0 @ 0x1f966c0] 335 frames in, 15 frames out; 320 frames dropped, 0 frames duplicated. [AVIOContext @ 0x1f52220] Statistics: 734659 bytes read, 2 seeks 

findings



Links


[L1] — -1 ;
[L2] — MIPSfpga-plus github ;
[L3] — P-Class P5600 Multiprocessor Core ;
[L4] — MIPS, ;
[L5] — Texas Instruments. Digital Signal Processors ;
[L6] — GPU ;
[L7] — OpenCL. ;
[L8] — Wikipedia: ;
[L9] — TFilter. Free online FIR filter design tool ;
[L10] — Wikipedia: ;
[L11] — Wikipedia: - ;
[L12] — Wikipedia: - ;
[L13] — Wikipedia: SIMD ;
[L14] — Wikipedia: ;
[L15] — MIPS SIMD ;
[L16] — GCC: MIPS SIMD Architecture (MSA) Support ;
[L17] — GCC: MIPS SIMD Architecture Built-in Functions ;
[L18] — ffmpeg github ( libavcodec/mips/) ;
[L19] — FFmpeg multimedia framework ;
[L20] — Codescape MIPS SDK ;
[L21] — H.264 Demo Clips ;
[L22] — x256. Sample HEVC Video Files ;
[L23] — ffmpeg
[L24] — ;


Documentation


[D1] — ., . — ;
[D2] — MIPS Architecture for Programmers Volume IV-j: The MIPS32 SIMD Architecture Module ;
[D3] — MIPS SIMD programming. Optimizing multimedia codecs ;



[P1] — - -1 . (: L1 );
[P2] — TFilter. 1 ();
[P3] — TFilter. 1 ;
[P4] — TFilter. 2 ();
[P5] — TFilter. 2 ;
[P6] — SIMD- (: D3 );
[P7] — MSA Vector registers (: D2 );
[P8] — MADDV Operation description (: D2 );


')

Source: https://habr.com/ru/post/328566/


All Articles