Delphi: Fast (de) JPEG Encoding with libjpeg-turbo

Once, while profiling a library for remote monitoring of a desktop, I found out that a lot of resources and time took encoding / decoding JPEG. Having studied third-party solutions to speed up this procedure, it was decided to use libjpeg-turbo.

Under the cat a lot of code on Delphi and described are the pitfalls of using the library

What is all this for?

The standard module jpeg.pas is a wrapper over libjpeg. libjpeg-turbo was created so that it would be easy to replace libjpeg with it, so it has a compatible API, with a huge speed gain.

Under the link you can see the comparison of libjpeg vs libjpeg-turbo vs intel-ipp. In a nutshell, this library is 3 times faster than libjpeg, and the same speed as Intel IPP but free.

Project delphi-jpeg-turbo

Before inventing a bicycle, I went through Google and came across a project delphi-jpeg-turbo . The project is certainly useful, but as it turned out, its implementation did not suit me:

The project did not fit, but the heders as a basis for their implementation proved to be very useful.

What happened

I will not write the libjpeg-turbo API, as I didn’t understand it deeply and, after what I saw, I hope that I no longer have to dig further into it. The source code of the Jpeg.pas module supplied with Delphi, which uses libjpeg with a compatible api, was very helpful in exploring the API. If in my implementation, what's wrong, please correct)

So, that's what I got:
unit suJpegTurboUnit; interface uses Windows, SysUtils, FastDIB; type //        JPEG  TOnEncodedJpegBuffer = reference to procedure(ABuffer: Pointer; ABufferSize: LongWord); // JPEG    function DecodeJpegTurbo(ABuffer: Pointer; ABufferLen: Integer; HQ: Boolean = True): TFastDIB; // JPEG procedure EncodeJpegTurbo(Source: TFastDIB; Quality: Integer; OnEncodedBuffer: TOnEncodedJpegBuffer); implementation uses suJpegTurboHeadersUnit, suJpegTurboMemDestUnit; var _LibInitialized: LongBool = False; //   procedure ErrorExit(cinfo: j_common_ptr); cdecl; var Msg: AnsiString; begin //   SetLength(Msg, JMSG_LENGTH_MAX); cinfo^.err^.format_message(cinfo, PAnsiChar(Msg)); //     #0  Msg  PAnsiChar raise Exception.CreateFmt('JPEG error #%d (%s)', [cinfo^.err^.msg_code, PAnsiChar(Msg)]); end; //      procedure OutputMessage(cinfo: j_common_ptr); cdecl; begin end; //  LibJpeg-Turbo procedure InitLib; begin if _LibInitialized then Exit; //  if not init_libJPEG then raise Exception.Create('initialization of libJPEG failed.'); //     if InterlockedCompareExchange(Integer(_LibInitialized), Integer(True), Integer(False)) = Integer(True) then //     ,      //   quit_libJPEG; end; // function DecodeJpegTurbo(ABuffer: pointer; ABufferLen: Integer; HQ: Boolean): TFastDIB; var Loop: Integer; JpegErr: jpeg_error_mgr; Jpeg: jpeg_decompress_struct; begin //  InitLib; FillChar(Jpeg, SizeOf(Jpeg), 0); FillChar(JpegErr, SizeOf(JpegErr), 0); //   jpeg_create_decompress(@Jpeg); try //    Jpeg.err := jpeg_std_error(@JpegErr); //   .  ,   //   LibJPEG   ,     MessageBox JpegErr.error_exit := ErrorExit; JpegErr.output_message := OutputMessage; jpeg_mem_src(@Jpeg, ABuffer, ABufferLen); // ,       jpeg_read_header(@jpeg, False); //     BGR jpeg.out_color_space := JCS_EXT_BGR; //  - 1:1 jpeg.scale_num := 1; jpeg.scale_denom := 1; //    If HQ then begin jpeg.do_block_smoothing := 1; jpeg.do_fancy_upsampling := 1; jpeg.dct_method := JDCT_ISLOW end else begin jpeg.do_block_smoothing := 0; jpeg.do_fancy_upsampling := 0; jpeg.dct_method := JDCT_IFAST; end; //  jpeg_start_decompress(@Jpeg); try Result := TFastDIB.Create(jpeg.output_width, jpeg.output_height, 24); try //  for Loop := 0 to jpeg.output_height - 1 do jpeg_read_scanlines(@jpeg, @Result.Scanlines[Result.Height - 1 - Loop], 1); except FreeAndNil(Result); raise; end; finally //  jpeg_finish_decompress(@Jpeg); end; finally //   jpeg_destroy_decompress(@Jpeg); end; end; //  procedure EncodeJpegTurbo(Source: TFastDIB; Quality: Integer; OnEncodedBuffer: TOnEncodedJpegBuffer); var ScanLine: JSAMPROW; CompressedBuff: Pointer; CompressedSize: LongWord; JpegErr: jpeg_error_mgr; Jpeg: jpeg_compress_struct; begin //  InitLib; FillChar(Jpeg, SizeOf(Jpeg), 0); FillChar(JpegErr, SizeOf(JpegErr), 0); //   jpeg_create_compress(@Jpeg); try //    Jpeg.err := jpeg_std_error(@JpegErr); //   .  ,   //   LibJPEG   ,     MessageBox JpegErr.error_exit := ErrorExit; JpegErr.output_message := OutputMessage; CompressedSize := 0; CompressedBuff := nil; //   jpeg_mem_dest -    . suJpegTurboMemDestUnit.jpeg_mem_dest(@Jpeg, @CompressedBuff, @CompressedSize); try jpeg.image_width := Source.Width; jpeg.image_height := Source.Height; jpeg.input_components := Source.Info.Header.BitCount div 8; jpeg.in_color_space := JCS_EXT_BGR; //Setting defaults jpeg_set_defaults(@Jpeg); //  jpeg_set_quality(@Jpeg, Quality, True); //  jpeg_start_compress(@Jpeg, True); try while Jpeg.next_scanline < Jpeg.image_height do begin ScanLine := JSAMPROW(Source.Scanlines[Jpeg.image_height - Jpeg.next_scanline - 1]); jpeg_write_scanlines(@Jpeg, @ScanLine, 1); end; finally //  jpeg_finish_compress(@Jpeg); end; //    if Assigned(OnEncodedBuffer) then OnEncodedBuffer(CompressedBuff, CompressedSize); finally //  FreeMemory(CompressedBuff); end; finally //   jpeg_destroy_compress(@Jpeg); end; end; initialization finalization //    if _LibInitialized then quit_libJPEG; end. 

While writing, there was a problem with the jpeg_mem_dest function, as it turned out, it allocates memory internally using the memory allocator from msvcrt.dll, and accordingly we need to manually release the memory using the mirror function from the same msvcrt.dll.

This option did not suit me for the reason that I use jpegturbo in which msvcrt is statically linked and the pointer to the memory release function is not exported. I had to write my own implementation of jpeg_mem_dest which uses the standard delphi memory allocator:
 {   jpeg_mem_dest  JpegTurbo.           RTL  Free (  ),      ,   ,  /    GetMemory/FreeMemory. ,  jpeg_mem_dest          FreeMemory.    Delphi  jdatadst.c } unit suJpegTurboMemDestUnit; interface uses suJpegTurboHeadersUnit; procedure jpeg_mem_dest(cinfo: j_compress_ptr; outbuffer: PPointer; outsize: PLongWord); implementation const OUTPUT_BUF_SIZE = 4096; //choose an efficiently fwrite'able size type my_mem_destination_mgr = record pub: jpeg_destination_mgr; //public fields outbuffer: PPointer; //target buffer outsize: PLongWord; newbuffer: Pointer; //newly allocated buffer buffer: JOCTET_ptr; //start of buffer bufsize: LongWord; end; my_mem_dest_ptr = ^my_mem_destination_mgr; //Initialize destination --- called by jpeg_start_compress //before any data is actually written. procedure init_mem_destination(cinfo: j_compress_ptr); cdecl; begin //no work necessary here end; { Empty the output buffer --- called whenever buffer fills up. In typical applications, this should write the entire output buffer (ignoring the current state of next_output_byte & free_in_buffer), reset the pointer & count to the start of the buffer, and return TRUE indicating that the buffer has been dumped. In applications that need to be able to suspend compression due to output overrun, a FALSE return indicates that the buffer cannot be emptied now. In this situation, the compressor will return to its caller (possibly with an indication that it has not accepted all the supplied scanlines). The application should resume compression after it has made more room in the output buffer. Note that there are substantial restrictions on the use of suspension --- see the documentation. When suspending, the compressor will back up to a convenient restart point (typically the start of the current MCU). next_output_byte & free_in_buffer indicate where the restart point will be if the current call returns FALSE. Data beyond this point will be regenerated after resumption, so do not write it out when emptying the buffer externally. } function empty_mem_output_buffer(cinfo: j_compress_ptr): Boolean; cdecl; var nextsize: LongWord; dest: my_mem_dest_ptr; nextbuffer: JOCTET_ptr; begin dest := my_mem_dest_ptr(cinfo^.dest); //Try to allocate new buffer with double size nextsize := dest^.bufsize * 2; nextbuffer := GetMemory(nextsize); if nextbuffer = nil then ERREXIT1(j_common_ptr(cinfo), JERR_OUT_OF_MEMORY, 10); Move(dest^.buffer^, nextbuffer^, dest^.bufsize); if dest^.newbuffer <> nil then FreeMemory(dest^.newbuffer); dest^.newbuffer := nextbuffer; dest^.pub.next_output_byte := JOCTET_ptr(PByte(nextbuffer) + dest^.bufsize); dest^.pub.free_in_buffer := dest^.bufsize; dest^.buffer := nextbuffer; dest^.bufsize := nextsize; Result := True; end; procedure term_mem_destination(cinfo: j_compress_ptr); cdecl; var dest: my_mem_dest_ptr; begin dest := my_mem_dest_ptr(cinfo^.dest); dest^.outbuffer^ := dest^.buffer; dest^.outsize^ := dest^.bufsize - dest^.pub.free_in_buffer; end; { Prepare for output to a memory buffer. The caller may supply an own initial buffer with appropriate size. Otherwise, or when the actual data output exceeds the given size, the library adapts the buffer size as necessary. The standard library functions GetMemory/FreeMemory are used for allocating larger memory, so the buffer is available to the application after finishing compression, and then the application is responsible for freeing the requested memory. } procedure jpeg_mem_dest(cinfo: j_compress_ptr; outbuffer: PPointer; outsize: PLongWord); var dest: my_mem_dest_ptr; begin if (outbuffer = nil) or (outsize = nil) then ERREXIT(j_common_ptr(cinfo), JERR_BUFFER_SIZE); if (cinfo^.dest = nil) then //first time for this JPEG object? cinfo^.dest := cinfo^.mem.alloc_small(j_common_ptr(cinfo), JPOOL_PERMANENT, SizeOf(my_mem_destination_mgr)); dest := my_mem_dest_ptr(cinfo^.dest); dest^.pub.init_destination := init_mem_destination; dest^.pub.empty_output_buffer := empty_mem_output_buffer; dest^.pub.term_destination := term_mem_destination; dest^.outbuffer := outbuffer; dest^.outsize := outsize; dest^.newbuffer := nil; if (outbuffer^ = nil) or (outsize^ = 0) then begin //Allocate initial buffer outbuffer^ := GetMemory(OUTPUT_BUF_SIZE); dest^.newbuffer := outbuffer^; if dest^.newbuffer = nil then ERREXIT1(j_common_ptr(cinfo), JERR_OUT_OF_MEMORY, 10); outsize^ := OUTPUT_BUF_SIZE; end; dest^.buffer := outbuffer^; dest^.pub.next_output_byte := dest^.buffer; dest^.bufsize := outsize^; dest^.pub.free_in_buffer := dest^.bufsize; end; end. suspend compression due to output {   jpeg_mem_dest  JpegTurbo.           Well, heders, they are in the module suJpegTurboHeadersUnit.pas, these are heders taken from delphi-jpeg-turbo with a couple of improvements:
 { Known color spaces. } J_COLOR_SPACE = ( JCS_UNKNOWN, { error/unspecified } JCS_GRAYSCALE, //* monochrome */ JCS_RGB, //* red/green/blue as specified by the RGB_RED, RGB_GREEN, //RGB_BLUE, and RGB_PIXELSIZE macros */ JCS_YCbCr, //* Y/Cb/Cr (also known as YUV) */ JCS_CMYK, //* C/M/Y/K */ JCS_YCCK, //* Y/Cb/Cr/K */ JCS_EXT_RGB, //* red/green/blue */ JCS_EXT_RGBX, //* red/green/blue/x */ JCS_EXT_BGR, //* blue/green/red */ JCS_EXT_BGRX, //* blue/green/red/x */ JCS_EXT_XBGR, //* x/blue/green/red */ JCS_EXT_XRGB, //* x/red/green/blue */ // When out_color_space it set to JCS_EXT_RGBX, JCS_EXT_BGRX, // JCS_EXT_XBGR, or JCS_EXT_XRGB during decompression, the X byte is // undefined, and in order to ensure the best performance, // libjpeg-turbo can set that byte to whatever value it wishes. Use // the following colorspace constants to ensure that the X byte is set // to 0xFF, so that it can be interpreted as an opaque alpha // channel. JCS_EXT_RGBA, ///* red/green/blue/alpha */ JCS_EXT_BGRA, //* blue/green/red/alpha */ JCS_EXT_ABGR, //* alpha/blue/green/red */ JCS_EXT_ARGB //* alpha/red/green/blue */ ); ... { Standard data source and destination managers: stdio streams. } { Caller is responsible for opening the file before and closing after. } // jpeg_stdio_dest: procedure(cinfo: j_compress_ptr; FILE * outfile); cdecl; // jpeg_stdio_src: procedure(cinfo: j_decompress_ptr; FILE * infile); cdecl; jpeg_mem_src: procedure(cinfo: j_decompress_ptr; inbuffer: Pointer; insize: LongWord); cdecl; jpeg_mem_dest: procedure(cinfo: j_decompress_ptr; outbuffer: Pointer; outsize: PLongWord); cdecl; ... Function init_libJPEG(): boolean; ... @jpeg_mem_src := GetProcAddress(libJPEG_Handle, 'jpeg_mem_src'); @jpeg_mem_dest := GetProcAddress(libJPEG_Handle, 'jpeg_mem_dest'); ... {$DEFINE JPEG_LIB_VERSION = 62} //Version 6b type J_MESSAGE_CODE = ( JMSG_NOMESSAGE, {$IF Declared(JPEG_LIB_VERSION) and (JPEG_LIB_VERSION < 70)} JERR_ARITH_NOTIMPL, {$IFEND} JERR_BAD_ALIGN_TYPE, JERR_BAD_ALLOC_CHUNK, JERR_BAD_BUFFER_MODE, JERR_BAD_COMPONENT_ID, {$IF Declared(JPEG_LIB_VERSION) and (JPEG_LIB_VERSION >= 70)} JERR_BAD_CROP_SPEC, {$IFEND} JERR_BAD_DCT_COEF, JERR_BAD_DCTSIZE, {$IF Declared(JPEG_LIB_VERSION) and (JPEG_LIB_VERSION >= 70)} JERR_BAD_DROP_SAMPLING, {$IFEND} JERR_BAD_HUFF_TABLE, JERR_BAD_IN_COLORSPACE, JERR_BAD_J_COLORSPACE, JERR_BAD_LENGTH, JERR_BAD_LIB_VERSION, JERR_BAD_MCU_SIZE, JERR_BAD_POOL_ID, JERR_BAD_PRECISION, JERR_BAD_PROGRESSION, JERR_BAD_PROG_SCRIPT, JERR_BAD_SAMPLING, JERR_BAD_SCAN_SCRIPT, JERR_BAD_STATE, JERR_BAD_STRUCT_SIZE, JERR_BAD_VIRTUAL_ACCESS, JERR_BUFFER_SIZE, JERR_CANT_SUSPEND, JERR_CCIR601_NOTIMPL, JERR_COMPONENT_COUNT, JERR_CONVERSION_NOTIMPL, JERR_DAC_INDEX, JERR_DAC_VALUE, JERR_DHT_INDEX, JERR_DQT_INDEX, JERR_EMPTY_IMAGE, JERR_EMS_READ, JERR_EMS_WRITE, JERR_EOI_EXPECTED, JERR_FILE_READ, JERR_FILE_WRITE, JERR_FRACT_SAMPLE_NOTIMPL, JERR_HUFF_CLEN_OVERFLOW, JERR_HUFF_MISSING_CODE, JERR_IMAGE_TOO_BIG, JERR_INPUT_EMPTY, JERR_INPUT_EOF, JERR_MISMATCHED_QUANT_TABLE, JERR_MISSING_DATA, JERR_MODE_CHANGE, JERR_NOTIMPL, JERR_NOT_COMPILED, {$IF Declared(JPEG_LIB_VERSION) and (JPEG_LIB_VERSION >= 70)} JERR_NO_ARITH_TABLE, {$IFEND} JERR_NO_BACKING_STORE, JERR_NO_HUFF_TABLE, JERR_NO_IMAGE, JERR_NO_QUANT_TABLE, JERR_NO_SOI, JERR_OUT_OF_MEMORY, JERR_QUANT_COMPONENTS, JERR_QUANT_FEW_COLORS, JERR_QUANT_MANY_COLORS, JERR_SOF_DUPLICATE, JERR_SOF_NO_SOS, JERR_SOF_UNSUPPORTED, JERR_SOI_DUPLICATE, JERR_SOS_NO_SOF, JERR_TFILE_CREATE, JERR_TFILE_READ, JERR_TFILE_SEEK, JERR_TFILE_WRITE, JERR_TOO_LITTLE_DATA, JERR_UNKNOWN_MARKER, JERR_VIRTUAL_BUG, JERR_WIDTH_OVERFLOW, JERR_XMS_READ, JERR_XMS_WRITE ); procedure ERREXIT(cinfo: j_common_ptr; code: J_MESSAGE_CODE); procedure ERREXIT1(cinfo: j_common_ptr; code: J_MESSAGE_CODE; p1: Integer); ... //  jerror.h //Fatal errors (print message and exit) procedure ERREXIT(cinfo: j_common_ptr; code: J_MESSAGE_CODE); begin cinfo^.err^.msg_code := Ord(code); cinfo^.err^.error_exit(j_common_ptr(cinfo)); end; procedure ERREXIT1(cinfo: j_common_ptr; code: J_MESSAGE_CODE; p1: Integer); begin cinfo^.err^.msg_code := Ord(code); cinfo^.err^.msg_parm.i[0] := p1; cinfo^.err^.error_exit(j_common_ptr(cinfo)); end; 

Written and tested on Delphi 2010.
All sources can be downloaded from the link.

