📜 ⬆️ ⬇️

Get the difference between binary files using vcdiff

tortoise.jpg tortoise_bad.jpg


It took me to understand where and how the JPEG file is corrupted during the transfer.


VCDIFF is a format and algorithm for delta coding. Described in RFC 3284 .

Delta encoding (English Delta encoding) - a way to present data as a difference (delta) between serial data instead of the data itself.

For example, I use text files encoded in Windows-1251 for clarity. But with the same success it can be binary files.


Sources:


"  " ( source.txt ) "  " ( target.txt ) 

Need to get the difference between files:


 "  " ( source.txt -> target.txt ) "  " ( target.txt -> source.txt ) 

I use the xdelta3 program but I think any one that works with the vcdiff format will do.


How to get


We will need another file filled with spaces:


 " " ( spaces.txt ) 

It must be greater than or equal in size to the source file (source.txt)


Team:


 xdelta3 -e -A -n -s source.txt target.txt | xdelta3 -d -s spaces.txt 

Result:


   

Flags used:
-e - delta creation
-A - removes extra headers
-n - removes crc (it does not allow to use a delta with another source)
-s [] - the source with which the target file is compared and restored
-d - get target file from delta and source


How it works


If you run the command:


 xdelta3 -e -A -n -s source.txt target.txt | xdelta3 printdelta 

Then after all the headers see the commands VCDIFF


  Offset Code Type1 Size1 @Addr1 + Type2 Size2 @Addr2 000000 025 CPY_0 9 S@0 000009 010 ADD 9 000018 025 CPY_0 9 S@14 

VCDIFF is inherently very simple. It consists of 3 teams.


COPY (copy) - copies data from a source or target.
ADD (add) - writes to the target file the data stored in the delta (unique data that is not in the source)
RUN (repeat) - repeats one byte from the delta a specified number of times.


Delta stores only unique data and the rest is copied from the source. If you run the command:


 xdelta3 -e -A -n -s source.txt target.txt > target.vcdiff 

We will see in the delta only the word "changes" which is only in the target file.


 D0A6D093D094200102011720131B2009 0302190D0A19200E 

( JSON does not like special characters, so I translated them into HEX )


If the delta is applied on the source (source.txt), then we get the target file (target.txt)


 xdelta3 -d -s source.txt target.vcdiff    

Replacing the source (source.txt) with a file filled with spaces (spaces.txt) we replaced the data that is repeated in the source and in the target file with spaces.


 xdelta3 -d -s spaces.txt target.vcdiff  

You can use any other character in the spaces.txt file. The main condition is that the spaces.txt file is greater than or equal in size to the source file.


I actually compared the JPEG files like this:


 xdelta3 -e -A -n -s bad_image.jpg good_image.jpg | xdelta3 -d -s spaces.txt 

The result of comparing these files:


View result
  F488A2 F2AB 

Many spaces and bytes that were "broken". Broken bytes translated to HEX.


Test jpeg files on which you can test comparison methods:


magnet tortoise.jpg (18,821 b)magnet tortoise_bad.jpg (18 829 b)
tortoise.jpgtortoise_bad.jpg

 xdelta3 -e -A -n -s tortoise_bad.jpg tortoise.jpg | xdelta3 -d -s spaces.txt 

The result of comparing these files:


View result
  F1BF F0B786 F39BAF F3BD94 

Broken bytes translated to HEX.


')

Source: https://habr.com/ru/post/419883/


All Articles