📜 ⬆️ ⬇️

Fast algorithm for calculating checksum for large JAR files

Task


Write a fast-Java fast algorithm for calculating the JAR file checksum (> 1G), if possible, do without third-party libraries.

Standard way


Use MD5 digest for the entire contents of the file.

MessageDigest digest = MessageDigest . getInstance ( "MD5" ) ; <br/>
byte [ ] buf = new byte [ 1024 ] ; <br/>
int len = 0 ; <br/>
InputStream stream = new BufferedInputStream ( new FileInputStream ( new File ( "/path/to/jar/file" ) ) ) ; <br/>
while ( ( len = stream. read ( buf ) ) > 0 ) { <br/>
digest. update ( buf, 0 , len ) ; <br/>
} <br/>
stream. close ( ) ; <br/>
byte [ ] md5sum = digest. digest ( ) ; <br/>


But, you can take advantage of the fact that the JAR already contains a CRC for each of the files in the archive.
We use MD5 digest only for CRC sequence.
')

Final version


import java.io.* ; <br/>
import java.math.* ; <br/>
import java.security.* ; <br/>
import java.util.* ; <br/>
import java.util.jar.* ; <br/>
<br/>
public class JarFileChecksum { <br/>
private final File jarFile ; <br/>
<br/>
public JarFileChecksum ( File jarFile ) { <br/>
this . jarFile = jarFile ; <br/>
} <br/>
<br/>
public String getChecksum ( ) throws Exception { <br/>
MessageDigest digest = MessageDigest . getInstance ( "MD5" ) ; <br/>
JarFile jar = new JarFile ( jarFile ) ; <br/>
int crc ; <br/>
byte [ ] buf = new byte [ 4 ] ; <br/>
for ( Enumeration < JarEntry > e=jar. entries ( ) ; e. hasMoreElements ( ) ; ) { <br/>
JarEntry entry = e. nextElement ( ) ; <br/>
// CRC integer <br/>
crc = ( int ) entry. getCrc ( ) ; <br/>
// split crc to bytes <br/>
buf [ 0 ] = ( byte ) ( ( crc >> 24 ) & 0xFF ) ; <br/>
buf [ 1 ] = ( byte ) ( ( crc >> 16 ) & 0xFF ) ; <br/>
buf [ 2 ] = ( byte ) ( ( crc >> 8 ) & 0xFF ) ; <br/>
buf [ 3 ] = ( byte ) ( crc & 0xFF ) ; <br/>
digest. update ( buf ) ; <br/>
} <br/>
jar. close ( ) ; <br/>
byte [ ] md5sum = digest. digest ( ) ; <br/>
// String <br/>
BigInteger bigInt = new BigInteger ( 1 , md5sum ) ; <br/>
return bigInt. toString ( 16 ) ; <br/>
} <br/>
}


Tests


A JAR file with a size of 1.6G, 37000 files and 1500 directories was chosen for testing.

First method: 140 seconds
Second method: 0.5 seconds

Source: https://habr.com/ru/post/91699/


All Articles