📜 ⬆️ ⬇️

How to make friends MD5 in Java and PHP for UTF8 strings

Preamble


Somehow, my company decided to integrate a forum written in PHP with our employee management system written in Java. To integrate in this case is to update the employee’s account on the forum in case of changes in his data in our system. And they entrusted this business to me (the PHP part) and to my colleague Ivan (the Java part). I create a small web API, it writes a function that, in case of changes in the employee’s data in our system, accesses the API and updates the employee’s account on the forum. The task is small, 3 days from the strength to write and debug everything. Naturally, we didn’t want to mess with coding - there’s nothing secret about the name, position, work phone, and other employee data. But somehow it was necessary to protect oneself from the fact that someone else could access the API and change the employee’s data. It was decided to sign the message using magic phrase. As the magic phrase, it was decided to take MD5 (login + position + salt), where salt (salt) is a kind of constant string. We implemented all of this, started testing - and it turned out that MD5 calculated for some employee in PHP does not coincide with that calculated for the same employee in our system written in Java. Our data on both sides was in UTF8. And I decided to figure out what was wrong.


Formulation of the problem


Given: a UTF8 encoded string from which you need to get an MD5 hash.
Necessary: ​​to determine why MD5 hashes calculated using Java and PHP differ.

The process of finding a solution


Take the classic string from the manual - "Hello world!" UTF8 encoded and will compare its hashes in PHP and Java.
')

We write PHP script


Well, everything is simple. Create a file in UTF8 encoding (I used Notepad ++ for this) and write the following code into it:

<?php header( "Content-Type: text/html; charset=UTF-8" ); $utf8string = ", !" ; echo '<pre>' .$utf8string. '</pre>' ; echo '<pre>' .md5($utf8string). '</pre>' ; ?> * This source code was highlighted with Source Code Highlighter .
  1. <?php header( "Content-Type: text/html; charset=UTF-8" ); $utf8string = ", !" ; echo '<pre>' .$utf8string. '</pre>' ; echo '<pre>' .md5($utf8string). '</pre>' ; ?> * This source code was highlighted with Source Code Highlighter .
  2. <?php header( "Content-Type: text/html; charset=UTF-8" ); $utf8string = ", !" ; echo '<pre>' .$utf8string. '</pre>' ; echo '<pre>' .md5($utf8string). '</pre>' ; ?> * This source code was highlighted with Source Code Highlighter .
  3. <?php header( "Content-Type: text/html; charset=UTF-8" ); $utf8string = ", !" ; echo '<pre>' .$utf8string. '</pre>' ; echo '<pre>' .md5($utf8string). '</pre>' ; ?> * This source code was highlighted with Source Code Highlighter .
  4. <?php header( "Content-Type: text/html; charset=UTF-8" ); $utf8string = ", !" ; echo '<pre>' .$utf8string. '</pre>' ; echo '<pre>' .md5($utf8string). '</pre>' ; ?> * This source code was highlighted with Source Code Highlighter .
  5. <?php header( "Content-Type: text/html; charset=UTF-8" ); $utf8string = ", !" ; echo '<pre>' .$utf8string. '</pre>' ; echo '<pre>' .md5($utf8string). '</pre>' ; ?> * This source code was highlighted with Source Code Highlighter .
  6. <?php header( "Content-Type: text/html; charset=UTF-8" ); $utf8string = ", !" ; echo '<pre>' .$utf8string. '</pre>' ; echo '<pre>' .md5($utf8string). '</pre>' ; ?> * This source code was highlighted with Source Code Highlighter .
<?php header( "Content-Type: text/html; charset=UTF-8" ); $utf8string = ", !" ; echo '<pre>' .$utf8string. '</pre>' ; echo '<pre>' .md5($utf8string). '</pre>' ; ?> * This source code was highlighted with Source Code Highlighter .


Header ("Content-Type: text / html; charset = UTF-8"); I added in order not to switch the encoding in the browser (for my Apache from Denver, the default encoding was win1251, of course).

We look in the browser that we have left:
Hello World!
c446a2994f35689482651b7c7ba8b56c



Writing a console program in Java


Similarly, create a file in UTF8 encoding and write the following code:
  1. public class Md5Tester {
  2. public static void main ( String [] args) throws java.io.UnsupportedEncodingException, java.security.NoSuchAlgorithmException {
  3. java.io.PrintStream sysout = new java.io.PrintStream (System. out , true , "UTF-8" );
  4. String utf8_string = "Hello world!" ;
  5. sysout.println (utf8_string);
  6. java.security.MessageDigest md5 = java.security.MessageDigest.getInstance ( "MD5" );
  7. byte [] md5_byte_array = md5.digest (utf8_string.getBytes ());
  8. String md5_string = new String (md5_byte_array);
  9. sysout.println (md5_string);
  10. }
  11. }
* This source code was highlighted with Source Code Highlighter .


Run (I did this in IntelliJ IDEA):
C: \ Sun \ SDK \ jdk \ bin \ java -Didea.launcher.port = 7552 "-Didea.launcher.bin.path = C: \ Program Files (x86) \ JetBrains \ IntelliJ IDEA 8.1.3 \ bin" - Dfile.encoding = UTF-8 ...
Hello World!
F O5h e | { l


(everything that went after -Dfile.encoding = UTF-8 in the start line I dropped so as not to clutter up the example).

As we can see, we have md5 hash in the console but not in hexadecimal. The first idea is to use BigInteger to get the string in hexadecimal.
  1. ...
  2. java.math.BigInteger md5_biginteger = new java.math.BigInteger (1, md5_byte_array);
  3. sysout.println (md5_biginteger.toString (16));
* This source code was highlighted with Source Code Highlighter .

Result:
Hello World!
c446a2994f35689482651b7c7ba8b56c

It seems we got what we wanted. However, let's not hurry and compare the hashes of some other string. Take the string whose hash contains leading 0: "rbablord5". Checking:
rbablord5
9736a8436e10bf1991927f2ffc76c12

While the correct hash is: 0 9736a8436e10bf1991927f2ffc76c12. Remarkably, such an error is quite frequent, it was even once in MySQL (I found a bug report in their tracker bugs.mysql.com/bug.php?id=27623 ). Then I realized that I was obviously inventing a bicycle, and after walking a little, I found the commons.apache.org/codec library. Having connected it, you can simply write:
  1. String md5_string = DigestUtils.md5Hex (utf8_string);
* This source code was highlighted with Source Code Highlighter .

And get the desired result. For those who do not want, for the sake of one md5 function, to connect an additional library to the project (there is still a lot of useful information in the library, see commons.apache.org/codec/api-release/index.html ) you can pick up the encodeHex function:

  1. private static final char [] DIGITS_LOWER = { '0' , '1' , '2' , '3' , '4' , '5' , '6' , '7' , '8' , '9' , ' a ' , ' b ' , ' c ' , ' d ' , ' e ' , ' f ' };
  2. private static final char [] DIGITS_UPPER = { '0' , '1' , '2' , '3' , '4' , '5' , '6' , '7' , '8' , '9' , ' A ' , ' B ' , ' C ' , ' D ' , ' E ' , ' F ' };
  3. protected static String encodeHex ( byte [] data, char [] toDigits) {
  4. int l = data.length;
  5. char [] out = new char [l << 1];
  6. // two characters form the hex value.
  7. for ( int i = 0, j = 0; i <l; i ++) {
  8. out [j ++] = toDigits [(0xF0 & data [i]) >>> 4];
  9. out [j ++] = toDigits [0x0F & data [i]];
  10. }
  11. return new String ( out );
  12. }
* This source code was highlighted with Source Code Highlighter .



Conclusion


When organizing data exchange between two systems using different technologies / programming languages, be vigilant, do not rely on the fact that the functions implementing the same algorithms completely match the input and output data formats. Most of the time, you will have to make an effort to dock the formats. But you do not need to try to reinvent the wheel (like me), if both systems are quite common, then this task has already been solved by someone before you.

PS I, unfortunately, lost a piece of code that was previously used in our Java system. I remember that there was a bike with BigInteger and some not very clear (for me, in any case) checks.

Source: https://habr.com/ru/post/73952/


All Articles