📜 ⬆️ ⬇️

Java programmer cheat sheet 7.1 Typical tasks: The optimal way to convert an InputStream to a string


I have a hobby: I collect various solutions of typical tasks in Java, which I find in the internet, and I try to choose the most optimal in size / performance / elegance. First of all in performance. Let's consider such a typical task that is often found in Java programming as “converting an InputStream into a string” and various options for its solution.


Let's see what limitations each person has (requirements for connecting a specific library / specific version, working correctly with unicode, etc.). The English version of this article can be found in my response to stackoverflow . Tests in my github project.



Convert InputStream to String


A very common task, let's consider in what ways it can be done (there will be 11 of them):


  1. Using IOUtils.toString from the Apache Commons library. One of the shortest one-liners.


     String result = IOUtils.toString(inputStream, StandardCharsets.UTF_8); 

  2. Using CharStreams from the guava library. Also pretty short code.


     try(InputStreamReader reader = new InputStreamReader(inputStream, Charsets.UTF_8)) { String result = CharStreams.toString(reader); } 

  3. Using Scanner ( JDK ). The solution is short, tricky, with the help of pure JDK, but this is more likely a hack that the brain will make to those who do not know about such a focus.


     try(Scanner s = new Scanner(inputStream).useDelimiter("\\A")) { String result = s.hasNext() ? s.next() : ""; } 

  4. Using Stream Api using Java 8 . Warning : It replaces different line breaks (such as \r\n ) with \n , sometimes it can be critical.


     try(BufferedReader br = new BufferedReader(new InputStreamReader(inputStream))) { String result = br.lines().collect(Collectors.joining("\n")); } 

  5. Using parallel Stream Api ( Java 8 ). Warning : Like solution 4, it replaces different line breaks (such as \r\n ) with \n .


     try(BufferedReader br = new BufferedReader(new InputStreamReader(inputStream))) { String result = br.lines().parallel().collect(Collectors.joining("\n")); } 

  6. Using InputStreamReader and StringBuilder from a regular JDK


     final int bufferSize = 1024; final char[] buffer = new char[bufferSize]; final StringBuilder out = new StringBuilder(); try(Reader in = new InputStreamReader(inputStream, "UTF-8")) { for (; ; ) { int rsz = in.read(buffer, 0, buffer.length); if (rsz < 0) break; out.append(buffer, 0, rsz); } return out.toString(); } 

  7. Using StringWriter and IOUtils.copy from Apache Commons


     try(StringWriter writer = new StringWriter()) { IOUtils.copy(inputStream, writer, "UTF-8"); return writer.toString(); } 

  8. Using ByteArrayOutputStream and inputStream.read from JDK


     try(ByteArrayOutputStream result = new ByteArrayOutputStream()) { byte[] buffer = new byte[1024]; int length; while ((length = inputStream.read(buffer)) != -1) { result.write(buffer, 0, length); } return result.toString("UTF-8"); } 

  9. Using BufferedReader from JDK . Warning : This solution replaces different line line.separator (such as \n\r ) with the line.separator system property (for example, in Windows with "\ r \ n").


     String newLine = System.getProperty("line.separator"); try(BufferedReader reader = new BufferedReader(new InputStreamReader(inputStream))) { StringBuilder result = new StringBuilder(); String line; boolean flag = false; while ((line = reader.readLine()) != null) { result.append(flag? newLine: "").append(line); flag = true; } return result.toString(); } 

  10. Using BufferedInputStream and ByteArrayOutputStream from JDK


     try(BufferedInputStream bis = new BufferedInputStream(inputStream); ByteArrayOutputStream buf = new ByteArrayOutputStream()) { int result = bis.read(); while(result != -1) { buf.write((byte) result); result = bis.read(); } return buf.toString(); } 

  11. Using inputStream.read () and StringBuilder ( JDK ). Warning : This solution does not work with Unicode, for example with Russian text.


     int ch; StringBuilder sb = new StringBuilder(); while((ch = inputStream.read()) != -1) sb.append((char)ch); reset(); return sb.toString(); 


So about the use :


  1. Solutions 4 , 5 and 9 convert different line breaks into one.


  2. Solutions 11 does not work with Unicode text.


  3. Solution 1 , 7 requires the use of the Apache Commons library, 2 requires the Guava library, 4 and 5 require Java 8 and higher,

Performance measurements


Warning : performance measurements are always highly dependent on the system, measurement conditions, etc. I measured on two different computers, one Windows 8.1, Intel i7-4790 CPU 3.60GHz 2, 16Gb, the second - Linux Mint 17.2, Celeron Dual-Core T3500 2.10Ghz 2, 6Gb, but I can not guarantee that the results are absolutely true, you You can always repeat the tests ( test1 and test2 ) on your system.


Performance measurements for small lines (length = 175), tests can be found on github (mode = average execution time (AverageTime), system = Linux Mint 17.2, Celeron Dual-Core T3500 2.10Ghz * 2, 6Gb, the lower the better, 1,343 - the best):


  Benchmark Mode Cnt Score Error Units 8. ByteArrayOutputStream and read (JDK) avgt 10 1,343 ± 0,028 us/op 6. InputStreamReader and StringBuilder (JDK) avgt 10 6,980 ± 0,404 us/op 10.BufferedInputStream, ByteArrayOutputStream avgt 10 7,437 ± 0,735 us/op 11.InputStream.read() and StringBuilder (JDK) avgt 10 8,977 ± 0,328 us/op 7. StringWriter and IOUtils.copy (Apache) avgt 10 10,613 ± 0,599 us/op 1. IOUtils.toString (Apache Utils) avgt 10 10,605 ± 0,527 us/op 3. Scanner (JDK) avgt 10 12,083 ± 0,293 us/op 2. CharStreams (guava) avgt 10 12,999 ± 0,514 us/op 4. Stream Api (Java 8) avgt 10 15,811 ± 0,605 us/op 9. BufferedReader (JDK) avgt 10 16,038 ± 0,711 us/op 5. parallel Stream Api (Java 8) avgt 10 21,544 ± 0,583 us/op 

Performance measurements for large lines (length = 50100), tests can be found on github (mode = average execution time (AverageTime), system = Linux Mint 17.2, Celeron Dual-Core T3500 2.10Ghz * 2, 6Gb, the lower the better, 200,715 - the best):


  Benchmark Mode Cnt Score Error Units 8. ByteArrayOutputStream and read (JDK) avgt 10 200,715 ± 18,103 us/op 1. IOUtils.toString (Apache Utils) avgt 10 300,019 ± 8,751 us/op 6. InputStreamReader and StringBuilder (JDK) avgt 10 347,616 ± 130,348 us/op 7. StringWriter and IOUtils.copy (Apache) avgt 10 352,791 ± 105,337 us/op 2. CharStreams (guava) avgt 10 420,137 ± 59,877 us/op 9. BufferedReader (JDK) avgt 10 632,028 ± 17,002 us/op 5. parallel Stream Api (Java 8) avgt 10 662,999 ± 46,199 us/op 4. Stream Api (Java 8) avgt 10 701,269 ± 82,296 us/op 10.BufferedInputStream, ByteArrayOutputStream avgt 10 740,837 ± 5,613 us/op 3. Scanner (JDK) avgt 10 751,417 ± 62,026 us/op 11.InputStream.read() and StringBuilder (JDK) avgt 10 2919,350 ± 1101,942 us/op 

Graph of average time versus line length, Windows 8.1 system, Intel i7-4790 CPU 3.60GHz 3.60GHz, 16Gb:
enter image description here


Table of dependence of the average time on the length of the line, Windows 8.1 system, Intel i7-4790 CPU 3.60GHz 3.60GHz, 16Gb:


   182 546 1092 3276 9828 29484 58968 test8 0.38 0.938 1.868 4.448 13.412 36.459 72.708 test4 2.362 3.609 5.573 12.769 40.74 81.415 159.864 test5 3.881 5.075 6.904 14.123 50.258 129.937 166.162 test9 2.237 3.493 5.422 11.977 45.98 89.336 177.39 test6 1.261 2.12 4.38 10.698 31.821 86.106 186.636 test7 1.601 2.391 3.646 8.367 38.196 110.221 211.016 test1 1.529 2.381 3.527 8.411 40.551 105.16 212.573 test3 3.035 3.934 8.606 20.858 61.571 118.744 235.428 test2 3.136 6.238 10.508 33.48 43.532 118.044 239.481 test10 1.593 4.736 7.527 20.557 59.856 162.907 323.147 test11 3.913 11.506 23.26 68.644 207.591 600.444 1211.545 

findings


  1. The quickest solution in all cases and all systems was the 8th test: Using ByteArrayOutputStream and inputStream.read from the JDK


     ByteArrayOutputStream result = new ByteArrayOutputStream(); byte[] buffer = new byte[1024]; int length; while ((length = inputStream.read(buffer)) != -1) { result.write(buffer, 0, length); } return result.toString("UTF-8"); 

  2. A short and very quick solution would be to use Apache Commons IOUtils.toString


     String result = IOUtils.toString(inputStream, StandardCharsets.UTF_8); 

  3. Java 8's Stream Api shows the average time, and using parallel streams only makes sense if the string is quite large, otherwise it works for a very long time (which was generally expected)


  4. Solution 11 is better not to use in principle, as it is slower and does not work with Unicode,

PS


  1. The English version of this article can be found in my response to stackoverflow . Tests in my github project. If you liked this article and you put a plus on stackoverflow I will be grateful to you.
  2. I would be very grateful for any comments, corrections, errors or other ways to convert the InputStream into a string
  3. I also advise you to look at my opensource project useful-java-links - perhaps the most comprehensive collection of useful Java libraries, frameworks and Russian-language instructional videos. There is also a similar English version of this project and I’m starting the opensource sub-project of Hello world to prepare a collection of simple examples for different Java libraries in one maven project (I will be grateful for any help).


')

Source: https://habr.com/ru/post/278233/


All Articles