Java programmer cheat sheet 7.2 Typical tasks: Traversing the Map, counting the number of occurrences of a substring

I have a hobby: I collect various solutions of typical tasks in Java, which I find in the internet, and I try to choose the most optimal in size / performance / elegance. First of all in performance. Let's consider such typical tasks that are often found in Java programming as “traversing Mapes” and counting the number of occurrences of strings, different versions of their solutions (including “beautiful” and not so much) and their performance.

English versions can be found on Stackoverflow: by traversing the map and by counting the occurrences of the substrings .
I also advise you to look at my opensource project useful-java-links - perhaps the most comprehensive collection of useful Java libraries and frameworks.

General table of contents 'cheat sheets'

1. JPA and Hibernate in questions and answers
2. Three hundred fifty most popular non-mobile Java opensource projects on github
3. Java collections (standard, guava, apache, trove, gs-collections, and others)
4. Java Stream API
5. Two hundred and fifty Russian-language teaching videos of lectures and reports on Java.
6. List of useful links for Java programmer
7 Typical tasks
7.1 Optimum way to convert an InputStream to a string
7.2 The most productive way to bypass the Map, count the number of occurrences of the substring
8. Libraries for working with Json (Gson, Fastjson, LoganSquare, Jackson, JsonPath and others)

1. Traversing the Map

So, if you search the Internet there are a dozen different ways to crawl the map, in general, when we are important and keys and values (of course, the crawling of just keys or values is done elementary). Some of them differ only in syntactic sugar, some fundamentally different. In fact, of course, which solutions are slow and which have been known for a long time, but it’s still interesting how much we can do in parrots.

For completeness, we will consider not only the bypass of ordinary Map, but also their counterparts from IterableMap from Apache Collections and MutableMap of Eclipse (CS) collections .

Let's look at the options:

The conditions of the problem, let us have a Map of numbers, we will look for the sum of all keys and all values of this Map. On the one hand, such a task is simplified to the maximum, on the other hand, it guarantees that the Java optimizer will not be able to throw out part of the code.

Using iterator , Map.Entry and while loop. Not the most beautiful option, in fact, an analogue of option (2), but let's see how they differ.

 long i = 0; Iterator<Map.Entry<Integer, Integer>> it = map.entrySet().iterator(); while (it.hasNext()) { Map.Entry<Integer, Integer> pair = it.next(); i += pair.getKey() + pair.getValue(); }

Using Map.Entry and foreach loop . The classic version of the map bypass.

 long i = 0; for (Map.Entry<Integer, Integer> pair : map.entrySet()) { i += pair.getKey() + pair.getValue(); }

Using foreach from Java 8 . Let's see how productive the new method, which appeared in Java 8.
```
 final long[] i = {0}; map.forEach((k, v) -> i[0] += k + v); 
```
Using keySet and foreach . It is believed that this method is very slow, the only question is how much.
```
 long i = 0; for (Integer key : map.keySet()) { i += key + map.get(key); } 
```

Using keySet and iterator . In fact, option 4 only without syntactic sugar.

 long i = 0; Iterator<Integer> itr2 = map.keySet().iterator(); while (itr2.hasNext()) { Integer key = itr2.next(); i += key + map.get(key); }

Using for and Map.Entry . Analogue 1 and 2, only in the "old style"

 long i = 0; for (Iterator<Map.Entry<Integer, Integer>> entries = map.entrySet().iterator(); entries.hasNext(); ) { Map.Entry<Integer, Integer> entry = entries.next(); i += entry.getKey() + entry.getValue(); }

Using Java 8 and Stream Api
```
 final long[] i = {0}; map.entrySet().stream().forEach(e -> i[0] += e.getKey() + e.getValue()); //    
```
7a. Using Java 8 and Stream Api with mapToLong and sum

map.entrySet (). stream (). mapToLong (e -> e.getKey () + e.getValue ()). sum ();

Using Java 8 and Parallel Stream Api

  final long[] i = {0}; map.entrySet().stream().parallel().forEach(e -> i[0] += e.getKey() + e.getValue()); //   ,      ,     ,   8

8a. Using Java 8 and parallel Stream Api with mapToLong and sum

  map.entrySet().parallelStream().mapToLong(e -> e.getKey() + e.getValue()).sum();

Using IterableMap from Apache Collections . This map'a differs an unusual way around.

 long i = 0; MapIterator<Integer, Integer> it = iterableMap.mapIterator(); while (it.hasNext()) { i += it.next() + it.getValue(); }

Using the MutableMap from Eclipse collections (formerly GS collections). This map received its forEach method much earlier than Java 8 appeared.
```
 final long[] i = {0}; mutableMap.forEachKeyValue((key, value) -> { i[0] += key + value; }); 
```

Performance tests

Typical warning : All results are given as they are, performance measurements are complicated, all numbers in your system may be completely different, so you shouldn’t believe the source codes on github , you can always get the results yourself. If you think that I made a mistake somewhere when measuring performance (and this is always possible), I will be grateful if you write in a personal or comments.

Mode = average time (AverageTime), system = Win 8.1 64-bit, Intel i7-4790 3.60GHz 3.60GHz, 16 GB, the lower the score the better)

1) For small maps (100 items), score 0.312 is the best value.

  Benchmark Size Mode Cnt Score Error Units 3. ForEachAndJava8 100 avgt 100 0.312 ± 0.003 us/op 10. EclipseMap 100 avgt 100 0.354 ± 0.003 us/op 2. ForEachAndMapEntry 100 avgt 100 0.403 ± 0.005 us/op 1. WhileAndMapEntry 100 avgt 100 0.427 ± 0.006 us/op 6. ForAndIterator 100 avgt 100 0.427 ± 0.006 us/op 7a Java8StreamApi2 100 avgt 100 0.509 ± 0.036 us/op 7. Java8StreamApi 100 avgt 100 0.529 ± 0.004 us/op 9. ApacheIterableMap 100 avgt 100 0.585 ± 0.008 us/op 4. KeySetAndForEach 100 avgt 100 0.937 ± 0.011 us/op 5. KeySetAndIterator 100 avgt 100 0.94 ± 0.011 us/op 8. Java8StreamApiParallel 100 avgt 100 6.066 ± 0.051 us/op 8a. Java8StreamApiparallel2 100 avgt 5 9.468 ± 0.333 us/op

2) For a medium map with 10,000 elements, score 35.301 is the best value.

  Benchmark Size Mode Cnt Score Error Units 10. EclipseMap 10000 avgt 100 35.301 ± 0.697 us/op 3. ForEachAndJava8 10000 avgt 100 39.797 ± 0.309 us/op 2. ForEachAndMapEntry 10000 avgt 100 43.149 ± 0.313 us/op 1. WhileAndMapEntry 10000 avgt 100 43.295 ± 0.314 us/op 6. ForAndIterator 10000 avgt 100 44.009 ± 0.379 us/op 7. Java8StreamApi 10000 avgt 100 49.378 ± 0.415 us/op 5. KeySetAndIterator 10000 avgt 100 97.844 ± 0.896 us/op 4. KeySetAndForEach 10000 avgt 100 99.317 ± 0.862 us/op 8. Java8StreamApiParallel 10000 avgt 100 112.364 ± 0.167 us/op 9. ApacheIterableMap 10000 avgt 100 138.379 ± 1.387 us/op

3) For a map with 30000 elements, score 122.277 is the best value.

  Benchmark Size Mode Cnt Score Error Units 8a. Java8StreamApiparallel2 30000 avgt 100 95.444 ± 13.706 us/op 10. EclipseMap 30000 avgt 100 122.277 ± 3.896 us/op 3. ForEachAndJava8 30000 avgt 100 136.906 ± 2.392 us/op 2. ForEachAndMapEntry 30000 avgt 100 145.845 ± 1.895 us/op 1. WhileAndMapEntry 30000 avgt 100 149.186 ± 2.621 us/op 6. ForAndIterator 30000 avgt 100 149.353 ± 2.427 us/op 7a. Java8StreamApi2 30000 avgt 100 180.803 ± 53.062 us/op 7. Java8StreamApi 30000 avgt 100 181.114 ± 3.272 us/op 8. Java8StreamApiParallel 30000 avgt 100 342.546 ± 1.206 us/op 5. KeySetAndIterator 30000 avgt 100 350.564 ± 8.662 us/op 4. KeySetAndForEach 30000 avgt 100 364.362 ± 9.416 us/op 9. ApacheIterableMap 30000 avgt 100 536.749 ± 25.819 us/op

Please note case 8a (parallel stream with mapToInt and sum) showed good results precisely because of working with numerical data, it is far from always applicable in real problems.

Schedule (tests depending on the size of the map)

Table (tests depending on the size of the map)

  Benchmark 100 500 900 1300 1700 2100 2500 10. EclipseMap 0.354 1.384 3.816 3.304 6.68 7.427 8.712 3. ForEachAndJava8 0.312 1.692 3.143 4.265 6.506 8.343 9.821 6. ForAndIterator 0.427 2.089 3.746 4.776 7.407 9.091 10.753 2. ForEachAndMapEntry 0.403 2.536 3.951 5.028 7.503 9.211 10.918 1. WhileAndMapEntry 0.427 2.026 3.4815 4.937 7.511 9.217 12.22 7. Java8StreamApi 0.529 2.343 4.264 5.399 8.826 12.633 12.918 9. ApacheIterableMap 0.585 2.725 5.574 9.292 13.01 17.719 23.882 5. KeySetAndIterator 0.94 4.592 8.24 12.496 16.077 21.012 24.389 4. KeySetAndForEach 0.937 4.572 8.251 12.522 14.831 20.502 24.881 8. Java8StreamApiParallel 6.066 12.152 16.563 18.512 25.987 28.813 33.336

All tests on github

Warning : Bypassing HashMap, EclipseMap and ApacheIterableMap was considered only for a typical case when there are few or no collisions, in the case of many conflicts, the results can be quite different. The cases of adding elements, random access, etc. are also not considered, that is, it is impossible to judge by this test which map is faster and better overall;

2. Counting the number of occurrences of a substring in a string

So take the following substring and find all the points in it:

  String testString = "abcd";

I wanted to warn you right away: I took all the options suggested in the StackOverflow topic for testing, common sense tells me that some of them are far from optimal nailing with a favorite microscope, but I was just curious about what happens in each case. Therefore, keep in mind that not all options below should be used in a real application !

1) Using Apache Commons

 int apache = StringUtils.countMatches(testString, "."); System.out.println("apache = " + apache);

2) Using Spring Framework's

 int spring = org.springframework.util.StringUtils.countOccurrencesOf(testString, "."); System.out.println("spring = " + spring);

3) Using replace

 int replace = testString.length() - testString.replace(".", "").length(); System.out.println("replace = " + replace);

4) Using replaceAll (case 1)

 int replaceAll = testString.replaceAll("[^.]", "").length(); System.out.println("replaceAll = " + replaceAll);

5) Using replaceAll (case 2)

 int replaceAllCase2 = testString.length() - testString.replaceAll("\\.", "").length(); System.out.println("replaceAll (second case) = " + replaceAllCase2);

6) Using split

 int split = testString.split("\\.",-1).length-1; System.out.println("split = " + split);

7) Using Java8 (case 1). Please note that unlike other options, you can only search for characters, not substrings.

 long java8 = testString.chars().filter(ch -> ch =='.').count(); System.out.println("java8 = " + java8);

8) Using Java8 (case 2), it is perhaps somewhat better in the case of unicode strings than case 1. Note that unlike other options, only characters can be searched for, not substrings.

 long java8Case2 = testString.codePoints().filter(ch -> ch =='.').count(); System.out.println("java8 (second case) = " + java8Case2);

9) Using StringTokenizer

 int stringTokenizer = new StringTokenizer(" " +testString + " ", ".").countTokens()-1; System.out.println("stringTokenizer = " + stringTokenizer);

The latter option is somewhat incorrect, since two points in a row will be counted as one, that is, for a ... bc ... d or ... abcd or a .... b ...... c ... ..d ... several points will be counted as one.

Typical warning : All results are given as they are, performance measurements are complicated, all numbers in your system may be completely different, so you shouldn’t believe the source codes on github , you can always get the results yourself. If you think that I made a mistake somewhere when measuring performance (and this is always possible), I will be grateful if you write in a personal or comments.

All tests on github

The results of measurements on the short line (using JMH , mode = AverageTime, a value of 0.010 better than 0.351 ):

 Benchmark Mode Cnt Score Error Units 1. countMatches avgt 5 0.010 ± 0.001 us/op 2. countOccurrencesOf avgt 5 0.010 ± 0.001 us/op 3. stringTokenizer avgt 5 0.028 ± 0.002 us/op 4. java8_1 avgt 5 0.077 ± 0.005 us/op 5. java8_2 avgt 5 0.078 ± 0.003 us/op 6. split avgt 5 0.137 ± 0.009 us/op 7. replaceAll_2 avgt 5 0.302 ± 0.047 us/op 8. replace avgt 5 0.303 ± 0.034 us/op 9. replaceAll_1 avgt 5 0.351 ± 0.045 us/op

The results of measurements on a long string of 2142 characters (using JMH , mode = AverageTime, a value of 0.010 better than 0.351 ):

 Benchmark Mode Cnt Score Error Units 1. countMatches avgt 5 2.392 ± 0.172 us/op 2. countOccurrencesOf avgt 5 2.362 ± 0.060 us/op 3. stringTokenizer avgt 5 5.931 ± 0.112 us/op 4. java8 avgt 5 9.626 ± 0.463 us/op 5. java8_1 avgt 5 8.586 ± 0.251 us/op 6. split avgt 5 21.201 ± 1.037 us/op 7. replaceAll2 avgt 5 26.614 ± 1.026 us/op 8. replaceAll1 avgt 5 31.505 ± 1.046 us/op 9. replace avgt 5 33.462 ± 2.329 us/op

PS English versions can be found on Stackoverflow: by traversing the map and counting the occurrences of substrings , the source codes in my project useful-java-links . I would be grateful for the pluses in SO or github, if you liked the article and think that they are deserved.
PPS I also advise you to look at my opensource project useful-java-links - perhaps the most comprehensive collection of useful Java libraries, frameworks and Russian-language instructional videos. There is also a similar English version of this project and I’m starting the opensource sub-project of Hello world to prepare a collection of simple examples for different Java libraries in one maven project (I will be grateful for any help).

General table of contents 'cheat sheets'

Source: https://habr.com/ru/post/278313/

All Articles

Java programmer cheat sheet 7.2 Typical tasks: Traversing the Map, counting the number of occurrences of a substring

1. Traversing the Map

Performance tests

2. Counting the number of occurrences of a substring in a string

More articles: