📜 ⬆️ ⬇️

All about String.intern ()

I think that many Java-developers know or at least heard about the String.intern() method. But not everyone uses it in their applications, or represents in which cases it is useful and useful at all. So it was with me until I encountered this method in one of the projects. At that moment I wanted to learn the meaning and features of its use and came across one very interesting article by the lead developer of Yahoo! by the name of Ethan Nicholas, the translation of which I now want to share with the part of the Habra community that is not indifferent to the Java language.

That who knows about this method only by hearsay, welcome under kat.


Strings are a fundamental part of any modern programming language and are just as important as numbers. Therefore, it can be assumed that Java programmers should have a firm idea of ​​them, but unfortunately this is not always the case.
')
Today I looked at the Xerces source code (XML parser included in Java) and came across a line that surprised me a lot:

com.sun.org.apache.xerces.internal.impl.XMLScanner: 395
protected final static String fVersionSymbol = "version".intern();

Then I found a few more lines defined as this one, and each of them was interned . So what is intern() ? Well, as you undoubtedly know, there are two different ways to compare objects in Java. You can use the == operator, or you can use the equals() method. The == operator compares whether two references refer to the same object, while equals() compares whether two objects contain the same data.

One of the first lessons you learn when learning Java is that usually you should use equals() rather than == to compare two strings. If we compare, say, new String("Hello") == new String("Hello") , then the result will be false , because these are two different instances of the class. If you use equals() , then you get true , as expected. Unfortunately, equals() can be quite slow, since it performs character-by-character string comparisons.

Because the == operator checks for identity (identity), all it has to do is compare two pointers, and obviously it will be much faster than equals() . So if you are going to compare the same strings multiple times, you can get a significant performance advantage by using the identity check of objects instead of comparing characters.

The main algorithm is:

1) Create a set of (hash set)
2) Check that the string (as a sequence of characters) you are dealing with is already in the set
3) If yes, then use a string from the set
4) Otherwise, add this line to the set and then use it

Using this algorithm, it is guaranteed that if two strings are identical sequences of characters, they are one instance of a class. This means that you can safely compare strings using == instead of equals() , while gaining significant performance advantages in repeated comparisons.

Fortunately, Java already includes the implementation of this algorithm. This is the intern() method in the java.lang.String class. The expression new String("Hello").intern() == new String("Hello").intern() returns true , while without using intern() returns false .

So why am I so surprised to see
protected final static String fVersionSymbol = "version".intern();
in xerces source code? Obviously, this string will be used for multiple comparisons. Does it make sense to intern her?

Of course it does. That is why Java already does it . All constant strings that are encountered in the class are automatically interned. This includes both own constants (for example, the above string "version" ), and other lines that are part of the class file format — class names, method signatures, and so on. This even applies to the expressions: "Hel" + "lo" processed by javac in the same way as "Hello" , therefore "Hel" + "lo" == "Hello" returns true .

Thus, the result of calling intern() for a constant string of type "version" , by definition, will be exactly the same object that you declared. In other words, "version" == "version".intern() always true. You need to intern strings when they are not constants, and you want to be able to quickly compare them with other interned strings.

Also, when interning strings, you can gain an advantage in memory usage, you store in it only one instance of a string character sequence, no matter how many times you refer to this string. This is the main reason why the string constants of a class file are interned: think about how many classes are referenced, for example, to java.lang.Object . The class name java.lang.Object should appear in each of these classes, but, thanks to the magic of intern() , it appears in memory only in one instance.

Conclusion? intern() is a useful method and can make life easier - but make sure you use it properly.

From translator
Please forgive me for having distorted the source code a couple of times to make it more understandable (as it seemed to me).

Many thanks to Habrayuser nolled , who invited me to the Habrasoobschestvo.

Update
I think that the following information that I learned from other sources will not be superfluous here:

1. A string pool is stored in the Perm Gen area, which is reserved for non-user JVM objects (classes, etc.). If you do not take this into account, you may unexpectedly get OutOfMemory Error.
2. Interned strings are not stored forever. Lines that are not referenced are also deleted by the garbage collector.
3. In most cases, you will not get a significant performance gain from using intern () - if string comparison is not the main (or very frequent) operation of your application and the strings being compared are different in length.

Source: https://habr.com/ru/post/79913/


All Articles