Soft links to guard the available memory or how to save memory correctly

  All java-developers, sooner or later, encounter the notorious OutOfMemoryError error.

After this meeting, we begin to more carefully treat the used memory, save it. Starting with version 1.2 in Java, the java.lang.ref. * Package appeared with the classes SoftReference, WeakReference, PhantomReference. Next, I will tell you how these classes will help in the fight against OutOfMemoryError. And what is more interesting, I will give real examples of their use. Let's start.

General description

First, a little general theory. Recall, in general terms, how the Garbage Collector (hereinafter GC) works. If you don’t go into details, the algorithm is simple: when the collector starts up, the virtual machine recursively finds, for all threads, all available objects in memory and marks them in some way. And in the next step, the GC deletes all unlabeled objects from the memory. Thus, after cleaning, only those objects that may be useful to the program will be kept in memory. Go ahead.
There are several types of links in Java. There is a StrongReference - these are the most common links that we create every day.

StringBuilder builder = new StringBuilder();

The builder is the strong link to the StringBuilder object.
And there are 3 "special" types of links - SoftReference, WeakReference, PhantomReference. In fact, the only difference between all types of links is the behavior of GC with the objects to which they refer. We will discuss in more detail the features of each type of links later, but for now the following knowledge will suffice:

SoftReference - if the GC sees that the object is accessible only through a chain of soft links, then it will remove it from memory. Then. Maybe.
WeakReference - if the GC sees that the object is available only through a chain of weak links, then it will remove it from memory.
PhantomReference - if the GC sees that the object is accessible only through a chain of phantom links, then it will delete it from memory. After several launches of GC.

If it is not yet clear what the difference is, then do not worry, soon everything will fall into place. Little things in detail, and the details will be next.
These 3 types of links are inherited from one parent - Reference, from which they actually take all their public methods and constructors.

 StringBuilder builder = new StringBuilder(); SoftReference<StringBuilder> softBuilder = new SoftReference(builder);

After completing these two lines, we will have 2 types of links to 1 StringBuilder object:

builder - strong link
softBuilder - soft-link (formally, this is a strong-link to a soft-link, but for simplicity I will write a soft-link)

And if during the execution of the program, the variable builder becomes inaccessible, but the link to the object referenced by softBuilder is still available and GC starts -> then the StringBuilder object will be marked as available only through a chain of soft links.
Consider the available methods:
softBuilder.get () - will return a strong reference to a StringBuilder object in case the GC has not deleted this object from memory. Otherwise, it will return null.
softBuilder.clear () - removes the reference to the StringBuilder object (that is, there is no longer a soft reference to this object)
All the same works for WeakReference and PhantomReference. True, PhantomReference.get () will always return null, but more on that later.
There is also such a class - ReferenceQueue. It allows you to track the moment when the GC determines that the object is no longer needed and can be deleted. This is where the Reference object gets after the object to which it refers is deleted from memory. When creating a Reference, we can pass to the ReferenceQueue constructor, which will contain links after deletion.
')

SoftReference details

GC Features

So all the same, how does the GC behave when it sees that the object is accessible only by a chain of soft links? Let's look at the work of GC in more detail:
And so, the GC began its work and goes through all the objects in the heap. If the object on the heap is a Reference, then GC places this object in a special queue in which all Reference objects lie. After passing through all the objects, GC takes the Reference queue of objects and decides on each of them to remove it from memory or not. How exactly the decision to remove the object is made depends on the JVM. But the general contract is as follows: GC is guaranteed to remove from the heap all objects that are accessible only by soft-reference, before throwing OutOfMemoryError .
SoftReference is our caching facility for objects in memory, but in a critical situation, when available memory runs out, GC will remove unused objects from memory and thereby try to save the JVM from shutting down. Is this not wonderful?
Here’s how Hotspot makes the decision to remove the SoftReference: if you look at the implementation of the SoftReference, you can see that there are 2 variables in the class - private static long clock and private long timestamp. Each time you start GC, it sets the current time to a variable clock. Each time a SoftReference is created, the current clock value is written to the timestamp. The timestamp is updated each time the get () method is called (each time we create a strong link to an object). This allows you to calculate how long a soft-link exists after the last access to it. We denote this interval by the letter I. Let the letter F denote the amount of free space on the heap in MB (megabytes). By the constant MSPerMB we denote the number of milliseconds, how many soft-links will exist for each free megabyte in the heap.
Then everything is simple, if I <= F * MSPerMB, then do not delete the object. If more then delete.
To change MSPerMB, use the -XX: SoftRefLRUPolicyMSPerMB switch . The default value is 1000 ms, which means that the soft link will exist (after the strong link has been removed) for 1 second for each megabyte of free memory on the heap. The main thing is not to forget that these are all approximate calculations, since in fact the soft-link will be removed only after launching the GC.
Note that in order to delete an object, I must be strictly greater than F * MSPerMB. From this it follows that the created SoftReference will live at least 1 launch of GC. (* if it is not clear why, then this will remain your homework).
In the case of a VM from IBM, the binding of the life of the soft-link does not go to the time, but to the number of surviving GC launches.

Application

The main SoftReference bun is that the JVM itself monitors whether an object should be deleted from memory or not. And if there is not enough memory, the object will be deleted. This is exactly what we need when caching. Caching using SoftReference can be useful in systems that are sensitive to the amount of available memory. For example, image processing. The first example of application will be a bit invented, but indicative:
Our system is engaged in image processing. Suppose we have a huge image that is somewhere in the file system and this image is always static. Sometimes the user wants to connect this image with another image. Here is our first implementation of this concatenation:

 public class ImageProcessor { private static final String IMAGE_NAME = "bigImage.jpg"; public InputStream concatenateImegeWithDefaultVersion(InputStream userImageAsStream) { InputStream defaultImage = this.getClass().getResourceAsStream(IMAGE_NAME); // calculate and return concatenated image } }

There are many shortcomings in this approach, but one of them is that we have to upload an image from the file system every time. And this is not the fastest procedure. Let's then cache the downloaded image. Here is the second version:

 public class CachedImageProcessor { private static final String IMAGE_NAME = "bigImage.jpg"; private InputStream defaultImage; public InputStream concatenateImegeWithDefaultVersion(InputStream userImageAsStream) { if (defaultImage == null) { defaultImage = this.getClass().getResourceAsStream(IMAGE_NAME); } // calculate and return concatenated image } }

This option is already better, but the problem is exactly there. The image is large and takes a lot of memory. Our application works with many images and the next time the user tries to process the image, OutOfMemoryError can easily collapse. And what can be done about it? It turns out that we need to choose either speed or stability. But we know about the existence of SoftReference. This will help us to continue to use caching, but at the same time in critical situations, unload them from the cache to free up memory. Moreover, we don’t have to worry about detecting a critical situation. This is what our third implementation will look like:

 public class SoftCachedImageProcessor { private static final String IMAGE_NAME = "bigImage.jpg"; private SoftReference<InputStream> defaultImageRef = new SoftReference(loadImage()); public InputStream concatenateImegeWithDefaultVersion(InputStream userImageAsStream) { if (defaultImageRef.get() == null) { // 1 defaultImage = this.getClass().getResourceAsStream(IMAGE_NAME); defaultImageRef = new SoftReference(defaultImage); } defaultImage = defaultImageRef.get(); // 2 // calculate and return concatenated image } }

This version is not perfect, but it shows how simple we can control the size occupied by the cache, or rather, to impose control on the virtual machine. The danger of this implementation is as follows. In line number 1, we check for null, in fact we want to check whether the GC has deleted the data from the memory or not. Suppose not deleted. But before executing line 2, the GC can start working and delete the data. In this case, the result of executing line 2 will be defaultImage = null. To safely check the existence of an object in memory, we need to create a strong link, defaultImage = defaultImageRef.get (); Here is what the final implementation will look like:

 public class SoftCachedImageProcessor { private static final String IMAGE_NAME = "bigImage.jpg"; private SoftReference<InputStream> defaultImageRef = new SoftReference(loadImage());; public InputStream concatenateImegeWithDefaultVersion(InputStream userImageAsStream) { defaultImage = defaultImageRef.get(); if (defaultImage == null) { defaultImage = this.getClass().getResourceAsStream(IMAGE_NAME); defaultImageRef = new SoftReference(defaultImage); } // calculate and return concatenated image } }

Let's go further. java.lang.Class also uses SoftReference for caching. It caches data about constructors, methods, and class fields. It is interesting to see what they cache. After you decide to use SoftReference for caching, you need to decide what to cache. Suppose we need to cache a list. We can use both List <SoftReference> and SoftReference <List>. The second option is more acceptable. It must be remembered that GC applies specific logic when processing Reference objects, and freeing memory will be faster if we have 1 SoftReference and not their list. This is what we see in the implementation of Class - developers have created a soft-link to an array of constructors, fields and methods. If we talk about performance, then it is worth noting that often, erroneously, people use WeakReference to build a cache where it is worth using SoftReference. This results in poor cache performance. In practice, weak links will quickly be removed from memory as soon as strong links to an object disappear. And when we really need to pull an object from the cache, we will see that it is no longer there.
Well, another example of using cache based on SoftReference. Google Guava has a MapMaker class. It will help us build a ConcurrentMap in which there will be the following feature - the keys and values in the Map can be wrapped in a WeakReference or SoftReference. Suppose in our application there is data that a user can request and this data comes from a database with a very complex query. For example, this will be the user's shopping list for the past year. We can create a cache in which the values (shopping list) will be stored using soft links. And if there is no value in the cache, you need to pull it out from the database. The key will be the user ID. Here is what the implementation might look like:

 ConcurrentMap<Long, List<Product>> oldProductsCache = new MapMaker().softValues(). .makeComputingMap(new Function<User, List<Product>>() { @Override public List<Product> apply(User user) { return loadProductsFromDb(user); } });

WeakReference

GC Features

Now let's take a closer look at what the WeakReference is. When the GC determines that the object is available only through the weak links, then this object is "immediately" deleted from the memory. It is worth remembering about ReferenceQueue and following the order of removing an object from memory. Let me remind you that for WeakReference and SoftReference, the algorithm for getting into ReferenceQueue is the same. So, GC was launched and determined that the object is available only through weak-links. This object was created as follows:

 StrIngBuilder AAA = new StringBuilder(); ReferenceQueue queue = new ReferenceQueue(); WeakReference weakRef = new WeakReference(AAA, queue);

First, the GC clears the weak-link, that is, weakRef.get () - will return null. Then the weakRef will be added to the queue and accordingly queue.poll () will return a reference to the weakRef. That's all I wanted to write about the features of the work of GC with WeakReference. Now let's see how this can be used.

Application

Well, of course, WeakHashMap. This is the implementation of the Map <K, V> that stores the key using a weak link. And when the GC deletes the key from memory, the entire record is deleted from the Map. I think it is not difficult to understand how this happens. When adding a new pair <key, value>, a WeakReference is created for the key and ReferenceQueue is passed to the constructor. When the GC deletes a key from memory, the ReferenceQueue returns the corresponding WeakReference for that key. After that, the corresponding Entry is deleted from the Map. It's pretty simple. But I want to draw attention to some details.

WeakHashMap is not intended to be used as a cache. WeakReference is created for the key and not for the value. And the data will be deleted only after there are no strong references to the key in the program and not to the value. In most cases, this is not what you want to achieve by caching.
The data from WeakHashMap will not be deleted immediately after the GC finds out that the key is available only through weak-links. In fact, the cleanup will occur the next time we call the WeakHashMap.
First of all, the WeakHashMap is intended for use with keys in which the equals method checks the identity of objects (uses the == operator). Once access to the key is lost, it can no longer be re-created.

Well, then in what cases it is convenient to use WeakHashMap? Suppose we need to create an XML document for a user. The design of the document will be handled by several services that will receive org.w3c.Node as input, in which the necessary elements will be added. Also for services you need a lot of information about the user from the Database. We will store this data in the UserInfo class. The UserInfo class takes up a lot of memory space and is relevant only for building a specific XML document. Caching UserInfo does not make sense. We only need to associate it with the document and it is desirable to remove it from memory when the document is no longer in use by the program. All we need to do:

 private static final NODE_TO_USER_MAP = new WeakHashMap<Node, UserInfo>();

Creating an XML document will look something like this:

 Node mainDocument = createBaseNode(); NODE_TO_USER_MAP.put(mainDocument, loadUserInfo());

Well, here is the reading:

 UserInfo userInfo = NODE_TO_USER_MAP.get(mainDocument); If(userInfo != null) { // … }

UserInfo will be in WeakHashMap until the GC notices that only weak links remain on the mainDocument.
Another example of using WeakHashMap. Many people know about the method String.intern (). So with the help of WeakReference you can create something like this. (Let's not discuss, within the framework of this article, the expediency of this decision, and we accept the fact that this solution has some advantages as compared to intern ()). So, we have soooo many lines. We know that the lines are repeated. To save memory, we want to reuse existing objects, rather than creating new objects for identical strings. This is how WeakHashMap will help us:

 private static Map<String, WeakReference<String>> stringPool = new WeakHashMap<String, WeakReference<String>>; public String getFromPool(String value) { WeakReference<String> stringRef = stringPool.get(value); if (stringRef == null || stringRef.get() == null ) { stringRef = new WeakReference<String>(value); stringPool.put(value, stringRef); } return stringRef.get(); }

And lastly, I will add that WeakReference is used in many classes - Thread, ThreadLocal, ObjectOutpuStream, Proxy, LogManager. You can look at their implementation in order to understand when WeakReference can help you.

PhantomReference

GC Features

Features of this type of links are two. The first is that the get () method always returns null. It is because of this that it makes sense to use PhantomReference only with ReferenceQueue. The second feature is that, unlike SoftReference and WeakReference, GC will add a phantom reference to ReferenceQueue after the finalize () method has been executed. That is actually, in difference from SoftReference and WeakReference, the object still is in memory.

Practice

At first glance, it is not clear how this type of links can be used. In order to explain how to use them, let us first examine the problems encountered when using the finalize () method: redefining this method allows us to clear the resources associated with the object. When the GC determines that an object is no longer available, it will execute this method before removing it from memory. Here are the problems associated with this:

GC runs unpredictably, we cannot know when the finalize () method will be executed
The finalize () methods are run in one thread, in turn. And until this method is executed, the object cannot be deleted from memory.
There is no guarantee that this method will be called. The JVM can finish its work without the object becoming unavailable.
During the execution of the finalize () method, a strong link to the object can be created and it will not be deleted, but the next time the GC sees that the object is no longer available, the finalize () method will no longer be executed.

Let's go back to the PhantomReference. This type of links in combination with ReferenceQueue allows us to find out when an object is no longer available and there are no other links to it. This allows us to do a cleanup of the resources used by the object at the application level. Unlike finalize (), we ourselves control the process of cleaning up resources. In addition, we can control the process of creating new objects. Suppose we have a factory that will return an HdImage object to us. We can control how many such objects will be loaded into memory:

 public HdImageFabric { public static final int IMAGE_LIMIT = 10; public static int count = 0; public static ReferenceQueue<HdImage> queue = new ReferenceQueue<HdImage>(); public HdImage loadHdImage(String imageName) { while (true) { if (count < IMAGE_LIMIT) { return wrapImage(loadImage(imageName)); } else { Reference<HdImage> ref = queue.remove(500); if (ref != null) { count--; System.out.println(“remove old image”); } } } } private HdImage wrapImage(HdImage image) { PhantomReference<HdImage> refImage = new PhantomReference(image, queue); count++; return refImage ; } }

This example is not thread-safe and has friend drawbacks, but it does show how PhantomReference can be used in practice.
Due to the fact that the get () method always returns null, it becomes incomprehensible, but how to understand which object was deleted. To do this, you need to create your own class, which will inherit PhantomReference, and which contains a certain descriptor, which in the future will help determine which resources need to be cleaned.
When you use the PhantomReference you need to remember the following things:

The contract guarantees that the link will appear in the queue after the GC notices that the object is accessible only by phantom links and before the object is deleted from memory. The contract does not guarantee that these events will occur one after the other. In reality, between these events, it may take some time. Therefore, do not rely on the PhantomReference to clean up critical resources.
The execution of the finalize () method and the addition of a phantom reference in ReferenceQueue are performed in different GC launches. Therefore, if the object redefines the method finalize (), then to delete it, 3 GC starts are required, and if the method is not redefined, then at least 2 GC starts

.

As a conclusion, I want to say that java.lang.ref. * Gives us quite good opportunities to work with JVM memory and we should not ignore these classes, they can be of great help to us. Their use is associated with a large number of errors, and you need to be extremely careful to achieve the desired result. But did these difficulties once stop us? That's all. Thanks to everyone who read to the end. I will try in the comments to answer those questions that I could not reveal in this article.

Source: https://habr.com/ru/post/169883/

All Articles

Soft links to guard the available memory or how to save memory correctly

General description

SoftReference details

GC Features

Application

WeakReference

GC Features

Application

PhantomReference

GC Features

Practice

More articles: