📜 ⬆️ ⬇️

Java Memoisation

Memoization - (Memoization, eng) a variant of caching, which consists in the fact that a table of results is created for a function, and being calculated for certain values ​​of the parameters, the result is entered into this table. In the future, the result is taken from this table.
This technique allows the use of additional memory to speed up the work of the program. This approach is very often used in functional programming languages, however, it can also be used in imperative ones. This topic discusses the use of memoization in the java language provides various examples of memoization and at the end a small comparison of the performance of these methods is made.

Introduction


Naturally, memoization can be used far from all functions. In fact, it can only be applied to pure functions . Those. functions that are: a). deterministic (i.e., with the same set of parameters, functions must return the same value), b). no side effects (i.e. should not affect the state of the system). In fact, such functions should be a “black box”, the only interaction with which can be done through arguments and return value. Also, when constructing certain implementations of memoization, we may have additional restrictions specific to this implementation (or language).
Another additional limitation can be the fact that memoization will be effective only for functions with a discrete domain. Despite the fact that in the computer representation all data are discrete, for floating point numbers we may have problems due to errors in the calculation. So, for example, (float f1 = 0.1f; float f2 = 0.7f; System.out.print (f1 + f2 == 0.8);) returns false, because the sum of 0.1 and 0.7 differs in binary representation from 0.8 and as a result, even in the memosized function, when calculating foo (f1 + f2), foo (0.8), the function will be calculated both times.

General form


Now we turn directly to the implementation of this technology. Let's try to write a general view of memoization for one function:
 private static final Map <K, R> cache = new HashMap <K, R> ();
 R bar (K k) {
    R result = null;
    synchronized (cache) {
        result = cache.get (k);
        if (result == null &&! cache.containsKey (k)) {
            Result result = // real function evaluation
            cache.put (k, result);
        }
    }
    return result;
 }

As you can see, we need a storage object that implements the following interface:


In Java, the most suitable candidate for the role of storage is the Map interface, for some cases (described below) it is possible to use faster data structures. The amortized complexity of the get, put, contains operations is O (1). That allows us to guarantee the limitation of the delay when performing memoization. Since we have to guarantee thread safety in case of simultaneous access of various threads to the storage during the execution of the entire series of operations, we have to declare a critical section in this section of the code, as a result, the use of initially thread safe structures (for example, hashtable) is meaningless.
')
UPD. As correctly noted in the comments habrausers gribozavr , remal and gvsmirnov in this code, there are problems with multithreading. Moreover, gvsmirnov proposed a solution that can be found in the book Java Concurrency in Practice. I'd add from myself that in some cases it is possible to synchronize only the put operation in this case, when the function is first accessed, several threads can call the same function. I would like to note that sometimes with a very complex operation (accessing over a network or to a database), even a multi-threaded version of the function may make sense. Because the multithreaded version was proposed after writing the post, then the usual

Example implementation (Fibonacci numbers):
     public static BigInteger fib (Integer k) {
         if (fibTable.containsKey (k)) {
             return fibTable.get (k);
         } else {
             BigInteger r;
             if (k <0) return BigInteger.ZERO;
             else if (k == 1) return BigInteger.ONE;
             else r = fib (k-1) .add (fib (k-2));
             fibTable.put (k, r);
             return r;
         }
     }

As a result, we received a solution that meets the following


Summary of the decision


Now, having solved a particular problem, we will try to generalize it to a class of called functions without arguments. To do this, we will consider the Callable interface with the call operation. Thus, all function arguments must be passed through the fields of the class implementing the interface. This greatly complicates the work of the programmer, but allows you to simply work with such methods. We will also consider our class immutable and with the function of determining the hash depending only on the parameters, it is necessary to ensure the correctness of the hash check in the cache table. Because if our class is changeable, then when the parameters change, the hash function may not change and as a result we will get the wrong value, and vice versa the objects of the classes with the same parameters can be considered different objects.

If all the assumptions are fulfilled, then we can construct the next ancestor class for memoized functions.

 public class MemoizedFunction <R> {
     private final Map <Callable <R>, R> cache;
     public MemoizedFunction (Map <Callable <R>, R> cache) {
         this.cache = cache;
     }
     public R invoke (Callable <R> function) throws Exception {
         R result;
         synchronized (cache) {
             result = cache.get (function);
             if (result == null &&! cache.containsKey (function)) {
                 result = function.call ();
                 cache.put (function, result);
             }
         }
         return result;
     }


I note that you can add a function to this class:
  public Object memorize (Callable fubc, final Map <Callabl, Object> obj) { 

similar invoke, but working for any functions, not necessarily heirs of MemoizedFunction. However, in this case, we deliberately lose the opportunity to work in generic, and the need to use such a function is a debatable issue, which, however, I would not like to touch on in this topic.

So, in this example, we abstracted from choosing a specific cache and function, leaving it to the programmer. Thus, when using this class, we can easily change the cache and use various computational functions. However, not everything is fine consider an example (the same Fibonacci numbers):

To do this, we need to create an inheritor, passing a link to the used cache in MemoizedFunction
 public class Fibber extends MemoizedFunction <BigInteger> {
     static HashMap <Callable <BigInteger>, BigInteger> cache =
                 new HashMap <Callable <BigInteger>, BigInteger> ();
     public fibber () {
         super (cache);
     }
     public BigInteger fib (int k) {
         BigInteger result = null;
         try {
             result = this.invoke (new Calculation (k));
         } catch (Exception ex) {/ * removed to shorten the code * /}
         return result;
     }
     class Calculation implements Callable <BigInteger> {
         private int k;
         Calculation (int k) {this.k = k;  }
         public boolean equals (Object obj) {/ * removed * /} to reduce the number of years * /}
         public int hashCode () {Return k;  }
         public BigInteger call () throws Exception {
             if (k <1) {return BigInteger.ZERO; 
             } else if (k == 1) {return BigInteger.ONE;
             } else {return (Fibber.this.fib (k-1)). add (Fibber.this.fib (k-2));  } // <- Attention (!)
         }
     }
 }

It is important to note that in order for the memorization of recursive calls to work, we need access to the external class. This limits the possibility of using this type of memoization for recursive functions, as shown in the example above.

As a result, we get an interface with the following characteristics:


NB Soon with the advent of closures in java, this framework can be greatly simplified (examples can be found in the blog Mark Mahieu ).

Momoisation using reflection



At the cost of performance loss, we can improve transparent memoization using the Proxy class.
(the implementation of the method is taken from the book O'Really, by the way there is a more advanced version of the memoiser. You can get acquainted with the principles of the proxy classes using the link Java Reflection in Action: Using Java's Dynamic Proxy or on java.sun.com . In short, an intercepting object is created all calls to the "hidden" object and calling their handler.

 public class Memoizer implements InvocationHandler {
   public static Object memoize (Object object) {
       return Proxy.newProxyInstance (
               object.getClass (). getClassLoader (),
               object.getClass (). getInterfaces (),
               new Memoizer (object));
   }
   private Object object;
   private Map caches = new HashMap ();
   private Memoizer (Object object) {
       this.object = object;
   }
   public Object invoke (Object proxy, Method method,
   Object [] args) throws Throwable {
       if (method.getReturnType (). equals (Void.TYPE)) {
           // Don't cache void methods
           return method.invoke (object, args);
       } else {
           Map cache = getCache (method);
           List key = Arrays.asList (args);
           Object value = cache.get (key);
           if (value == null &&! cache.containsKey (key)) {
               value = method.invoke (object, args);
               cache.put (key, value);
           }
           return value;
       }
   }
   private synchronized Map getCache (Method m) {
       Map cache = (Map) caches.get (m);
       if (cache == null) {
           cache = Collections.synchronizedMap (new HashMap ());
           caches.put (m, cache);
       }
       return cache;
   }
 }


In this case, to use memoization, we will need to create an additional interface describing our structure (and memoization functions).
 public interface IFibber {
     public BigInteger fib (int k);
 }
 public class Fibber implements IFibber {
     public BigInteger fib (int k) {
         if (k <0) {return BigInteger.ZERO;  }
         if (k == 1) {return BigInteger.ONE;  }
         return this.fib (k-1) .add (this.fib (k-2));
     }
 }

however, the creation of the structure is greatly simplified;
  IFibber f = new Fibber (); 
to write
  IFibber f = (IFibber) Memoizator.memoize (new Fibber ()); 

specifications:


Performance



Unfortunately, the only valid way to understand if the speed of work from memoization improves is to profile it. Therefore, options with simple addition / removal of memoization can help with profiling. You can suggest the following strategy
  1. Select a class / function to memoize
  2. Profile the class / function (and the system as a whole)
  3. If the performance is applicable, then go to step 6, otherwise add memo
  4. Profile the momoized function / class (and the system as a whole)
  5. If the performance has not changed or has changed little, then remove the memo
  6. repeat if necessary


Now I want to talk about the performance bore decisions. First, the speed of calculating Fibonacci numbers was measured, a simple variant and a variant with reflection, do not support caching, and therefore have an exponential complexity, in contrast to the polynomial complexity of 1 and 2 variants.
2010002000
option 13.8 ± 0.5ms14 ± 4ms27.3 ± 2.0ms
option 23.4 ± 0.2ms13 ± 3ms15.0 ± 1.5ms


Formally, the following parameters influence the speed of work: the size of the definition area and the distribution of the parameters on it, the time the function is executed, the number of threads making simultaneous access to the repository. In principle, you can conduct a multi-parameter analysis of dependencies, but in any case this is beyond the scope of this topic.

Bonuses


The question of using a function from several arguments remains unresolved, there may be different solutions: a). creating a special hash function, b). use of currying and accordingly several hash tables, c). use of hash as it was used in the version with ferlexia, d). other options.

Here is an example of how to create your own hash function:
     private static HashMap <Pair <Integer, Integer>, Integer> cache =
             new HashMap <Pair <Integer, Integer>, Integer> ();
     static class Pair <T1, T2> {
         final int hash;
         Pair (T1 p1, T2 p2) {
             int h = 7;
             h = 23 * h + Objects.hashCode (p1);
             h = 23 * h + Objects.hashCode (p2);
             hash = h;
         }
         public int hashCode () {
             return hash;
         }
         public boolean equals (Object obj) {/ * omitted because  can be generated by IDE * /}
     }

Next, when working with such a hash, we need to create an object key Pair <Integer, Integer> p = new Pair (i, j); and use it to work with the repository.

There are also some optimization methods:


In general, this can be the end of a brief introduction to the technology of memoisation, any comments, comments, clarifications are welcome.

UPD. added a few words about multithreading.
UPD2 added a link to Java Concurrency in Practive
UPD3 in many comments noted the open source library ehcache, which implements memoization and much more, so that interested users should probably pay attention to it.



Source: https://habr.com/ru/post/97513/


All Articles