Making a simple Spring-based Circuit Breaker

This article is for those who use an effective cache in their application and simply want to add stability to the project, adding stability not only to the application, but to the whole environment.

If you recognize yourself, read on.

What is Circuit Breaker

The topic is hackneyed like the world and will not bore you by increasing entropy and repeating the same thing. From my point of view, Martin Fowler spoke best about this here , but I will try to fit the definition into one sentence:
functionality that prevents inherently doomed requests to an unavailable service, allowing it to “get up off its knees” and continue normal operation .
')
Ideally, by preventing fated requests, Circuit Breaker (hereinafter referred to as CB) should not break your application. Instead, it is good practice to return, if not the most relevant data, but still relevant (“not rotten”), or, if this is not possible, some default value.

Goals

Select the main thing:

It is necessary to allow the data source to recover by stopping requests to it for a while.
In the case of stopping requests to the target service, you need to give, if not the most recent, but still relevant data
In case the target service is unavailable and there is no relevant data, provide a behavior strategy (returning a default value or another strategy that is appropriate for a particular case)

Implementation mechanism

Case: service available (first request)

We go to the cache. By key (CRT see below). We see that there is nothing in the cache
We go to the target service. Get the value
We save the value in the cache, set the TTL to it, which will cover the maximum possible unavailability of the target service, but it should not exceed the validity period of the data that you are ready to give to the client in case of loss of communication with the target service
We save to cache for the value from p.3. Cache Refresh Time (CRT) - the time after which you need to try to go to the target service and update the value
We return the value from item 2 to the user.

Case: CRT has not expired

We go to the cache. By key find CRT. We see that it is relevant
We get the value for it from the cache
Return user value

Case: CRT expired, target service available

We go to the cache. By key find CRT. We see that it is irrelevant
We go to the target service. Get the value
Update the value in the cache and its TTL
Updating the CRT for it by adding the Cache Refresh Period (CRP) is the value that needs to be added to the CRT to get the next CRT
Return user value

Case: CRT expired, target service unavailable

We go to the cache. By key find CRT. We see that it is irrelevant
We go to the target service. It is unavailable
Get the value from the cache. Not the freshest (with rotten CRT), but still relevant since its TTL has not yet expired
We return it to the user.

Case: CRT expired, target service unavailable, nothing in cache

We go to the cache. By key find CRT. We see that it is irrelevant
We go to the target service. It is unavailable
Get the value from the cache. He is not
We are trying to apply a special strategy for such cases. For example, returning the default value for the specified field, or a special value of the type “Currently this information is not available.” In general, if this is possible, it is better to return something and not to break the application. If this is not possible, then an exception exception and quick response strategy must be applied to the exception user.

What will we use

I use Spring Boot 1.5 in my project, I still haven't found the time to upgrade to the second version.

To prevent the article from getting 2 times longer, I will use Lombok.

As Key-Value storage (hereinafter simply KV) I use Redis 5.0.3, but I am sure that Hazelcast or equivalent will work. The main thing is to implement the interface CacheManager. In my case, this is RedisCacheManager from spring-boot-starter-data-redis.

Implementation

Above, in the “Implementation Mechanism” section, two important definitions were heard: CRT and CRP. I will write them again more deployed, because they are very important for understanding the code that follows:

Cache Refresh Time ( CRT ) is a separate entry in KV (key + postfix “_crt”), which shows the time when it would be time to go to the target service for fresh data. Unlike TTL, the CRT offensive does not mean that your data is “rotten”, but just the fact that there is a chance to get fresher in the target service. Got fresh - well, if not, and the current will go.

Cache Refresh Period ( CRP ) is the value that is added to the CRT after polling the target service (whether successful or not). Thanks to her, the remote service has the ability to “catch your breath” and restore your work in case of a fall.

So, traditionally we start with the design of the main interface. It is through him that you will need to work with the cache if you need the logic CB. It should be as simple as possible:

public interface CircuitBreakerService { <T> T getStableValue(StableValueParameter parameter); void evictValue(EvictValueParameter parameter); }

Interface options:

 @Getter @AllArgsConstructor public class StableValueParameter<T> { private String cachePrefix; //    private String objectCacheKey; private long crpInSeconds; // Cache Refresh Period private Supplier<T> targetServiceAction; //      private DisasterStrategy disasterStrategy; //   : CRT ,   ,     public StableValueParameter( String cachePrefix, String objectCacheKey, long crpInSeconds, Supplier<T> targetServiceAction ) { this.cachePrefix = cachePrefix; this.objectCacheKey = objectCacheKey; this.crpInSeconds = crpInSeconds; this.targetServiceAction = targetServiceAction; this.disasterStrategy = new ThrowExceptionDisasterStrategy(); } }

 @Getter @AllArgsConstructor public class EvictValueParameter { private String cachePrefix; private String objectCacheKey; }

So we will use it:

 public AccountDataResponse findAccount(String accountId) { final StableValueParameter<?> parameter = new StableValueParameter<>( ACCOUNT_CACHE_PREFIX, accountId, properties.getCrpInSeconds(), () -> bankClient.findById(accountId) ); return circuitBreakerService.getStableValue(parameter); }

If you need to clear the cache, then:

 public void evictAccount(String accountId) { final EvictValueParameter parameter = new EvictValueParameter( ACCOUNT_CACHE_PREFIX, accountId ); circuitBreakerService.evictValue(parameter); }

Now the most interesting is the implementation (explained in the comments in the code):

 @Override public <T> T getStableValue(StableValueParameter parameter) { final Cache cache = cacheManager.getCache(parameter.getCachePrefix()); if (cache == null) { return logAndThrowUnexpectedCacheMissing(parameter.getCachePrefix(), parameter.getObjectCacheKey()); } //   .   CRT final String crtKey = parameter.getObjectCacheKey() + CRT_CACHE_POSTFIX; //  CRT  ,    final LocalDateTime crt = Optional.ofNullable(cache.get(crtKey, LocalDateTime.class)) .orElseGet(() -> DateTimeUtils.now().minusSeconds(1)); if (DateTimeUtils.now().isBefore(crt)) { //  CRT   ,     final Optional<T> valueFromCache = getFromCache(parameter, cache); if (valueFromCache.isPresent()) { return valueFromCache.get(); } } //  CRT  ,        return getFromTargetServiceAndUpdateCache(parameter, cache, crtKey, crt); } private static <T> Optional<T> getFromCache(StableValueParameter parameter, Cache cache) { return (Optional<T>) Optional.ofNullable(cache.get(parameter.getObjectCacheKey())) .map(Cache.ValueWrapper::get); }

If the target service is unavailable, we try to retrieve still relevant data from the cache:

 private <T> T getFromTargetServiceAndUpdateCache( StableValueParameter parameter, Cache cache, String crtKey, LocalDateTime crt ) { T result; try { result = getFromTargetService(parameter); } /* Circuit breaker exceptions */ catch (WebServiceIOException ex) { log.warn( "[CircuitBreaker] Service responded with error: {}. Try get from cache {}: {}", ex.getMessage(), parameter.getCachePrefix(), parameter.getObjectCacheKey()); result = getFromCacheOrDisasterStrategy(parameter, cache); } cache.put(parameter.getObjectCacheKey(), result); cache.put(crtKey, crt.plusSeconds(parameter.getCrpInSeconds())); return result; } private static <T> T getFromTargetService(StableValueParameter parameter) { return (T) parameter.getTargetServiceAction().get(); }

If there is no actual data in the cache (they were deleted by TTL, and the target service is still unavailable), then we use DisasterStrategy:

 private <T> T getFromCacheOrDisasterStrategy(StableValueParameter parameter, Cache cache) { return (T) getFromCache(parameter, cache).orElseGet(() -> parameter.getDisasterStrategy().getValue()); }

There is nothing interesting in removing from the cache; I will bring it here only to complete the picture:

 private <T> T getFromCacheOrDisasterStrategy(StableValueParameter parameter, Cache cache) { return (T) getFromCache(parameter, cache).orElseGet(() -> parameter.getDisasterStrategy().getValue()); }

There is nothing interesting in removing from the cache; I will bring it here only to complete the picture:

 @Override public void evictValue(EvictValueParameter parameter) { final Cache cache = cacheManager.getCache(parameter.getCachePrefix()); if (cache == null) { logAndThrowUnexpectedCacheMissing(parameter.getCachePrefix(), parameter.getObjectCacheKey()); return; } final String crtKey = parameter.getObjectCacheKey() + CRT_CACHE_POSTFIX; cache.evict(crtKey); }

Disaster strategy

This, in fact, is the logic that occurs, if the CRT has expired, the target service is unavailable, there is nothing in the cache.

I wanted to describe this logic separately, because many do not reach out to think and how to implement it. But this is, in fact, what makes our system truly sustainable.

Do not you want to experience that sense of pride in your child, when everything that can only fail, but your system still works. Even in spite of the fact that, for example, in the “price” field, the actual value of the goods will not be displayed, but the inscription: “is currently being clarified”, but how much better this is than the answer “500 service is unavailable”. After all, for example, the remaining 10 fields: product description, etc. you returned. How much does the quality of this service change? .. My call is to pay more attention to the details, making them more qualitative.

I finish the lyrical digression. So, the strategy interface will be as follows:

 public interface DisasterStrategy<T> { T getValue(); }

You should select implementation depending on a specific case. For example, if you can return some default value, you can do something like this:

 public class DefaultValueDisasterStrategy implements DisasterStrategy<String> { @Override public String getValue() { return "   "; } }

Or, if in a specific case you don’t return anything at all, then you can throw an exception:

 public class ThrowExceptionDisasterStrategy implements DisasterStrategy<Object> { @Override public Object getValue() { throw new CircuitBreakerNullValueException("Ops! Service is down and there's null value in cache"); } }

In such a case, the CRT will not be incremented and the next request will follow the target service again.

Conclusion

I adhere to the following point of view - if you have the opportunity to use a ready-made solution, and not to fence, in fact, a simple, but nevertheless a bicycle, as in this article, do so. Use this article to understand the principles of operation, and not as a guide to action.

There are many ready-made solutions, especially if you are using Spring Boot 2, such as Hystrix.

The most important thing to understand is that this decision is based on the cache and its efficiency is equal to the efficiency of the cache. If the cache is inefficient (few hits, many misses), then this Circuit Breaker will be just as ineffective: every cache miss will be followed by going to the target service, which may be in agony and agony at this moment, trying to rise.

Be sure, before applying this approach, measure the effectiveness of your cache. This can be done by “Cache Hit Rate” = hits / (hits + misses), should aim for 1, not 0.

And yes, no one bothers you to keep several variants of CB in your project at once, applying the one that best solves a specific problem.

Source: https://habr.com/ru/post/451858/

All Articles