📜 ⬆️ ⬇️

Debugging a bug that does not play

On October 10, 2018, our team released a new version of the application on React Native. We are pleased and proud of it.

But what a horror: after a few hours, the number of crashes under Android suddenly increases.


10,000 crashes under Android
')
Our Sentry glitch monitoring tool is going crazy.

In all cases, we see an error like JSApplicationIllegalArgumentException Error while updating property 'left' in shadow node of type: RCTView" .

In React Native, this usually happens if you set a property with the wrong type. But why didn't the error manifest itself during testing? With us, each developer thoroughly tests new releases on several devices.

Also, the errors seem quite random, as if they fall on any combination of properties and the type of shadow node. For example, here are the first three:


It seems that the error occurs on any device and in any version of Android, judging by the report Sentry.


Most crashes under Android 8.0.0 are falling, but this is consistent with our user base.

Let's replay!


So, the first step before fixing a bug is to reproduce it, right? Fortunately, thanks to the Sentry logs, we can find out what users are doing before a crash occurs.

So, let's see ...



Hmm, in the overwhelming majority of cases, users simply open the application and - boom, fails.

Well, try again. Install the application on six Android devices, open it and exit several times. No failure! Moreover, it is impossible to reproduce it locally in dev-mode.

Well, that seems pointless. Failures are still quite random and occur in 10% of cases. It seems that you have a chance of 1 out of 10, that the application will fall at startup.

Stack trace analysis


To reproduce this failure, let's try to understand where it comes from ...


As mentioned earlier, we have several different bugs. And they all have similar, but slightly different traces.

Well, take the first one:

 java.lang.ArrayIndexOutOfBoundsException: length=10; index=-1 at android.support.v4.util.Pools$SimplePool.release(Pools.java:116) at com.facebook.react.bridge.DynamicFromMap.recycle(DynamicFromMap.java:40) at com.facebook.react.uimanager.LayoutShadowNode.setHeight(LayoutShadowNode.java:168) at java.lang.reflect.Method.invoke(Method.java) ... java.lang.reflect.InvocationTargetException: null at java.lang.reflect.Method.invoke(Method.java) ... com.facebook.react.bridge.JSApplicationIllegalArgumentException: Error while updating property 'height' in shadow node of type: RNSVGSvgView at com.facebook.react.uimanager.ViewManagersPropertyCache$PropSetter.updateShadowNodeProp(ViewManagersPropertyCache.java:113) ... 

So, the problem is in android/support/v4/util/Pools.java .

Hmm, we are very deep in the Android support library, you can hardly get any benefit here.

Find another way


Another way to find the root cause of the error is to check out the new changes made to the latest release. Especially those that affect native Android code. Two hypotheses arise:


At the moment we can not reproduce the error, so the best strategy:

  1. Roll back one of the two libraries. Roll it out for 10% of users, which is trivially done in the Play Store. Check with several users if the crash persists. Thus, we will confirm or disprove the hypothesis.


    But how to choose a library for rollback? Of course, you can throw a coin, but is this the best option?


    Getting to the bottom


    Let's analyze the previous trace more carefully. Perhaps this will help determine the library.

     /** * Simple (non-synchronized) pool of objects. * * @param The pooled type. */ public static class SimplePool implements Pool { private final Object[] mPool; private int mPoolSize; ... @Override public boolean release(T instance) { if (isInPool(instance)) { throw new IllegalStateException("Already in the pool!"); } if (mPoolSize < mPool.length) { mPool[mPoolSize] = instance; mPoolSize++; return true; } return false; } 

    There was a failure. Java.lang.ArrayIndexOutOfBoundsException error java.lang.ArrayIndexOutOfBoundsException: length=10; index=-1 java.lang.ArrayIndexOutOfBoundsException: length=10; index=-1 means that mPool is an array of size 10, but mPoolSize=-1 .

    Okay, how mPoolSize=-1 ? In addition to the recycle method above, the only place to change mPoolSize is the acquire method of the class SimplePool :

     public T acquire() { if (mPoolSize > 0) { final int lastPooledIndex = mPoolSize - 1; T instance = (T) mPool[lastPooledIndex]; mPool[lastPooledIndex] = null; mPoolSize--; return instance; } return null; } 

    Therefore, the only way to get a negative mPoolSize value is to reduce it when mPoolSize=0 . But how is this possible with the condition mPoolSize > 0 ?

    Put a breakpoint in Android Studio and take a look at what happens when you start the application. I mean, here is an if condition, this code should work fine!

    Finally, a revelation!



    See in DynamicFromMap static link to SimplePool .

     private static final Pools.SimplePool<DynamicFromMap> sPool = new Pools.SimplePool<>(10); 

    After several dozen clicks of the Play button with carefully placed breakpoints, we see that the mqt_native_modules streams access the SimplePool.acquire and SimplePool.release functions using React Native to control the style properties of the React component (below the component width property)



    But they are also accessed by the main thread main !



    Above, we see that they are used to update the fill property in the main thread, usually for the react-native-svg component! Indeed, the react-native-svg library began using DynamicFromMap only from version 7 to improve the performance of native svg animations.

    And-and-and ... the function can be called from two threads, but DynamicFromMap does not use SimplePool thread-safe way. “Thread safe,” you say?

    Thread safety, some theory


    In single-threaded JavaScript, developers usually do not need to deal with thread safety.

    On the other hand, Java supports the concept of parallel or multi-threaded programs. Multiple threads can run within the same program and can potentially access the overall data structure, which sometimes leads to unexpected results.

    Take a simple example: the image below shows that streams A and B are parallel:

    • read integer;
    • increase its value;
    • return it.


    Stream B can potentially access the data value before Stream A updates it. We expected two separate steps to give a final value of 19 . Instead, we can get 18 . This situation, where the final state of the data depends on the relative order of the flow operations, is called the race state. The problem is that this state does not necessarily occur all the time. Perhaps in the above case, stream B has another job before increasing the value, which gives sufficient time for stream A to update the value. This explains the chance and inability to reproduce the failure.

    The data structure is considered thread-safe if operations can be performed simultaneously by multiple threads without the risk of a race condition.

    When one thread reads for a particular data element, another thread should not have the right to modify or delete this element (this is called atomicity). In the previous example, if the update cycles were atomic, the race conditions could have been avoided. Flow B will wait for thread A to complete the operation, and then start itself.

    In our case, this can happen:



    Because DynamicFromMap contains a static link to SimplePool , several DynamicFromMap calls come from different streams, simultaneously calling the acquire method in SimplePool .

    In the illustration above, flow A calls the method, evaluating the condition as true , but it has not yet managed to reduce the value of mPoolSize (which is used in conjunction with flow B), while flow B also calls this method and also evaluates the condition as true . Subsequently, each call will reduce the value of mPoolSize , as a result of which the “impossible” value is obtained.

    Correction


    Studying the options for correction, we found a pull-request for react-native , which has not yet joined the branch - and it provides thread safety in this case.



    Then we rolled out the corrected version of React Native for users. Crash finally fixed, hurray!


    So, thanks to the help of Jenik Duplessi (contributor to the core of React Native) and Michael Sand (maintainer react-native-svg maintainer), the patch is included in the next minor version of React Native 0.57 .

    Fixing this bug required some effort, but it was a great opportunity to dig deeper into react-native and react-native-svg. A good debugger and some well-located breakpoints are important. I hope you learned something from this story too!

Source: https://habr.com/ru/post/432426/


All Articles