We study and implement the algorithm of the correct observer pattern for react components

So, we continue to develop the observer-pattern. In the previous article from the old and very simple "observer" pattern, in small steps we came to mobx and wrote its mini-version. In this article, we will write a full-fledged version of mobx that implements the algorithm for updating dependencies in the correct order to avoid unnecessary calculations. I must say that attempts to describe this algorithm in Habré have been made earlier in the articles of Comrade vintage about atoms here , here , and here, but the last “correct” update procedure is not fully described there and this will be discussed in this article.

So in the last article, in order for the components of the reactor to automatically subscribe to the data that they render and when changing, the rerender of only the necessary components was called, we came to this modification of the observer pattern

let CurrentObservables = null; class Observable { listeners = new Set(); constructor(value){ this.value = value } get(){ if(CurrentObservables) CurrentObservables.add(this); return this.value; } set(newValue){ if(newValue !== this.value){ this.notify(); } } subscribe(listener){ this.listeners.add(listener) } unsubscribe(listener){ this.listeners.delete(listener) } notify(){ for(const listener of this.listeners){ listener(); } } } function connect(target){ return class extends (React.Component.isPrototypeOf(target) ? target : React.Component) { stores = new Set(); listener = ()=> this.setState({}) render(){ this.stores.forEach(store=>store.unsubscribe(this.listener)); this.stores.clear(); const prevObservables = CurrentObservables; CurrentObservables = this.stores; cosnt rendered = target instanceof React.Component ? super.render() : target(this.props); this.stores = CurrentObservables; CurrentObservables = prevObservables; this.stores.forEach(store=>store.subscribe(this.listener)); return rendered; } componentWillUnmount(){ this.stores.forEach(store=>store.unsubscribe(this.listener)); } } }

Let's reconsider a bit - let's take the logic of installing a global array inside the observer itself. This can be represented as, for example, a table cell in Google-Dox - there is a cell that simply stores the value and there is a cell that stores not only the value (which will be cached), but also the formula (function) for its recalculation. And at the same time, in addition to the formula for the recalculation function, we will add another function parameter to perform side effects, such as calling setState ({}) on the component when the value changes. As a result, we get the following Cell class.

 let CurrentObserver = null class Cell { reactions = new Set(); dependencies = new Set(); tracked = false; constructor(value, fn = null, reactionFn = null) { this.value = value; this.fn = fn; this.reactionFn = reactionFn } get() { if (this.fn && !this.tracked) this.run(); if (CurrentObserver) { this.reactions.add(CurrentObserver); CurrentObserver.dependencies.add(this); } return this.value; } set(newValue) { if (newValue !== this.value) { this.value = newValue; for (const reaction of this.reactions) { reaction.run(); } return true; } else { return false; } } run() { if(!this.fn) return; const currentObserver = CurrentObserver; CurrentObserver = this; const oldDependencies = this.dependencies; this.dependencies = new Set(); const newValue = this.fn(); CurrentObserver = currentObserver; for(const dep of oldDependencies){ if(!this.dependencies.has(dep)){ dep.reactions.delete(this); } } const changed = this.set(newValue); if (changed && this.tracked && this.reactionFn){ const currentObserver = CurrentObserver; CurrentObserver = null; this.reactionFn(); CurrentObserver = currentObserver; } this.tracked = true; } unsubscribe(){ for(const dep of this.dependencies){ dep.reactions.delete(this); } this.tracked = false; } } function connect(target){ return class extends (React.Component.isPrototypeOf(target) ? target : React.Component) { constructor(...args){ super(...args); this._cell = new Cell(null, ()=>{ return React.Component.isPrototypeOf(target) ? super.render() : target(this.props); }, ()=>{ this.forceUpdate(); //  setState({})  forceUpdate()           }); } render(){ return this._cell.get(); } componentWillUnmount(){ this._cell.unsubscribe(); } } }

Now find out the update modes of our observer. In the example above, we still have all the observers active - after the first time .get he subscribed to his dependencies and will be called every time a dependency changes its value. This mode is convenient for components that need to be updated every time the data on which they are subscribed changes, but there are so-called "cached" or "memoized" functions for which this behavior is undesirable. For example, there is an observatory const fullName = new Cell(()=>firstName.get() + lastName.get()) which should calculate the full name when either the first name or the last name changes. But what if after it is calculated to the fullName in the application, under some conditions, it is not necessary to apply? We will get an extra calculation and to avoid this, we can make it so that the component is not calculated immediately, but only when it is not addressed to it - when calling .get() .

The extra calculations are generally the key point when comparing libraries based on the model of "cells and formulas in a table". Unnecessary calculations may appear if the algorithm for determining which dependencies need to be called after the value has changed in the case of the rhomboid dependency diagram (when there are cycles in the dependency graph), which is wrong (as in our example above)

Let's consider this situation - there are four cells - firstName , lastName , fullName (which calculates the full name) and label (which displays either the name if it is long, otherwise the full name)

 const firstName = new Cell("fff"); const lastName = new Cell("lll"); const fullName = new Cell("", ()=>firstName.get() + " " + lastName.get()); const label = new Cell("", ()=>firstName.get().length <= 3 ? fullName.get() : firstName.get()))

Here, the simplest version of diamond-shaped dependencies is on fullName , the label depends on fullName , but the label also depends on firstName and it turns out like a cycle.
It is necessary to clarify that in the process we are interested in recalculating only the values of the label cell (for example, you need to render in the component), so if you suddenly need to calculate the fullName value for the label , then you don’t need to calculate it.

And here is the first bug - when changing firstName - in our Cell implementation, when we cycle, we call subscribers the label component will be calculated twice - the first time the firstName will call the label because it is signed directly, and the second time the label calculated when fullName changes its value. The first label calculation is not necessary because it contains temporary data - the new name and the old fullName . Accordingly, we need to get rid of unnecessary calculations and we can only do this by calling the subscribers in the correct order - first fullName and then label .

How can we do this? If you think that there are a couple of options.

One option is the "dirty-clean" method, which is described by the mobx author in his report on the mobx device ( https://www.youtube.com/watch?v=TfxfRkNCnmk ) (funny that the author actually lied because mobx doesn’t implement this is a more correct algorithm but more on that later).

In short, the algorithm consists of a method of distributing a function call according to a dependency graph and identifying the value of the "depth" of each dependency through the increment-decrement of the counter and then calling them in the order of increasing depth. Suppose that when a name is changed, the firstName cell will not immediately call subscribers in the loop, but set the value 1 to each of the listeners and call them so that each set the value of its subscribers to 1 more than that of itself. And so recursively. The fullName cell will receive the values 1 and the label cell will receive the value 2 because the counter is first labeled firstName cell and then the fullName cell. Now, after the recursive call has ended, the fistName cell calls the reverse procedure - reducing the counter recursively at its subscribers. And now the moment - after the counter reduction code has been called up, it is necessary to check if the value has returned to zero, then only then the cell should be recalculated. So, the label counter decreases from 2 to 1 (but is not calculated because it is not 0) then the fullName counter decreases from 1 to 0 and calculates fullName and only then the label itself is calculated because fullName after calculation causes the label counter to decrease from 1 to 0.

Thus, we obtained label calculation only once after all dependent cells update themselves and are of current importance.

Another option (which in fact is an optimized version of the first) would be the idea to call subscribers in order of increasing depth. Under the cell depth, we take the maximum depth of our dependent cells + 1 and a cell without a formula that has no dependencies will have a depth of 0. We get that firstName and lastName will have a value of 0, fullName will have a value of 1 and the label will have a value of 2 because the maximum value for subscribers ( fullName and firstName ) is 1, we get +1 and we get 2.

Now when the fistName cell updates its value, it should call its subscribers in increasing depth order — first fullName and then label . An array can be sorted each time it is called, or it can be optimized and inserted into a sorted array at the moment of adding a new dependency.

The depth value also needs to be updated each time a new subscriber is added by comparing its value with the current cell value.

In this way, we will get a call to subscribers in the correct order and avoid unnecessary calculations. Nearly...

In both cases, there is one very inconspicuous bug. The label cell formula does not just depend on firstName and fullName — it depends on them under certain conditions. If the value is firstName.get().length <= 3 then we output fullName but if the value is greater than 3, then we depend only on firstName . Now let's think about what happens when the firstName value changes from 4 to 3. The firstName cell updates its value and should call subscribers in depth order — there will be a fullName call that calculates its value and then a label call that calculates its value already having the fullName actual value. At first glance, everything seems to be correct. But if you think that the calculation of fullName is not really necessary here - because the value of fistName will be 3, which means that when the label last called, it will not need to call fullName.get() because the if branch will simply not be executed. Moreover, the next time when you need to call fullName its value will be irrelevant because between its call the lastName can be updated as many times as you like. Here you have a bug with the extra calculation. As a result, our algorithm with calling subscribers in the order of their depth does not work in the general case.

So, there is the “right” algorithm, which under no circumstances and tricky dependencies will not cause a double cell calculation. To begin with, I’ll give you a code that, in combination, is almost a full-fledged version of mobx (with the exception of an array and decorators) just 85 lines

 class Cell { reactions = new Set(); dependencies = new Set(); runned = false; constructor(value, fn = null, reactionFn = null, active = false) { this.value = value; this.fn = fn; this.reactionFn = reactionFn; this.state = fn ? "dirty" : "actual"; } get() { if (this.state !== "actual") this.actualize(); if (CurrentObserver) { this.reactions.add(CurrentObserver); CurrentObserver.dependencies.add(this); } return this.value; } set(newValue) { if (newValue !== this.value) { this.value = newValue; for (const reaction of this.reactions) { reaction.mark(true); } runPendingCells() return true; } else { return false; } } mark(dirty = false) { this.state = dirty ? "dirty" : "check"; for (const reaction of this.reactions) { if(reaction.state === "actual") reaction.mark(); } if (this.active) PendingCells.push(this); } actualize() { if (this.state === "check") { for (const dep of this.dependencies) { if(this.state === "dirty") break; dep.actualize(); } if(this.state === "dirty"){ this.run(); } else { this.state = "actual" } } else if(this.state === "dirty"){ this.run(); } } run() { if (!this.fn) return; const currentObserver = CurrentObserver; CurrentObserver = this; const oldDependencies = this.dependencies; this.dependencies = new Set(); const newValue = this.fn(); CurrentObserver = currentObserver; for (const dep of oldDependencies) { if (!this.dependencies.has(dep)) dep.reactions.delete(this); } const changed = this.set(newValue); this.state = "actual"; if (changed && this.reactionFn) { const currentObserver = CurrentObserver; CurrentObserver = null; this.reactionFn(!this.runned); if(!this.runned) this.runned = true; CurrentObserver = currentObserver; } } unsubscribe() { for (const dep of this.dependencies) { dep.reactions.delete(this); if(dep.reactions.size === 0) dep.unsubscribe(); } this.state = "dirty"; } } function runPendingCells() { for (const cell of PendingCells) { cell.actualize(); } }

And now the description:
Let the cell have three states - "actual" (which means that the value of the formula is relevant), "dirty" (which will mean that as soon as get() called, the cell should be recalculated) and "check". Now, as soon as the cell changes its value, it will not immediately trigger the calculation of subscribers in any order, but will mark its subscribers as "dirty". And those, in turn, will also mark their subscribers, but only with the value "check" and those, in turn, will also mark their subscribers with the value "check", and so on recursively to the end. That is, only subscribers of the cells that have changed will have the value "dirty" and all the rest until the end of the tree - the value of "check", and so that we do not get stuck in a recursive call, we need to cause recursion only for those cells that have not been marked ( actual ")

Further, when reaching the end of the tree - that is, the cell that has no more subscribers and is “active”, you must add such a cell to some global array of PendingCells . An "active" is a cell that represents not some kind of memosized function (the value of which may not be needed right now), but a reaction (for example, a component of the reactor) that should be launched each time any of the dependent cells changes its value.

As a result, when the cell has changed its value and caused this recursive process for its subscribers, we will have in the global array PendingCells some root cells that have no dependencies but which can directly or indirectly depend and appropriately or be recalculated (if all of the intermediate cells in the chain will change their value) or will not (if someone in this chain does not change its value when recomputing)

Now go to the second stage. A cell that has changed and caused a recursive process for its subscribers, calls a certain global function flush() which takes the cells that have accumulated in the global array PendingCells and will call their actualize() function. This function will be recursive and will do this — if the cell value is "dirty", it will recalculate its formula (and we remember that the value of "dirty" will be only cells that are direct subscribers to the cell that has changed, and all the rest will be set to "check"). If the value is "check", the cell will ask its dependent cells to be updated (call the actualize() method) and then check its value again and if it is equal to "check", then we change the value to "actual" and do not call recalculation if if it is "dirty", then we must respectively recalculate. At the same time, checking for "dirty" is necessary after calling "actualize ()" on each dependent cell, because if the cell is set to "dirty" it makes no sense to call for updating other cells and you can immediately interrupt the cycle and recalculate. And the fact that the other cells are not updated, it does not matter, because if the cell is accessed to get the value of the formula in the .get() method, the cell should check its value and if it is "check", then it should call this actualize() method and if "dirty" then recalculate appropriately. That's all, the end of the algorithm.

So, at first glance, the algorithm may seem complicated, but it is quite simple - when a cell changes its value, we have only 2 stages - the first stage is a recursive descent to mark as dirty (for the first level) and check for everyone else and the second stage is a recursive ascent which is the actualization of values.

Now I will clarify some unobvious moments.

The first is how does one avoid that bug with extra recalculation? This is because we do not have a hard condition to cause a recomputation of dependent cells in a cell that has changed. Dependent cells will be marked as dirty, and that’s all - they are calculated only when you need to know their value somewhere. That is, in the bug example, the fullName cell will simply be marked as "dirty" and then its value will not be computed since the label fulfills the condition firstName.get().length === 3 and the label will no longer depend on fullName .

The second is why such a strange action — inside the actualize() method — to check — if the value is "check", then call actualize() on the dependent cells both in the process and again after the cycle, re-check the value and if "dirty" then interrupt the cycle and cause recalculation and if "check" then reset after the cycle to "actual" and do nothing? The thing is that in the process of calling actualize() on dependent cells, some of them may have the value "dirty" and as we know they need to recalculate. And in the calculation there is a condition - if the cell has changed its value, then it should mark its listeners as "dirty". And so the cell that was previously “check” can, after updating its dependent cells, change the value itself when any of them change and, accordingly, need to check the condition again. But only in this case, if no dependent cells have changed their value, it means that the cell itself doesn’t have a meaning to be calculated and we change the value from "check" to "actual"

Well, now we can check the effect of this algorithm on our bug example. The firstName changes, the label and fullName cells are marked as "dirty" and only the label goes into the global array PendingCells because fullName not an "active" cell as a label but simply memorates its value and will be updated only when it is accessed and not immediately. Further, the label because "dirty" will immediately recalculate, but since firstName.get().length === 3 we do not need the fullName value and we will thus avoid unnecessary recalculation.

Honestly, the description of the algorithm takes up much more space than its implementation. This typescript code as well as the example with the reactant and the test for the calculation bug are in the repository ( https://github.com/bgnx/xmob )

Source: https://habr.com/ru/post/349022/

All Articles

We study and implement the algorithm of the correct observer pattern for react components

More articles: