📜 ⬆️ ⬇️

Say a word about poor recursion, or all that you did not know and do not want to know about it

Recursion: see recursion.

All programmers are divided into 11 2 categories: who does not understand recursion, who already understood, and who learned to use it. In general, the guiller of me is exclusively cardboard, so you, the reader, will still have to comprehend the Tao Recursion on your own, I will only try to give a few magic pendels in the right direction.

Applied programming is always engaged in solving applied problems by applying the efforts of a programmer to achieve results in non-ideal conditions. It is on the basis of the imperfect nature of this world and the limited resources that the need for programmers develops: after all, some people need to help theorists shove their harmonious and beautiful theory into practice.

- How is it folded?
- Excellent! Only the hand sticks out of the suitcase a little.

It is precisely trying to place the harmonious theory of the algorithm into a hard backpack of real resources and you have to constantly cut to the living, repack, and instead of the beautiful and slim Fibonacci definitions:
')
def fib(n): if n<0: raise Exception("fib(n) defined for n>=0") if n>1: return fib(n-1) + fib(n-2) return n 

you have to fence all sorts of dirty khaki, ranging from:

  @memoized def fib(n): if n<0: raise Exception("fib(n) defined for n>=0") if n>1: return fib(n-1) + fib(n-2) return n 

And ending in general:

  def fib(n): if n<0: raise Exception("fib(n) defined for n>=0") n0 = 0 n1 = 1 for k in range(n): n0, n1 = n1, n0+n1 return n0 


So what is recursion?


Recursion, in fact, is proof by induction. We tell how to get a result for a certain state, assuming that we have a result for a different set of states, and also tell how to get a result in those states to which everything slides one way or another.

If you are waiting for guests and suddenly noticed a stain on your suit, do not be upset. This is fixable.
For example, vegetable oil stains are easily displayed with gasoline. Gasoline stains are easily removed with an alkali solution.
Stains from alkali disappear from the vinegar essence. Traces of vinegar essences must be rubbed with sunflower oil.
Well, how to remove the stains from sunflower oil, you already know ...

Now let's take a classic example: traversing a tree in depth. But no, let's consider another example: we need to print a tree of expressions in the form of a reverse Polish record. That is, for tree 1 we want to print “2 2 +” and for tree 2 “2 2 + 2 2 + *”.

tex
 \begin{tikzpicture} \node (is-root) {+} child { node {2} } child { node {2} }; \path (is-root) +(0,+0.5\tikzleveldistance) node {\textit{Tree 1}}; \end{tikzpicture} \begin{tikzpicture} \node (is-root) {*} child { node {+} [sibling distance=1cm] child {node{2}} child {node{2}} } child { node {+} [sibling distance=1cm] child {node{2}} child {node{2}} }; \path (is-root) +(0,+0.5\tikzleveldistance) node {\textit{Tree 2}}; \end{tikzpicture} 


As it is easy to see, the task turns into a simple “tree-by-depth walkthrough”: for each node we display the contents of all its child elements, after which we output the node itself. That is, the code will be:

 class TreeNode(object): def __init__(self, value=None, children=[]): self.value = value self.children = children def printTree(node): for child in node.children: printTree(child) print node.value, def main(): tree1 = TreeNode("+", [ TreeNode(2), TreeNode(2) ]) tree2 = TreeNode("*", [ TreeNode("+", [ TreeNode(2), TreeNode(2) ]), TreeNode("+", [ TreeNode(2), TreeNode(2) ]) ]) print "Tree1:", printTree(tree1) print print "Tree2:", printTree(tree2) print if __name__ == "__main__": main() 


It would seem that everything is fine! The code works fine as long as the tree meets the requirements: any node has an array of children (possibly empty) and some value. Who will say what else requirement for this tree?



I will not torment. Requirement: not very large depth of the tree. How so? That's how:

 def buildTree(depth): root = TreeNode("1") node = root for k in range(depth): node = TreeNode("--", [ node ]) return node def depthTest(depth): tree = buildTree(depth) print "Tree of depth", depth, ":", printTree(tree) def main(): for d in range(10000): depthTest(d) 

Launch, and oops! "Tree of depth 997: RuntimeError: maximum recursion depth exceeded." We climb into the documentation, and find the function sys.getrecursionlimit . And now let's move away from the world of interpreted languages, and move on to the world of languages ​​that run directly on the processor. For example, in C ++.

Mentally rewrite 1-in-1 this code in C ++ (I will leave this task to the reader as a warm-up), and try to find the limit when the application rests on the restriction ...

for the lazy
 #include <string> #include <vector> #include <iostream> using namespace std; class TreeNode { public: string value_; vector<TreeNode*> children_; TreeNode(const string& value, const vector<TreeNode*>& children): value_(value), children_(children) {} virtual ~TreeNode() {} }; void printTree(const TreeNode* node) { for(auto i : node->children_) printTree(i); cout << node->value_ << " "; } TreeNode* buildTree(const int depth) { auto root = new TreeNode("1", {}); auto node = root; for(int i = 0; i<depth; i++) { vector<TreeNode*> children; children.push_back(node); node = new TreeNode("--", children); } return node; } void depthTest(const int depth) { auto tree = buildTree(depth); cout << "Tree of depth " << depth << ": "; printTree(tree); cout << endl; } int main() { for(int d=60000;; d+=16384) { depthTest(d); } } 

Run ... "Bus error (core dumped)". Judging by gdb, at the moment of dropping a stack with a depth of 104790 frames. And what happens if we want to print not just in a row through spaces, but also output "{" and "}" around expressions? Well, that is, for tree 1, for the result to be {2 2 +} and for tree 2 - {{2 2 + {2 2 +} *}? Rewrite ...

 def printTree(node): opened = False for child in node.children: if not opened: print "{", opened = True printTree(child) print node.value, if opened: print "}", 


Nothing has changed, still falls when trying to print a tree with a depth of 997. And now the same thing, but on the pros ... Oops. Depth of the stack in the fall - 87327. Stop. We just added one local variable that does not affect the algorithm and the essence of what is happening, and the tree size limit was reduced by 17%! And now the funniest thing - it all strongly depends on the compiler options, on which platform it runs, in which OS and with which settings.

But this is not the funny thing. Let's imagine that this function is used by another function. All is well, if it is the only one - we can calculate how much the actual steps are less than the maximum depth. And if this function is used from another recursive? Then the capabilities of this function will depend on the depth of the other function.

In this way, our beautiful simple algorithm stops suddenly getting into our imperfect suitcase. Let the reader imagine how good it is to have such limitations in the service, which is launched in production and provides some service to unsuspecting hackers, who only do what they poke into this service with their dirty fuzzy testers.

So what's the problem?


The use of a recursive algorithm implies the use of an almost uncontrollable resource: a call stack .

First, we can’t know exactly how much has been used. Secondly, we can not know exactly how much is left. Thirdly, we cannot guarantee the availability of a certain size of this resource for each challenge. In the 4th, we can not fix the flow of this resource. Thus, we become dependent on the resource, which is damn difficult to control and distribute. As a result, we cannot guarantee any characteristics of this function / service. Well, if our service works in a managed context: java, python, .net, and so on. It’s bad if the service works in an uncontrolled environment: javascript (which is generally bad ). Even worse, if the service runs on C ++, and the depth of the recursion depends on the data transmitted by the user.

What to do?


If we are not working on a microcontroller, we can not worry about the size of the stack: for the usual chain of calls it should be enough. Provided, of course, that we care about the hygiene of local variables: large objects and arrays are allocated using memory (new / malloc). However, the use of recursion implies that instead of a limited number of calls, we will have them simply countable.

So, to get rid of the problems created by recursion, you can do the following (from simple to complex):
- Hard to limit the maximum size / format / number in the incoming data. Hi, zip bombs and their ilk - sometimes even a small incoming package can make a big fuss.
- Hard to limit the maximum depth of calls to some number. It is important to remember that this number should be VERY small. That is about hundreds. And be sure to add tests that verify that the program with this maximum number does not break. And with the maximum number on all possible execution branches (hello to allocating local variables on demand). And don't forget to check this test on different compilation options and after each build.
- Tightly limit the amount of stack used. Using complex bypass maneuvers and knowledge of the practical implementation of execution in hardware, you can get the stack size that is used now (such as taking the address of a local volatile variable). In some cases (for example, through libunwind in linux), you can also get the available stack volume for the current thread, and take a difference between them. When using this method, it is important to have tests that verify that the cut-off is guaranteed and with all the input data options — for example, it can be fun if the test is performed in one method that is recursive after 3-4 others. And it may fall in the intermediate ... But only in the release mode, after the inline of some functions, for example. However, tests for the maximum allowable complexity are also important here, so as not to accidentally cut off some of the valid input requests that clients actually use.
- The best way: get rid of recursion .

And do not lie, that you are free and holy - you are captivated and not free.
I opened the sky before you!
Times change their course - Look at the palms ...

Infinite sweetness of freedom
Reject freedom
Sergey Kalugin

Yes Yes. After comprehending the Tao of recursion, you also comprehend the Tao of non-recursion. Almost all recursive algorithms have non-recursive counterparts. Starting from more efficient (see above Fibonacci), and ending with equivalent, using a queue in memory instead of a call stack.

Canonical non-recursive implementation of traversing the tree in depth:
 def printTree(node): stack = [ (node, False, False) ] while len(stack)>0: i = len(stack)-1 node, visited, opened = stack[i] if not visited: for child in reversed(node.children): if not opened: print "{", opened = True stack.append( (child, False, False) ) visited = True stack[i] = (node, visited, opened) else: print node.value, if opened: print "}", del stack[i] 

It is easy to see that the algorithm has not changed, but instead of using the call stack, the stack array is used in memory, which stores both the processing context (in our case, the opened flag) and the processing context (in our case, before or after the children are processed). In cases when you need to do something between each of the recursive calls, or add processing phases. Note: this is an optimized algorithm that pushes all the children onto the stack at once, and that is why it is folding in the reverse order. This ensures that the same order as the original non-recursive algorithm is preserved.

Here is the same code, only written "in the forehead", keeping the context (at the same time, outputting commas between elements):

 def printTree(node): stack = [ (node, 0) ] while len(stack)>0: i = len(stack)-1 node, phase = stack[i] if phase < len(node.children): child = node.children[phase] if phase == 0: print "{", if phase > 0: print ",", stack.append( (child, 0) ) stack[i] = (node, phase+1) else: print node.value, if phase>0: print "}", del stack[i] 

Yes, the transition to non-recursive technologies is not completely free: we pay periodically more expensive - dynamic memory allocation for the organization of the stack. However, it pays off: not all local variables are saved to the “manual stack”, but only the minimum necessary context, the size of which can already be controlled. The second item of expenditure: code readability. The code written in a non-recursive form is somewhat more difficult for perception due to branching from the current state. The solution to this problem lies already in the area of ​​code organization: making steps into separate functions and competently naming them.

Misadventure


Despite the existence of a certain “de-taxation tax”, I personally consider it obligatory to be paid in any place where data is processed, in one way or another, received from the user.

And how do you not use recursion?

Source: https://habr.com/ru/post/256351/


All Articles