Closures and JavaScript objects. Reinvent the interpreter

Usually, concepts or programming paradigms are explained either descriptively - “chewing” new ideas with simple words, or metaphorically - likening them to familiar objects and concepts. But neither the first nor the second method gives such an accurate and complete picture of the subject as a look from the point of view of low-level implementation.

When in learning a language you reach non-trivial things, it can be useful to shift the level of abstraction in order to understand how everything is actually arranged. After all, by and large, any constructions of languages of arbitrarily high level are reduced to the good old machine code. Writing in object-oriented or functional style is possible on pure C, and even in assembly language. Roughly speaking, any high-level language is a set of syntactic candies and chocolates fixed at the level of the compiler or interpreter. Increasing the level of abstraction allows you to write more complex programs with less effort, but to understand at the beginning of the path, what exactly is meant by inheritance or closure, how it all works and why it is much easier to figure out how all this is implemented.

JavaScript, like no other language, needs just such an explanation. The functional nature hidden behind the C-like syntax and the unusual prototype model of inheritance are at first very confusing. Let's mentally lower the JavaScript level to a simple procedural, like C. Starting from this “non-language”, we will reinvent functional and object-oriented programming.

Top-notch features

When the interpreter parses the text of the program, it creates in memory structures that contain the code and data for it that is directly understandable to the processor itself. At run time, the basic data structures of the interpreter are the call stack and the heap . The heap stores the program data, and the so-called stack frames (they are also activation frames, activation records, activation objects) are stored in the stack. As a rule, a stack frame contains the return address, references to function arguments, and memory blocks allocated for local variables. When the function finishes, the stack is reduced by one frame, and all the blocks in the heap referenced by the stack frame fields are made available to the garbage collector. But not everything is so simple.
')
Consider the following code:

//  1 var x = 10; var foo = function() { var y = 20; var bar = function() { var z = 30; return x+y+z; } return bar; } var baz = foo(); console.log(baz()); // 60

The variables x , foo and baz are global, and therefore are available everywhere, regardless of the stack depth. At the moment of calling foo , a frame with local variables y and bar appears at the top of the stack, then, when exiting foo , this frame is lost, and at the time of calling baz , only z activated in the activation frame. Where does the interpreter get y ? In the C language (and all its descendants), this problem is solved very severely - the declaration of nested functions is prohibited. In Pascal, on the contrary, there are nested functions, but there is no possibility to return a function at the output. When people say that functions in imperative languages are not first-class objects, this is what they mean. Functional languages, on the other hand, allow you to do whatever you want with functions (the example above is real in JavaScript). How do they do it?

When returning the nested function bar from the external function foo , the stack frame created when foo called is stored in the baz variable as part of the execution context of the function bar . Thus, a chain of scopes is formed that is completely independent of the call stack. The specific implementation of the mechanism for forming this chain may, of course, be very different from this simplified description, but the main thing is that the variable y exists while the baz exists, despite the fact that when exiting foo reference to it disappears from the stack.

The totality of all currently existing visibility chains (for each function defined in the current execution context, one chain) forms something like a three-dimensional tree. If we return several nested functions, the branches of the tree diverge to the sides, sharing the variables of the external function, and if we call the external function several times, the branch goes up, creating independent copies of the free variables:

 //  2 var x = 'I am global!'; var foo = function (y) { var z = 'unchanged'; var getXYZ = function () { return 'x: '+x+' y: '+y+' z: '+z; } var setZ = function(newZ) { z = newZ; } return [getXYZ, setZ]; //   ,      , ? } var a = foo('Alice'); // *1* var b = foo('Bob'); // *2* console.log(a[0]()); // x: I am global! y: Alice z: unchanged a[1]('changed'); // *3* console.log(a[0]()); // x: I am global! y: Alice z: changed *4* console.log(b[0]()); // x: I am global! y: Bob z: unchanged x = 'Everybody can see me!'; // *5* console.log(a[0]()); // x: Everybody can see me! y: Alice z: changed console.log(b[0]()); // x: Everybody can see me! y: Bob z: unchanged

At the point * 1 *, the "tree of visibility" forks at the node foo , since we returned two nested functions. They can communicate with each other through the variables y and z , which can be seen at the point * 3 *. At the point * 2 * we enter the function foo a second time, and the tree grows up - copies of all local variables foo . The “gray” getXYZ and setZ are also reported via y and z in the “gray” node foo , but they know nothing about y and z from the “black” node foo , which is clearly seen at * 4 *. At the same time, the variable x level above is visible to all the leaves of the tree of visibility (* 5 *).

Thus, at the time of writing the program, we define the structure of future trees of appearance, at the time of actually creating the branches of the tree, we can affect some variables inside it and leave them fixed (after calling foo('Alice') or foo('Bob') there is no way change the value of the variable y outside) and, as long as the tree branches exist, we can control its state only insofar as the leaves allow it.

The need for these additional interpreter data structures is one of the main differences between the internal structure of functional languages and imperative ones. Since such structures are created implicitly, without the direct participation of the programmer, only the interpreter itself can free the memory from them. Therefore, functional languages cannot exist without garbage collectors, and imperative languages can. By the way, the first garbage collector was written in 1959 for the Lisp language. Oh yeah, I almost forgot - the reference to a function with its chain of scopes is called a closure .

Encapsulation and inheritance

So, slightly modifying the interpreter, we turned the primitive procedural language into a full-fledged functional language and understood how the closures are arranged. Now it would be nice to teach him how to work with objects. Although the PLO itself is more familiar to most than closures, there are many more problems with it. The fact is that if in other languages there is a rigidly fixed set of constructions that uniquely defines the style and shades of the implementation of the object paradigm in this particular language, then in JavaScript you can fence anything and everything. And the city ... Any self-respecting author of a book on JavaScript considers it his duty to provide at least four different ways to organize hierarchies of objects in order to demonstrate to the reader “the power and expressiveness of language”. This is certainly cool, but familiar to Java, Ruby or C #, the brain is boiling from such anarchy. While I was not confronted with JavaScript, I did not feel any need to figure out exactly how all these object things work — they just worked, as written in the book. With JavaScript, this number does not work.

Let's “forget” that there are already objects in JavaScript, and we will call them structures, as in C, and properties of objects as members of structures. Also, for the time being, we will abandon the point notation and everywhere will refer to the members of the structure through square brackets. Since we already have first-class functions that can be handled as freely as any variables, it is simpler to construct a structure containing both data and a function for processing them:

 //  3 var obj = { x: 10, y: 20, foo: function () {return x + y;} }; console.log(obj['foo']()); // !

In fact, everything is a little more complicated. This example is not functional, since there is no x and y in the activation object (so called stack stack in JavaScript) of our foo() function. There are no them and up the chain, there is only the variable obj . To get to them, we will have to refer to them as obj['x'] and obj['y'] :

 //  4 var obj = { x: 10, y: 20, foo: function () {return obj['x'] + obj['y'];} } console.log(obj['foo']()); // 30

Earned! We put the data and the function that processes it into one structure. But very often we need several objects with the same device, the same functions, but different values of variables. Create a function that generates such structures:

 //  5 function createObj(x, y) { var obj = {}; obj['x'] = x, obj['y'] = y, obj['foo'] = function () {return obj['x'] + obj['y'];} return obj; } var obj1 = createObj(1, 2); var obj2 = createObj(3, 4); console.log(obj1['foo']()); // 3 console.log(obj2['foo']()); // 7

Since the createObj() function returns a nested function as part of the obj object, each time it is called, a closure is created that contains copies of x , y and foo that are independent of each other (the tree of scopes grows up). What we have done is already very similar on the full object. To note this matter, in the subsequent listings we will move to a more concise dot notation. But OOP is not OOP without inheritance. How to organize it? We could write a function that would copy all the properties of the parent object into a descendant object. Such inheritance is called cascading, but, strictly speaking, it is cloning rather than inheritance. Changes in the implementation of the parent will not affect the descendant; in addition, if each descendant contains copies of the parent's methods, this will lead to unnecessary memory consumption. Perhaps, it is better to simply store a reference to the parent in one of the properties of the child. We also need a function to search for properties up the inheritance chain:

 //  6 function createObj(x, y) { var obj = {}; obj.x = x, obj.y = y, obj.foo = function () {return obj.x + obj.y;} return obj; } function createChild (parent) { var child = {}; child.__parent__ = parent; return child; } function lookupProperty (obj, prop) { if (prop in obj) return obj[prop]; else if (obj.__parent__) return lookupProperty (obj.__parent__, prop); } var a = createObj(1, 2); var b = createChild (a); console.log(lookupProperty(b, 'y')); // 2 console.log(lookupProperty(b, 'foo')()); // 3

It seems to be an order, but if we change the object b , for example like this: bx = 10 , then we will see that nothing really works. The foo() method still refers to the properties of its object, not the descendant object. If we want to reuse methods during inheritance, we need to teach them to work with the properties of other objects. You can pass an argument to the method that points to the current object. It is also necessary to use the lookupProperty() function inside the method, because we do not know in advance whether the x and y properties are defined in the current object, or whether they will have to be searched upwards through the inheritance chain. The functions createChild() and lookupProperty() remain unchanged:

 //  7 function createObj(x, y) { var obj = {}; obj.x = x, obj.y = y, obj.foo = function (currentObj) { return lookupProperty(currentObj, 'x') + lookupProperty(currentObj, 'y'); } return obj; } function createChild (parent) { var child = {}; child.__parent__ = parent; return child; } function lookupProperty (obj, prop) { if (prop in obj) return obj[prop]; else if (obj.__parent__) return lookupProperty (obj.__parent__, prop); } var a = createObj(1, 2); var b = createChild (a); bx = 10; console.log(lookupProperty(b, 'y')); // 2 console.log(lookupProperty(b, 'foo')(b)); // 12

So, we have just implemented delegating inheritance, completely without using the built-in features of JavaScript. Our example is quite liberal - you can inherit an object, or you can create it from scratch, a descendant object can at any time break the chain of inheritance and cling to another object.

Let's make changes to our interpreter to support the PLO. Since inheritance is good, and it almost always makes sense to inherit anything from something, lookupProperty() should be made completely transparent by lookupProperty() it inside the interpreter. We will not see her again, but we will remember that she is.

Then, combine the createObj() and createChild() functions — they are quite similar. Both create a temporary object on entry and return it on exit. We include the combined function in the root Object . It will take two arguments - the parent object and the object describing the differences between the descendant and the parent (this approach is also called differential or differential inheritance).

Finally, in order to not pass the current object through the arguments with each method call, we will automatically provide a reference to the object to which the method belongs. Let's call it this , in accordance with the tradition of the PLO:

 //  8 var a = Object.create(null, { x: {value: 1}, y: {value: 2}, foo: {value: function() { return this.x + this.y; } } }); var b = Object.create(a, {x: {value: 10}}); console.log(b.x+', '+b.y+', '+b.foo()); // 10, 2, 12

We brought inheritance in accordance with the EcmaScript 5 standard. Unfortunately, the new standard does not work everywhere and is not very fast. In addition, millions of lines of code have already been written, in which inheritance is done in the old manner, via new . This scheme involves the use of constructors and prototypes. They say that Brendan Ike introduced it into the language, so as not to shock the programmers who are accustomed to classical inheritance with the simplicity and straightforwardness of the scheme described above. Perhaps a good smuggler would have come from Ike - he managed to push a functional language into mainstream programming, where the imperative languages had previously run the ball, disguised it with C-like syntax, and prototype inheritance, confusing it and making it look like a classical one.

For classical OOP, there is a rigid boundary between classes and instances. In the prototype OOP there is none at all, since there are no classes and any object can serve as a prototype. To soften this distinction, a special type of function was created - constructors. The hierarchy of constructors exists in parallel with the hierarchy of objects, each constructor "hangs" above its object, as a class above an instance. A constructor is not a prototype of an object, and a prototype of a constructor has nothing to do with the prototype of the objects created by these constructors.

Let's return to listing 7. To inscribe constructors in the inheritance scheme, we recall that everything in JavaScript, including functions, is an object. That is, we can add properties to functions as to ordinary objects. Add a prototype property that will point to the prototype of the created object in the case of a function being called as a constructor. Then, as in the transition to Listing 8, rename the temporary variables obj , child and currentObj to this , hide their declaration and return to the interpreter, remove the lookupProperty() to the same lookupProperty() . Since each constructor creates one specific type of objects, using a generic method like Object.create() meaningless, so we will call constructors by the type of objects they create, but with a capital letter, so as not to be confused with ordinary functions. To let the interpreter know that we want to call a function as a constructor, we add the keyword new in front of its name. Here's what we get:

 //  9 function A(x, y) { this.x = x, this.y = y, this.foo = function () { return this.x + this.y; } }; function B () {}; B.prototype = new A(1, 2); var b = new B(); console.log(b.x+', '+b.y+', '+b.foo()); // 1, 2, 3

The pair constructor + prototype (in this example: B()+B.prototype ) plays the same role as the class in the classic OOP. Notice that in Listing 8, the a object serves solely to inherit b from it, and in Listing 9, we have completely got rid of the variable a , which means that we do not need the A() constructor either. The x and y properties, which are different for each object, can be defined in the constructor B , and the common for all method foo() - in the prototype:

 //  10 function B (x, y) { this.x = x, this.y = y }; B.prototype.foo = function () { return this.x + this.y; } var b = new B(1, 2); console.log(b.x+', '+b.y+', '+b.foo()); // 1, 2, 3

The same for Object.create() :

 //  11 var B = { foo: function() { return this.x + this.y; } }; var b = Object.create(B, {x: {value: 10}, y: {value: 20}}); console.log(b.x+', '+b.y+', '+b.foo()); // 10, 20, 30

Conclusion

1. The execution of a JavaScript program is provided by three main data structures of the interpreter - the call stack, the scope chain and the inheritance chain. The stack is the oldest and most primitive structure. It is strictly linear ( although options are possible ... ), but it is quick and simple. The last two structures are more like trees than chains, but since only one function is performed at a time, and it can only move up through these trees, from its point of view these are precisely chains.

2. The structure of the tree of scopes is static and is set at the time of writing the program. The root of the tree is a global object. The growth of the visibility chain is the same as the stack growth, when you call the next function. But when returning to the calling function, the stack is always reduced, and the chain of scopes can continue to grow, returning the reference to the nested function and even start branching if there are several such functions or if the external function is executed several times. Such long-lived nodes form a circuit.

3. The inheritance tree grows from the Object object (UPD: in fact it grows from null. Since this object, to put it mildly, is not very informative, it is usually not taken into account. Thanks to azproduction for the amendment). The main purpose of the inheritance tree is to search for properties up the prototype chain, if the property is not present in the object itself, what the lookupProperty () function did in our examples. In the EcmaScript standard, this is the [[get]] method . The reference to the parent object (__parent__ in Listings 6 and 7) is called __proto__ in many implementations and is available to the programmer. But its use is considered bad form. The language standard does not provide for the possibility of changing parents, as in our self-made implementation of inheritance. Object.prototype and Object .__ proto__ are completely different things. Object.prototype is used only when calling a function as a constructor and sets the prototype of the returned object.

4. In JavaScript, there are no modifiers private, protected or public to hide the implementation of the object. However, such concealment can be implemented using closures. So or even so . However, this is a rather dubious practice - reading and testing such code is more difficult. In most modern dynamic languages, private is just a convention, and private properties can be accessed from outside if desired. In JavaScript, it is customary to denote private properties by the underscore: _private. In addition, modules are often used. This is a very convenient and practically standard alternative to particular methods and properties.

5. The keyword this indicates the current object, which is quite obvious in the case of constructors and methods of objects. In the case of a function not called as an object method, this default defaults to a global object. Although the creation, transmission, and return of the objects pointed to by this are hidden inside the interpreter, the call () and apply () methods have the ears of the currentObj variable shown in Listing 7: the first argument of these methods will be visible inside this function as this.

List of additional literature

Javascript closures
Javascript Core
Subtleties ECMA-262-3. Closures
The subtleties of using this
ECMA-262-5 in detail. Lexical environments: Common Theory
Learning Javascript with Object Graphs: Part 1 , Part 2 , Part 3
Object Oriented C (pdf)
Basics and Misconceptions About JavaScript

Source: https://habr.com/ru/post/125306/

All Articles

Closures and JavaScript objects. Reinvent the interpreter

Top-notch features

Encapsulation and inheritance

Conclusion

List of additional literature

More articles: