How-To: Substitute asynchronous HTML / JS code using JS

The main task of the ad network management system is to insert the code of these networks into the code of end-user sites. In general, such systems can be used to solve a wide range of tasks - from A / B-testing the effectiveness of ads of various formats, to placing several types of advertising materials on several platforms in parallel or adding additional effects (mainly animation) to them. Everything is needed to simplify the management of advertising on sites and debug the analytics process, which ultimately results in an increase in revenue from online advertising when selling traffic.

At the same time, in the general case, it is practically impossible to describe which code belongs specifically to the advertising system, and which does not, therefore, in today's topic we will consider a more general task - introducing arbitrary HTML / JS code into the site code of the end user.
')

Task

Suppose that we have a view container and we need to implement loading into it code that is received from the server and contains HTML markup and JS scripts (can be either asynchronous or synchronous).

The task is to ensure the efficiency of the solution obtained for n similar containers that simultaneously exist on the same page.

Let's explain by example:

 : <script id=1> document.write("<" + "script id=2>" + "document.write(\"<\" + \"div>someresult1<\" + \"/div>\");" + "document.write(\"<\" + \"div>someresult2<\" + \"/div>\");" + "<" + "/script>"); document.write("<" + "script id=3 src='somescript1.js'>" + "<" + "/script>"); document.write("<" + "script id=4 src='somescript1.js'>" + "<" + "/script>"); document.write("<" + "script id=5>" + "document.write(\"<\" + \"div>someresult3<\" + \"/div>\");" + "document.write(\"<\" + \"div>someresult4<\" + \"/div>\");" + "<" + "/script>"); </script> somescript1.js: document.write("<" + "div>someresult1.1<" + "/div>"); document.write("<" + "script class='loaded' src='somescript2.js'>" + "document.write(\"<\" + \"div>someresult1.2<\" + \"/div>\")" + "<" + "/script>"); somescript2.js: document.write("<" + "div>someresult2.1<" + "/div>");

In this case, the scripts somescript1.js and somescript2.js are examples of scripts of the first and second level of nesting, respectively. In addition, somescript1.js models the behavior of the system if there is also any code in the body of the loaded script.

The system that should be developed should load the following code into the container:

 <script id="1"> document.write("<" + "script id=2>" + "document.write(\"<\" + \"div>someresult1<\" + \"/div>\");" + "document.write(\"<\" + \"div>someresult2<\" + \"/div>\");" + "<" + "/script>"); document.write("<" + "script id=3 src='somescript1.js'>" + "<" + "/script>"); document.write("<" + "script id=4 src='somescript1.js'>" + "<" + "/script>"); document.write("<" + "script id=5>" + "document.write(\"<\" + \"div>someresult3<\" + \"/div>\");" + "document.write(\"<\" + \"div>someresult4<\" + \"/div>\");" + "<" + "/script>"); </script> <script id="2"> document.write("<" + "div>someresult1<" + "/div>"); document.write("<" + "div>someresult2<" + "/div>"); </script> <div>someresult1</div> <div>someresult2</div> <script id="3" src="somescript1.js"></script> <div>someresult1.1</div> <script class="loaded" src="somescript2.js"> document.write("<" + "div>someresult1.2<" + "/div>") </script> <div>someresult2.1</div> <script id="4" src="somescript1.js"></script> <div>someresult1.1</div> <script class="loaded" src="somescript2.js"> document.write("<" + "div>someresult1.2<" + "/div>") </script> <div>someresult2.1</div> <script id="5"> document.write("<" + "div>someresult3<" + "/div>"); document.write("<" + "div>someresult4<" + "/div>"); </script> <div>someresult3</div> <div>someresult4</div>

Decision

To simplify further explanations, we introduce additional terminology:

Anonymous script — JS script, with or without an empty “src” attribute.
The download script is a JS script that is loaded from a third-party server and, therefore, has a non-empty attribute “src”.
Output pointer - indicates the element after which the content generated using document.write should be inserted.

Inserting HTML code does not cause any problems, but when inserting JS scripts, a number of pitfalls are found:

Synchronous scripts may contain document.write, which does not work in asynchronous mode.
Scripts can be downloadable, which makes it impossible for us to analyze their text.
Downloadable scripts refer only to global objects; accordingly, any settings of the surrounding space can only be global.
Scripts can spawn other scripts. In this case, synchronous scripts can generate other synchronous scripts that retain the ability to use document.write .

A rather obvious way out of this is to replace global document.write with a custom function that could work in a similar way.

 document.write = function(html){ … };

Here, in general, everything is clear, except for one thing: where exactly should our function insert the code that is the result of its work?

JavaScript is single-threaded, which means that for anonymous synchronous scripts, the following solution is suggested: when you receive a response from the server, set the global attribute (input pointer) for the container that will be used by the replacement function. If all scripts are synchronous and anonymous, then they should be substituted in turn, which, if the output pointer is correctly shifted, results in a correct result.

As a result, the entire processing cycle of the substituted code is implemented without breaks in the execution flow and does not cause questions.

Houston, we have problems

Everything is not so easy and simple in the case when we meet the downloadable script. In such a situation, it seems logical to stop the execution of all other scenarios until the moment of loading and finally executing the current loadable script. This scheme is guaranteed to work thanks to the onload event, but the speed of this solution is rather small, so you need to find a better way.

And there is such a solution, though it works only for browsers of the Interner Explorer family - this is the onreadystatechange event that allows you to create a wrapper for the loading script as a handler that moves the output pointer of our changed document.write to the location of the script before it starts and - if necessary - will restore the original output pointer after the completion of the script. Unfortunately, it will not be possible to go this way if we deal with any browser other than IE, since nowhere, except for the Microsoft brainchild, there is no support for events that occur after the script is loaded, but before it is executed.

Only one way remains - to make so that our function substituting document.write , could define itself, from what script it is called. And in most modern browsers (IE11, Firefox, Chrome, latest versions of Opera) for downloadable scripts this is possible, albeit with some reservations. Due to the fact that such scripts are executed in the global namespace, it is impossible to create a copy of the function for each loaded script. It would seem that this means that you can determine where to insert the result of document.write work only on the basis of the input parameters - the string.

This is so only at first glance. In fact, in all the browsers mentioned above, it is possible to get to the address from which the script that called our replaced document.write was loaded. This is done through the stack, from which you can get the desired address, and already at this address install the desired script.

Another difficulty

It seems that everything is fine - we have found an excellent solution to the problem, but again difficulties arise. First of all, in the event that we have several identical scripts, then it is necessary to somehow ensure their sequential execution in a previously known order. The second point is that if the script contains several document.write calls, then you still need to somehow guarantee the correctness of the results of each of them, because in the standard case, each function will write data right after its own script, and not after the last element created previous document.write from script.

It turns out that, among other things, after the function is triggered, you must also add a link to the last element created from this script.

Final stage

There is one more possible variant of the development of events - the presence of anonymous and downloadable scripts in one piece of code. Since under normal conditions, without any substitutions, document.write can be used only in a synchronous stream, in order for the executable code to produce the same result as for a regular sequential execution, we need to ensure that all the scripts are loaded and triggered in turn.

For anonymous scripts, this obviously turns out by itself, and for loaded scripts, you will have to interrupt the flow of our document.write substitution at the moment of loading wait and restore it with the onload event.

Consider an example from the beginning of the topic to get an understanding of the sequence of actions.

As a means of inserting the code, it is logical to use your own substitute document.write , since by this point you already know where to insert the results. Thus, we obtain the following order of execution:

The inserted script calls document.write to create a script 2 that will create the first two test divs.
Script 2 calls document.write to create someresult1 and someresult2.
Script 2 execution ends, control is returned to the original document.write. At the same time, due to the fact that the substitution is global, the output pointer looks at the created someresult2. Thus script 1 continues to create elements. Now script 3 is created and, since it is loaded, the execution of document.write is interrupted until the onload of script 3 is triggered. Previously document.write checks all other scripts for the presence of the same loading path and marks them.
Script 3 is loaded, it calls document.write, from which one of the methods we described (depending on the browser) detects the document.write output pointer. In IE, the output pointer is substituted when the code is loaded before it is executed; in modern browsers - using the stack directly at the time of calling document.write; for others, knowledge of the point of exit is ensured by the predictability of the execution order of the scripts (blocking). Document.write inserts someresult1.1 and marks script 3 for an output pointer.
Script 3 calls document.write, which defines the script that called it and, following the mark made by the previous call, shifts the output pointer, then creates the loaded script and someresult1.2. Execution is interrupted before loading and triggering the loaded script.
The loaded script is loaded and calls document.write, which defines an output pointer and creates someresult2.1.
The onload script of the loaded script fires, returning control to the processing code of the document.write script 3, which, in turn, terminates and triggers the onload event of script 3, which returns control to script 1.
Script 1 creates script 4, due to the global nature of document.write, at the moment of returning control, the output pointer gets better, taking into account the operations performed by the document.write function. Thus, script 4 appears at the end of an already created piece of code. Execution of document.write is interrupted with a preliminary indication that the script 3 that has not yet been created is being executed.
For script 4, the entire procedure is repeated, as already described for script 3 (paragraphs 4–8).
Control is returned to script 1, which creates script 5.
Script 5 calls document.write to create someresult3 and someresult4.
Control is returned to script 1.
Script 1 is completed.

With a quick glance from the side, it seems that there is nothing difficult in the described sequence, but remember that the execution thread was interrupted 6 times:

On the download of scripts 3 and 4 (obviously, instant, but formally, this is also a gap, and something can break into it).
Inside the scripts 3 and 4 (although in the example there is no gap between them, it may well be, because this is a downloadable script, the structure of which is generally unknown).
Two loaded script loads, and the second, although formal, leaves a gap in execution.

And the main trick is precisely that at every moment when calling document.write, the correct output pointer is used.

Conclusion

Now consider the final add-in, designed for the simultaneous insertion of n codes. In principle, the considered algorithm has no obvious contraindications to multithreading - one should only make a reservation that the structures that store scripts chains and current output pointers for various containers should be their own. So, we substitute document.write is not just a function, but a dispatcher, which prepares the context and only after that it will call our analogue document.write .

Accordingly, the choice can be offered two implementation schemes: either our analog document.write should be an object, and we use a dispatcher that controls n instances of such objects, or we store an array of n contexts, and our dispatcher simply sets a pointer to the current context for this analogue of document.write .

Thus, if we assume that there are two containers into which we are trying to install the example code, the order of execution will be almost the same - except that the second container will wedge in at the points of discontinuity, causing a change of context or work object. For example, after step 3 for the first container, steps 1 and 2 of the second container will follow. In step 3, the algorithm should detect that the script with exactly the same src is already being loaded, and abort, waiting for its execution. The first container is executed up to step 5 inclusive, after which it returns control to the waiting second container, which continues to be executed from step 3.

Subsequently, either the first or the second container continues execution to the next break point (depending on which of them finishes the execution earlier). Further sequence has already been reviewed and does not carry anything new.

That's all for today! Thank you all for your attention, we will be happy to answer questions in the comments.

Source: https://habr.com/ru/post/219713/

All Articles