Continuing the theme of client optimization
merge resources .
As was shown earlier, it is advantageous to divide resources into two groups: “core” (loaded on all pages) and “page resources” (loaded only on pages that use them). The problem is the choice of resources that should be included in the kernel - too large kernel can significantly increase the time of initial loading of the site.
There are three main types of page access:
')
1. bootstrap (both the kernel and the resources of the selected page are loaded)
2. loading a new page (the kernel is in the cache, the resources of the selected page are loaded)
3. loading the old page (both the kernel and page resources are in the cache).
In the third case, we cannot influence the download speed. In addition, to speed up the initial load, it is advantageous to reduce the volume of the core, so optimizing the case (1) worsens the boot time in case (2) and vice versa. What to do?
PDF formulasWe use the following notation:
- n_p - the number of pages;
- n_r - the number of resources;
- l is the value of the network delay;
- 1_r is a column vector of length n_r, containing only one;
- 1_p is a vector line of length n_p containing only one;
- d - data transfer time;
- a is a column vector of length n_r. The i-th element of this column corresponds to the i-th resource. In the event that an element is zero, the resource is included in the kernel, and otherwise in the page resources;
- B - a matrix of resource ownership pages. The columns correspond to the resources, the lines to the pages: b_ {ij} = 1, if page i uses resource j;
- S is the diagonal matrix of resource weights (s_ {ii} - resource weight i);
- p_1, p_2 is a line vector of page selection probability in the first and second cases.
Denote the ratio of the number of cases (1) to the total number of cases (1) and (2) as p_r (this value can be calculated from the web server logs or use the estimate 1 / n_p.
Denote the probability of choosing page i and its loading time as p_ {1i} and t_ {1i} in the first case and p_ {2i} and t_ {2i} in the second.
(p_ {1i} and p_ {2i} can be calculated by the web server logs). We get that the expectation of the resource load time of an arbitrary page.
M = \ sum_i (p_r t_ {1i} p_ {1i} + (1 - p_r) t_ {2i} p_ {2i}).
In our case, t consists of two parts: network delay l (in which we include [almost] fixed time costs, such as the time to establish a connection, send an HTTP request and receive response headers) and data transfer time d. When splitting resources into the core (index c) and page resources (index p), we get that
t_1i = l_ {ci} + l_ {pi} + d_ {ci} + d_ {pi},
t_2i = l_ {pi} + d_ {pi}.
Since kernel resources are shared across all pages.
l_ {ci} = l_ {c} = const
d_ {ci} = d_ {c} - does not depend on the page
In addition, l_ {pi} = l_ {p} = const, since all page resources are combined together. We get that
M = \ sum_i (p_r p_ {1i} (l_ {c} + l_ {p} + d_ {c} + d_ {pi}) + (1 - p_r) p_ {2i} (l_ {p} + d_ {pi })) =
= p_r (l_ {c} + l_ {p}) + (1 - p_r) l_ {p} +
+ p_r dd_ {c} +
+ \ sum_i (p_r p_ {1i} d_ {pi}) + (1 - p_r) p_ {2i} d_ {pi})
We cannot change l, so we have to minimize the expression
p_r d_ {c} + \ sum_i (p_r p_ {1i} d_ {pi}) + (1 - p_r) p_ {2i} d_ {pi}
The value of d is obviously proportional to the size of the resource s (fortunately, the resources do not affect the download speed of each other):
M ~ p_r s_ {c} + \ sum_i (p_r p_ {1i} s_ {pi}) + (1 - p_r) p_ {2i} (s_ {pi})
Writing the expression in the matrix form, we get:
p_r 1_r ^ t S (1_r - a) + p_r p_1 BS a + (1 - p_r) p_2 BS a =
= p_r 1_r ^ t S 1_r + (p_r p_1 B + (1 - p_r) p_2 B - p_r 1_r ^ t) S a
Obviously, the term p_r 1_r ^ t S 1_r is constant. Thus, the final expression, which we must minimize:
(p_r p_1 B + (1 - p_r) p_2 B - p_r 1_r ^ t) S a = ca
The value (p_r p_1 B + (1 - p_r) p_2 B - p_r 1_r ^ t) S can be calculated, since it does not contain unknowns. The result will be a row vector c. Since a by definition can contain only zeros and ones, the task becomes trivial: a_i = 1 if c_i <= 0.