📜 ⬆️ ⬇️

Statistics using javascript libraries and CDN

Have you ever thought about such questions:


But I thought about it.
And not just thinking, but did a little research.
And he wrote a small extension for chromium, which, perhaps, will make life better or break the Internet .
Results inside.


Conclusions for the lazy, or TL; DR;


  1. 10% of the 300,000 most popular sites use Wordpress.
  2. Popular sites that use jQuery go to the library connection from the CDN. Every year the right guys more and more.
  3. The most popular versions of jQuery in the world: 1.7.x , 1.8.x , 1.9.1 , 1.10.2 .
  4. jQuery 1.7.x leads by a wide margin: each 4th connected jquery has version 1.7.1 or 1.7.2
  5. Google , jQuery and Cloudflare are the most popular CDNs.
  6. 89% of all downloads from Google CDN are jquery.

')

How it all began, or the prelude



I've become thoughtful - why browsers don't add popular js-libraries to their distributions ? After all, CDN is very good, one URL for the resource, caching, everything. But it is even better not to download static files at all, but to have them immediately in the browser.

As a response to the injustice of fate, this pattern of expansion of the structure was built, which is designed to speed up the Internet.

But you can’t just put forward a couple of hypotheses and “fill in” the prototype to calm down and rest on its laurels: the brain requires evidence, facts and a cheerful movement ( yes, this is how I treat interesting research, although there were few in the process of preparing the data ).

Why investigate?


So there are a few ideas:


These ideas required confirmation. And during the implementation of the extension, I encountered additional questions:


What are we researching?


Initially, I wanted to use the Common Crawl package. But in view of the fact that this beast weighs 81 TB , and considering the amount of time and money that will have to be spent on its analysis, the beast was left alone.

A little later, I came across a wonderful article in which the author explored the Internet just for the topic that I needed.
The problem was that I did not find the right answers in the article, but I found the right tools!

Study


For the answers I needed, I used httparchive . This is a crawler data set that polls sites from the TOP 300,000 of the Alexa service. Those. we can say that this is a huge bunch of the most popular sites on the Internet.

I downloaded the freshest dataset for myself - the results of a site survey for March 1, 2014 .
Below I will give the results of the study and the requests that I used to obtain them.
You can compare my results with the results obtained a year earlier .

Number of sites loading jQuery from CDN

Hidden text
SELECT "jquery" AS name, count(distinct(pageid)) AS count, (100*count(distinct(pageid))/290835) AS percent FROM requests WHERE pageid <= 14802750 AND pageid >= 14489007 AND url LIKE "%//ajax.googleapis.com/ajax/libs/jquery/%" 


Nameamount%
jquery5997720.6223

Every year, the number of sites that use various jQuery CDN solutions grows. This means that progress does not stand still and people are aware of the coolness of such a decision.

Popularity of different versions of jQuery from Google CDN


In this case, I modified the original request. My goal is to examine the share of each version of jQuery in the total number of sites that generally connect jQuery. In the articles of other authors there are small problems that affect the visibility of the result:


Hidden text
 select SUBSTRING( url FROM POSITION("/libs/jquery/" IN url) + 13 FOR LOCATE("/jquery", url, POSITION("/libs/jquery/" IN url) + 13) - (POSITION("/libs/jquery/" IN url) + 13) ) as version, count(distinct(pageid)) as count, (100*count(distinct(pageid))/59977) as percent from requests where pageid >= 14489007 and pageid <= 14802750 and url LIKE "%//ajax.googleapis.com/ajax/libs/jquery/%.min.js" group by version order by count desc; 


VersionNumber of inclusions%
1.7.2893814.9024
1.7.1684211.4077
1.8.356709.4536
1.9.155339.2252
1.10.252448.7434
1.8.238326.3891
1.4.236736.1240
1.3.225194.1999
1.5.222973.8298
1.6.419873.3129
1.4.419853.3096
1.6.216442.7411
1.6.113952.3259
1.5.111601.9341
1.9.09641.6073
1.8.18801.4672
1.10.18681.4472
1.8.08031.3388
2.0.35080.8470
1.2.64490.7486
1.7.04030.6719
1.4.13820.6369
1.11.03630.6052
1.4.33570.5952
2.0.02460.4102
1.6.02040.3401
1.6.31930.3218
1.3.11120.1867
1.5.01040.1734
1.4.0830.1384
1.10.0790.1317
2.0.2740.1234
2.1.0680.1134
1.3.0420.0700
2.0.1nineteen0.0317
1.2.3130.0217

An interesting trend is observed in the jQuery world - version 1.7.x leads from year to year by a wide margin .

Most popular CDNs distributing js-libraries.

ParameterNumber% of all sites
Total number of CDN requests7816026.8743

Hidden text
 select "Google"as name, count(distinct(pageid)) as count, (100*count(distinct(pageid))/78160) as percent from requests where pageid >= 14489007 and pageid <= 14802750 and url LIKE "%//ajax.googleapis.com/ajax/libs/%" UNION select "Yandex" as name, count(distinct(pageid)) as count, (100*count(distinct(pageid))/78160) as percent from requests where pageid >= 14489007 and pageid <= 14802750 and url LIKE "%//yandex.st/%" UNION select "Microsoft" as name, count(distinct(pageid)) as count, (100*count(distinct(pageid))/78160) as percent from requests where pageid >= 14489007 and pageid <= 14802750 and url LIKE "%//ajax.aspnetcdn.com/ajax/%" UNION select "JsDelivr" as name, count(distinct(pageid)) as count, (100*count(distinct(pageid))/78160) as percent from requests where pageid >= 14489007 and pageid <= 14802750 and url LIKE "%//cdn.jsdelivr.net/%" UNION select "Cloudflare" as name, count(distinct(pageid)) as count, (100*count(distinct(pageid))/78160) as percent from requests where pageid >= 14489007 and pageid <= 14802750 and url LIKE "%//cdnjs.cloudflare.com/ajax/libs/%" UNION select "jQuery" as name, count(distinct(pageid)) as count, (100*count(distinct(pageid))/78160) as percent from requests where pageid >= 14489007 and pageid <= 14802750 and url LIKE "%//code.jquery.com/%" group by name order by count desc; 


CDNCountPercent
Google6767186.5801
jQuery922211.7989
Cloudflare39965.1126
Yandex23793.0438
Microsoft13001.6633
Jsdelivr3240.4145

As we see, the lion's share of resources is connected from Google CDN .
Let's now look at the Google CDN profile. It will be interesting, but the result is predictable.

Profile download scripts from Google CDN


Hidden text
 select "jquery" as name,count(distinct(pageid)) as count, (100*count(distinct(pageid))/67198) as percent from requests WHERE pageid <= 14802750 AND pageid >= 14489007 and url like "%//ajax.googleapis.com/ajax/libs/jquery/%" UNION select "jquerymobile" as name,count(distinct(pageid)) as count, (100*count(distinct(pageid))/67198) as percent from requests WHERE pageid <= 14802750 AND pageid >= 14489007 and url like "%//ajax.googleapis.com/ajax/libs/jquerymobile/%" UNION select "angularjs" as name,count(distinct(pageid)) as count, (100*count(distinct(pageid))/67198) as percent from requests WHERE pageid <= 14802750 AND pageid >= 14489007 and url like "%//ajax.googleapis.com/ajax/libs/angularjs/%" UNION select "chrome-frame" as name,count(distinct(pageid)) as count, (100*count(distinct(pageid))/67198) as percent from requests WHERE pageid <= 14802750 AND pageid >= 14489007 and url like "%//ajax.googleapis.com/ajax/libs/chrome-frame/%" UNION select "dojo" as name,count(distinct(pageid)) as count, (100*count(distinct(pageid))/67198) as percent from requests WHERE pageid <= 14802750 AND pageid >= 14489007 and url like "%//ajax.googleapis.com/ajax/libs/dojo/%" UNION select "ext-core" as name,count(distinct(pageid)) as count, (100*count(distinct(pageid))/67198) as percent from requests WHERE pageid <= 14802750 AND pageid >= 14489007 and url like "%//ajax.googleapis.com/ajax/libs/ext-core/%" UNION select "jqueryui" as name,count(distinct(pageid)) as count, (100*count(distinct(pageid))/67198) as percent from requests WHERE pageid <= 14802750 AND pageid >= 14489007 and url like "%//ajax.googleapis.com/ajax/libs/jqueryui/%" UNION select "mootools" as name,count(distinct(pageid)) as count, (100*count(distinct(pageid))/67198) as percent from requests WHERE pageid <= 14802750 AND pageid >= 14489007 and url like "%//ajax.googleapis.com/ajax/libs/mootools/%" UNION select "prototype" as name,count(distinct(pageid)) as count, (100*count(distinct(pageid))/67198) as percent from requests WHERE pageid <= 14802750 AND pageid >= 14489007 and url like "%//ajax.googleapis.com/ajax/libs/prototype/%" UNION select "scriptaculous" as name,count(distinct(pageid)) as count, (100*count(distinct(pageid))/67198) as percent from requests WHERE pageid <= 14802750 AND pageid >= 14489007 and url like "%//ajax.googleapis.com/ajax/libs/scriptaculous/%" UNION select "swfobject" as name,count(distinct(pageid)) as count, (100*count(distinct(pageid))/67198) as percent from requests WHERE pageid <= 14802750 AND pageid >= 14489007 and url like "%//ajax.googleapis.com/ajax/libs/swfobject/%" UNION select "webfontloader" as name,count(distinct(pageid)) as count, (100*count(distinct(pageid))/67198) as percent from requests WHERE pageid <= 14802750 AND pageid >= 14489007 and url like "%//ajax.googleapis.com/ajax/libs/webfont/%" order by count; 


ScriptCountPercent
jquery5997789.2541
jqueryui1243718.5080
webfontloader46246.8812
swfobject23473.4927
prototype9931.4777
scriptaculous7871.1712
mootools4450.6622
angularjs3530.5253
dojo1860.2768
chrome-frame750.1116
ext-coresixteen0.0238
jquerymobileone0.0015

jQuery is really the most popular script. Bypasses the rest of the library in order ! ..
Notice the intriguing result? jQuery mobile is only connected on one site!
This is not a mistake, I checked three times :)

Approximate Wordpress Impact

In analyzing the data, I noticed a steady pattern that introduces noise into the results. Namely, an incomprehensible parameter in static queries :? Ver = xxx .
As it turned out, these are mostly WordPress tricks! It adds a parameter with a version to the statics.
In addition, there are several more characteristic patterns - some sites add cache basting to all resources, including statics from CDN.

Let's go back to WordPress. I found interesting patterns that allow you to enter simple heuristics and estimate how widespread wordpress is:

Using this knowledge, we obtain the following.
Hidden text
 select count(distinct(pageid)) as count, (100*count(distinct(pageid))/290835) as percent from requests where pageid >= 14489007 and pageid <= 14802750 and url LIKE "%jquery-migrate%.js\\?ver=%" or url LIKE "%jquery-migrate%.js\\?v=%"; 


Number of sites% of the total
2981910.2529

As you can see, more than 10% of the most visited sites in the world use wordpress.

PS During the study, no site was hurt. But the extension can break something. If you still decide to use it and find this behavior - write to me in a personal .
PPS If you have interesting questions, then ask them in the comments . I will update the article and add answers.

Source: https://habr.com/ru/post/215173/


All Articles