The developer of the Opera browser Opera published the first results of a study aimed at studying the structure of the network content. For this purpose, the company created an application called
MAMA (
Metadata Analysis and Mining Application ): working as a spider, it indexes markup and some other data from more than 3.5 million pages.
Statistical analysis of the data collected by MAMA allowed Opera engineers to draw conclusions about emerging trends in web development and how standards-based web technologies are applied in the network. Opera plans to bring its project to a higher level by developing a search engine based on already indexed data. Thus, Web-designers, browser developers and Web-engineers will be able to easily obtain information about the actual use of Web-technologies on the Internet.
Preliminary data published by the company provides interesting information on the use of specific HTML elements. Among the pages analyzed by MAMA, the most popular elements are
head ,
title ,
html ,
body ,
a ,
meta ,
img and
table . The least commonly used elements are
var ,
del, and
bdo .
')

The company also studied the prevalence of Rich Web Apllications, which are mainly related to the use of AJAX technology. The study showed that Adobe Flash is used on approximately 35% of all sites analyzed. It is most common in China (67% of sites), least of all in Denmark (25% of sites). XMLHttpRequest, which is the main AJAX, is used on 3.2% of all sites. A kind of record here set Norway, where the use of this mechanism was found on 10% of sites.
The study also showed that CSS is used quite widely: almost 80% of the resources it was found in one form or another. The most popular CSS properties are associated with color and fonts. JavaScript also keeps up with CSS and is used by 75% of Web resources.
Compliance?
Opera, among other things, decided to check the indexed pages using the W3C validation utilities to determine which number conforms to the standards.

The results showed that only 4.13% of all pages are valid. Another striking conclusion is that about 50% of the pages that contain the W3C compliance badge are invalid. Theoretically, the layout of such pages initially met the standards, but later it lost this property (for example, as a result of adding new content to the page).
The company's engineers tried to find out if there was any connection between the development tool and the validity of the pages. For this, the page's meta tags were analyzed. It turned out that the pages created using Apple iWeb are valid in 81% of cases. For comparison, only 3.4 percent of the pages created in Adobe Dreamweaver meet the standards.
The results of the research are very interesting, but the potential of the entire system has not yet been fully revealed. Opera’s attempt to develop a search engine based on MAMA data opens up even more amazing analysis possibilities that other projects can use in their own research and development.
“The Internet is fragmented, complex and prone to continuous growth. MAMA provides us with information on the intensity of application of various Web technologies. ”- says the vice-president of Opera, Snorre M. Grimsby. “We can use this information to test and ensure high compatibility, reliability and performance of our products. We want to share this technology with our colleagues so that they too can benefit from it. ”