📜 ⬆️ ⬇️

Statistics on sellers Yandex Market

Hello!

The statistical analysis of Vkontakte, given in a recent topic, encouraged me to share some of the results of my research into vendor accounts on Yandex.Market.

In the process of collecting statistics, 21052 accounts registered as of July 4, 2011 were analyzed. And here are the results of the work done.
Caution traffic.
')
What goals were pursued:

1. To gain experience of parsing sites using phpQuery, XPath;
2. Calculate the number of registered accounts on Yandex.Market;
3. Get the address of the seller's website (if any);
4. Learn Google PR, Yandex CY, Yandex CID, Alexa LP, IP address, from whois - created: and paid-till: domain settings;
5. Analyze the collected information.

Obstacles:

Faced the problem of restricting access over IP from Yandex. Attempts to solve it through a proxy failed. Therefore, information was collected in parts from under different IP. IP of local providers, server of friends, Internet traffic of CDMA operator of Ukraine PEOPLEnet were used. A total of 7 IPs were “banned”.

Progress:

A bot was written that accessed via links, such as market.yandex.ru/shop-info.xml?shop-id=xxx , and analyzed the resulting content. The Id value varied from 0 to 68545 (as of July 4, 2011, this was the last account. Calculated during the parsing).
Example. Under id = 155 is ozon.ru and there is a site in the name, and under id = 156 there is nothing (invalid id).

As a result, 21052 (31%) valid values ​​were obtained:
Out of the entire set of valid id - 14220 (68%) were given the site address, the rest are just the name of the store or company:

Domain Name Analysis:

Found 211 duplicate sites under different id of them:
17 duplicates - test.yandex.ru (some of them);
15 duplicates - sotmarket.ru (part of them);
6 duplicates - techhome.ru, teramir.ru;
5 duplicates - assistavto.ru, kubanpc.ru, ulmart.ru;
4 duplicates - dostavka.ru , h2odesign.ru, kupitswimtraner.ru, originalam.net;
3 duplicates - dsbw.ru, flamingo.ru, holodilnik.ru, kupithexbug.ru, superplayer.ru, techport.ru;
2 duplicates - 15 sites;
1 duplicate - 87 sites;

The question arises, why register more than once? Perhaps someone from the public will tell, but the fact of this is the place to be.

Distribution of domains by zones:

* Note: others - fm, eu, lv, am, cx, uz, lt, cc, ws, in.

PR, CY, CIT, Alexa LP, whois

With the help of the site seop.ru , Google PR, Yandex CY, Yandex CyIC was determined by substituting 10 addresses. Results parsed through XPath.

The number of sites in each level of Google PR:

* Note: n / a - not defined.
Google PR 8 - laptopshop.ru;
Google PR 7 - ozon.ru, tehnotrade.com.ua, biblioclub.ru, shop.tut.by, s7.ru

Top 20 sites Yandex CY:

It is noteworthy that TOP are headed by hosting companies.

The share of sites with the Yandex CY level from 0 to 100 is 13,205 (94%) and is distributed as follows:


The number of sites in each level of Yandex CIC:

* Note: n / a - not defined.
Surprisingly, there were no sites with level 1. This is probably a site calculation error. I just cite data obtained from open sources.
VIC 6 - hw.ru, hosting.rbc.ru, sport.lgg.ru, hc.ru, ozon.ru, peterhost.ru, host.ru, 3206080.ru, all-hotels.ru, host.ru.

Top 20 sites in terms of Alexa LP (less is better). Values ​​were taken directly from the site :


The presence of an IP address was determined using the php function gethostbyname:


The presence of created: and paid-till: fields from whois was determined using the opensource code of the phpwhois project:


Conclusion

That turned out such statistics. At the origins of the study was a task, received and executed on freelancing and interest, who is registered in Yandex.Market. I want to note that every day there are new registered sellers and the id index is growing.

I hope the information provided was useful or at least interesting for you. This is my first topic and the first experience of writing an IT article. I will be glad to accept constructive criticism and to hear your thoughts on this matter.
Thanks for attention!

PS

At the request of users, I give a link to the file with the collected database.

Source: https://habr.com/ru/post/124740/


All Articles