⬆️ ⬇️

Web analytics: analyze it! Part 2. Data Collection

Before analyzing these statistics, you need to understand how they were collected, which of them may be inaccurate, and why.



The server on the Internet receives requests from the user's browser and sends the data. For each page view, the server receives one request (for the body of the page), and then several additional (pictures, scripts, style sheets and other additional data to display the page). Scripts on the page can also generate requests to the server - including, to a separate server statistics.



The web server connects requests for the same user using sessions. When a new user accesses the server, he creates a new session identifier, which the user informs the server with each new page load. Typically, the session ID is transmitted to the server from a cookie - a variable file that the browser can save for a particular site.



What can the server know about the user?There are three main types of statistics systems. One, “passive”, is based on the analysis of server logs - records of requests made to the server. Each time a user accesses the server, he writes a set of data about this request to the event log. Typically, such a log is kept by the server for its own needs, so there is no extra load due to the collection of statistics, and not a single request remains unaccounted for. However, in order to obtain all the necessary data, the standard server configuration is not enough.

')

The second type of statistics collection system adds to the page shown by the user, additional code that queries the statistics system. When the Internet was young and the browsers were very stupid, they used pictures: to display a picture on the page, browsers made a request to the statistics server. This server has already made an entry in its journal, and then displayed a picture with numbers - it was precisely since then that the Runet tradition of “hanging counters” of 88x31 size went. This method is worse for everyone than the first, and was used only because site owners did not have access to their server logs.



When browsers everywhere learned to execute JavaScript scripts (mini-programs), the external counters became much smarter. In addition to the standard query, JavaScript was able to transmit a lot of new data to the statistics system: the resolution and color of the screen, the parameters of the operating system. They again got access to the referrer and were able to set cookies on behalf of the visited site, which made it easier to track user sessions. The most popular “fancy” counter of this kind is Google Analytics!



The external counter on JS also has flaws:

- with its help it is impossible to track file downloads from the site,

- it records only transitions to the pages loaded in full (since otherwise the code will not have time to be executed),

- it requires a modern browser and permission to operate the scripts,

- it does not work on mobile browsers (except Opera Mini and modern smartphones),

- in order to record the parameters of the internal “kitchen”, such as user account data, all this data needs to be distilled into the counter code, which is usually unsafe, difficult and, as a result, pointless. Imagine that this is a dating site, each of whose users has a profile with a bunch of parameters. To analyze the behavior of users with different parameters of the profiles, you need to link the questionnaires with the requests.



In such difficult cases, site developers themselves develop a system for recording statistics to which they add all the features they need. The advantages of such a system are its infinite flexibility. This also leads to a major drawback: the need to write data analysis tools manually for such a system. So developers, whose needs are met by ready-made systems, try to use them.

Server logsExternal statisticsRecord statistics
Session tracking- (difficult enough realizable)++
Record all visits+- (only browsers with JS enabled and full pages)+
Tracking uploaded files+-+
Tracking search and other bots+-+
Communication visits with internal data site--+
Comparison with data from other sites-+-
Ability to track transactions and sales funnels-++
Tracking events that do not lead to requests to the server-++


The first and most important thing to remember when working with statistics from the Internet: accurate and complete data is often very difficult to obtain. I will clarify inaccuracies as metrics are listed. A key analyst skill is the ability to distinguish important from unimportant constraints.
For example, external statistics systems based on JavaScript will not work for users with very old browsers or scripts disabled for security reasons. In most cases, this is acceptable: the share of such users is small (less than a percent). However, if you collect data on the corporate intranet in a company that disables JS from its employees, or you want to measure the percentage of users with disabled scripts, this method is no longer suitable.
General limitations of statistics collection systems:





Standard, popular systems most often use the second method. Most of all, we’ll talk about Google Analytics, and in the next section we’ll look at the main metrics available to its users.

Source: https://habr.com/ru/post/66404/



All Articles