📜 ⬆️ ⬇️

How to track file downloads from your site on WordPress



There was the task of tracking file downloads from the site (images, documents, videos, distributions, ...), because Regular statistics services cannot do this without changing the URL of the files. And the statistics should be visible in the usual place (for example, Google Analytics or FireBase).

After going through several plugins (most of them have the words Download and Manager in the title), I found that all of them are organized according to the principle of manually compiling a list of files for monitoring. And in many of them, protection against unauthorized downloads is implemented, which is redundant in this task. It would be possible to use them, but if there are many files, then in the end:
')

As a result, its own implementation was made in the form of a plug-in for WordPress, which simply indicates the directory (relative site path) and then monitors the downloads of its content.

Link to the free plugin here for those who have enough information above. Below are examples of statistics results and details of the technical implementation.

Where statistics are sent


For the time being, the two most basic places of aggregation of statistics are supported.

Google Analytics


Statistics are published in the form of messages (Events), in which a Category (Event Category) is set through settings, the URI to the file is specified in the Event Action, and the request parameters are specified in the Event Label if the corresponding setting is set. As a result, you can comfortably observe the download dynamics of each file in the catalog in the Google Analytics console.


Table in wordpress database


Mainly for debugging. It simply counts the number of downloads, the temporal dynamics are not visible. Table fields: IP, file URI, request parameters (if any) and counter. Data can be seen by any SQL editor (for example, phpMyAdmin).

Each entry is assigned an ID to delete them individually if necessary.


Interception file accesses


Uploading files is handled by the Apache Web server itself, so a handler has been made in .htaccess with redirection to a PHP script.

It looks like this:

<FilesMatch "\.(.*)$">
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{REQUEST_URI} !\.( htaccess|php|js|css )$
RewriteCond %{REQUEST_URI} ^/ mypath /(.*)
RewriteRule ^(.*) /index\.php\?seraph_dlstat_api=Get&uri=%{REQUEST_URI} [L,QSA]
</IfModule>
</FilesMatch>


Specially made exceptions for system files with types htaccess, php, js, css.

To minimize the response time, the script call is implemented via the seraph_dlstat_api parameter for index.php, which is checked almost immediately after downloading all the WordPress scripts needed for processing. This is done on the do_parse_request action hook — the very first callback after loading the entire working environment (running wp-load.php).

Next, the script processes \ registers the URI and returns the contents of the file through the readfile system function. Also, partial download of files via HTTP_RANGE is supported, where the file is already read by blocks.

Deferred data sending


To maximize the response time, asynchronous statistics sending is supported. When accessing the file, an entry is created in the database and the file is immediately returned to the client. And already on the WordPress ( WP Cron ) header triggering, the data is taken from the table and the statistics are sent.

For Google Analytics, this is valid because It supports asynchronous message reception by specifying the delay time .

By default, WP Cron is triggered when any page loads. You can configure WP Cron from the system scheduler to further optimize the response time.

Conclusion


As a result, for the client, the file download is indistinguishable from the standard processing by the Web server and now it is possible to track this.

I would appreciate any feedback.

Source: https://habr.com/ru/post/352270/


All Articles