📜 ⬆️ ⬇️

Google Analytics - bypass sampling and collect raw data

image
Hello!

If you have a visited site (more than 500 thousand sessions during the reporting period) or you build some complicated reports through the interface (segmentation, connection of additional parameters, frequent change of the reporting period) - Google Analytics starts saving its resources and includes sampling data. Details are well described in the official certificate . That is, in order to prepare a report for you, not all data is taken, but some part, for example 30%, and then proportionally the indicators are adjusted to 100% and displayed in your report.

Of course, in such cases there will be a discrepancy in the number of payments, transaction amounts and in the number of conversions. Check is easy - compare with numbers from a database or CRM.
')
Avoiding problems is easy - connect Google Analytics 360, but expensive.
Let's learn how to collect raw data using free Google Analytics.

// this instruction is not a solution to all your problems, we are getting acquainted with technology!


How reports are built in Google Analytics


To display the reports in the Google Analytics interface (hereafter GA), the following happens: data collection, data processing, report generation.

image

Data collection
According to the protocol Measurement Protocol GA collects information about all interactions: page views, user events, transactions.

Data processing
Based on information received about interactions (page views, other events) GA:

Report generation
When you open a report via the GA web interface or API, depending on the selected report, the system retrieves data from the repository and returns information.

How GA collects data - technical side


You add the code that GA provides you, or create a tag through the Google Tag Manager.

When this code is triggered in the user's browser, a ga object is created with the tracker. Further, through this tracker, the interaction is fixed - page view.
Interaction is fixed, so information is sent to the Google Analytics server using the Measurement Protocol .

If to simplify as much as possible: the information is transmitted to the GA server via a GET request of the format:

https://www.google-analytics.com/collect?v=1&_v=j67&a=1998834664&t=pageview&_s=1&dl=https%3A%2F%2Fhabrahabr.ru%2Ftop%2F&ul=en-us&de=UTF-8&dt=%D0%9B%D1%83%D1%87%D1%88%D0%B8%D0%B5%20%D0%BF%D1%83%D0%B1%D0%BB%D0%B8%D0%BA%D0%B0%D1%86%D0%B8%D0%B8%20%D0%B7%D0%B0%20%D1%81%D1%83%D1%82%D0%BA%D0%B8%20%2F%20%D0%A5%D0%B0%D0%B1%D1%80%D0%B0%D1%85%D0%B0%D0%B1%D1%80&sd=24-bit&sr=1920x1080&vp=1841x341&je=0&_u=SCCAgEADQ~&jid=&gjid=&cid=2098823486.1505375017&tid=UA-726094-1&_gid=1797204180.1524028566&cd1=habrauser&cd2=other&cd4=no&z=1479651106 


You can open in the browser the Panel for developers, the Network tab, filter the word “collect” and see detailed information on request.

image

That is, the data is transmitted via the Query String to Google Analytics:

 v:1 _v:j50 a:643761009 t:pageview _s:1 dl:https://habrahabr.ru/ ul:en-us de:UTF-8 dt:    /  sd:24-bit sr:1920x1080 vp:1109x966 je:0 fl:25.0 r0 _u:QCCAgEAB~ jid:1630561303 cid:774042187.1492148509 tid:UA-726094-1 cd1:guest cd4:no cd5:other z:1998272259 

Also, each request is accompanied by the transfer of ip-address, referrer, information about the user agent.

Any other interactions: events, transactions are also sent through this tracker. That is, the tracker is one and information about all interactions is sent through it (standard + those that you set up yourself).

Collect raw data


We have already figured out how GA sends itself data. It would be great to duplicate this data and save it to your storage.

Write a parser that will collect all the parameters that Google Analytics collects, connect to all events ... No, no bikes!

Before sending the information, the GA script performs a series of tasks . Just sending information to the server is one of the tasks. And to our joy, these tasks can be modified - to send data not only to Google, but also to an arbitrary URL.

How to do it


Choose the option through which you have a Google Analytics counter connected:

Analytics.js
The standard installation code analytics.js is:
 <script> (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) })(window,document,'script','https://www.google-analytics.com/analytics.js','ga'); ga('create', 'UA-XXXXXXXXX-X', 'auto'); ga('send', 'pageview'); </script> 


We finish the customTask task, as a result it turns out:
 <script> (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) })(window,document,'script','https://www.google-analytics.com/analytics.js','ga'); ga('create', 'UA-XXXXXXXXX-X', 'auto'); //  customTask ga('set', 'customTask', function(tracker) { //    sendHitTask. var originalSendHitTask = tracker.get('sendHitTask'); //     sendHitTask tracker.set('sendHitTask', function(model) { //     www.google-analytics.com/collect originalSendHitTask(model); //         var custom_tracking_url = '    ', hitPayLoad = '?' + model.get('hitPayload'), user_agent = '&user_agent='+ encodeURIComponent(navigator.userAgent), referrer = '&referrer='+encodeURIComponent(document.referrer); var final_tracking_url = custom_tracking_url + hitPayLoad + user_agent + referrer document.createElement("img").src = final_tracking_url; }); }); ga('send', 'pageview'); </script> 



Google tag manager
You need to create a custom JavaScript customTask variable:

 function () { return function(tracker) { //    sendHitTask. var originalSendHitTask = tracker.get('sendHitTask'); //     sendHitTask tracker.set('sendHitTask', function(model) { //     www.google-analytics.com/collect originalSendHitTask(model); //         var custom_tracking_url = '    ', hitPayLoad = '?' + model.get('hitPayload'), user_agent = '&user_agent='+ encodeURIComponent(navigator.userAgent), referrer = '&referrer='+encodeURIComponent(document.referrer); var final_tracking_url = custom_tracking_url + hitPayLoad + user_agent + referrer document.createElement("img").src = final_tracking_url; }); } } 


Now you need to add customTask field with {{customTask}} to your Universal Analytics tag:

image



The result is that we have added a new task to the Google Analytics tracker and with each interaction the information will be sent not only to Google Analytics, but also to your entry point.

Storage Configuration


For simplicity, I'll take Google Tables as a repository. Of course, for large amounts of data this is not an option at all. But we are familiar with the technology here, so for example, it will do.

We create the table, we set names to columns. Names must match the name of the parameters from the Query String that will be sent by the Google Analytics tracker:

image

Open script editing:



Add a script that, with each GET request, will parse the Query String and add values ​​to the table:

 function doGet(e) { record_data(e); } var SCRIPT_PROP = PropertiesService.getScriptProperties(); function setup() { var doc = SpreadsheetApp.getActiveSpreadsheet(); SCRIPT_PROP.setProperty("key", doc.getId()); } function record_data(e) { try { var doc = SpreadsheetApp.openById(SCRIPT_PROP.getProperty("key")); var sheet = doc.getSheetByName('Sheet1'); // select the responses sheet var headers = sheet.getRange(1, 1, 1, sheet.getLastColumn()).getValues()[0]; var nextRow = sheet.getLastRow()+1; // get next row var row = [ new Date() ]; // first element in the row should always be a timestamp // loop through the header columns for (var i = 1; i < headers.length; i++) { // start at 1 to avoid Timestamp column if(headers[i].length > 0) { if(!e.parameter[headers[i]]) { e.parameter[headers[i]] = ''; } row.push(e.parameter[headers[i]]); // add data to row } } sheet.getRange(nextRow, 1, 1, row.length).setValues([row]); } catch(error) { Logger.log(e); } finally { return; } } 


Run the setup () function and give access to the script:



In the “Who has access to the app” option, select “Anyone, even anonymous”.

As a result, you will receive a link to your Web App:



Copy the link and transfer it to the CustomTask script, in the variable custom_tracking_url.

Now, with all the configured interactions, the data will fall not only into GA, but also into your repository.



See how it works in realtime:
  1. Open the table .
  2. Open a test site .
  3. Walk through the site + stay tuned in the table.


Not all data


Since some data (for example, an IP address) does not arrive via the get parameter, but in the request headers you can parse it on the side of the receiving script.

With source / medium - you can also work, get it out of Pageurl and scatter it in different columns.

We will not dwell on this, I think that the idea is clear.

Why is this all about?


Source: https://habr.com/ru/post/353836/


All Articles