📜 ⬆️ ⬇️

We develop analytics system

This post opens a series of articles on the development of an analytical system for monitoring user actions. In the first article we will talk about how to collect the necessary data from mobile applications for android and ayos.

package Birdy::Stat::Stalin; # #  ,      #    ,      # # ######################################################## # ######################################################## # # !######### # # !########! ##! # !########! ### # !########## #### # ######### ##### ###### # !###! !####! ###### # ! ##### ######! # !####! ####### # ##### ####### # !####! #######! # ####!######## # ## ########## # ,######! !############# # ,#### ########################!####! # ,####' ##################!' ##### # ,####' ####### !####! # ####' ##### # ~## ##~ # # ######################################################## # ######################################################## 


We are Surfingbird advisory service. The more we understand the user - the more relevant recommendations we generate.

Google analytics, Flurry, Appsflyer - you can smear yourself from head to toe with existing analytical systems. You can build a magnificent dashboard on which to bring DAU, MAU, DNU, ARPU, K-Factor and a dozen more indicators - but all this will be only shadows on the walls of the cave. No analytics system will answer you the question WHY the user has left the application, what exactly justified his departure, it will only record the fact of the user's departure. You can not even write him a farewell email) Therefore, we decided that in order to answer this and similar questions - we need to know everything about users. In what sequence and with what interval, on which screens and which buttons did he press. How many seconds and what article did he read before turning around, spit and leave. What is the histogram reading the article. How much time was spent on each pixel and in which version of the A / B test. At some point, we realized that we needed Stalin.
')
First of all, we agreed on a data structure in which we want to transmit the tracked events. This structure is the same for the web, mobile and, running ahead - back-end databases (yes, there are a lot of them).

Events consist of the following basic components
  1. Action - user action, answers the question what he did
  2. Screen - screen, answers the question on which screen
  3. ContentType - content type, answers the question with what type of content was the interaction

And:



The measure is the count of events. By default, it is equal to one, but can be used for the pre-aggregation of single-type events for which there is no need for analysis in time.

This is a basic set from which we make a start.

In turn, measurements can be represented, for example, by a certain set of values, for example:

  // public enum Action { none, //   install,//     hit,//    clickon_surfbutton,//    clickon_volumebutton,//    //  open_surf,//   open_feed,//   open_popular,//   open_dayDigest,//    open_profile,//  open_settings,//  open_comment,//  // /  (/) registrationBegin_vk,//done registrationSignIn_vk,//done registrationSignUp_vk,//done registrationBegin_fb,//done registrationSignIn_fb,//done registrationSignUp_fb,//done registrationBegin_email,//done registrationComplete_email,//done // page_seen,//     page_click,//     (8 ) page_open,//  ( ) page_read,//    // share_fb,//done share_vk,//done share_sms,//done share_email,//done share_pocket,//done share_copyLink,//done share_saveImage,//done share_twitter,//done share_other,//done //   like,//done dislike,//done favorite,//done addToCollection,//done //   openPush,//done deliveredPush,//done //and so on } 


It can be noted that in the naming of dimensions, the possibility of pre-aggregation of similar values ​​is also sewn up, to facilitate further analysis in OLAP. Those. while remaining flat at the data collection level, it can be expanded into a two-level hierarchy at the Cuba level.

If you look at the data model, for example, in android, then any event can be represented as the following class:

  public ClassEvent (Action action, Screen screen, ContentType contentType, String contentID, String abTest1, String abTest2, String description, int count) { this.abTest1 = abTest1; this.abTest2 = abTest2; //and so on this.contentType = contentType; this.contentID = contentID; this.time = System.currentTimeMillis()/1000; this.deviceID = SurfingbirdApplication.getInstance().getDeviceId(); this.deviceType = "ANDROID"; String loginToken = SurfingbirdApplication.getInstance().getSettings().getLoginToken(); this.userToken = loginToken==null?"":loginToken; this.clientVersion = SurfingbirdApplication.getInstance().getAppVersion(); } @Override public String toString() { JSONObject jsonObject= new JSONObject(); try { jsonObject.put("clientVersion", clientVersion); jsonObject.put("action", action.toString()); jsonObject.put("screen", screen); jsonObject.put("contentType", contentType); jsonObject.put("contentID", contentID); jsonObject.put("time", time); jsonObject.put("deviceID", deviceID); jsonObject.put("deviceType", deviceType); jsonObject.put("userToken", userToken); jsonObject.put("abTest1_id", abTest1); jsonObject.put("abTest1_value", abTest2); jsonObject.put("description", description); jsonObject.put("count", count); } catch (JSONException e) { AQUtility.debug("EVENTERROR",e.toString()); } return jsonObject.toString(); } 


What does this look like in the application itself?

Any action on any screen is recorded as an event.
The easiest way to consider a fragment of my session in tabular form.


I started the session, clicked on the surf, uploaded some text editors to page 5, how many seconds did I read it, then went to the popular tab, started reading why the iPhone is three times more expensive than the android. Damn, yes, it was last night, by the way I did not understand why)

Something like this will look the same data, but after processing in OLAP:
image

But not the point. The next task that needs to be solved is integration with other analytics systems (by the way, we found out who is lying and approximately how much, but this is not about that) and the “packaging” of events in “packs”

On the android, we pack into packages of 50 pieces and, at the time of generation, add analytics to Google, for cross-checking:

  public void newEvent(ClassEvent.Action action,ClassEvent.Screen screen,ClassEvent.ContentType contentType,String contentId) { registerEvent(new ClassEvent(action,screen,contentType,contentId)); } public void newEvent(ClassEvent.Action action,ClassEvent.Screen screen,ClassEvent.ContentType contentType,String contentId,String abTest1,String abTest2,String description, int count) { registerEvent(new ClassEvent(action,screen,contentType,contentId,abTest1,abTest2,description,count)); } public void registerEvent(ClassEvent event) { Tracker t = getTracker( SurfingbirdApplication.TrackerName.GLOBAL_TRACKER); t.setScreenName(event.screen.toString()); Map<String, String> hits = new HitBuilders.EventBuilder() .setCategory("event") .setAction(event.action.toString()) .setLabel(event.action.toString()) .build(); t.send(hits); if (TextUtils.equals("",event.userToken) || TextUtils.equals("null",event.userToken)) { String eventsString = "["; eventsString+=event.toString(); eventsString+="]"; events.clear(); aq.ajax(UtilsApi.eventsCallBackBasic(this, "some_method", eventsString)); } else { events.add(event); if (events.size()>50) { sendEvents(); } } } public void sendEvents() { if (events.size()>0) { String eventsString = "["; for (ClassEvent event: events) { if (!eventsString.equals("[")) eventsString+=","; eventsString+=event.toString(); } eventsString+="]"; events.clear(); aq.ajax(UtilsApi.eventsCallBack(this, "nop", eventsString)); } } 


A very limited part of the events are performed with the basic authorization and sent immediately, all the rest are packaged in packs and sent either as they accumulate or at the time of completing the program - forcibly.

This is actually the “sketching of events on android”
  SurfingbirdApplication.getInstance().newEvent(ClassEvent.Action.install, ClassEvent.Screen.none, ClassEvent.ContentType.none, ""); SurfingbirdApplication.getInstance().newEvent(ClassEvent.Action.openPush, ClassEvent.Screen.page_parsed, ClassEvent.ContentType.siteShort,shortUrl); SurfingbirdApplication.getInstance().newEvent(ClassEvent.Action.registrationBegin_email, ClassEvent.Screen.start, ClassEvent.ContentType.none, ""); 


On Ayios, we tried a slightly different logic:

Events also accumulate into the stack and on any subsequent request - an array of accumulated messages is attached to it by a steam locomotive. If there are more than 50 events, we will force a request with the system method nop . Also, if the tracked event needs to be sent as soon as possible, you can force a nop request.

 //     AFHTTPRequestOperationManager - (void) POST:(NSString *)path parameters:(NSMutableDictionary *)parameters success:(void (^__strong)(AFHTTPRequestOperation *__strong, __strong id))success failure:(void (^__strong)(AFHTTPRequestOperation *__strong, NSError *__strong))failure { SBEvents *events = [SBEventTracker sharedTracker].events; if (events.count > 0) { parameters[@"_events"] = [events jsonString]; [[SBEventTracker sharedTracker] clearEvents]; } [super POST:path parameters:parameters success:^(AFHTTPRequestOperation *operation, id json) { // } failure:^(AFHTTPRequestOperation *operation, NSError *error) { // }]; } 


On the backend, events come to the module written in pearls, which actually lay out the records in the table. But this is not its only function, it also controls the integrity of the data. If suddenly an event comes from the client that is not known to Stalin - he puts it in a separate table that is processed later after the inconsistency has been removed (for example, after adding a new value to the appropriate enum)

Attention, the code on the pearl. Untrained people cause bleeding from the eyes and premature pregnancy.
 package Birdy::Stat::Stalin; use constant { SUCCESS => 'success', FAILURE => 'failure', UNKNOWN => 'unknown', CONTENT_TYPE_NONE => 'none', }; sub track_events { my $params = shift; return unless ref $params eq 'ARRAY'; return unless @$params; my ($s_events, $f_events, $u_events) = ([],[],[]); foreach (@$params) { my $event = __PACKAGE__->new($_); $event->parse; #      given ($event->status) { when (SUCCESS) { push @$s_events, $event; } when (FAILURE) { push @$f_events, $event; } when (UNKNOWN) { push @$u_events, $event; } } } __PACKAGE__->_track_success_events($s_events); __PACKAGE__->_track_failure_events('failure', $f_events); __PACKAGE__->_track_failure_events('unknown', $u_events); } state $enums = { 'action' => [qw/ install hit open_surf open_feed open_popular open_dayDigest open_profile open_settings open_comment registrationBegin_email registrationComplete_email page_seen page_click page_open share_fb share_vk share_sms share_email share_pocket share_copyLink share_saveImage share_twitter share_other like dislike favorite addToCollection openPush deliveredPush openDayDigestFromLocalPush error page_read none /], 'screen' => [qw/ none start similar surf feed popular dayDigest profile settings page_parsed page_image siteTag actionBar actionBar_profile actionBar_page actionBar_channel profile_channel profile_add profile_like profile_favorite profile_collection /], 'deviceType' => ['IPAD', 'IPHONE', 'ANDROID'], 'contentType' => [CONTENT_TYPE_NONE, 'siteShort', 'userShort', 'siteTag'], }; state $fields = [ sort (keys %$enums, qw/time deviceID clientVersion userId userLogin contentID shortUrl count description/) ]; sub parse { my ($self) = @_; my $event_param = {}; { my $required = [keys %$enums]; my $optional = []; #      #  - ,     unless ( $self->_check_params($required) ) { $self->status(FAILURE); return; } #   ,        #        , #      ,   ,   unless ( $self->_check_enum_params($required) ) { $self->status(UNKNOWN); return; } my $params = $self->_parse_params([@$required, @$optional]); $event_param = { %$params, %$event_param }; } { my $required = ['time', 'deviceID', 'clientVersion']; my $optional = ['userToken', 'count', 'description']; # contentID ,  contentType eq 'none' push @{ $event_param->{'contentType'} eq CONTENT_TYPE_NONE ? $optional : $required }, 'contentID'; #      #  - ,     unless ( $self->_check_params($required) ) { $self->status(FAILURE); return; } my $params = $self->_parse_params([@$required, @$optional]); $event_param = { #    ,      (map { $_ => undef } @$optional), %$params, %$event_param, }; } $event_param->{'time'} = Birdy::TimeUtils::unix2date( $event_param->{'time'} ); $self->status(SUCCESS); $self->params($event_param); return; } #  hashref    sub _parse_params { my ($self, $params) = @_; $params = [] if ref $params ne 'ARRAY'; my $result = {}; foreach my $key (@$params) { my $value = $self->params->{$key}; next unless $value; $result->{$key} = $value; } return $result; } 


In the process of implementation, we discovered some strange things. For example, some of the events came in the distant future, some in the past. It is easy to guess that all these users were happy owners of smartphones on android. But in general - everything was possible. The system regularly collects statistics and we barely have time to realize it.

In the next articles, we plan to elaborate on the method of analyzing content mastering, how to build a DWH / OLAP system of shit and sticks, and also tell you more about farewell letters and what funny results this leads to.

Source: https://habr.com/ru/post/250517/


All Articles