Predictive analytics on the SCP platform

This is the third publication in the framework of the assistance to the participants of the competition “SAP Koder-2017” .

Each enterprise in the course of its life activity generates a significant amount of data, both “big” and not so much. These data can often be used to obtain new knowledge, which in turn can have a significant impact on the business development strategy or behavior tactics at some local points of work. Now, in connection with the development of computer technology and the growth of the volume of accumulated data, numerical methods have been greatly developed, allowing to extract useful information from the array of "raw" data and use it in various business scenarios.

The SAP Cloud Platform has, along with other built-in services, a predictive analytics toolkit that allows you to build and use built models in business tasks created on the platform (and beyond). The set of predictive tools included in the service at the date of post publication consists of the following elements:

Clustering - classic cluster analysis and segmentation of the database of objects with a large number of attribute classifiers;
Forecasts - building forecasts based on time series;
Key Influencers - search for the parameters that most affect the target function;
Outliers - search for non-standard patterns in the data set (identification of fraud, input errors, etc.);
Recommendation - building models of product recommendations based on the purchase history (checks);
Scoring Equation - construction and extraction of an equation that allows you to calculate the objective function analytically and embed it in your own application;
What If - a "what-if" analysis, which suggests the consequences of the commission of certain actions, based on the object's behavior history

The current list of methods and their description can be viewed by reference .
')
One of the tasks for the SAP Coder competition is to use the recommendation model. Here we will explain how to build a similar model in the Predictive SCP service. The first thing to start with is the preparation of data for the “training” of the model. In the case of the Recommendation service, training implies searching for pairs of goods (sold together) and building a list of recommendations to specific customers (for example, participants of the loyalty program).

Initial data

The initial data for building a model is simple - it is the store's cash checks. They should contain the following parameters:

userID - loyalty program participant number (unique identifier of the buyer)
itemID - product code (SKU)
purchaseDate - transaction date (check)

Downloading this data into the system without having access to the server's file system is easiest through the Import / Export function in HANA Studio (Eclipse). For this you need:

Prepare data in a CSV format file.
Create a new HANA MDC database on the SCP. In our case, it is called h1.
Create a data scheme in which we will conduct our experiments. We created a PROBA scheme
Create a table to put the original data, here PROBA.SALES_DATA. There should be several key fields in the table, in our case it is RID - loyalty program participant number, RDATE, TRIME, RDATETIME - date and time of the transaction in different formats, only RDATETIME is important, SKU - product article number.

Additionally, our database contains a table of PRODUCTS - of two fields - the code and the name of the product.
Upload the SALES_DATA table to a local disk using the Export function. Go down through the directory structure to the file with a description of the export
Place data in CSV format in data.csv file.
In the data.ctl file, change the CSV field separator to the required one
In the data.info file, change data on the size of the data.csv file and the number of lines in it.
Load data using the Import function and replace an existing object in the database.

Service setting

Before you start building a model, you need to make basic settings for the Predictive service. By default, the service is turned off, the first thing to do is to enable the service.

The system will ask if it is worth installing updates. The correct answer is yes. After that, you need to deploy the service on your user account, for this you need to enter the login and password used to log in to the SCP.

After deploying the service, click on the Java Dashboard link.

And we assign both specified roles to our development user: C4PA-User, C4PA-Admin

The next step is to link SCP Predictive service to our database.

To do this, it is desirable to create a technical user in the database, in our case PROBA_U

and assign the necessary authority to start the predictive service. When creating a user, HANA will ask the initial password to change it (and enter the database automatically on behalf of Predictive Service), you need to log in once on behalf of this user. To do this, you need to create a new connection to the cloud data source in HANA Studio and enter the database by changing the initial password.

After creating a technical user (or deciding to use an existing one), we bind the service to a specific database schema.

In this case, we use the data of the technical user, we leave the Data Source by default -.

After the service has been deployed to the developer’s account and the database has been linked, we restart the service — we press the Stop and Start buttons successively.

After restarting the service, a link to the java application will appear, allowing you to manage the service, monitor it and use it in application development.

After calling the link, the system offers us two panels, one for development, the other for monitoring the service.

When you click on the Administration panel, the application displays a lot of monitoring information about the Predictive service, which can be used for analysis, but for our case, the main panel is “Predictive Services API Documentation”

Building and using a recommendation model

Let's look at our checks in a more adapted form for the human eye. To do this, create a data view in the form

CREATE VIEW "PROBA"."SALESWPROD" ( "RID",
"USER_ID",
"RDATE",
"RTIMESTAMP",
"ITEMS",
"SKU_ID",
"SKU_NAME" ) AS SELECT
T0."RID" ,
t0."USER_ID",
T0."RDATE",
TO_TIMESTAMP(T0."RDATETIME"),
T0."ITEMS",
T0."SKU",
T1."SKU_NAME"
from "PROBA"."SALES_DATA" T0
inner join "PROBA"."PRODUCTS" T1 on T0."SKU" = T1."SKU_ID" WITH READ ONLY

We do not need this view to solve our problem, but it allows us to visually see receipts with sales positions (SKU)

Launch Predictive Services API Documentation. On this page of the Predictive Service application are collected all the mathematical methods included in it and access points to each of them (endpoints).

Start by creating a data source for the predictive model. To do this, click on the POST tab at the access point / api / analytics / dataset

and change the hanaURL parameter in the JSON template, with which all parameters are passed to the Perdictive service. Press POST, wait for server response with status 200. In response, the server also returns a JSON file in which it reports information about the connected source (number of lines, number and type of field, etc., and, most importantly, dataset ID. This ID is for us you need to remember, in the future we will use it when creating a recommendation model. Close this form by pressing the close button in the upper right corner.

We return to the main page of the Predictive Service and proceed to the creation of the model. To do this, click the POST tab at the link / api / analytics / recommendations / recommender. The recommendation model setup page opens. The parameters for building a future model are specified in JSON format. All possible parameters for the model are described in the documentation at https://help.hana.ondemand.com/c4pa/frameset.htm?ee805144d197482abef88bfad8d895da.html .

it

UserColumn - field with loyalty program participant number
itemColumn - SKU
dateColumn - transaction date
startDate - start date of data for calculation
endDate - the end date of the data to be calculated

Additionally, you can change the parameters describing the mathematics of the model. For our model, we take the following parameters:

We start building a model and get a response. The key point is to write the model ID.

Using the link / api / analytics / recommendations / recommender / {jobID} you can view the status of the constructed model, specifying 15 as the ID. For our model, the status is as follows

We see that the number of SKUs found in one basket with others is 2777 pieces, on the basis of which 9633 recommendations rules were found. Using the link / api / analytics / recommendations, you can test the resulting model. Here you must enter the following parameters:

itemList - SKUs already in the cart
maxItems - the maximum number of recommendations returned
recommenderID - ID of the model built in the previous step
userID - loyalty program member number

You can specify either both the itemList and userID parameters, or only one of them. If you specify only one parameter, the system will give a warning, but will allow you to continue working.

Check the model with SKU 5000267097428

in response we get

Let's see what it is

Thus, we find that when buying whiskey it would be nice to recommend the buyer also dry wine.

A recommendation model can also be launched in batch mode, generating a table of recommendations for all loyalty program users. To do this, click the POST tab on the link / api / analytics / recommendations / batch

Then we specify the table in which to put our recommendations.

And run the calculation. The service creates a table and for each user calculates the recommended product, which can be acquired with greater probability.

Thus, the Predictive service allows you to very quickly set up and use some of the most commonly used mathematical methods for building predictive models that are suitable for use in real business.

Source: https://habr.com/ru/post/327086/

All Articles

Predictive analytics on the SCP platform

Initial data

Service setting

Building and using a recommendation model

More articles: