This is the third part of the experiment to create a system of recognition of fraudulent payments (
antifraud-system ). The goal is to create an affordable (in terms of development and ownership) antifraud service that will allow several participants to make online payments - merchants, aggregators, payment systems, banks - to reduce the risks of fraudulent payments (
fraud ) through their sites.
In the
last part, we focused on the functional and non-functional requirements for antifraud service. In this part of the article we will consider the
software architecture of the service, its modular structure and key details of the implementation of such a service .
')
Infrastructure
The service consists of several applications running in Microsoft Azure. Placing using a cloud platform instead of on-premise placement will not only allow, at a minor time, to develop a service that meets all the requirements listed in the second part in the section “Non-functional requirements ->
Quality attributes ”, but also significantly reduces the initial financial costs of hardware and software. security.
Antifraud service consists of the following systems:
- Antifraud API Service is a REST service that provides an API for interacting with the Fraud Predictor ML service.
- Fraud Predictor ML is a fraud payment detection service based on machine learning algorithms.
- Transactions Log (transaction log) - NoSQL transaction information store.
In addition, the service has numerous software clients (
Clients ), which are merchant web applications, or js widgets that call the Antifraud API Service REST services.
The schematic diagram of the interaction of these systems is illustrated above.
Used architectural patterns
Infrastructure, along with the subject area and legislation, potentially carries with it a large number of restrictions that must be taken into account at the architectural level. And if we have already discussed the domain and legal restrictions in the previous parts of the article, then we will discuss the advantages and limitations associated with the choice of the Microsoft Azure cloud platform below.
Azure anti-fraud-system services used - Cloud service for web- / worker-roles, Azure Table, Azure Queue, Azure ML, etc. - in addition to the almost zero initial financial costs for infrastructure, the following advantages are out of the box:
- high availability : SLA not lower than 99.95%;
- storage reliability : high redundancy storage systems;
- storage security : ISO 27001/27002 certificates and others , including PCI DSS 3.0;
- fault tolerance : all working nodes can (recommended) run in multiple instances;
- scalability : it is possible to automatically scale the number of working nodes depending on the load, partitioning NoSQL-storage tables based on PartitionKey;
As a bonus, I consider:
- convenient monitoring of the application;
- deep integration with Visual Studio.
But to take advantage of all these advantages came about only thanks to the “sharpening” of the anti-fraud service architecture under the cloud, like this:
- web / worker sites are stateless ;
- horizontal partitioning to store structured or semi-structured data (Sharding Pattern [1]);
- network interactions occur only asynchronously and only with the use of retry-policies (Retry Pattern [1]);
- for the equalization of loads and guaranteed processing of tasks, message queues are used (Queue-Based Load Leveling Pattern [1]).
In addition, antifraud service is a
near real-time system, so when implementing antifraud service:
- We use data parallel algorithms (the simplest and one of the most efficient MapReduce );
- we use the Push'n'Forget approach for such places as saving a single record in the transaction log (the accuracy of the machine learning algorithm for one missing record from 10K successful will not have a strong influence);
- we avoid transaction log locks (any shared resources), which is achieved by adding the timestamp field to the transaction information;
- We "kill" (or at least do something with them) long requests .
You must also keep in mind that all cloud services have limitations:
- as a technical nature : the most frequent of them are the maximum number of requests per second, the maximum message size;
- and of a technological nature: the most serious of them are the supported protocols of interaction with PaaS-services.
Interaction between service components
For a merchant, a service is a REST service with which you can communicate using the https protocol - Antifraud API Service. The Antifraud API Service runs in a cluster consisting of several
stateless web roles (a web role in Azure — the application layer that performs the role of the web application).
The following sequence diagram describes the merchant's possible interactions with all subsystems of the antifraud service.
- Step 1. Submitting a request with payment information.
- Step 2. Transformation of the Model (in terms of MVC).
- Step 3. Sending a request to the service to predict the result of the payment.
- Step 4. Returning the result - whether the payment will be successful.
- Step 5. Saving data.
- Step 6. Return the result to the client.
- Step 7, 8. Recalculate and update the training set, retrain the model.
- Step 9-12 (optional). The client initiates the sending of a request with information about the result of the payment (in the case when the result of the prediction differs from the actual result of the payment transmitted in the request).
Consider each of the steps in more detail.
The request from the merchant goes to the controller (in terms of MVC) (Step 1). After that, the resulting model (in terms of MVC) passes:
- transformation from a controller model to a domain object;
- request to external geolocation services (Azure Marketplace), in order to find out the country by the payer index and the country by the IP of the host from which the request to withdraw funds from the card came;
- global filtering phase;
- verification phase for validity of billing data;
- preliminary analysis of the received transaction - we consider heuristics for timeframes 5 seconds, 1 minute, 24 hours;
- Concealment of personal data of the buyer and payment data - hashed the name of the cardholder, the name of the owner of the account on the merchant's website, the address of the payer, telephone, email.
- we delete unnecessary data - for example, data on the card’s validity period after step 4 will not be needed.
Heuristics, global filters, and validity of payment information were discussed in detail in the previous part of the article.
In step 2, the domain object is transformed into a DTO object, which:
- transferred to the service Fraud Predictor ML (step 3);
- after receiving a response from Fraud Predictor ML (step 4) information about the transaction and its result is saved to the transaction log (step 5) (about it just below);
- We return to the client the answer about the predicted result of the payment (fraudulent or not).
To improve the quality of the prediction algorithm, the client is available API clarify the results of the transaction. So, if the actual result of the payment was different from the value returned by our antifraud service, the merchant can report this by sending a request for clarification of the transaction results (step 9). Such requests:
- have the format <transaction_id, transaction_result, last_update_time>;
- processed by the Merchant API Service and after validation are placed in the Azure Queue (fault-tolerant service queues).
Requests are taken from the queue by one of the robots that represent a
stateless worker-role (the worker-role — in Azure, this is the application layer that serves as the processor).
Transaction Store
Both information about transactions and additional information on them (mainly statistics) are stored in the transaction log - long-term storage based on the Azure Table (service, which is a fail-safe NoSQL-storage (key-value)).
The transaction log consists of 2 tables:
- table with facts about transactions TransactionsInfo : transaction id (Row Key), merchant id, hash of the cardholder name (if available), amount and currency of payment, etc .;
- table with calculated statistical metrics of Statistics Statistics: how many times they paid from this card (several timeframes), from how many IP addresses, how the time interval was between payments, how long the buyer registered with the merchant, how many times made successful payments, etc.
At steps 7, 8, the model is retrained. A training set is data from a transaction log, since The log store contains the latest information on payments and their results. Overtraining can occur on a schedule, upon the appearance of a fixed value of new entries in the transaction log, on overcoming a certain threshold of incorrect predictions.
Details of the issue of learning the model of detection of fraudulent payments will touch on in the next final part.
Conclusion of the 3rd part
In this part, we discussed the antifraud service architecture, identified functional parts in it - Antifraud API Service, Fraud Predictor ML, Transactions Log, defined their areas of responsibility, as well as ways of interaction between them.
With the right approach to the architecture, deploying antifraud service in the Microsoft Azure cloud will significantly reduce the initial financial costs of the infrastructure, as well as reduce the time spent on issues related to system scalability, reliable data storage and high availability of services.
In the
next final part, we will continue to create an antifraud service that is much cheaper in terms of development and ownership costs than its counterparts — we will
develop the Fraud Predictor ML service, which is based on the Azure Machine Learning service and is the analytical core of the antifraud service.Useful sources
[1]
Cloud Design Patterns: Prescriptive Architecture Guidance for Cloud Applications , MSDN.