In the
first part of the experiment, it was described why the
problem of fraudulent payments (
fraud ) is acute for all participants of the online-payments market, what difficulties to create their own
monitoring system of fraudulent payments (
antifraud-systems ) have to be overcome, and why for most merchants such systems -
Expensive pleasure , for which they are not always willing to pay.
Another complicating development of such systems is the fact that the antifraud-system is a business-critical system and its simple will either lead to stopping the business process (payment acceptance), or if the system works incorrectly, to increase the risks of financial and reputational losses for companies (online store, bank).
Therefore, the practices and approaches listed in the article are applicable not only on the merchant side, but on the side of other participants of Internet acquiring - aggregators, payment systems, banks. Moreover, the approaches listed in the article are often closed from the best practices community in the respective organizations.
')
This part will describe the requirements for an antifraud system, whose influence on the software architecture is significant.
Non-functional requirements
Quality attributes
About the selection of quality attributesI will not stretch the description with an explanation of why I have included those other attributes of quality, since such an explanation is obvious if we take into account the type of system being designed - business critical .
In addition, I deliberately will not give specific numbers on the time of availability of the antifraud-system and other quality attributes, since the article does not set itself the goal of discussing a single system. Instead, a set of approaches and principles underlying such systems is described.
Quality attributes:
- distribution ;
- resiliency ;
- high scalability ;
- reliability .
Legislative restrictions
Legislative limitations are one of the important factors determining the software architecture of the antifraud system.
Thus, according to the requirements of the PCI DSS standard,
you cannot store the full card number (PAN) * or security code (CVV). It is allowed to store the first six and last four digits of the card. Also, nothing prevents the generation of an internal unique identifier for customer cards. The holder's name and expiration date of the card are allowed to be transmitted only via secure channels.
* About PAN number storageIn fact, at a high level (somewhere on the 80th :) certification of the PCI DSS standard it is allowed to store the PAN in encrypted form.
In addition to the requirements of the PCI DSS standard, it is necessary to comply with the provisions of the Law on Personal Data (152-FZ).
Discussion of the whole variety of technical-bureaucratic procedures (with the ensuing legal subtleties) that are necessary just for storing and processing the last name, first name of the client will most likely take 10 sheets of instructions and 1.5 months of work for the implementation of these instructions (only a joke). Therefore the best way
do not create yourself extra work comply with the provisions of the 152-FZ - do not fall under its action.
In the designed antifraud system,
all software modules will work with the impersonal data .
Summing up, the restrictions are of a legal nature, we add to the system the following requirements:
- do not store PAN and CVV cards in any form;
- other payment data stored only in a secure form ;
- transfer information between the merchant (software client) and the antifraud system only through secure communication channels ;
- work only with impersonated data .
Functional requirements
API Requirement
To begin, consider the system requirements from the point of view of the outside world, i.e. software clients (
merchants ). Software clients interact with the antifraud system in accordance with the following API requirements:
Functional:
- Provide the client with an API to send payment information ;
- Return to the customer the result of the prediction whether the payment is fraudulent ;
- Provide the client with an API to adjust the results of the payment .
Non-functional:
- Provide a public client interaction protocol ;
- Interact with the client via secure communication channels .
Business requirements
From the point of view of the internal logic of the antifraud system, let us single out just one essential business requirement:
predict whether the transaction will be successful from the payment data .
In the process of implementing this requirement, we will try to prove that the payment will not pass. Consider the main reasons for the refusal to conduct a transaction: the payment data is incorrectly generated or the
transaction is fraudulent . Below we will analyze the verification methods for each of the listed reasons.
Validating Billing Information
You shouldn't hope that the merchant will properly verify your billing details. Regardless of whether it was
a user input error or malicious actions , identifying errors in the payment details at the early stages will help save CPU cycles as well as prevent noise from the learning model (it will be discussed later).
It is necessary to check whether the cardholder’s name contains at least 2 letters (dashes and digits in the name are acceptable), is the card valid (does the card have an expiration date), does the card number pass the Luna algorithm check.
Algorithm MoonAlgorithm Luna (Luhn algorithm) - an algorithm for calculating the check digit of the plastic card number. Designed to detect errors caused by inadvertent data corruption. Allows only with some degree of accuracy to judge the absence of errors in the card number.
Check whether the transaction is fraudulent
To identify the sign that a payment is fraudulent, there are a large number of
heuristics . Some companies boast a figure below 200 heuristics. Although I immediately suspect that some of these heuristics are either not supported by anything, or are the result of some other heuristic, or it is a crutch that allows better fitting the result to the training sample and not giving any effect on real data. A large number of heuristics gives only:
retrained model, incorrect recognition of whether a transaction is fraudulent and a decrease in application performance .
Therefore, I will list only the main and, in general, the
most efficient heuristics :
- one card - a lot of IP, and the opposite case: one IP - a lot of cards;
- one card - many purchases / unsuccessful attempts;
- one client - many cards (especially issued by different banks);
- one client - many indices, emails;
- the client's name does not match the name of the owner of the account on the merchant's site (if any);
- the client's country does not coincide with the country of the owner of the account on the merchant's website (if any);
- Payment occurs overnight (according to the local time of the client).
But "much" is how much? What time period (5 seconds or 2 weeks)? How to get around the problem that the weight of the filter x
1 in is not equal to the weight of the filter x
2 , and the values ​​of their weights should change dynamically during the operation of the application?
Often, the main approach is to naively assign a fixed value to one of the filters and then process these conditions in type constructs (this is pseudo-code, not 1C):
if (____ip > 4) { _ = ; return; } else { if (_____1_ > 5) { _ = ; return; } else {
I don’t even want to begin to list the shortcomings of this approach and the final cost of such a code, which will be formed from the losses from
false positives to the deviation of “decent” payments and
skipping the fraud with a small change in the strategy by fraudsters .
Therefore, the only correct solution would be to develop a system in which heuristic filters
are capable of self-learning both on the accumulated payment history and on new payments . Here we will have several
machine learning algorithms at once
: logistic regression, support vector machine, neural networks.Global filters
I call global filters lists, in which there is a payer in which, to conduct all other checks - the validity of billing data, fraud check - is meaningless. To such lists I
include blacklist of bank cards, IP, countries, merchants .
Global filters can be both static and dynamic, can be associated with both business rules (the merchant does not accept payments from the Arctic), and with the detection of anomalous activity (IP address).
Conclusion of the 2nd part
In the first two parts, we examined the main aspects of a predominantly non-technical nature, which should be considered when designing and developing a system for recognizing fraudulent payments.
We are going to create a
fault-tolerant, highly scalable, reliable antifraud service that will be “outside” open to software clients via the REST API (https), and “inside” will contain logic based on
machine learning methods . To give even more intrigue I will say that the service will work on one of the
public cloud platforms .
In the
next part, we finally
let's do business Consider the software architecture of an antifraud service, its modular structure and key details of the implementation of such a service.