⬆️ ⬇️

Market basket analysis and association rules

In the continuation of the topic of Data Mining let's talk about how it all began. It all began with a market basket analysis.



From the BaseGroup glossary :



Market basket analysis is the process of finding the most typical shopping patterns in supermarkets. It is produced by analyzing the transactional database in order to identify combinations of products that are interconnected. In other words, goods are being detected, whose presence in a transaction affects the likelihood of the appearance of other products or their combinations.

')

The results obtained by analyzing the market basket, allow you to optimize the range of products and stocks, placing them in the trading floors, increase sales by offering customers related products. For example, if as a result of the analysis it is established that the joint purchase of pasta and ketchup is a typical pattern, then placing these goods on the same showcase can be “provoked” by the buyer to jointly purchase them.







Association Rules



To solve the problem of analyzing the market basket, associative rules of the form “if ... then ...” are used. For example, "if a customer has bought a beer, then he will buy chips." Each purchase is referred to as a “transaction”, based on a larger set of such transactions and builds a study of customer behavior.



Associative rules are a very simple and convenient form of knowledge recording. Once again I want to clarify that information about transactions is the original data, but the association rules obtained are the knowledge that helped big supermarkets save big money in the 80s.



Some metrics are used to characterize the rule:



Rule X-> Y has support s (support), if s transactions from D, contain the intersection of sets X and Y. The reliability of the rule shows what is the probability that X follows Y. Rule X-> Y holds with confidence c (confidence) if c transactions from D containing X also contain Y, conf (X-> Y) = supp (X-> Y) / supp (X).



For example: “75% of transactions involving bread also contain milk. 3% of the total number of all transactions contain both goods ". 75% is the confidence of the rule, 3% is support (support), or “Bread” -> “Milk” with a probability of 75% and support 3%.



As a rule, the obvious rules have high support and reliability (60% and more), but they are not de facto knowledge. The focus should be on the rules with the support of 5-10%, they can be the source of the idea of ​​a promotion or service.



Apriori algorithm



The basic algorithm that is used to obtain associative rules is the apriori algorithm. Its author is Rakesh Agrawal ( Rakesh Agrawal , now an employee of Microsoft Research).



The Apriori algorithm is designed to search for all the frequent feature sets. It is level by level, uses a wide search strategy and performs it bottom-up.



The search algorithm is as follows ( source ):

image

image



The main feature of the algorithm is the anti-monotony property ( source ):

image



Apriori uses one of the support properties, which states that the support of any element set cannot exceed the minimum support of any of its subsets. For example, the support of the 3-element set {Bread, Butter, Milk} will always be less than or equal to the support of the 2-element set {Bread, Butter}, {Bread, Milk}, {Butter, Milk}. The fact is that any transaction containing {Bread, Butter, Milk} must also contain {Bread, Butter}, {Bread, Milk}, {Butter, Milk}, and the reverse is not true.



Due to this property, the search is not “greedy” and allows you to process large amounts of information in seconds.

The classic apriori algorithm has already been modified several times, and work is underway to improve speed.



In addition to apriori, other algorithms are used:

Read more about them here .



Solving other problems



The classical task of analyzing the market basket has been transformed into new tasks that need to be solved in our time, namely:



Software implementations





Literature



Source: https://habr.com/ru/post/66016/



All Articles