Association Rule Learning ~ using Apriori and Eclat (R Studio) to predict Shopping Behavior
Apriori
Association Rule Learning: Apriori is one of the powerful algorithms to understand association among the products. Take an example of a supermarket where most of the person buys egg also buys milk and also baking soda. Probably the reason is they want to bake a cake for new year's eve.
So we can see there is an association between eggs, milk as well as baking soda. Now after knowing such association we simply put all the 3 things together in the shelf and that definitely will increase our sales.
Let’s perform Apriori with the help of an example. First, we will try with R then we will move on with Python.
Install and load the library ‘arules’ that contains ‘apriori’ function.
install.packages('arules'
library(arules)
dataset = read.csv(file.choose(), header = FALSE)
if we import our data set with our regular read.csv function we will not able to view ‘itemFrequencyPlot’ So we will use ‘read.transactions’
dataset = read.transactions(file.choose(), sep = ',', rm.duplicates = TRUE)
If we filter our dataset to ‘milk’ we can see items sold with ‘milk’
This doesn't mean we can simply do the COUNT and figure out the association of milk.
It's simply not appropriate when it comes to multivariate analysis.
Okay, assume u have figured out the association of milk. People who buy milk also buy item 1, item 3, item 5 by just filtering and having done with time-consuming count calculation.
What about Beer? People who buy Beer also buy milk and other products. So will you be able to find an association between both? Nope
So there is where Association Rule Learning: Apriori comes into the picture. Apriori will help us to find hidden patterns like this example. People who buy Milk, egg, baking soda to baking cake and will also buy Beer and snacks becoz there is a festival nearby.
Association Rule Learning has lots of applications to understand and predict the trend. One of the best commonly used examples is Movie Recommendation.
itemFrequencyPlot(dataset, topN = 10)
summary(dataset)
TopN =10 refers to the top 10 items.
The summary gives brief details of our data set attributes.
Now we need to understand 3 concepts support, confidence, and lift to apply Apriori
Support: refers to transactions containing the item / total transactions
Confidence: Transactions containing the item/total items
Another example in Movie Recommendation:
Number of movies the users seen / total user watch list
Lift: is confidence/support, refers what is the probability of likelihood the consumers will buy milk will also buy an egg, baking soda, and beer from a random new sample of the population.
So the lift is the improvement over the original prediction.
#Training Apriori on the data set
rules = apriori(data = dataset, parameter = list(support = 0.004, confidence = 0.3))
Support: 5*7/7500 =0.0046 i.e. a product that is purchased 5 times a day X 7 days in a week / total number of transaction
Confidence is totally based on business problems. However, if we give low confidence then it will give inappropriate Rules like people who buy milk also buy washing power. If we are not able to get any rules then try to reduce the confidence level as there might not be enough data for Apriori to generate rules.
#The results
inspect(sort(rules, by = 'lift')[1:10])
Herewith inspection we are looking into the first 10 rules but we need to sort by ‘lift” will give us 10 highest/strongest rules.
If people buy Pasta they will also buy escalope in 37% of the cases and again if people buy olive oil, tomatoes they will also buy spaghetti in 61% of the cases.
Amazing insights isn’t it?
There might have some rules that have no correlation is becoz of the ‘Support’ value. That is the times the item sold each day is higher so it will be reflected in the output.
Of course, we need all of our products to be in the rule. What we can do to fix this is by increasing the confidence value that’s it!
Alright let’s put all of the pieces together
#Apriori
#install.packages('arules')
library(arules)
dataset = read.csv(file.choose(), header = FALSE)
dataset = read.transactions(file.choose(), sep = ',', rm.duplicates = TRUE)
summary(dataset)
itemFrequencyPlot(dataset, topN = 10,col='blue')
#Training Apriori on the dataset
rules = apriori(data = dataset, parameter = list(support = 0.004, confidence = 0.3))
#The results
inspect(sort(rules, by = 'lift')[1:10])
Another quicker and simplified version of Apriori is the Eclat
In Eclat we only need the Support and the minimum set of items (minlen =2) that is a minimum of at least 2 items that have been purchased together
# Training Eclat on the dataset
rules = eclat(data = dataset, parameter = list(support = 0.004, minlen = 3))
One interesting fact if u have noticed, here we have 267 sets and before in Apriori we had Rules. Even though Eclat is considered to be Association Rule Learning, actually returns sets of items.
In our case at least/minimum 3 sets of items were purchased together.
inspect(sort(rules, by = ‘support’)[1:10])
Finally, in our inspect this time we don’t have ‘lift’ we will sort our result by ‘Support’ in descending/high to low order.
So in Eclat we get different sets of items that are been purchased frequently. together.
let’s put all the codes together
# Eclat
# install.packages('arules')
library(arules)
dataset = read.csv(file.choose())
dataset = read.transactions(file.choose(), sep = ',', rm.duplicates = TRUE)
summary(dataset)
itemFrequencyPlot(dataset, topN = 10)
#Training Eclat on the dataset
rules = eclat(data = dataset, parameter = list(support = 0.004, minlen = 3))
inspect(sort(rules, by = 'support')[1:10])
Done that’s it….!
Thanks for your time to read to the end. I tried my best to keep it short and simple keeping in mind to use this code in our daily life.
I hope you enjoyed it.
Feel Free to ask because “Curiosity Leads To Perfection”
Some of my alternative internet presences are Facebook, Instagram, Udemy, Blogger, Issuu, and more.
Also available on Quora @ https://www.quora.com/profile/Bob-Rupak-Roy
Stay tuned for more updates.! have a good day….
Comments
Post a Comment