Analysis in R


Market Basket Analysis 






Market Basket Analysis is a modelling technique based upon the theory that if you buy a certain group of items, you are more (or less) likely to buy another group of items. For example, if you are in an English pub and you buy a pint of beer and don't buy a bar meal, you are more likely to buy chips at the same time than somebody who didn't buy beer.

Inspect the data :

getwd()
setwd("C:/Users/student.ASH05.000/Desktop/R")

data <- readLines("groceries.csv")
summary(data)
str(data)
                          
transactions <- data.frame(
  'ID' = NA,
  'Product' = NA
)
#adding data into transactions by treversing the vectors
rownum <- nrow(transactions)
for(i in 1:length(data)){
  for (item in strsplit(data[i],',')[[1]]){
    transactions[rownum+1,] = list(i,item)
    rownum <- rownum + 1
  }
}

transactions <- transactions[-1,]

str(transactions)
summary(transactions)
View(transactions)




range(lengths(strsplit(data,split = ',')))

1 32


length(data)

9835

Top 20 most sold products:

products <- transactions[,"Product"]
products <- as.data.frame(products)
install.packages("dplyr")
library("dplyr")
#
products <- products %>% group_by(products) %>% summarise(n = n())
products <- products %>% arrange(desc(n))
View(products)



Total sales a product accounts for:

total_products <- sum(products$n)
products$salesAccountPercent <- products$n / total_products
products$salesAccountPercent <- products$salesAccountPercent * 100
colnames(products) <- c("products","count","salesAccountPercent")
View(products)



install.packages("ggplot2")
library("ggplot2")
#  chart
png(file = "barchart.png")

barplot(products$count[1:20],names.arg=products$products[1:20],xlab="Products",ylab="Count",ylim=,col="blue",
        main="Sales chart",border="white")

dev.off()
# Give the chart file a name.
png(file = "city.jpg")

# Plot the chart.
pie(products$count[1:20],products$products[1:20])

# Save the file.
dev.off()

#apriori
trans <- split(transactions$Product,transactions$ID,"transactions")
head(trans)
install.packages("arules")
library("arules")
rules = apriori(trans,parameter = list(support = 0.001, confidence = 0.8, maxlen = 3,minlen=3))
rules <- sort(rules, by = "lift", desc = T)




Conclusion

Market basket analysis is an unsupervised machine learning technique that can be useful for finding patterns in transactional data. It can be a very powerful tool for analyzing the purchasing patterns of consumers. The main algorithm used in market basket analysis is the apriori algorithm. The three statistical measures in market basket analysis are support, confidence, and lift. Support measures the frequency an item appears in a given transactional data set, confidence measures the algorithm’s predictive power or accuracy, and lift measures how much more likely an item is purchased relative to its typical purchase rate. In our example, we examined the transactional patterns of grocery purchases and discovered both obvious and not-so-obvious patterns in certain transactions.







Comments