Compressing Rows
Parag Verma
6th Nov, 2022
Introduction
When we are dealing with Customer level data, at times it becomes difficult to get a high level view of whats happening. For instance, if we are trying to analyse the customer shopping journey based on transaction level data, then just by looking at individual records wont help a lot. What we require here is a compressed version of his shopping history across different days.
In this blog we will look at how dplyr and toString function can be used to generate such summaries
package.name<-c("dplyr","stringr")
for(i in package.name){
if(!require(i,character.only = T)){
install.packages(i)
}
library(i,character.only = T)
}
Step 1: Creating a data frame
Lets create a dummy data frame with details of shopping history for a customer.
df<-data.frame(Customer_Num=rep("Customer 1",5),
Day=c("Day 1","Day 5","Day 7","Day 12","Day 13"),
Product_Name=c("A","B","AB","C","X")
)
head(df)
Customer_Num Day Product_Name
1 Customer 1 Day 1 A
2 Customer 1 Day 5 B
3 Customer 1 Day 7 AB
4 Customer 1 Day 12 C
5 Customer 1 Day 13 X
Just by looking at the data we cant get a hang of whats happening.What we require here is a high level view of things. Maybe something like the below might give us better idea.
As can be seen, it becomes very easy to wrap around our head as to what exactly is cutomer doing.
Now lets try and get the same summary using dplyr and toString
Step 2: Summarizing records at Customer Level
df2<-df%>%
group_by(Customer_Num)%>%
summarise(Journey = toString(Product_Name))%>%
ungroup()
`summarise()` ungrouping output (override with `.groups` argument)
df2
# A tibble: 1 x 2
Customer_Num Journey
<chr> <chr>
1 Customer 1 A, B, AB, C, X
Lets remove the , with the | operator
final.df<-df2%>%
mutate(Journey2=gsub("[,]"," | ",Journey))%>%
select(-Journey)%>%
rename(Journey =Journey2)
final.df
# A tibble: 1 x 2
Customer_Num Journey
<chr> <chr>
1 Customer 1 A | B | AB | C | X
Thus we saw how easy it is to add value to our analysis using simple changes to how we want to summarize the data.In the next blog, we will look at how to decompress records into individual rows