Machine Learning Made Easy: Identify customer visit information

Customer Visit Information

Introduction

Many a times we are given messy data and asked to derive customer visit information from it.For instance, we might be provided with a data dump with no date field.In such cases, identification of visit information becomes a challenge.In this blog I have tried to demonstrate how we can be a little smarter in deriving such information.

Step 1: Importing the libraries

package.name<-c("tidyverse","data.table")

for(i in package.name){

  if(!require(i,character.only = T)){

    install.packages(i)
  }
  library(i,character.only = T)

}

Loading required package: tidyverse

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.0     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Loading required package: data.table


Attaching package: 'data.table'


The following objects are masked from 'package:lubridate':

    hour, isoweek, mday, minute, month, quarter, second, wday, week,
    yday, year


The following objects are masked from 'package:dplyr':

    between, first, last


The following object is masked from 'package:purrr':

    transpose

Step 2:Creating the illustrative data

dummy.data<-data.frame(Name=c("Roger","Roger","Roger",
                              "Nadal","Nadal",
                              "Roger","Roger"),
                       Item=c("Naan","Lassi","Gulabjamun",
                              "Roti","Daal",
                              "Lemon Juice","Ice Cream"))

dummy.data

   Name        Item
1 Roger        Naan
2 Roger       Lassi
3 Roger  Gulabjamun
4 Nadal        Roti
5 Nadal        Daal
6 Roger Lemon Juice
7 Roger   Ice Cream

Lets say that the above data represents items orderered by different people of various visits.By looking at the data, we can observe the following things:

In one of Roger’s visit, he order three items
Nadal order two items in his visit
In his second visit, Roger order 2 items

Now we know that First visit was done by Roger on lets say day1. Nadal might have also visited the same day or other day.And then finally, Roger visited again on a separate day.So Roger visited twice while Nadal only visited once.While it is easy to deduce this for few observations, but what if the data runs into several thousands of records.In such case we would need a logic to identify visit info and eventually create visit profile.

Step 3: Creating the logic

Difference of Row Number at customer level

row.diff<-dummy.data%>%
  mutate(Row_Num=1:n())%>%
  arrange(Row_Num)%>%
  group_by(Name)%>%
  mutate(Lag_Row_Num=lag(Row_Num))%>%
  ungroup()%>%
  mutate(Diff = Row_Num - Lag_Row_Num)

row.diff

# A tibble: 7 × 5
  Name  Item        Row_Num Lag_Row_Num  Diff
  <chr> <chr>         <int>       <int> <int>
1 Roger Naan              1          NA    NA
2 Roger Lassi             2           1     1
3 Roger Gulabjamun        3           2     1
4 Nadal Roti              4          NA    NA
5 Nadal Daal              5           4     1
6 Roger Lemon Juice       6           3     3
7 Roger Ice Cream         7           6     1

If Diff is NA or greater than 1, then it represents new visit

new.visit<-row.diff%>%
  mutate(Cntr = ifelse(Diff == 1,0,"New Visit"))%>%
  mutate(Cntr2=ifelse(is.na(Cntr),"New Visit",Cntr))%>%
  mutate(Cntr3=ifelse(Cntr2==0,NA,Cntr2))%>%
  dplyr::select(-Cntr,-Lag_Row_Num)


new.visit

# A tibble: 7 × 6
  Name  Item        Row_Num  Diff Cntr2     Cntr3    
  <chr> <chr>         <int> <int> <chr>     <chr>    
1 Roger Naan              1    NA New Visit New Visit
2 Roger Lassi             2     1 0         <NA>     
3 Roger Gulabjamun        3     1 0         <NA>     
4 Nadal Roti              4    NA New Visit New Visit
5 Nadal Daal              5     1 0         <NA>     
6 Roger Lemon Juice       6     3 New Visit New Visit
7 Roger Ice Cream         7     1 0         <NA>

Creating Index based on Name and Cntr3

index.df<-new.visit%>%
  group_by(Name,Cntr3)%>%
  mutate(Index=1:n())%>%
  mutate(Visit_Info=ifelse(Cntr3==0,0,Index))%>%
  dplyr::select(-Index,-Cntr2,-Cntr3)%>%
  ungroup()

Adding missing grouping variables: `Cntr3`

index.df

# A tibble: 7 × 6
  Cntr3     Name  Item        Row_Num  Diff Visit_Info
  <chr>     <chr> <chr>         <int> <int>      <int>
1 New Visit Roger Naan              1    NA          1
2 <NA>      Roger Lassi             2     1         NA
3 <NA>      Roger Gulabjamun        3     1         NA
4 New Visit Nadal Roti              4    NA          1
5 <NA>      Nadal Daal              5     1         NA
6 New Visit Roger Lemon Juice       6     3          2
7 <NA>      Roger Ice Cream         7     1         NA

Filling up visit info(similar to ctrl + D in excel)

fill.df<-index.df%>%
  group_by(Name)%>%
  fill(Visit_Info)%>%
  mutate(Visit_Details=str_c("Visit # ",Visit_Info))%>%
  ungroup()%>%
  arrange(Row_Num)

fill.df

# A tibble: 7 × 7
  Cntr3     Name  Item        Row_Num  Diff Visit_Info Visit_Details
  <chr>     <chr> <chr>         <int> <int>      <int> <chr>        
1 New Visit Roger Naan              1    NA          1 Visit # 1    
2 <NA>      Roger Lassi             2     1          1 Visit # 1    
3 <NA>      Roger Gulabjamun        3     1          1 Visit # 1    
4 New Visit Nadal Roti              4    NA          1 Visit # 1    
5 <NA>      Nadal Daal              5     1          1 Visit # 1    
6 New Visit Roger Lemon Juice       6     3          2 Visit # 2    
7 <NA>      Roger Ice Cream         7     1          2 Visit # 2

Cleaning the data a little

visit.clean<-fill.df%>%
  select(-Cntr3,-Diff)

visit.clean

# A tibble: 7 × 5
  Name  Item        Row_Num Visit_Info Visit_Details
  <chr> <chr>         <int>      <int> <chr>        
1 Roger Naan              1          1 Visit # 1    
2 Roger Lassi             2          1 Visit # 1    
3 Roger Gulabjamun        3          1 Visit # 1    
4 Nadal Roti              4          1 Visit # 1    
5 Nadal Daal              5          1 Visit # 1    
6 Roger Lemon Juice       6          2 Visit # 2    
7 Roger Ice Cream         7          2 Visit # 2

Machine Learning Made Easy

Monday, March 10, 2025

Identify customer visit information

Customer Visit Information

2025-03-11

Introduction

Step 1: Importing the libraries

Step 2:Creating the illustrative data

Step 3: Creating the logic

Difference of Row Number at customer level

No comments:

Post a Comment

Price Elasticity Model in Python