When a certain promotion activity is initiated, we are required to often report the first instance of sales post launch of the marketing activity. If Day 1 is available, then we can directly take the sales corresponding to that day.However, when there is no clear cut Day 1 available, we need to identify date that is closest to the program launch date and get the corresponding sales.¶
In [20]:
import pandas as pd
import numpy as np
import datetime
from datetime import timedelta
from datetime import date
Step 1:Creating Dummy data set¶
In [79]:
resp_id=[1,1,1,1,2,2,2,3,3]
launch_date=["2022-04-22","2022-04-22","2022-04-22","2022-04-22","2022-03-10","2022-03-10","2022-03-10","2022-06-20","2022-06-20"]
sales=np.random.rand(9)*100
event_day=["Prelaunch","Day1","Day20","Day50","Day3","Day6","Day10","Day5","Day10"]
event_date=["2022-04-15","2022-04-22","2022-05-12","2022-06-12","2022-03-05","2022-03-16","2022-03-20","2022-06-25","2022-06-30"]
In [80]:
df=pd.DataFrame([resp_id,launch_date,event_day,event_date,sales]).T
df.columns=['Respondent','Launch_Date','Event_Day','Event_Date','Sales']
df
Out[80]:
Respondent | Launch_Date | Event_Day | Event_Date | Sales | |
---|---|---|---|---|---|
0 | 1 | 2022-04-22 | Prelaunch | 2022-04-15 | 46.648147 |
1 | 1 | 2022-04-22 | Day1 | 2022-04-22 | 56.779029 |
2 | 1 | 2022-04-22 | Day20 | 2022-05-12 | 87.195115 |
3 | 1 | 2022-04-22 | Day50 | 2022-06-12 | 0.026134 |
4 | 2 | 2022-03-10 | Day3 | 2022-03-05 | 13.818841 |
5 | 2 | 2022-03-10 | Day6 | 2022-03-16 | 41.424889 |
6 | 2 | 2022-03-10 | Day10 | 2022-03-20 | 79.518794 |
7 | 3 | 2022-06-20 | Day5 | 2022-06-25 | 87.119789 |
8 | 3 | 2022-06-20 | Day10 | 2022-06-30 | 37.970452 |
Step 2: Following is the requirement¶
- For each respondent, if Day1 in Event_Day is available, then get corresponding Sales
- If Day1 is not available, then get the sales for an event_date which is closest to Launch Date(but less than the launch date)
Converting dates into proper format¶
In [81]:
df['Launch_Date']=pd.to_datetime(df['Launch_Date'])
df['Event_Date']=pd.to_datetime(df['Event_Date'])
df.dtypes
Out[81]:
Respondent object Launch_Date datetime64[ns] Event_Day object Event_Date datetime64[ns] Sales object dtype: object
Subtracting Event_Date from Launch_Date¶
In [82]:
df['Diff_Days']= (df['Launch_Date'] - df['Event_Date']).apply(lambda x:x.days)
df
Out[82]:
Respondent | Launch_Date | Event_Day | Event_Date | Sales | Diff_Days | |
---|---|---|---|---|---|---|
0 | 1 | 2022-04-22 | Prelaunch | 2022-04-15 | 46.648147 | 7 |
1 | 1 | 2022-04-22 | Day1 | 2022-04-22 | 56.779029 | 0 |
2 | 1 | 2022-04-22 | Day20 | 2022-05-12 | 87.195115 | -20 |
3 | 1 | 2022-04-22 | Day50 | 2022-06-12 | 0.026134 | -51 |
4 | 2 | 2022-03-10 | Day3 | 2022-03-05 | 13.818841 | 5 |
5 | 2 | 2022-03-10 | Day6 | 2022-03-16 | 41.424889 | -6 |
6 | 2 | 2022-03-10 | Day10 | 2022-03-20 | 79.518794 | -10 |
7 | 3 | 2022-06-20 | Day5 | 2022-06-25 | 87.119789 | -5 |
8 | 3 | 2022-06-20 | Day10 | 2022-06-30 | 37.970452 | -10 |
Getting Event date closest to Lauch date¶
In [85]:
df2=df.sort_values(['Respondent','Diff_Days'])
df2['Flag']=np.where(df2['Diff_Days'] > 0,1,0)
df2['Index']=df2.groupby(['Respondent']).Flag.cumsum()
df2
Out[85]:
Respondent | Launch_Date | Event_Day | Event_Date | Sales | Diff_Days | Flag | Index | |
---|---|---|---|---|---|---|---|---|
3 | 1 | 2022-04-22 | Day50 | 2022-06-12 | 0.026134 | -51 | 0 | 0 |
2 | 1 | 2022-04-22 | Day20 | 2022-05-12 | 87.195115 | -20 | 0 | 0 |
1 | 1 | 2022-04-22 | Day1 | 2022-04-22 | 56.779029 | 0 | 0 | 0 |
0 | 1 | 2022-04-22 | Prelaunch | 2022-04-15 | 46.648147 | 7 | 1 | 1 |
6 | 2 | 2022-03-10 | Day10 | 2022-03-20 | 79.518794 | -10 | 0 | 0 |
5 | 2 | 2022-03-10 | Day6 | 2022-03-16 | 41.424889 | -6 | 0 | 0 |
4 | 2 | 2022-03-10 | Day3 | 2022-03-05 | 13.818841 | 5 | 1 | 1 |
8 | 3 | 2022-06-20 | Day10 | 2022-06-30 | 37.970452 | -10 | 0 | 0 |
7 | 3 | 2022-06-20 | Day5 | 2022-06-25 | 87.119789 | -5 | 0 | 0 |
In [90]:
df2['Flag0']=np.where(df2['Event_Day']=="Day1",1,0)
Day1_Flag=df2.groupby(['Respondent']).apply(lambda x:x.Flag0.sum()).reset_index()
Day1_Flag.columns=['Respondent','Day1_Flag']
df3=df2.merge(Day1_Flag,on="Respondent",how="left")
df3
Out[90]:
Respondent | Launch_Date | Event_Day | Event_Date | Sales | Diff_Days | Flag | Index | Flag0 | Day1_Flag | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 2022-04-22 | Day50 | 2022-06-12 | 0.026134 | -51 | 0 | 0 | 0 | 1 |
1 | 1 | 2022-04-22 | Day20 | 2022-05-12 | 87.195115 | -20 | 0 | 0 | 0 | 1 |
2 | 1 | 2022-04-22 | Day1 | 2022-04-22 | 56.779029 | 0 | 0 | 0 | 1 | 1 |
3 | 1 | 2022-04-22 | Prelaunch | 2022-04-15 | 46.648147 | 7 | 1 | 1 | 0 | 1 |
4 | 2 | 2022-03-10 | Day10 | 2022-03-20 | 79.518794 | -10 | 0 | 0 | 0 | 0 |
5 | 2 | 2022-03-10 | Day6 | 2022-03-16 | 41.424889 | -6 | 0 | 0 | 0 | 0 |
6 | 2 | 2022-03-10 | Day3 | 2022-03-05 | 13.818841 | 5 | 1 | 1 | 0 | 0 |
7 | 3 | 2022-06-20 | Day10 | 2022-06-30 | 37.970452 | -10 | 0 | 0 | 0 | 0 |
8 | 3 | 2022-06-20 | Day5 | 2022-06-25 | 87.119789 | -5 | 0 | 0 | 0 | 0 |
Index = 1 is the row where Event date is closest to the Launch day and also it is less than the launch date¶
In [91]:
df3['Event_Day2']=np.where((df3['Event_Day']=="Day1") & (df3['Day1_Flag']==1),"Day1",
np.where((df3['Index']==1) & (df3['Day1_Flag']==0),"Day1","None"))
df3
Out[91]:
Respondent | Launch_Date | Event_Day | Event_Date | Sales | Diff_Days | Flag | Index | Flag0 | Day1_Flag | Event_Day2 | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 2022-04-22 | Day50 | 2022-06-12 | 0.026134 | -51 | 0 | 0 | 0 | 1 | None |
1 | 1 | 2022-04-22 | Day20 | 2022-05-12 | 87.195115 | -20 | 0 | 0 | 0 | 1 | None |
2 | 1 | 2022-04-22 | Day1 | 2022-04-22 | 56.779029 | 0 | 0 | 0 | 1 | 1 | Day1 |
3 | 1 | 2022-04-22 | Prelaunch | 2022-04-15 | 46.648147 | 7 | 1 | 1 | 0 | 1 | None |
4 | 2 | 2022-03-10 | Day10 | 2022-03-20 | 79.518794 | -10 | 0 | 0 | 0 | 0 | None |
5 | 2 | 2022-03-10 | Day6 | 2022-03-16 | 41.424889 | -6 | 0 | 0 | 0 | 0 | None |
6 | 2 | 2022-03-10 | Day3 | 2022-03-05 | 13.818841 | 5 | 1 | 1 | 0 | 0 | Day1 |
7 | 3 | 2022-06-20 | Day10 | 2022-06-30 | 37.970452 | -10 | 0 | 0 | 0 | 0 | None |
8 | 3 | 2022-06-20 | Day5 | 2022-06-25 | 87.119789 | -5 | 0 | 0 | 0 | 0 | None |
Dropping additional columns¶
In [92]:
df3.drop(columns=['Event_Day','Diff_Days','Flag','Index','Flag0','Day1_Flag'],inplace=True)
df3
Out[92]:
Respondent | Launch_Date | Event_Date | Sales | Event_Day2 | |
---|---|---|---|---|---|
0 | 1 | 2022-04-22 | 2022-06-12 | 0.026134 | None |
1 | 1 | 2022-04-22 | 2022-05-12 | 87.195115 | None |
2 | 1 | 2022-04-22 | 2022-04-22 | 56.779029 | Day1 |
3 | 1 | 2022-04-22 | 2022-04-15 | 46.648147 | None |
4 | 2 | 2022-03-10 | 2022-03-20 | 79.518794 | None |
5 | 2 | 2022-03-10 | 2022-03-16 | 41.424889 | None |
6 | 2 | 2022-03-10 | 2022-03-05 | 13.818841 | Day1 |
7 | 3 | 2022-06-20 | 2022-06-30 | 37.970452 | None |
8 | 3 | 2022-06-20 | 2022-06-25 | 87.119789 | None |
Renaming Event_Day2 as Event_Day¶
In [93]:
df3.rename(columns={'Event_Day2':'Event_Day'},inplace=True)
df3
Out[93]:
Respondent | Launch_Date | Event_Date | Sales | Event_Day | |
---|---|---|---|---|---|
0 | 1 | 2022-04-22 | 2022-06-12 | 0.026134 | None |
1 | 1 | 2022-04-22 | 2022-05-12 | 87.195115 | None |
2 | 1 | 2022-04-22 | 2022-04-22 | 56.779029 | Day1 |
3 | 1 | 2022-04-22 | 2022-04-15 | 46.648147 | None |
4 | 2 | 2022-03-10 | 2022-03-20 | 79.518794 | None |
5 | 2 | 2022-03-10 | 2022-03-16 | 41.424889 | None |
6 | 2 | 2022-03-10 | 2022-03-05 | 13.818841 | Day1 |
7 | 3 | 2022-06-20 | 2022-06-30 | 37.970452 | None |
8 | 3 | 2022-06-20 | 2022-06-25 | 87.119789 | None |