Sunday, December 18, 2022

Price Volume Analysis in R

Price Volume Analysis in R


Introduction

In almost all the business scenarios, we are often asked about the extent to which we can alter the input such as price, marketing, sales rep visit etc to control the output(Sales, Prescription,etc).

For instance, consider the price reduction as a tactic. We all know that reducing the price increase the product purchase volume. But can be say that the price can be decreased indefinitely to get the same increase in volume ?…Answer would be a NO . In this blog we will try to establish a relationship between Price and Volume and how Price acts as a lever on Volume.This will help us answer the ‘What if’ question.

To get a hang of things, we can look at the Demand curve which depicts the relationship between price of a certain commodity and quantity that is demanded at that price.



Demand curve are often used with the supply curve to find the equilibrium price. This is the seller’s price at which the buyer is willing to purchase the goods.So both parties are willing and agree on the price point and hence the equilibrium.

The above curve holds for price quantity relationship for an individual consumer(individual demand curve) or for all consumers in a particular market(market demand curve).

One important thing to consider here is that the relationship between price and quantity is seldom linear and doesnt follow straight line as shown in the figure.We will use certain models that can factor in this non-linearity in the relationship.

In this blog, we will create a framework to do the following:

  • Establish a relationship between Price and Volume using regression model
  • Introduce non-linear relationship to get the yield curves
  • Estimate Volume for different values of Price

For our analysis, we will be using the cheese data set from bayesm library.


Step 0: Installing libraries

package.name<-c("dplyr","data.table","stats","bayesm",
                "ggplot2","hrbrthemes","plotly")

for(i in package.name){

  if(!require(i,character.only = T)){

    install.packages(i)
  }
  library(i,character.only = T)

}


Step 1: Creating a Simple Illustration

Importing the cheese data set

data("cheese")
df<-cheese
head(df)
              RETAILER VOLUME       DISP    PRICE
1  LOS ANGELES - LUCKY  21374 0.16200000 2.578460
2 LOS ANGELES - RALPHS   6427 0.12411257 3.727867
3   LOS ANGELES - VONS  17302 0.10200000 2.711421
4   CHICAGO - DOMINICK  13561 0.02759109 2.651206
5      CHICAGO - JEWEL  42774 0.09061273 1.986674
6       CHICAGO - OMNI   4498 0.00000000 2.386616

The attributes are as follows:

  • RETAILER:List of Retailers
  • VOLUME:Number of units sold
  • DISP:A Measure of advertising display activity
  • PRICE:Unit Price in Dollars

For each retailer, the data captures weekly sales of cheese. Here the dependent variable \(y\) is the units sold or VOLUME and PRICE is independent variable \(x\). For sake of explanation, lets look at regression model for only one of the retailers such as ATLANTA - KROGER CO.

If you unable to run the below piece of code, then you can download the cheese dataset form my github repo repository.


Step 2: Relationship between Price and Volume for each retailer

In theory, as price decreases volume increases.Since our dataset is at a retailer level, hence it makes sense to look at the correlation between PRICE and VOLUME at a retailer level first.

Lets get the correlation coefficient of PRICE and VOLUME for individual retailers

cor.df<-df%>%
  group_by(RETAILER)%>%
  summarise(correlation_coefficient=cor(VOLUME,PRICE))%>%
  ungroup()%>%
  arrange(correlation_coefficient)
`summarise()` ungrouping output (override with `.groups` argument)
head(cor.df)
# A tibble: 6 x 2
  RETAILER                         correlation_coefficient
  <fct>                                              <dbl>
1 SACRAMENTO - RALEYS                               -0.937
2 BUFFALO/ROCHESTER - TOPS MARKETS                  -0.930
3 CHICAGO - DOMINICK                                -0.927
4 BALTI/WASH - GIANT FOOD INC                       -0.909
5 CHICAGO - OMNI                                    -0.903
6 COLUMBUS,OH - BIG BEAR                            -0.888

We can see that “SACRAMENTO - RALEYS” has the most negative correlation between PRICE and VOLUME. Hence to understand the concept further, we will take the data for this retailer into consideration

Step 2: Creating log of Price for SACRAMENTO - RALEYS

Lets filter the data for SACRAMENTO - RALEYS and create log of price

retailer<-"SACRAMENTO - RALEYS"


interim.df<-df%>%
  filter(RETAILER==retailer)%>%
  mutate(log_price=ifelse(PRICE==0,0,log(PRICE)))%>%
  select(-RETAILER,-DISP)

head(interim.df)
  VOLUME    PRICE log_price
1   1538 3.719766  1.313661
2   1302 3.777266  1.329000
3   1551 3.823340  1.341124
4   1506 3.839973  1.345465
5   1643 3.853317  1.348934
6   1477 3.852404  1.348697


Reason for Taking Log - In the next section you will observe from the scatterplot that PRICE and VOLUME are not linearly related(we cant draw a striaght line to define their relationship).Moreover,in most practical cases they are not linearly related.

Log transfformation has two utilitites:

  • Model non-linearity in relationship
  • Perform sensitivity/diminishing return analyses


Step 3: Basic Summary Stats for PRICE and VOLUME

Lets look at some of the basic stats for PRICE using box plot

retailer<-"SACRAMENTO - RALEYS"

ggplot(interim.df%>%
         mutate(Retailer=retailer), aes(x=Retailer, y=PRICE)) +
    geom_boxplot(color="red",
        fill="orange",
        alpha=0.2,
        notch=FALSE)+
  stat_summary(fun="mean",color="blue")+
  theme_bw()+
  theme(axis.line = element_line(colour = "black"),
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
    panel.border = element_blank(),
    panel.background = element_blank()) +
   coord_cartesian(ylim = c(2, 4))
Warning: Removed 1 rows containing missing values (geom_segment).


Observations from the Plot:

  • Median value is around 3.7
  • Mean value is around 3.5(blue dot)
  • Most of the PRICE values are between 3.3 and 3.8
  • There are few lower values around 2.5


Basic stats for VOLUME using box plot

retailer<-"SACRAMENTO - RALEYS"

ggplot(interim.df%>%
         mutate(Retailer=retailer), aes(x=Retailer, y=VOLUME)) +
    geom_boxplot(color="black",
        fill="blue",
        alpha=0.2,
        notch=FALSE)+
  stat_summary(fun="mean",color="red")+
  theme_bw()+
  theme(axis.line = element_line(colour = "black"),
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
    panel.border = element_blank(),
    panel.background = element_blank())+
  scale_y_continuous(n.breaks = 10)
Warning: Removed 1 rows containing missing values (geom_segment).


Observations from the Plot:

  • Median value is around 1700 units
  • Mean value is around 2000(red dot)
  • Most of the PRICE values are between 1500 and 2300
  • There are few values greater than 3500

Step 4: Scatterplot of PRICE Vs VOLUME

In order to model the relationship between Price and Volume, we need to first visualize this using a scatterplot

ggplot(interim.df, aes(x=VOLUME, y=PRICE)) + 
  geom_point(color='red')+
  geom_smooth()+
  theme_classic()
`geom_smooth()` using method = 'loess' and formula 'y ~ x'

It can be seen that there is a negative relationship between PRICE and VOLUME.We can draw the following observations

  • As PRICE increase, VOLUME decreases
  • The relationship is non-linear, meaning that it is not possible to fit a straight line in the scatterplot. Instead we can draw a smooth curve to define the relationship


Based on the above chart, we can say that instead of establishing a relationship of PRICE Vs VOLUME, we should focus on log(PRICE) vs VOLUME.


Step 5: Key steps to develop the Model

  • We will create a regression model of the type y ~ log(x)
  • y will be VOLUME
  • x will be log_price
  • Aim should be to get a good \(R^2\) so that the model explains the variance in VOLUME to a larger extent
  • once the model is created,we will get a non-linear relationship between VOLUME and PRICE.This is the key to analyzing the diminishing return of PRICE reduction
  • To get the diminishing return point,We will then decrease the PRICE by 1% in each step and subsequently record the mean value of VOLUME. This will be plotted to get the demand and supply curves


Step 6: Developing the Regression Model

model1 <- lm(VOLUME  ~ log_price, data=interim.df)
summary(model1)


Output of Regression:

  • Equation obtained from the model is
    VOLUME = 9430 - 5907*log_price
  • Model is significant (p value at the bottom is less than 0.05)
  • \(R^2\) of 88.9% is very good
  • Impact of log_price on VOLUME is negative and significant indicated by Pr(>|t|) less than 0.05


As next steps, we will focus on doing the sensitivity analyses.Here we will reduce the value of PRICE(and hence log_price) by 1% all the way up-to 90% and get the corresponding predicted VOLUME based on the equation( VOLUME = 9430 - 5907*log_price) obtained.


Step 7: Sensitivity Analyses: An Illustrative Example

Lets look at how we plan to decrease the value of PRICE in steps of 1% each all the way up-to 90% and record changes in VOLUME.To get a hang of things, lets start by decreasing the PRICE by 1% for the first iteration and get the predicted VOLUME using the model

perc<-1/100

test.df<-interim.df%>%
  select(-log_price)%>%# removing the original log_price
  mutate(decreased_PRICE= (1 - perc)*PRICE)%>% # reducing by 1%
  mutate(log_price=ifelse(decreased_PRICE==0,0,log(decreased_PRICE))) # taking log


head(test.df)
  VOLUME    PRICE decreased_PRICE log_price
1   1538 3.719766        3.682568  1.303610
2   1302 3.777266        3.739493  1.318950
3   1551 3.823340        3.785107  1.331074
4   1506 3.839973        3.801573  1.335415
5   1643 3.853317        3.814784  1.338884
6   1477 3.852404        3.813880  1.338647


Now lets apply the model on this test.df with PRICE reduced by 1% to get predicted VOLUME

test_prediction<-predict(model1,test.df)
test_prediction[1:10]
       1        2        3        4        5        6        7        8 
1729.060 1638.435 1566.809 1541.163 1520.669 1522.069 2400.203 1519.777 
       9       10 
1696.512 1728.311 


Getting the mean of PRICE and predicted VOLUME

test_collate<-data.frame(Mean_PRICE=mean(test.df$decreased_PRICE),
                         Mean_VOLUME=mean(test_prediction))

test_collate
  Mean_PRICE Mean_VOLUME
1   3.471449    2138.214

So we can summaries the above steps:

  • Decrease the PRICE by 1%
  • Get the new log_price
  • Get Predicted VOLUME using the regression model
  • Get the mean of PRICE(decreased) and predicted VOLUME
  • Repeat the above 4 steps till we reduce the PRICE by 90%


Step 8: Sensitivity Analyses: Complete Iteration

Having discussed the sensitivity analyses usinng a small exmaple, lets now create a framework to reduce PRICE from 1% till 90% and do the process repeatedly

l1<-list()

for(i in 1:90){

  # print(i)
  perc<-i/100

  test.df<-interim.df%>%
    select(-log_price)%>%# removing the original log_price
    mutate(decreased_PRICE= (1 - perc)*PRICE)%>% # reducing by 1%
    mutate(log_price=ifelse(decreased_PRICE==0,0,log(decreased_PRICE))) # taking log

  # Prediction
  test_prediction<-predict(model1,test.df)
  
  # Collating the mean of PRICE and VOLUME
  test_collate<-data.frame(Perc=i,
                          Mean_PRICE=mean(test.df$decreased_PRICE),
                         Mean_VOLUME=mean(test_prediction))

l1[[i]]<-test_collate 
  
}

final.df<-do.call(rbind.data.frame,l1)
head(final.df)
  Perc Mean_PRICE Mean_VOLUME
1    1   3.471449    2138.214
2    2   3.436384    2198.193
3    3   3.401319    2258.787
4    4   3.366254    2320.009
5    5   3.331189    2381.872
6    6   3.296124    2444.390

Lets write the file to a csv in the drive

write.csv(final.df,"final_file_SACRAMENTO - RALEYS.csv",row.names = F)


Step 9: Plotting the output results

Lets plot the mean price and mean volume.

p <- final.df %>%
  ggplot( aes(x=Mean_PRICE,y=Mean_VOLUME )) +
    geom_area(fill="#69b3a2", alpha=0.5) +
    geom_line(color="#69b3a2") +
    ggtitle("Predicted VOLUME for different values of PRICE") +
    ylab("Mean_VOLUME") +
    theme_ipsum()
ggplotly(p)


Key Observations

  • As Price decreases, the Volume increases(non-linearly) as shown in the figure
  • One can argue that reducing the price to a very low value will lead to increase in Volume but that would increase the cost of producing one unit of Cheese and hence that would not be feasible.If we had the cost data, we could have obtained an optimal price point which makes the operation of producing cheese profitable
  • The model enables us to vary Price and obtain the corresponding value of Volume and therefore provides a sense of different possibilities of using Price as a lever on Volume

Finding Optimal Price Point

Since we have the relationship between Price and Volume, the next business question would be identify the optimal price at which we can have the maximum profit. Now Profit calculation would require us to input Cost/Quantity sold information into the equation. For the sake of the analyses, lets assume that the Cost/Quantity sold is $1. Now based on this assumption, lets calculate the optimal price point

Note that the final.df that we created in the above analysis has different volume information for various price points. We can use the same data frame and create an additional columns for:

  • Total Cost
  • Total Profit
profit.df<-final.df%>%
  mutate(Total_Cost=1*Mean_VOLUME)%>%
  mutate(Total_Profit= Mean_PRICE * Mean_VOLUME  - Total_Cost)%>%
  filter(Total_Profit > 0)# removing negative values of Profit


head(profit.df)
  Perc Mean_PRICE Mean_VOLUME Total_Cost Total_Profit
1    1   3.471449    2138.214   2138.214     5284.488
2    2   3.436384    2198.193   2198.193     5355.643
3    3   3.401319    2258.787   2258.787     5424.068
4    4   3.366254    2320.009   2320.009     5489.730
5    5   3.331189    2381.872   2381.872     5552.593
6    6   3.296124    2444.390   2444.390     5612.621


Now lets plot Mean_PRICE and Total_Profit to get a sense of the optimal price point

p2 <- profit.df %>%
  ggplot( aes(x=Mean_PRICE,y=Total_Profit )) +
    geom_area(fill="#69b3a2", alpha=0.5) +
    geom_line(color="#69b3a2") +
    ggtitle("Total Profit for different values of PRICE") +
    ylab("Total_Profit") +
    theme_ipsum()
ggplotly(p2)


Getting the price point for maximum profit

profit.df%>%
  mutate(Optimal_Price=Mean_PRICE[which.max(Total_Profit)])%>%
  mutate(Flag=ifelse(Optimal_Price == Mean_PRICE,1,0))%>%
  filter(Flag==1)%>%
  select(-Flag)
  Perc Mean_PRICE Mean_VOLUME Total_Cost Total_Profit Optimal_Price
1   24   2.664951    3700.173   3700.173     6160.607      2.664951

There are few important things that we can infer from the above table:

  • Optimal price point which results in max profit is $2.664/Unit
  • Max profit that can be achieved is $6160.60
  • Total cheese units sold as a result is 3700.17
  • The effective discount that results in maximum impact is 24%


My Channel

Web Scraping Tutorial 4- Getting the busy information data from Popular time page from Google

Popular Times Popular Times In this blog we will try to scrape the ...