Price Volume Analysis in R
Parag Verma
18th Dec, 2022
Introduction
In almost all the business scenarios, we are often asked about the extent to which we can alter the input such as price, marketing, sales rep visit etc to control the output(Sales, Prescription,etc).
For instance, consider the price reduction as a tactic. We all know that reducing the price increase the product purchase volume. But can be say that the price can be decreased indefinitely to get the same increase in volume ?…Answer would be a NO . In this blog we will try to establish a relationship between Price and Volume and how Price acts as a lever on Volume.This will help us answer the ‘What if’ question.
To get a hang of things, we can look at the Demand curve which depicts the relationship between price of a certain commodity and quantity that is demanded at that price.
Demand curve are often used with the supply curve to find the equilibrium price. This is the seller’s price at which the buyer is willing to purchase the goods.So both parties are willing and agree on the price point and hence the equilibrium.
The above curve holds for price quantity relationship for an individual consumer(individual demand curve) or for all consumers in a particular market(market demand curve).
One important thing to consider here is that the relationship between price and quantity is seldom linear and doesnt follow straight line as shown in the figure.We will use certain models that can factor in this non-linearity in the relationship.
In this blog, we will create a framework to do the following:
- Establish a relationship between Price and Volume using regression model
- Introduce non-linear relationship to get the yield curves
- Estimate Volume for different values of Price
For our analysis, we will be using the cheese data set from bayesm library.
Step 0: Installing libraries
package.name<-c("dplyr","data.table","stats","bayesm",
"ggplot2","hrbrthemes","plotly")
for(i in package.name){
if(!require(i,character.only = T)){
install.packages(i)
}
library(i,character.only = T)
}
Step 1: Creating a Simple Illustration
Importing the cheese data set
data("cheese")
df<-cheese
head(df)
RETAILER VOLUME DISP PRICE
1 LOS ANGELES - LUCKY 21374 0.16200000 2.578460
2 LOS ANGELES - RALPHS 6427 0.12411257 3.727867
3 LOS ANGELES - VONS 17302 0.10200000 2.711421
4 CHICAGO - DOMINICK 13561 0.02759109 2.651206
5 CHICAGO - JEWEL 42774 0.09061273 1.986674
6 CHICAGO - OMNI 4498 0.00000000 2.386616
The attributes are as follows:
- RETAILER:List of Retailers
- VOLUME:Number of units sold
- DISP:A Measure of advertising display activity
- PRICE:Unit Price in Dollars
For each retailer, the data captures weekly sales of cheese. Here the dependent variable \(y\) is the units sold or VOLUME and PRICE is independent variable \(x\). For sake of explanation, lets look at regression model for only one of the retailers such as ATLANTA - KROGER CO.
If you unable to run the below piece of code, then you can download the cheese dataset form my github repo repository.
Step 2: Relationship between Price and Volume for each retailer
In theory, as price decreases volume increases.Since our dataset is at a retailer level, hence it makes sense to look at the correlation between PRICE and VOLUME at a retailer level first.
Lets get the correlation coefficient of PRICE and VOLUME for individual retailers
cor.df<-df%>%
group_by(RETAILER)%>%
summarise(correlation_coefficient=cor(VOLUME,PRICE))%>%
ungroup()%>%
arrange(correlation_coefficient)
`summarise()` ungrouping output (override with `.groups` argument)
head(cor.df)
# A tibble: 6 x 2
RETAILER correlation_coefficient
<fct> <dbl>
1 SACRAMENTO - RALEYS -0.937
2 BUFFALO/ROCHESTER - TOPS MARKETS -0.930
3 CHICAGO - DOMINICK -0.927
4 BALTI/WASH - GIANT FOOD INC -0.909
5 CHICAGO - OMNI -0.903
6 COLUMBUS,OH - BIG BEAR -0.888
We can see that “SACRAMENTO - RALEYS” has the most negative correlation between PRICE and VOLUME. Hence to understand the concept further, we will take the data for this retailer into consideration
Step 2: Creating log of Price for SACRAMENTO - RALEYS
Lets filter the data for SACRAMENTO - RALEYS and create log of price
retailer<-"SACRAMENTO - RALEYS"
interim.df<-df%>%
filter(RETAILER==retailer)%>%
mutate(log_price=ifelse(PRICE==0,0,log(PRICE)))%>%
select(-RETAILER,-DISP)
head(interim.df)
VOLUME PRICE log_price
1 1538 3.719766 1.313661
2 1302 3.777266 1.329000
3 1551 3.823340 1.341124
4 1506 3.839973 1.345465
5 1643 3.853317 1.348934
6 1477 3.852404 1.348697
Reason for Taking Log - In the next section you
will observe from the scatterplot that PRICE and VOLUME are not linearly
related(we cant draw a striaght line to define their
relationship).Moreover,in most practical cases they are not linearly
related.
Log transfformation has two utilitites:
- Model non-linearity in relationship
- Perform sensitivity/diminishing return analyses
Step 3: Basic Summary Stats for PRICE and VOLUME
Lets look at some of the basic stats for PRICE using box plot
retailer<-"SACRAMENTO - RALEYS"
ggplot(interim.df%>%
mutate(Retailer=retailer), aes(x=Retailer, y=PRICE)) +
geom_boxplot(color="red",
fill="orange",
alpha=0.2,
notch=FALSE)+
stat_summary(fun="mean",color="blue")+
theme_bw()+
theme(axis.line = element_line(colour = "black"),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_blank(),
panel.background = element_blank()) +
coord_cartesian(ylim = c(2, 4))
Warning: Removed 1 rows containing missing values (geom_segment).
Observations from the Plot:
- Median value is around 3.7
- Mean value is around 3.5(blue dot)
- Most of the PRICE values are between 3.3 and 3.8
- There are few lower values around 2.5
Basic stats for VOLUME using box plot
retailer<-"SACRAMENTO - RALEYS"
ggplot(interim.df%>%
mutate(Retailer=retailer), aes(x=Retailer, y=VOLUME)) +
geom_boxplot(color="black",
fill="blue",
alpha=0.2,
notch=FALSE)+
stat_summary(fun="mean",color="red")+
theme_bw()+
theme(axis.line = element_line(colour = "black"),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_blank(),
panel.background = element_blank())+
scale_y_continuous(n.breaks = 10)
Warning: Removed 1 rows containing missing values (geom_segment).
Observations from the Plot:
- Median value is around 1700 units
- Mean value is around 2000(red dot)
- Most of the PRICE values are between 1500 and 2300
- There are few values greater than 3500
Step 4: Scatterplot of PRICE Vs VOLUME
In order to model the relationship between Price and Volume, we need to first visualize this using a scatterplot
ggplot(interim.df, aes(x=VOLUME, y=PRICE)) +
geom_point(color='red')+
geom_smooth()+
theme_classic()
`geom_smooth()` using method = 'loess' and formula 'y ~ x'
It can be seen that there is a negative relationship between PRICE and VOLUME.We can draw the following observations
- As PRICE increase, VOLUME decreases
- The relationship is non-linear, meaning that it is not possible to fit a straight line in the scatterplot. Instead we can draw a smooth curve to define the relationship
Based on the above chart, we can say that instead of establishing a relationship of PRICE Vs VOLUME, we should focus on log(PRICE) vs VOLUME.
Step 5: Key steps to develop the Model
- We will create a regression model of the type y ~ log(x)
- y will be VOLUME
- x will be log_price
- Aim should be to get a good \(R^2\) so that the model explains the variance in VOLUME to a larger extent
- once the model is created,we will get a non-linear relationship between VOLUME and PRICE.This is the key to analyzing the diminishing return of PRICE reduction
- To get the diminishing return point,We will then decrease the PRICE by 1% in each step and subsequently record the mean value of VOLUME. This will be plotted to get the demand and supply curves
Step 6: Developing the Regression Model
model1 <- lm(VOLUME ~ log_price, data=interim.df)
summary(model1)
Output of Regression:
- Equation obtained from the model is
VOLUME = 9430 - 5907*log_price - Model is significant (p value at the bottom is less than 0.05)
- \(R^2\) of 88.9% is very good
- Impact of log_price on VOLUME is negative and significant indicated by Pr(>|t|) less than 0.05
As next steps, we will focus on doing the sensitivity analyses.Here we will reduce the value of PRICE(and hence log_price) by 1% all the way up-to 90% and get the corresponding predicted VOLUME based on the equation( VOLUME = 9430 - 5907*log_price) obtained.
Step 7: Sensitivity Analyses: An Illustrative Example
Lets look at how we plan to decrease the value of PRICE in steps of 1% each all the way up-to 90% and record changes in VOLUME.To get a hang of things, lets start by decreasing the PRICE by 1% for the first iteration and get the predicted VOLUME using the model
perc<-1/100
test.df<-interim.df%>%
select(-log_price)%>%# removing the original log_price
mutate(decreased_PRICE= (1 - perc)*PRICE)%>% # reducing by 1%
mutate(log_price=ifelse(decreased_PRICE==0,0,log(decreased_PRICE))) # taking log
head(test.df)
VOLUME PRICE decreased_PRICE log_price
1 1538 3.719766 3.682568 1.303610
2 1302 3.777266 3.739493 1.318950
3 1551 3.823340 3.785107 1.331074
4 1506 3.839973 3.801573 1.335415
5 1643 3.853317 3.814784 1.338884
6 1477 3.852404 3.813880 1.338647
Now lets apply the model on this test.df with PRICE reduced by 1% to get predicted VOLUME
test_prediction<-predict(model1,test.df)
test_prediction[1:10]
1 2 3 4 5 6 7 8
1729.060 1638.435 1566.809 1541.163 1520.669 1522.069 2400.203 1519.777
9 10
1696.512 1728.311
Getting the mean of PRICE and predicted VOLUME
test_collate<-data.frame(Mean_PRICE=mean(test.df$decreased_PRICE),
Mean_VOLUME=mean(test_prediction))
test_collate
Mean_PRICE Mean_VOLUME
1 3.471449 2138.214
So we can summaries the above steps:
- Decrease the PRICE by 1%
- Get the new log_price
- Get Predicted VOLUME using the regression model
- Get the mean of PRICE(decreased) and predicted VOLUME
- Repeat the above 4 steps till we reduce the PRICE by 90%
Step 8: Sensitivity Analyses: Complete Iteration
Having discussed the sensitivity analyses usinng a small exmaple, lets now create a framework to reduce PRICE from 1% till 90% and do the process repeatedly
l1<-list()
for(i in 1:90){
# print(i)
perc<-i/100
test.df<-interim.df%>%
select(-log_price)%>%# removing the original log_price
mutate(decreased_PRICE= (1 - perc)*PRICE)%>% # reducing by 1%
mutate(log_price=ifelse(decreased_PRICE==0,0,log(decreased_PRICE))) # taking log
# Prediction
test_prediction<-predict(model1,test.df)
# Collating the mean of PRICE and VOLUME
test_collate<-data.frame(Perc=i,
Mean_PRICE=mean(test.df$decreased_PRICE),
Mean_VOLUME=mean(test_prediction))
l1[[i]]<-test_collate
}
final.df<-do.call(rbind.data.frame,l1)
head(final.df)
Perc Mean_PRICE Mean_VOLUME
1 1 3.471449 2138.214
2 2 3.436384 2198.193
3 3 3.401319 2258.787
4 4 3.366254 2320.009
5 5 3.331189 2381.872
6 6 3.296124 2444.390
Lets write the file to a csv in the drive
write.csv(final.df,"final_file_SACRAMENTO - RALEYS.csv",row.names = F)
Step 9: Plotting the output results
Lets plot the mean price and mean volume.
p <- final.df %>%
ggplot( aes(x=Mean_PRICE,y=Mean_VOLUME )) +
geom_area(fill="#69b3a2", alpha=0.5) +
geom_line(color="#69b3a2") +
ggtitle("Predicted VOLUME for different values of PRICE") +
ylab("Mean_VOLUME") +
theme_ipsum()
ggplotly(p)
Key Observations
- As Price decreases, the Volume increases(non-linearly) as shown in the figure
- One can argue that reducing the price to a very low value will lead to increase in Volume but that would increase the cost of producing one unit of Cheese and hence that would not be feasible.If we had the cost data, we could have obtained an optimal price point which makes the operation of producing cheese profitable
- The model enables us to vary Price and obtain the corresponding value of Volume and therefore provides a sense of different possibilities of using Price as a lever on Volume
Finding Optimal Price Point
Since we have the relationship between Price and Volume, the next business question would be identify the optimal price at which we can have the maximum profit. Now Profit calculation would require us to input Cost/Quantity sold information into the equation. For the sake of the analyses, lets assume that the Cost/Quantity sold is $1. Now based on this assumption, lets calculate the optimal price point
Note that the final.df that we created in the above analysis has different volume information for various price points. We can use the same data frame and create an additional columns for:
- Total Cost
- Total Profit
profit.df<-final.df%>%
mutate(Total_Cost=1*Mean_VOLUME)%>%
mutate(Total_Profit= Mean_PRICE * Mean_VOLUME - Total_Cost)%>%
filter(Total_Profit > 0)# removing negative values of Profit
head(profit.df)
Perc Mean_PRICE Mean_VOLUME Total_Cost Total_Profit
1 1 3.471449 2138.214 2138.214 5284.488
2 2 3.436384 2198.193 2198.193 5355.643
3 3 3.401319 2258.787 2258.787 5424.068
4 4 3.366254 2320.009 2320.009 5489.730
5 5 3.331189 2381.872 2381.872 5552.593
6 6 3.296124 2444.390 2444.390 5612.621
Now lets plot Mean_PRICE and Total_Profit to get a sense of the optimal price point
p2 <- profit.df %>%
ggplot( aes(x=Mean_PRICE,y=Total_Profit )) +
geom_area(fill="#69b3a2", alpha=0.5) +
geom_line(color="#69b3a2") +
ggtitle("Total Profit for different values of PRICE") +
ylab("Total_Profit") +
theme_ipsum()
ggplotly(p2)
Getting the price point for maximum profit
profit.df%>%
mutate(Optimal_Price=Mean_PRICE[which.max(Total_Profit)])%>%
mutate(Flag=ifelse(Optimal_Price == Mean_PRICE,1,0))%>%
filter(Flag==1)%>%
select(-Flag)
Perc Mean_PRICE Mean_VOLUME Total_Cost Total_Profit Optimal_Price
1 24 2.664951 3700.173 3700.173 6160.607 2.664951
There are few important things that we can infer from the above table:
- Optimal price point which results in max profit is $2.664/Unit
- Max profit that can be achieved is $6160.60
- Total cheese units sold as a result is 3700.17
- The effective discount that results in maximum impact is 24%