Running counter within groupby in R
Parag Verma
7th March, 2022
Introduction
There are certain situation where we are required to create a running counter either in ascending order or descending order to filter out certain records in a group by condition.For example:
- Identify the top N salaries of each department in a company
- Identify first and the last value from each group
In this blog we will look at how dplyr can be used to generate such summaries
package.name<-c("dplyr","stringr")
for(i in package.name){
if(!require(i,character.only = T)){
install.packages(i)
}
library(i,character.only = T)
}
Step 1: Creating a data frame
We are creating a dummy data frame with details of daily Sales for 100 products
df<-data.frame(Product_Line=sample(c("Prod1","Prod2","Prod3"),10,T),
Sales=rnorm(10,15,6)
)
head(df)
Product_Line Sales
1 Prod2 27.86048
2 Prod2 11.10242
3 Prod1 24.76522
4 Prod3 18.75677
5 Prod3 13.61866
6 Prod1 19.36591
Step 2: Creating a running counter based on Product groupby and Sales
df2<-df%>%
arrange(Product_Line,Sales)%>%
group_by(Product_Line)%>%
mutate(Index1=1:n(),
Index2=n():1)
df2
# A tibble: 10 x 4
# Groups: Product_Line [3]
Product_Line Sales Index1 Index2
<chr> <dbl> <int> <int>
1 Prod1 19.4 1 2
2 Prod1 24.8 2 1
3 Prod2 11.1 1 5
4 Prod2 16.6 2 4
5 Prod2 17.7 3 3
6 Prod2 19.1 4 2
7 Prod2 27.9 5 1
8 Prod3 13.6 1 3
9 Prod3 15.0 2 2
10 Prod3 18.8 3 1
Index 1 is counter based on Product and increasing sales whereas Index2 is a counter based on Product and decreasing sales.
Step 3: Highest Sales for each Product
sales_highest<-df2%>%
filter(Index2==1)
sales_highest
# A tibble: 3 x 4
# Groups: Product_Line [3]
Product_Line Sales Index1 Index2
<chr> <dbl> <int> <int>
1 Prod1 24.8 2 1
2 Prod2 27.9 5 1
3 Prod3 18.8 3 1
Step 4: Lowest Sales for each Product
sales_lowest<-df2%>%
filter(Index1==1)
sales_lowest
# A tibble: 3 x 4
# Groups: Product_Line [3]
Product_Line Sales Index1 Index2
<chr> <dbl> <int> <int>
1 Prod1 19.4 1 2
2 Prod2 11.1 1 5
3 Prod3 13.6 1 3