Summarise Multiple Columns
Parag Verma
20th Nov, 2022
Introduction
Lets say we have an excel file with multiple columns for various weeks. In various scenarios, we are required to summarize these columns and report mean/sum/mode values.In this blog we will look at how to create such summaries using combination of across and everything functions
package.name<-c("dplyr")
for(i in package.name){
if(!require(i,character.only = T)){
install.packages(i)
}
library(i,character.only = T)
}
Step 1: Creating a data frame
Lets create a dummy data frame with details of Product Sales across various weeks .
df<-data.frame(Item_Num = c(1:5),
Product = c("A","A","A","B","B"),
Week1_Sales = c(1:5),
Week2_Sales = c(11:15)
)
head(df)
Item_Num Product Week1_Sales Week2_Sales
1 1 A 1 11
2 2 A 2 12
3 3 A 3 13
4 4 B 4 14
5 5 B 5 15
We need to created summary at a Product Level.So for each Product, we need mean sales for each of the three weeks.
Now lets try and get the summary using dplyr and a combination of across and everything functions.
Step 2: Summarizing information at Product Level
Lets try and get the total sales for Product A and B for the three weeks.
df2<-df%>%
select(-Item_Num)%>%
group_by(Product)%>%
summarise(across(everything(),list(sum)))%>%
ungroup()
`summarise()` ungrouping output (override with `.groups` argument)
df2
# A tibble: 2 x 3
Product Week1_Sales_1 Week2_Sales_1
<chr> <int> <int>
1 A 6 36
2 B 9 29
Now lets try and also get the mean of the sales
df2<-df%>%
select(-Item_Num)%>%
group_by(Product)%>%
summarise(across(everything(),list(sum,mean)))%>%
ungroup()
`summarise()` ungrouping output (override with `.groups` argument)
df2
# A tibble: 2 x 5
Product Week1_Sales_1 Week1_Sales_2 Week2_Sales_1 Week2_Sales_2
<chr> <int> <dbl> <int> <dbl>
1 A 6 2 36 12
2 B 9 4.5 29 14.5
The output summary has _1(represent sum which is the first argument in the list function) and _2(represent mean which is the second argument in the list function) extensions. The column names requires some clean up.
Cleaning up ‘1’ suffixes
nm<-colnames(df2)
nm2<-gsub("_1","_Sum",nm)
nm2
[1] "Product" "Week1_Sales_Sum" "Week1_Sales_2" "Week2_Sales_Sum"
[5] "Week2_Sales_2"
Cleaning up ‘2’ suffixes
nm3<-gsub("_2","_Mean",nm2)
nm3
[1] "Product" "Week1_Sales_Sum" "Week1_Sales_Mean" "Week2_Sales_Sum"
[5] "Week2_Sales_Mean"
Assigning nm3 to the colnames for df2
colnames(df2)<-nm3
head(df2)
# A tibble: 2 x 5
Product Week1_Sales_Sum Week1_Sales_Mean Week2_Sales_Sum Week2_Sales_Mean
<chr> <int> <dbl> <int> <dbl>
1 A 6 2 36 12
2 B 9 4.5 29 14.5