Friday, April 1, 2022

Combine multiple data frames from a List into a single data frame

Combine multiple data frames from a List


Introduction

There are certain situations where we have to store results of an intermediate steps in a list as data frames and then combine all these data frames into a single consolidated data frame.Typical scenario can be when doing frequency profiling or when analyzing product performance at the end of each week.Lets look at a simple example of how we can use do.call and rbind.data.frame to execute this


package.name<-c("dplyr","stringr")

for(i in package.name){

  if(!require(i,character.only = T)){

    install.packages(i)
  }
  library(i,character.only = T)

}


Step 1: Creating the data frame

For our blog, we will be using the starwars dataset readily available in dplyr package

df<-dplyr::starwars
head(df)
# A tibble: 6 x 14
  name     height  mass hair_color  skin_color eye_color birth_year sex   gender
  <chr>     <int> <dbl> <chr>       <chr>      <chr>          <dbl> <chr> <chr> 
1 Luke Sk~    172    77 blond       fair       blue            19   male  mascu~
2 C-3PO       167    75 <NA>        gold       yellow         112   none  mascu~
3 R2-D2        96    32 <NA>        white, bl~ red             33   none  mascu~
4 Darth V~    202   136 none        white      yellow          41.9 male  mascu~
5 Leia Or~    150    49 brown       light      brown           19   fema~ femin~
6 Owen La~    178   120 brown, grey light      blue            52   male  mascu~
# ... with 5 more variables: homeworld <chr>, species <chr>, films <list>,
#   vehicles <list>, starships <list>


Step 2: Frequency Profile of eye_color and gender

nm<- c("gender","eye_color")
base.variable<-"Level"
l1<-list()


Iterating through variables in nm and storing the results in l1

l1<-lapply(nm, function(x){
  
  interim.df<-df%>%
    select(x)%>%
    group_by(!!!syms(x))%>%
    summarise(Total_Count=n())%>%
    mutate(Feature=x)%>%
    rename(!!base.variable := !!paste0(x))%>%
    select(Feature,Level,Total_Count)
  
  interim.df
  
})
Note: Using an external vector in selections is ambiguous.
i Use `all_of(x)` instead of `x` to silence this message.
i See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
This message is displayed once per session.
names(l1)<-nm

If you are perplexed by the !!! operator shown in the above piece of code and want to know more about it, then visit this Link


Step 3: Combining individual data frames in l1 into a single data frame

final.df<-do.call(rbind.data.frame,l1)
final.df
# A tibble: 18 x 3
   Feature   Level         Total_Count
 * <chr>     <chr>               <int>
 1 gender    feminine               17
 2 gender    masculine              66
 3 gender    <NA>                    4
 4 eye_color black                  10
 5 eye_color blue                   19
 6 eye_color blue-gray               1
 7 eye_color brown                  21
 8 eye_color dark                    1
 9 eye_color gold                    1
10 eye_color green, yellow           1
11 eye_color hazel                   3
12 eye_color orange                  8
13 eye_color pink                    1
14 eye_color red                     5
15 eye_color red, blue               1
16 eye_color unknown                 3
17 eye_color white                   1
18 eye_color yellow                 11


Parting Comments

In this blog we looked at a very simple example of how We can use do.call and rbind.data.frame to combine multiple data frames in a list into a single consolidated data frame

My Youtube Channel

Embed Shiny

Please wait...