Pivot Multiple Columns in R
Parag Verma
31st May, 2021
Introduction
There are certain cases where we need to create a pivot/widen the data based on multiple features.This cant be done simply and requires a two step approach.We will look at this using Medicaid dataset present in the AER package.
package.name<-c("dplyr","tidyr","stringr","AER")
for(i in package.name){
if(!require(i,character.only = T)){
install.packages(i)
}
library(i,character.only = T)
}
data("Medicaid1986")
df<-Medicaid1986
head(df)
visits exposure children age income health1 health2 access married gender
1 0 100 1 24 14.500 0.495 -0.854 0.50 no female
2 1 90 3 19 6.000 0.520 -0.969 0.17 no female
3 0 106 4 17 8.377 -1.227 0.317 0.42 no female
4 0 114 2 29 6.000 -1.524 0.457 0.33 no female
5 11 115 1 26 8.500 0.173 -0.599 0.67 no female
6 3 102 1 22 6.000 -0.905 0.062 0.25 no female
ethnicity school enroll program
1 cauc 13 yes afdc
2 cauc 11 yes afdc
3 cauc 12 yes afdc
4 cauc 12 yes afdc
5 cauc 16 yes afdc
6 other 12 yes afdc
The following columns are categorical in nature:
- married
- gender
- ethnicity
- enroll
- program
Lets say we want to pivot the dataset based on the above features.In order to do this, we need to use the following two step approach:
- Step 1: unpivot the data on the mentioned categorical data
- Step 2: then pivot the data based on the key:value pair obtained
Step 1: unpivot the data on the mentioned categorical data
interim.df<-df%>%
mutate(Index=1:n())%>%
gather(key = "attributes",value = "values",
c("married","gender","ethnicity","enroll","program"))%>%
mutate(key_value=str_c(attributes ,"_",values))%>%
select(Index,everything())%>%
unique()
Warning: attributes are not identical across measure variables;
they will be dropped
head(interim.df)
Index visits exposure children age income health1 health2 access school
1 1 0 100 1 24 14.500 0.495 -0.854 0.50 13
2 2 1 90 3 19 6.000 0.520 -0.969 0.17 11
3 3 0 106 4 17 8.377 -1.227 0.317 0.42 12
4 4 0 114 2 29 6.000 -1.524 0.457 0.33 12
5 5 11 115 1 26 8.500 0.173 -0.599 0.67 16
6 6 3 102 1 22 6.000 -0.905 0.062 0.25 12
attributes values key_value
1 married no married_no
2 married no married_no
3 married no married_no
4 married no married_no
5 married no married_no
6 married no married_no
Step 2: then pivot the data based on the key:value pair obtained
final.df<-interim.df%>%
mutate(key_value=str_c(attributes ,"_",values))%>%
mutate(TotalCount=1)%>%
select(-attributes,-values)%>% # Most important step
spread(key_value,TotalCount,fill=NA)
head(final.df)
Index visits exposure children age income health1 health2 access school
1 1 0 100 1 24 14.500 0.495 -0.854 0.50 13
2 2 1 90 3 19 6.000 0.520 -0.969 0.17 11
3 3 0 106 4 17 8.377 -1.227 0.317 0.42 12
4 4 0 114 2 29 6.000 -1.524 0.457 0.33 12
5 5 11 115 1 26 8.500 0.173 -0.599 0.67 16
6 6 3 102 1 22 6.000 -0.905 0.062 0.25 12
enroll_no enroll_yes ethnicity_cauc ethnicity_other gender_female gender_male
1 NA 1 1 NA 1 NA
2 NA 1 1 NA 1 NA
3 NA 1 1 NA 1 NA
4 NA 1 1 NA 1 NA
5 NA 1 1 NA 1 NA
6 NA 1 NA 1 1 NA
married_no married_yes program_afdc program_ssi
1 1 NA 1 NA
2 1 NA 1 NA
3 1 NA 1 NA
4 1 NA 1 NA
5 1 NA 1 NA
6 1 NA 1 NA
final.df contains the pivotted dataset or the wide form dataset.
Final Note
We saw how it is extremely easy to create a pivoted or a wide form dataset using multiple keys in R
Link to Previous Blogs
My Youtube Channel
List of Datasets for Practise
https://vincentarelbundock.github.io/Rdatasets/datasets.html
No comments:
Post a Comment