Line Plots
Parag Verma
Introduction
While analysing datasets, it is important to represent summary stats using appropriate graphs.In this series, we will look at how to create most commonly used line plot using ggplot library. Real case scenarios will be taken to understand the nitty-gritties of implementation
Installing the library: dplyr,tidyr and Ecdat package
package.name<-c("dplyr","tidyr","Ecdat","ggplot2")
for(i in package.name){
if(!require(i,character.only = T)){
install.packages(i)
}
library(i,character.only = T)
}
# Ecdat package has the 'Health Insurance and Hours Worked By Wives' data
data(HI)
df<-HI
head(df)
whrswk hhi whi hhi2 education race hispanic experience kidslt6 kids618
1 0 no no no 13-15years white no 13.0 2 1
2 50 no yes no 13-15years white no 24.0 0 1
3 40 yes no yes 12years white no 43.0 0 0
4 40 no yes yes 13-15years white no 17.0 0 1
5 0 yes no yes 9-11years white no 44.5 0 0
6 40 yes yes yes 12years white no 32.0 0 0
husby region wght
1 11.960 northcentral 214986
2 1.200 northcentral 210119
3 31.275 northcentral 219955
4 9.000 northcentral 210317
5 0.000 northcentral 219955
6 15.690 northcentral 208148
Step 1:Lets calcualte Average Experience across different regions
interim.df<-df%>%
select(region,experience)%>%
group_by(region)%>%
summarise(AverageExperience=mean(experience))
Line Plot to represent the above information
ggplot(data=interim.df, aes(x=region, y=AverageExperience,group=1)) +
geom_line(linetype = "dashed")+
geom_point(color="red")
Final Comments
The above plot helps us to understand how a continuous metric such as experience can be different across various levels of a feature such as region or gender or ethnicity
Link to Previous R Blogs
List of Datasets for Practise
https://hofmann.public.iastate.edu/data_in_r_sortable.html
https://vincentarelbundock.github.io/Rdatasets/datasets.html
Thanks Parag, these are really helpful!
ReplyDelete