barplot using plotly library
Parag Verma
Introduction
Strong visualization is very important in any data science activity. Conveying the results to the target audience and ensuring that key insights come out in the form of a story is very decisive and can determine the effectiveness of the model building process.In this regard, I introduce you to a very powerful visualization package in R by the name plotly. There are tonnes of features in the library that enhances the look and feel of the visualization and adds value to interpretation. In this blog, we will look at how to create a simple barplot using plotly library
Installing libraries
Lets install plotly and other libraries used to create the plot
package.name<-c("dplyr","tidyr","Ecdat","plotly")
for(i in package.name){
if(!require(i,character.only = T)){
install.packages(i)
}
library(i,character.only = T)
}
# Ecdat package has the 'Health Insurance and Hours Worked By Wives' data
data(HI)
df<-HI
head(df)
whrswk hhi whi hhi2 education race hispanic experience kidslt6 kids618
1 0 no no no 13-15years white no 13.0 2 1
2 50 no yes no 13-15years white no 24.0 0 1
3 40 yes no yes 12years white no 43.0 0 0
4 40 no yes yes 13-15years white no 17.0 0 1
5 0 yes no yes 9-11years white no 44.5 0 0
6 40 yes yes yes 12years white no 32.0 0 0
husby region wght
1 11.960 northcentral 214986
2 1.200 northcentral 210119
3 31.275 northcentral 219955
4 9.000 northcentral 210317
5 0.000 northcentral 219955
6 15.690 northcentral 208148
Step 1:Frequency Profile of the variables
Lets look at the count of records for different levels of categorical variables
interim.df<-df%>%
select(hhi,whi,hhi2,education,race,hispanic,kidslt6,kids618,region)
l1<-lapply(colnames(interim.df),function(x){
z<-interim.df%>%
select(x)%>%
mutate(Feature=x)
colnames(z)<-c("Level","Feature")
z1<-z%>%
group_by(Feature,Level)%>%
summarise(Total=n())
z1["Level"]<-sapply(z1["Level"],as.character)
return(z1)
})
df.final<-do.call(rbind.data.frame,l1)%>%
as.data.frame()
row.names(df.final)<-NULL
head(df.final)
Feature Level Total
1 hhi no 11219
2 hhi yes 11053
3 whi no 13961
4 whi yes 8311
5 hhi2 no 8696
6 hhi2 yes 13576
Step 2:Dataset for ‘education’ variable
df.interim<-df.final%>%
filter(Feature=="education")%>%
select(-Feature)
df.interim
Level Total
1 <9years 1122
2 9-11years 1771
3 12years 8677
4 13-15years 5790
5 16years 3472
6 >16years 1440
Step 3:Initialising the plotly object
barplt <- df.interim %>% plot_ly()
barplt <- barplt %>% add_trace(x = df.interim$Level, y = df.interim$Total, type = 'bar',text=paste0(round(df.interim$Total,2)),textposition="Outside",
marker = list(color = 'Orange',
line = list(color = 'Orange', width = 1.5)))
barplt <- barplt %>% layout(title = "<b>Frequency Profile of Education",
barmode = 'group',
xaxis = list(title = "Age Bracket"),
yaxis = list(title = "Record Count"),
autosize=F,width = 500,
margin = list(l = 50, r = 50, b = 50, t = 50, pad = 4))
Warning: Specifying width/height in layout() is now deprecated.
Please specify in ggplotly() or plot_ly()
barplt
Warning: `arrange_()` is deprecated as of dplyr 0.7.0.
Please use `arrange()` instead.
See vignette('programming') for more help
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated.
Link to Previous R Blogs
List of Datasets for Practise
https://hofmann.public.iastate.edu/data_in_r_sortable.html
https://vincentarelbundock.github.io/Rdatasets/datasets.html