Monday, February 28, 2022

Replace NA values with column mean in R

Replace NA values in R


Introduction

There are certain cases where we need to replace NA values in a data frame with corresponding mean.The following strategy can be used when replacing unwated values with a preferred value for a given column

  • Step 1:Convert all unwanted values to NA for all the columns
  • Step 2:Then custom NA functions to replace NA values
package.name<-c("dplyr","tidyr","stringr")

for(i in package.name){

  if(!require(i,character.only = T)){

    install.packages(i)
  }
  library(i,character.only = T)

}

Step 1: Creating a data frame with NA values

df<-data.frame(Var1=c(1,2,3,NA,NA,4,5),
               Var2=c(5,2,1,NA,NA,4,NA))
head(df)
  Var1 Var2
1    1    5
2    2    2
3    3    1
4   NA   NA
5   NA   NA
6    4    4


Step 2: Creating the Function that replaces NA with column mean

replace_NA<-function(x){
  
  
  pos<-which(is.na(x))
  x[pos]<-mean(x,na.rm=T)
  return(x)
  
}

# Creating a vector for testing
x<-c(1,2,3,NA)

# Testing the function on x
replace_NA(x)
[1] 1 2 3 2


Step 3: Using apply family of function to replace across all columns

df2<-apply(df,2,replace_NA)
df2
     Var1 Var2
[1,]    1    5
[2,]    2    2
[3,]    3    1
[4,]    3    3
[5,]    3    3
[6,]    4    4
[7,]    5    3

My Youtube Channel

No comments:

Post a Comment

Web Scraping Tutorial 4- Getting the busy information data from Popular time page from Google

Popular Times Popular Times In this blog we will try to scrape the ...