Replace NA values in R
Parag Verma
28th Feb, 2022
Introduction
There are certain cases where we need to replace NA values in a data frame with corresponding mean.The following strategy can be used when replacing unwated values with a preferred value for a given column
- Step 1:Convert all unwanted values to NA for all the columns
- Step 2:Then custom NA functions to replace NA values
package.name<-c("dplyr","tidyr","stringr")
for(i in package.name){
if(!require(i,character.only = T)){
install.packages(i)
}
library(i,character.only = T)
}
Step 1: Creating a data frame with NA values
df<-data.frame(Var1=c(1,2,3,NA,NA,4,5),
Var2=c(5,2,1,NA,NA,4,NA))
head(df)
Var1 Var2
1 1 5
2 2 2
3 3 1
4 NA NA
5 NA NA
6 4 4
Step 2: Creating the Function that replaces NA with column mean
replace_NA<-function(x){
pos<-which(is.na(x))
x[pos]<-mean(x,na.rm=T)
return(x)
}
# Creating a vector for testing
x<-c(1,2,3,NA)
# Testing the function on x
replace_NA(x)
[1] 1 2 3 2
Step 3: Using apply family of function to replace across all columns
df2<-apply(df,2,replace_NA)
df2
Var1 Var2
[1,] 1 5
[2,] 2 2
[3,] 3 1
[4,] 3 3
[5,] 3 3
[6,] 4 4
[7,] 5 3