In this blog, we will look at how blank values in a data frame can be replaced by corresponding column mean¶
Creating the data frame¶
In [1]:
import numpy as np
import pandas as pd
In [55]:
df=pd.DataFrame(np.array([np.random.rand(5),
np.random.rand(5)])).T
df.columns=['Var1','Var2']
df
Out[55]:
Var1 | Var2 | |
---|---|---|
0 | 0.792497 | 0.938109 |
1 | 0.738229 | 0.040128 |
2 | 0.977057 | 0.700718 |
3 | 0.844219 | 0.327364 |
4 | 0.218864 | 0.942333 |
Introducing blank values¶
In [56]:
df2=pd.DataFrame(np.array(['']*2*3).reshape(2,3)).T
df2.columns=df.columns
df2
Out[56]:
Var1 | Var2 | |
---|---|---|
0 | ||
1 | ||
2 |
In [57]:
df3=pd.concat([df,df2])
df3
Out[57]:
Var1 | Var2 | |
---|---|---|
0 | 0.792497 | 0.938109 |
1 | 0.738229 | 0.040128 |
2 | 0.977057 | 0.700718 |
3 | 0.844219 | 0.327364 |
4 | 0.218864 | 0.942333 |
0 | ||
1 | ||
2 |
Replace balnk values with NaN¶
In [58]:
df4=df3.replace(r'^\s*',np.nan,regex=True)
df4
Out[58]:
Var1 | Var2 | |
---|---|---|
0 | 0.792497 | 0.938109 |
1 | 0.738229 | 0.040128 |
2 | 0.977057 | 0.700718 |
3 | 0.844219 | 0.327364 |
4 | 0.218864 | 0.942333 |
0 | NaN | NaN |
1 | NaN | NaN |
2 | NaN | NaN |
Creating the function that replaces missing/NaN values with mean¶
In [59]:
s1=df4['Var1']
s1
Out[59]:
0 0.792497 1 0.738229 2 0.977057 3 0.844219 4 0.218864 0 NaN 1 NaN 2 NaN Name: Var1, dtype: object
In [60]:
np.mean(s1)
Out[60]:
0.714173345604612
In [61]:
s2=np.where(s1.isnull(),np.mean(s1),s1)
s2
Out[61]:
array([0.7924972356442854, 0.7382294219780061, 0.9770568854898789, 0.8442190494351298, 0.21886413547576056, 0.714173345604612, 0.714173345604612, 0.714173345604612], dtype=object)
In [62]:
# Now creating the function
def Replace_NaN(x):
y=np.where(x.isnull(),np.mean(x),x)
return(y)
Applying the function on the entire dataset¶
In [64]:
df5=df4.apply(Replace_NaN)
df5
Out[64]:
Var1 | Var2 | |
---|---|---|
0 | 0.792497 | 0.938109 |
1 | 0.738229 | 0.040128 |
2 | 0.977057 | 0.700718 |
3 | 0.844219 | 0.327364 |
4 | 0.218864 | 0.942333 |
0 | 0.714173 | 0.58973 |
1 | 0.714173 | 0.58973 |
2 | 0.714173 | 0.58973 |
No comments:
Post a Comment