Sometimes while importing a dataset from csv/excel, a numeric data type is read as a string. In such a case, we need to convert all those columns into numeric.In this blog, we will look at how to do that¶
Step 1: Importing Libraries¶
In [2]:
import pandas as pd
import numpy as np
Step 2: Creating the dataset¶
In [14]:
df = pd.DataFrame([np.array(["1","2","3"]),
np.array(["4","5",np.nan])]).T
df.columns=['Value1','Value2']
df
Out[14]:
Value1 | Value2 | |
---|---|---|
0 | 1 | 4 |
1 | 2 | 5 |
2 | 3 | nan |
Step 3: Checking data type of the columns¶
In [15]:
df.dtypes
Out[15]:
Value1 object Value2 object dtype: object
we can see that even though value1 and value 2 columns contains numbers but they are represented as string.Lets say if we need to add these two columns, then we need to first convert them into numeric data type¶
Step 4: Converting into Numeric data type¶
In [18]:
cols=['Value1','Value2']
df[cols] = df[cols].apply(pd.to_numeric, errors='coerce')
df
Out[18]:
Value1 | Value2 | |
---|---|---|
0 | 1 | 4.0 |
1 | 2 | 5.0 |
2 | 3 | NaN |
In [19]:
df.dtypes
Out[19]:
Value1 int64 Value2 float64 dtype: object
Lets add the two columns now¶
In [22]:
def func_1(a,b):
return a + b
In [27]:
df['Value3']=df.apply(lambda x: func_1(x.Value1, x.Value2), axis=1)
df
Out[27]:
Value1 | Value2 | Value3 | |
---|---|---|---|
0 | 1 | 4.0 | 5.0 |
1 | 2 | 5.0 | 7.0 |
2 | 3 | NaN | NaN |