Slicing and assignment in pandas data frame can be easily done using loc.Moreover, .loc results in faster computations and hence becomes critical when we are dealing with huge datasets.In this blog, we will look at how to use .loc to slice and dice data¶
In [1]:
import pandas as pd
import numpy as np
Step 1:Creating the data frame¶
In [50]:
df = pd.DataFrame([np.array(['A','B','C']),
np.array([1,2,3]),
np.array([4,5,np.nan])
]).T
df.columns=['Label','Value1','Value2']
df
Out[50]:
Label | Value1 | Value2 | |
---|---|---|---|
0 | A | 1 | 4.0 |
1 | B | 2 | 5.0 |
2 | C | 3 | NaN |
In [52]:
df.dtypes
Out[52]:
Label object Value1 object Value2 object dtype: object
In [ ]:
# Converting Value1 and Value2 into numeric
In [54]:
df[['Value1','Value2']] = df[['Value1','Value2']].apply(pd.to_numeric ,
errors = "coerce")
In [56]:
df.dtypes
Out[56]:
Label object Value1 int64 Value2 float64 dtype: object
Using loc to filter on certain rows¶
In [34]:
df.loc[1:2]
Out[34]:
Label | Value1 | Value2 | |
---|---|---|---|
1 | B | 2 | 5 |
2 | C | 3 | nan |
Using loc to filter on certain columns¶
In [35]:
df.loc[:,['Value1','Value2']]
Out[35]:
Value1 | Value2 | |
---|---|---|
0 | 1 | 4 |
1 | 2 | 5 |
2 | 3 | nan |
In [36]:
df.loc[:,['Value1']]
Out[36]:
Value1 | |
---|---|
0 | 1 |
1 | 2 |
2 | 3 |
Using loc to filter on certain rows and columns¶
In [38]:
df.loc[0:1 ,['Label','Value2']]
Out[38]:
Label | Value2 | |
---|---|---|
0 | A | 4 |
1 | B | 5 |
Using loc to assign values to rows and columns¶
In [39]:
df
Out[39]:
Label | Value1 | Value2 | |
---|---|---|---|
0 | A | 1 | 4 |
1 | B | 2 | 5 |
2 | C | 3 | nan |
In [60]:
df.loc[df['Value1'] > 1 , ['Value1','Value2']] = 100
df
Out[60]:
Label | Value1 | Value2 | |
---|---|---|---|
0 | A | 1 | 4.0 |
1 | B | 100 | 100.0 |
2 | C | 100 | 100.0 |
No comments:
Post a Comment