There are instances where we are required to filter a data frame using multiple conditions on various columns.In this blog, we will look at how to accomplish that¶
In [2]:
import pandas as pd
import numpy as np
Step 1: Creating the data frame¶
In [5]:
day = ['Day1','Day1','Day1','Day1','Day2','Day2','Day2']
product=['A','B','C','D','A','B','C']
sales=np.random.rand(7)*10
In [6]:
df=pd.DataFrame([day,product,sales]).T
df.columns=['Day','Product','Sales']
df
Out[6]:
Day | Product | Sales | |
---|---|---|---|
0 | Day1 | A | 6.856995 |
1 | Day1 | B | 1.237694 |
2 | Day1 | C | 0.958126 |
3 | Day1 | D | 4.275079 |
4 | Day2 | A | 5.684071 |
5 | Day2 | B | 0.302415 |
6 | Day2 | C | 6.963567 |
In [ ]:
df.dtypes
Step 2:Converting Sales into numeric data type¶
In [7]:
df['Sales']=df['Sales'].apply(pd.to_numeric,errors= "coerce")
Step 3: Filter on Day 1, Product A and Sales greater than 5¶
In [13]:
df.loc[(df['Day']=="Day1") & (df['Product'] == "A") & (df['Sales'] > 5),]
Out[13]:
Day | Product | Sales | |
---|---|---|---|
0 | Day1 | A | 6.856995 |
Step 4: Filter on Day 1, Product A and B¶
In [16]:
df.loc[(df['Day']=="Day1") & (df['Product'].isin(["A","B"])) , ]
Out[16]:
Day | Product | Sales | |
---|---|---|---|
0 | Day1 | A | 6.856995 |
1 | Day1 | B | 1.237694 |