There are instances where we have to filter a data frame using multiple criteria.In this blog we will look at how to perform this using numpy logical and operation¶
Step 1: Installing libraries¶
In [3]:
import pandas as pd
import numpy as np
Step 2: Creating dummy data frames¶
In [4]:
df1=pd.DataFrame([["Day1","Week 1",np.random.rand()],
["Day2","Week 1",np.random.rand()],
["Day3","Week 1",np.random.rand()],
["Day4","Week 1",np.random.rand()],
["Day5","Week 1",np.random.rand()]])
df1.columns=['Day','Week Number','Sales_Vs_Inventory']
df1
Out[4]:
Day | Week Number | Sales_Vs_Inventory | |
---|---|---|---|
0 | Day1 | Week 1 | 0.703584 |
1 | Day2 | Week 1 | 0.239418 |
2 | Day3 | Week 1 | 0.885111 |
3 | Day4 | Week 1 | 0.699228 |
4 | Day5 | Week 1 | 0.233243 |
In [5]:
df2=pd.DataFrame([["Day1","Week 2",np.random.rand()],
["Day2","Week 2",np.random.rand()],
["Day3","Week 2",np.random.rand()],
["Day4","Week 2",np.random.rand()],
["Day5","Week 2",np.random.rand()]])
df2.columns=['Day','Week Number','Sales_Vs_Inventory']
df2
Out[5]:
Day | Week Number | Sales_Vs_Inventory | |
---|---|---|---|
0 | Day1 | Week 2 | 0.326464 |
1 | Day2 | Week 2 | 0.967262 |
2 | Day3 | Week 2 | 0.191519 |
3 | Day4 | Week 2 | 0.634987 |
4 | Day5 | Week 2 | 0.133107 |
Step 3:Storing df1 and df2 in a list¶
In [6]:
l1=[df1,df2]
l1
Out[6]:
[ Day Week Number Sales_Vs_Inventory 0 Day1 Week 1 0.703584 1 Day2 Week 1 0.239418 2 Day3 Week 1 0.885111 3 Day4 Week 1 0.699228 4 Day5 Week 1 0.233243, Day Week Number Sales_Vs_Inventory 0 Day1 Week 2 0.326464 1 Day2 Week 2 0.967262 2 Day3 Week 2 0.191519 3 Day4 Week 2 0.634987 4 Day5 Week 2 0.133107]
Step 4: Merging the data frames within l1¶
In [7]:
interim_df=pd.concat(l1,axis=0)
interim_df
Out[7]:
Day | Week Number | Sales_Vs_Inventory | |
---|---|---|---|
0 | Day1 | Week 1 | 0.703584 |
1 | Day2 | Week 1 | 0.239418 |
2 | Day3 | Week 1 | 0.885111 |
3 | Day4 | Week 1 | 0.699228 |
4 | Day5 | Week 1 | 0.233243 |
0 | Day1 | Week 2 | 0.326464 |
1 | Day2 | Week 2 | 0.967262 |
2 | Day3 | Week 2 | 0.191519 |
3 | Day4 | Week 2 | 0.634987 |
4 | Day5 | Week 2 | 0.133107 |
Step 5: Filtering on multiple criteria¶
In [20]:
# Lets filter the data frame based on Day1, week2 and Sales_Vs_Inventory value of less than 0.30
pos1= list(interim_df['Day']=="Day5")
pos1
Out[20]:
[False, False, False, False, True, False, False, False, False, True]
In [21]:
pos2= list(interim_df['Week Number']=="Week 2")
pos2
Out[21]:
[False, False, False, False, False, True, True, True, True, True]
In [50]:
pos3= list(interim_df['Sales_Vs_Inventory'] < 0.40)
pos3
Out[50]:
[False, True, False, False, True, True, False, True, False, True]
Merging the logical values to create a consolidated flag¶
In [51]:
pos= np.array(pos1) & np.array(pos2) & np.array(pos3)
pos
Out[51]:
array([False, False, False, False, False, False, False, False, False, True])
In [53]:
interim_df[pos.tolist()]
Out[53]:
Day | Week Number | Sales_Vs_Inventory | |
---|---|---|---|
4 | Day5 | Week 2 | 0.133107 |