There are instances when a lot of columns are present in our data frame and we have to lets say subtract or add 2 columns in groups.In such cases we can use string match to identify set of columns that are to be subtracted/added and repeat the operation across all the columns¶
In [ ]:
import numpy as np
import pandas as pd
Lets look at the PE ratios of some of the banks in India.Again this is a dummy data and just for illustration purpose.The main goal is to use regular expression to perform columnar operation¶
In [130]:
Week_Num=['Week1','Week2','Week3','Week4','Week5']
ICICI_CurrentWeek = pd.Series(np.random.rand(5))
ICICI_LastWeek = pd.Series(np.random.rand(5))
HDFC_CurrentWeek = pd.Series(np.random.rand(5))
HDFC_LastWeek = pd.Series(np.random.rand(5))
AXIS_CurrentWeek = pd.Series(np.random.rand(5))
AXIS_LastWeek = pd.Series(np.random.rand(5))
SBI_CurrentWeek = pd.Series(np.random.rand(5))
SBI_LastWeek = pd.Series(np.random.rand(5))
df=pd.DataFrame([Week_Num,
ICICI_CurrentWeek,ICICI_LastWeek,
HDFC_CurrentWeek,HDFC_LastWeek,
AXIS_CurrentWeek,AXIS_LastWeek,
SBI_CurrentWeek,SBI_LastWeek]).T
df.columns=['Week_Num','ICICI_CurrentWeek','ICICI_LastWeek',
'HDFC_CurrentWeek','HDFC_LastWeek',
'AXIS_CurrentWeek','AXIS_LastWeek',
'SBI_CurrentWeek','SBI_LastWeek']
df
Out[130]:
Week_Num | ICICI_CurrentWeek | ICICI_LastWeek | HDFC_CurrentWeek | HDFC_LastWeek | AXIS_CurrentWeek | AXIS_LastWeek | SBI_CurrentWeek | SBI_LastWeek | |
---|---|---|---|---|---|---|---|---|---|
0 | Week1 | 0.537536 | 0.621686 | 0.025014 | 0.81958 | 0.741462 | 0.172415 | 0.949251 | 0.600982 |
1 | Week2 | 0.217225 | 0.805213 | 0.918463 | 0.621928 | 0.148297 | 0.775873 | 0.586337 | 0.319201 |
2 | Week3 | 0.24441 | 0.695069 | 0.789786 | 0.521547 | 0.739275 | 0.767115 | 0.550886 | 0.346773 |
3 | Week4 | 0.184179 | 0.371545 | 0.91638 | 0.346759 | 0.743964 | 0.568355 | 0.078098 | 0.35065 |
4 | Week5 | 0.616801 | 0.214047 | 0.092583 | 0.441851 | 0.561522 | 0.259982 | 0.364596 | 0.390632 |
The goal here is to subtract ICICI current week numebrs from last week.This has to be repeated for other banks as well.Lets set up Identifying patterns for column name¶
In [14]:
col_nm_pattern=['ICICI','HDFC','AXIS','SBI']
For each value in the above pattern, we need to identify Current week and last week columns and need to subtract them¶
In [131]:
l1=[]
lower_pos='LastWeek'
upper_pos='CurrentWeek'
for i in col_nm_pattern:
pos=[j for j in df.columns if i in j] # Identifies column names specific to only one Bank
v2 = df[[k for k in pos if upper_pos in k][0]] # Identifies current week values for the bank
v1 = df[[k for k in pos if lower_pos in k][0]] # Identifies last week values for the bank
diff_val=v2 - v1 # SUbtract the two columns
interim_df=pd.DataFrame([df['Week_Num'],diff_val]).T
interim_df.columns = ['Week_Num',i + ' Numbers wrt ' + lower_pos]
l1.append(interim_df)
Consolidating all the values into a single data frame¶
In [132]:
final_df = pd.concat(l1,axis=1)
final_df
Out[132]:
Week_Num | ICICI Numbers wrt LastWeek | Week_Num | HDFC Numbers wrt LastWeek | Week_Num | AXIS Numbers wrt LastWeek | Week_Num | SBI Numbers wrt LastWeek | |
---|---|---|---|---|---|---|---|---|
0 | Week1 | -0.08415 | Week1 | -0.794566 | Week1 | 0.569047 | Week1 | 0.348269 |
1 | Week2 | -0.587988 | Week2 | 0.296535 | Week2 | -0.627576 | Week2 | 0.267136 |
2 | Week3 | -0.450659 | Week3 | 0.268239 | Week3 | -0.027841 | Week3 | 0.204113 |
3 | Week4 | -0.187366 | Week4 | 0.569621 | Week4 | 0.175609 | Week4 | -0.272551 |
4 | Week5 | 0.402753 | Week5 | -0.349268 | Week5 | 0.30154 | Week5 | -0.026036 |
Removing multiple instances of Week_Num column¶
In [133]:
final_df=final_df.loc[:,~final_df.columns.duplicated()]
final_df
Out[133]:
Week_Num | ICICI Numbers wrt LastWeek | HDFC Numbers wrt LastWeek | AXIS Numbers wrt LastWeek | SBI Numbers wrt LastWeek | |
---|---|---|---|---|---|
0 | Week1 | -0.08415 | -0.794566 | 0.569047 | 0.348269 |
1 | Week2 | -0.587988 | 0.296535 | -0.627576 | 0.267136 |
2 | Week3 | -0.450659 | 0.268239 | -0.027841 | 0.204113 |
3 | Week4 | -0.187366 | 0.569621 | 0.175609 | -0.272551 |
4 | Week5 | 0.402753 | -0.349268 | 0.30154 | -0.026036 |
Merging all the data into a single data frame¶
In [134]:
final_consolidated=df.merge(final_df,on="Week_Num")
final_consolidated
Out[134]:
Week_Num | ICICI_CurrentWeek | ICICI_LastWeek | HDFC_CurrentWeek | HDFC_LastWeek | AXIS_CurrentWeek | AXIS_LastWeek | SBI_CurrentWeek | SBI_LastWeek | ICICI Numbers wrt LastWeek | HDFC Numbers wrt LastWeek | AXIS Numbers wrt LastWeek | SBI Numbers wrt LastWeek | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Week1 | 0.537536 | 0.621686 | 0.025014 | 0.81958 | 0.741462 | 0.172415 | 0.949251 | 0.600982 | -0.08415 | -0.794566 | 0.569047 | 0.348269 |
1 | Week2 | 0.217225 | 0.805213 | 0.918463 | 0.621928 | 0.148297 | 0.775873 | 0.586337 | 0.319201 | -0.587988 | 0.296535 | -0.627576 | 0.267136 |
2 | Week3 | 0.24441 | 0.695069 | 0.789786 | 0.521547 | 0.739275 | 0.767115 | 0.550886 | 0.346773 | -0.450659 | 0.268239 | -0.027841 | 0.204113 |
3 | Week4 | 0.184179 | 0.371545 | 0.91638 | 0.346759 | 0.743964 | 0.568355 | 0.078098 | 0.35065 | -0.187366 | 0.569621 | 0.175609 | -0.272551 |
4 | Week5 | 0.616801 | 0.214047 | 0.092583 | 0.441851 | 0.561522 | 0.259982 | 0.364596 | 0.390632 | 0.402753 | -0.349268 | 0.30154 | -0.026036 |
In [135]:
final_consolidated = final_consolidated.reindex(sorted(final_consolidated.columns), axis=1)
final_consolidated
Out[135]:
AXIS Numbers wrt LastWeek | AXIS_CurrentWeek | AXIS_LastWeek | HDFC Numbers wrt LastWeek | HDFC_CurrentWeek | HDFC_LastWeek | ICICI Numbers wrt LastWeek | ICICI_CurrentWeek | ICICI_LastWeek | SBI Numbers wrt LastWeek | SBI_CurrentWeek | SBI_LastWeek | Week_Num | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.569047 | 0.741462 | 0.172415 | -0.794566 | 0.025014 | 0.81958 | -0.08415 | 0.537536 | 0.621686 | 0.348269 | 0.949251 | 0.600982 | Week1 |
1 | -0.627576 | 0.148297 | 0.775873 | 0.296535 | 0.918463 | 0.621928 | -0.587988 | 0.217225 | 0.805213 | 0.267136 | 0.586337 | 0.319201 | Week2 |
2 | -0.027841 | 0.739275 | 0.767115 | 0.268239 | 0.789786 | 0.521547 | -0.450659 | 0.24441 | 0.695069 | 0.204113 | 0.550886 | 0.346773 | Week3 |
3 | 0.175609 | 0.743964 | 0.568355 | 0.569621 | 0.91638 | 0.346759 | -0.187366 | 0.184179 | 0.371545 | -0.272551 | 0.078098 | 0.35065 | Week4 |
4 | 0.30154 | 0.561522 | 0.259982 | -0.349268 | 0.092583 | 0.441851 | 0.402753 | 0.616801 | 0.214047 | -0.026036 | 0.364596 | 0.390632 | Week5 |
Shifting Week_Num to the left¶
In [157]:
final_consolidated=final_consolidated[['Week_Num'] + final_consolidated.columns.tolist()]
final_consolidated
Out[157]:
Week_Num | AXIS Numbers wrt LastWeek | AXIS_CurrentWeek | AXIS_LastWeek | HDFC Numbers wrt LastWeek | HDFC_CurrentWeek | HDFC_LastWeek | ICICI Numbers wrt LastWeek | ICICI_CurrentWeek | ICICI_LastWeek | SBI Numbers wrt LastWeek | SBI_CurrentWeek | SBI_LastWeek | Week_Num | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Week1 | 0.569047 | 0.741462 | 0.172415 | -0.794566 | 0.025014 | 0.81958 | -0.08415 | 0.537536 | 0.621686 | 0.348269 | 0.949251 | 0.600982 | Week1 |
1 | Week2 | -0.627576 | 0.148297 | 0.775873 | 0.296535 | 0.918463 | 0.621928 | -0.587988 | 0.217225 | 0.805213 | 0.267136 | 0.586337 | 0.319201 | Week2 |
2 | Week3 | -0.027841 | 0.739275 | 0.767115 | 0.268239 | 0.789786 | 0.521547 | -0.450659 | 0.24441 | 0.695069 | 0.204113 | 0.550886 | 0.346773 | Week3 |
3 | Week4 | 0.175609 | 0.743964 | 0.568355 | 0.569621 | 0.91638 | 0.346759 | -0.187366 | 0.184179 | 0.371545 | -0.272551 | 0.078098 | 0.35065 | Week4 |
4 | Week5 | 0.30154 | 0.561522 | 0.259982 | -0.349268 | 0.092583 | 0.441851 | 0.402753 | 0.616801 | 0.214047 | -0.026036 | 0.364596 | 0.390632 | Week5 |