There are situation where we need to merge data frames with different columns.This can happen when we are trying to merge sales dataset from different region containing different products or merge datasets related to different clinical studies.In this blog, we will look at a very simple way to do that¶
Step 1: Import libraries¶
In [1]:
import pandas as pd
import numpy as np
Step 2: Create dummy data frames¶
In [5]:
df1=pd.DataFrame([np.array([1,2,3,4]),
['PL1','PL2','PL3','PL4']]).T
df1.columns=['Item_Num','Product_Line']
df1
Out[5]:
Item_Num | Product_Line | |
---|---|---|
0 | 1 | PL1 |
1 | 2 | PL2 |
2 | 3 | PL3 |
3 | 4 | PL4 |
In [6]:
df2=pd.DataFrame([np.array([5,6,7]),
['East','West','North']]).T
df2.columns=['Item_Num','Region']
df2
Out[6]:
Item_Num | Region | |
---|---|---|
0 | 5 | East |
1 | 6 | West |
2 | 7 | North |
We can see that df1 and df2 are different by the fact that df1 has Product_Line column while df2 doesnt have it.On the other hand, df2 has Region column while df1 doesnt have it¶
Step 3: Merging df1 and df2 together¶
In [7]:
pd.concat([df1,df2],axis=0,ignore_index=True)
Out[7]:
Item_Num | Product_Line | Region | |
---|---|---|---|
0 | 1 | PL1 | NaN |
1 | 2 | PL2 | NaN |
2 | 3 | PL3 | NaN |
3 | 4 | PL4 | NaN |
4 | 5 | NaN | East |
5 | 6 | NaN | West |
6 | 7 | NaN | North |