Introduction¶
In this blog we will look at the following:
- Add days to a date string
- Subtract two dates
In [1]:
import pandas as pd
import numpy as np
import datetime
from datetime import timedelta
from datetime import date
Step 1:Create dummy dates¶
In [2]:
date_val = pd.Series(['22/09/2020','25-09-2020','20200927','2020-09-22T13:25'])
date_val
Out[2]:
0 22/09/2020 1 25-09-2020 2 20200927 3 2020-09-22T13:25 dtype: object
Step 2:Convert into date format¶
In [3]:
date_val_converted=pd.to_datetime(date_val)
date_val_converted
Out[3]:
0 2020-09-22 00:00:00 1 2020-09-25 00:00:00 2 2020-09-27 00:00:00 3 2020-09-22 13:25:00 dtype: datetime64[ns]
We can see that the data type of 'date_val_converted' is date time¶
Step 3:Lets add 15 days to each date¶
In [4]:
date_val_converted + timedelta(days=3)
Out[4]:
0 2020-09-25 00:00:00 1 2020-09-28 00:00:00 2 2020-09-30 00:00:00 3 2020-09-25 13:25:00 dtype: datetime64[ns]
Step 4: Subtract two dates¶
Lets subtract these dates from today's date¶
In [32]:
diff_days=[(pd.to_datetime(date.today())-i).days for i in date_val_converted]
diff_days
Out[32]:
[566, 563, 561, 565]
In [35]:
date_df=pd.DataFrame(date_val_converted)
date_df['Today'] = date.today()
date_df.columns=[['Dummy_Date','Today']]
date_df['Diff_Days']=diff_days
date_df
Out[35]:
Dummy_Date | Today | Diff_Days | |
---|---|---|---|
0 | 2020-09-22 00:00:00 | 2022-04-11 | 566 |
1 | 2020-09-25 00:00:00 | 2022-04-11 | 563 |
2 | 2020-09-27 00:00:00 | 2022-04-11 | 561 |
3 | 2020-09-22 13:25:00 | 2022-04-11 | 565 |
Step 5: Subtract two dates column in Pandas data frame¶
In [51]:
interim_df=pd.DataFrame(date_val_converted)
interim_df['Today']=date.today()
interim_df.columns=['Reference Date','Today']
interim_df
Out[51]:
Reference Date | Today | |
---|---|---|
0 | 2020-09-22 00:00:00 | 2022-04-23 |
1 | 2020-09-25 00:00:00 | 2022-04-23 |
2 | 2020-09-27 00:00:00 | 2022-04-23 |
3 | 2020-09-22 13:25:00 | 2022-04-23 |
In [52]:
Diff_Days= pd.to_datetime(interim_df['Today']) - pd.to_datetime(interim_df['Reference Date'])
interim_df['Diff_Days'] = Diff_Days
interim_df
Out[52]:
Reference Date | Today | Diff_Days | |
---|---|---|---|
0 | 2020-09-22 00:00:00 | 2022-04-23 | 578 days 00:00:00 |
1 | 2020-09-25 00:00:00 | 2022-04-23 | 575 days 00:00:00 |
2 | 2020-09-27 00:00:00 | 2022-04-23 | 573 days 00:00:00 |
3 | 2020-09-22 13:25:00 | 2022-04-23 | 577 days 10:35:00 |
Converting days(date time) into integer¶
In [53]:
interim_df['Diff_Days2']=interim_df['Diff_Days'].apply(lambda x: x.days)
interim_df
Out[53]:
Reference Date | Today | Diff_Days | Diff_Days2 | |
---|---|---|---|---|
0 | 2020-09-22 00:00:00 | 2022-04-23 | 578 days 00:00:00 | 578 |
1 | 2020-09-25 00:00:00 | 2022-04-23 | 575 days 00:00:00 | 575 |
2 | 2020-09-27 00:00:00 | 2022-04-23 | 573 days 00:00:00 | 573 |
3 | 2020-09-22 13:25:00 | 2022-04-23 | 577 days 10:35:00 | 577 |
In [54]:
interim_df2 = interim_df.drop(['Diff_Days'],axis=1)
interim_df2
Out[54]:
Reference Date | Today | Diff_Days2 | |
---|---|---|---|
0 | 2020-09-22 00:00:00 | 2022-04-23 | 578 |
1 | 2020-09-25 00:00:00 | 2022-04-23 | 575 |
2 | 2020-09-27 00:00:00 | 2022-04-23 | 573 |
3 | 2020-09-22 13:25:00 | 2022-04-23 | 577 |
In [55]:
new_nm = {'Diff_Days2': 'Diff_Days'}
interim_df2.rename(columns=new_nm,inplace=True)
interim_df2
Out[55]:
Reference Date | Today | Diff_Days | |
---|---|---|---|
0 | 2020-09-22 00:00:00 | 2022-04-23 | 578 |
1 | 2020-09-25 00:00:00 | 2022-04-23 | 575 |
2 | 2020-09-27 00:00:00 | 2022-04-23 | 573 |
3 | 2020-09-22 13:25:00 | 2022-04-23 | 577 |