There are certain situation where we are required to take row wise sum across multiple columns.For example¶
- Get the total weekly sum of shipment across days of the week
- Estimate mean of daily blood pressure readings over days of the month
In this blog we will look at how pandas can be used to generate such summaries
Step 1: Creating a data frame¶
In [1]:
import pandas as pd
import numpy as np
In [8]:
Product_Line=["Prod_"+str(i+1) for i in range(100)]
Product_Line[:5]
Out[8]:
['Prod_1', 'Prod_2', 'Prod_3', 'Prod_4', 'Prod_5']
In [10]:
Sales_Monday=np.random.normal(20,5,100)
Sales_Tuesday=np.random.normal(18,3,100)
Sales_Wednesday=np.random.normal(19,4,100)
Sales_Thursday=np.random.normal(17,5,100)
Sales_Friday=np.random.normal(16,5,100)
In [12]:
df=pd.DataFrame([Product_Line,Sales_Monday,Sales_Tuesday,
Sales_Wednesday,Sales_Thursday,Sales_Friday]).T
df.columns=['Product_Line','Sales_Monday','Sales_Tuesday',
'Sales_Wednesday','Sales_Thursday','Sales_Friday']
df.head()
Out[12]:
Product_Line | Sales_Monday | Sales_Tuesday | Sales_Wednesday | Sales_Thursday | Sales_Friday | |
---|---|---|---|---|---|---|
0 | Prod_1 | 19.921991 | 15.572811 | 15.021528 | 13.502968 | 20.652096 |
1 | Prod_2 | 14.70693 | 18.64976 | 13.869032 | 19.839118 | 11.917464 |
2 | Prod_3 | 24.771821 | 15.40479 | 13.600543 | 27.041315 | 13.421013 |
3 | Prod_4 | 26.408631 | 19.511651 | 21.67273 | 16.574764 | 14.913276 |
4 | Prod_5 | 16.261362 | 17.63592 | 8.822245 | 21.374031 | 18.045948 |
Step 2: Generating row sum¶
In [17]:
df['Total_weekly_sales']=df.drop(['Product_Line'],axis=1).sum(axis=1)
df.head()
Out[17]:
Product_Line | Sales_Monday | Sales_Tuesday | Sales_Wednesday | Sales_Thursday | Sales_Friday | Total_weekly_sales | |
---|---|---|---|---|---|---|---|
0 | Prod_1 | 19.921991 | 15.572811 | 15.021528 | 13.502968 | 20.652096 | 84.671393 |
1 | Prod_2 | 14.70693 | 18.64976 | 13.869032 | 19.839118 | 11.917464 | 78.982305 |
2 | Prod_3 | 24.771821 | 15.40479 | 13.600543 | 27.041315 | 13.421013 | 94.239482 |
3 | Prod_4 | 26.408631 | 19.511651 | 21.67273 | 16.574764 | 14.913276 | 99.081052 |
4 | Prod_5 | 16.261362 | 17.63592 | 8.822245 | 21.374031 | 18.045948 | 82.139506 |
Step 3: Generating column sum¶
In [18]:
df.drop(['Product_Line'],axis=1).sum(axis=0)
Out[18]:
Sales_Monday 2009.935125 Sales_Tuesday 1779.659919 Sales_Wednesday 1816.47381 Sales_Thursday 1706.994755 Sales_Friday 1560.890021 Total_weekly_sales 8873.95363 dtype: object