Tuesday, October 6, 2020

Blog 6 Python Blog:Pivoting using Pandas

PivotingusingPandas
In [2]:
# Pivoting is used in a lot of cases where different summaries needs to be created for different Features
# Here we will see how to do standard pivoting operations using Pandas
import pandas as pd
import numpy as np
In [18]:
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:100% !important;margin-left: -5%; }</style>"))
In [4]:
# We will use OrderedDict  function from collecctions library
# Normal dictionaries doesnt remember the order in which the key value pairs are inserted in a dictionary
# but OrderedDict  remebers the order
from collections import OrderedDict
table = OrderedDict((
    ("Item", ['Item0', 'Item1', 'Item2', 'Item3']),
    ('CType',['Gold', 'Bronze', 'Gold', 'Silver']),
    ('USD',  [5,7,2,7]),
    ('EU',   [2,4,6,1])
))
df1 = pd.DataFrame(table)
df1
Out[4]:
Item CType USD EU
0 Item0 Gold 5 2
1 Item1 Bronze 7 4
2 Item2 Gold 2 6
3 Item3 Silver 7 1
In [6]:
# Creating the pivot
p = df1.pivot(index='Item', columns='CType', values='USD')
p

# We see the NaN where there is no value in df1 for Ctype columns
Out[6]:
CType Bronze Gold Silver
Item
Item0 NaN 5.0 NaN
Item1 7.0 NaN NaN
Item2 NaN 2.0 NaN
Item3 NaN NaN 7.0
In [8]:
type(p)
# p is a Data Frame object and hence standard data frame operations applies
Out[8]:
pandas.core.frame.DataFrame
In [10]:
p.columns.values
# Getting the column names
Out[10]:
array(['Bronze', 'Gold', 'Silver'], dtype=object)
In [12]:
p.index.values
# Getting the iindex values
Out[12]:
array(['Item0', 'Item1', 'Item2', 'Item3'], dtype=object)
In [14]:
print (p[p.index=='Item0'].Gold.values)
# Getting the values for item0 corresponding to Gold column
[5.]
In [16]:
# Another subsetting operation
p[['Bronze','Gold']][0:1]
Out[16]:
CType Bronze Gold
Item
Item0 NaN 5.0
In [18]:
p[['Bronze','Gold']][0:2]
Out[18]:
CType Bronze Gold
Item
Item0 NaN 5.0
Item1 7.0 NaN
In [20]:
# Using iloc function for subsetting
p.iloc[0:2,0:2]
Out[20]:
CType Bronze Gold
Item
Item0 NaN 5.0
Item1 7.0 NaN
In [22]:
# Creates Hierarchical Columns; Values argument is not given
# Here for USD as well as EU values , summary is created
a = df1.pivot(index='Item',columns='CType')
a
Out[22]:
USD EU
CType Bronze Gold Silver Bronze Gold Silver
Item
Item0 NaN 5.0 NaN NaN 2.0 NaN
Item1 7.0 NaN NaN 4.0 NaN NaN
Item2 NaN 2.0 NaN NaN 6.0 NaN
Item3 NaN NaN 7.0 NaN NaN 1.0
In [24]:
# Accessing Hierarchical Dataframe
a['USD'] # Type is a Dataframe
Out[24]:
CType Bronze Gold Silver
Item
Item0 NaN 5.0 NaN
Item1 7.0 NaN NaN
Item2 NaN 2.0 NaN
Item3 NaN NaN 7.0
In [26]:
# Limitations with pivot method
table1 = OrderedDict((
    ("Item", ['Item0', 'Item0', 'Item0', 'Item1']),
    ('CType',['Gold', 'Bronze', 'Gold', 'Silver']),
    ('USD',  [5,7,2,7]),
    ('EU',   [2,4,6,1])
))
df2 = pd.DataFrame(table1)

df2
Out[26]:
Item CType USD EU
0 Item0 Gold 5 2
1 Item0 Bronze 7 4
2 Item0 Gold 2 6
3 Item1 Silver 7 1
In [27]:
# p1 = df2.pivot(index='Item', columns='CType', values='USD')
# Throws an error as item0 value repeats in 'item' column
In [29]:
# To handle this kind of situations we can use pivot_table function
p1 = df2.pivot_table(index='Item', columns='CType', values='USD', aggfunc=np.sum)
p1

# here we can specify an agggreagating function
Out[29]:
CType Bronze Gold Silver
Item
Item0 7.0 7.0 NaN
Item1 NaN NaN 7.0
In [31]:
p2 = df2.pivot_table(index='Item', columns='CType', aggfunc=np.sum)
p2
Out[31]:
EU USD
CType Bronze Gold Silver Bronze Gold Silver
Item
Item0 4.0 8.0 NaN 7.0 7.0 NaN
Item1 NaN NaN 1.0 NaN NaN 7.0
In [19]:
!jupyter nbconvert --to html PivotingusingPandas.ipynb
[NbConvertApp] Converting notebook PivotingusingPandas.ipynb to html
[NbConvertApp] Writing 299655 bytes to PivotingusingPandas.html

Web Scraping Tutorial 4- Getting the busy information data from Popular time page from Google

Popular Times Popular Times In this blog we will try to scrape the ...