Saturday, February 9, 2019

Pandas Complete Guide

This blog contains collection of all the codes snippets related to pandas. It covers the following things:

  1. Introduction to Series: https://mlmadeeasy.blogspot.com/2019/02/introduction-to-series.html
  2. Hierarchical Indexing: https://mlmadeeasy.blogspot.com/2019/02/hierarchical-indexing.html
  3. Combining and Merging Data sets: https://mlmadeeasy.blogspot.com/2019/02/combining-and-merging-datasets.html
  4. Pivoting using Pandas: https://mlmadeeasy.blogspot.com/2019/02/pivoting-using-pandas.html
  5. Frequency Profiling using Pandas: https://mlmadeeasy.blogspot.com/2019/02/frequency-profiling-using-pandas.html
  6. Data Manipulation in Python Part 1: https://mlmadeeasy.blogspot.com/2019/01/data-manipulation-with-python.html
  7. Data Manipulation in Python Part 2 (Retail Case Study):https://mlmadeeasy.blogspot.com/2019/01/data-manipulation-with-python-part-2.html

Frequency Profiling using Pandas

This blog contains introduction to frequency profiling. This is common place with data engineers who do ETL. It is also used at the start of every exploratory data analysis.

The blog consists of the following attachments:
  1. Jupyter Notebook https://drive.google.com/file/d/1mL3GoVQjGTTUeWMmwDKtytGNRF0evE6D/view?usp=sharing
  2. Html doc:   https://drive.google.com/file/d/1ExuLhp1CgCjE0DCTR_G09LnD63I_Txp4/view?usp=sharing

Pivoting using Pandas

This blog contains introduction to pivoting using pandas. It gives different scenarios of creating summaries using pivot table functions

The blog consists of the following attachments:
  1. Jupyter Notebook https://drive.google.com/file/d/1Vsui7dQH9pJN1unw6oz47hZH1WZifObQ/view?usp=sharingW7mTz2k8hnx8XjcW4JL5JeoPw/view?usp=sharing
  2. Html doc: https://drive.google.com/file/d/1DFcwajEK2qqYXcKXjd_srQsjkYvuBGON/view?usp=sharing

Combining and Merging Datasets

This blog contains introduction to merging data frames. It gives different scenarios of using keys from different data frames.

The blog consists of the following attachments:
  1. Jupyter Notebook https://drive.google.com/file/d/100RXan-W7mTz2k8hnx8XjcW4JL5JeoPw/view?usp=sharing
  2. Html doc: https://drive.google.com/file/d/1t0J8QBD7kNb-DZDevPs-y36imb0G4I6g/view?usp=sharing

Hierarchical Indexing

This blog contains introduction to Hierarchical Indexing. It starts with challenges associated with multi level group by and how to tackle it using hierarchical indexing. 

The blog consists of the following attachments:
  1. Jupyter Notebook https://drive.google.com/file/d/1ruXC1cGMbx_hDQisz8oe3drdAzYFrpo9/view?usp=sharing
  2. Html doc: https://drive.google.com/file/d/1Kf8qHLYY_ihos_z5WS8fpyTUwacXldnD/view?usp=sharing

Introduction to Series

This blog contains introduction to series and all the functions associated with it. It starts with how to store data, indexing, data manipulation, handling NaN, etc. It is very important to understand series as it forms the basis to understand pandas data frames.

The blog consists of the following attachments:
  1. Jupyter Notebook https://drive.google.com/file/d/1lV07jr9RHCzd9e_d1ukj3hcficx2M2R8/view?usp=sharing
  2. Html doc: https://drive.google.com/file/d/16bPtQUuOJoXqoi8P5tJsB--GDnygintV/view?usp=sharing

Tuesday, January 15, 2019

Data manipulation with Python Part 2

In this post we would look at another Data Manipulation scenario. The nature of the task as well as resources have been summarized in the following sections:

Problem Statement: There is a sports accessories company ABC that sells sports gear across the globe. The data has fields such as Revenue, Quantity, Gross Margin, Order Method, Time, Country etc spread from 2012 to 2014 across 4 Quarters (Q1 through Q4). Based on the Global Outlook and growth forecasts made by economist at ABC, the company has decided to sell total units equal to 211,555,475 in 2015. However, this is an overall number and Product Managers for individual countries don't know how this number would drill down to individual Product. Your job as a Business Analyst is to help the Product Managers get the numbers so that they can plan effectively

Data Download Link

Learning Objectives:
  1. Using Pandas for data manipulation
    • Getting records and column names
    • Handle NA
    • Using Groupby to sum up numbers at different levels
  2. Using 'apply' function on certain columns
  3. Using lambda along with apply to get Percentage Share
  4. % formatting 
Please download the jupyter file from the following link

Web Scraping Tutorial 4- Getting the busy information data from Popular time page from Google

Popular Times Popular Times In this blog we will try to scrape the ...