Thursday, April 18, 2019

Introduction to NLP Part 2: Regular Expression in Python

Regular Expression is like a series of characters that is used to search a definite pattern in text. These are often used to extract information from both structured as well as unstructured text corpus. Almost all programming language have a well defined library of functions used for this purpose. In this blog we would look at some of the common functions that are used in python along with some scenario based use cases. The broad objective of this blog is to:

  1. Get familiar with functions used for search
  2. Exploring the 're' library
  3. Use the expressions in a list and data frame to
    • Search text
    • Replace text
Link to extract python(ipynb) file:

Saturday, April 6, 2019

Introduction to NLP Part 1: Tokenization, Lemmatization and Stop Word Removal

In this post we would look at how to handle text data in python. Any text analysis activity basically has three main components:

  1. Tokenization
  2. Lemmatization/Stemming
  3. Stop Word Removal

We would look at a small text example and understand how to perform the above three steps using the nltk library. I have performed all the operation by downloading all the methods in nltk using the following line of code

  • nltk.download()

I have not mentioned the above line of code in the attached python notebook and html version but it is advisable for users to run the above line after doing import nltk. The nltk.download() will take some time (few hours) to download all the relevant packages to your console. After this you can run the entire python script.

Download Link:  https://drive.google.com/drive/folders/12LrZTI5qT-vzz6ce5dpXZ2ucdUsfa9S_?usp=sharing

Download the ipynb file and html version to understand the flow


Friday, March 15, 2019

Handling JSON in Python

In this post we would look at scenarios where we have to work with Java script object notation(JSON). The nature of the task as well as resources have been summarized in the following sections:

Problem Statement: Many a times we are given a situation where we need to extract information from JSON object before it can be used further.This is a typical case when dealing with creation of tools to process information or while extracting information from some other source or while exchanging information. The aim is to understand JSON and how it can be converted into a structured data using python.

Data Download Link
Learning Objectives:
  • Handling JSON
    • Understanding JSON object 
    • Using json library
    • Converting it into a table (data frame) 
Please download the jupyter file from the following link

Saturday, February 9, 2019

Pandas Complete Guide

This blog contains collection of all the codes snippets related to pandas. It covers the following things:

  1. Introduction to Series: https://mlmadeeasy.blogspot.com/2019/02/introduction-to-series.html
  2. Hierarchical Indexing: https://mlmadeeasy.blogspot.com/2019/02/hierarchical-indexing.html
  3. Combining and Merging Data sets: https://mlmadeeasy.blogspot.com/2019/02/combining-and-merging-datasets.html
  4. Pivoting using Pandas: https://mlmadeeasy.blogspot.com/2019/02/pivoting-using-pandas.html
  5. Frequency Profiling using Pandas: https://mlmadeeasy.blogspot.com/2019/02/frequency-profiling-using-pandas.html
  6. Data Manipulation in Python Part 1: https://mlmadeeasy.blogspot.com/2019/01/data-manipulation-with-python.html
  7. Data Manipulation in Python Part 2 (Retail Case Study):https://mlmadeeasy.blogspot.com/2019/01/data-manipulation-with-python-part-2.html

Frequency Profiling using Pandas

This blog contains introduction to frequency profiling. This is common place with data engineers who do ETL. It is also used at the start of every exploratory data analysis.

The blog consists of the following attachments:
  1. Jupyter Notebook https://drive.google.com/file/d/1mL3GoVQjGTTUeWMmwDKtytGNRF0evE6D/view?usp=sharing
  2. Html doc:   https://drive.google.com/file/d/1ExuLhp1CgCjE0DCTR_G09LnD63I_Txp4/view?usp=sharing

Pivoting using Pandas

This blog contains introduction to pivoting using pandas. It gives different scenarios of creating summaries using pivot table functions

The blog consists of the following attachments:
  1. Jupyter Notebook https://drive.google.com/file/d/1Vsui7dQH9pJN1unw6oz47hZH1WZifObQ/view?usp=sharingW7mTz2k8hnx8XjcW4JL5JeoPw/view?usp=sharing
  2. Html doc: https://drive.google.com/file/d/1DFcwajEK2qqYXcKXjd_srQsjkYvuBGON/view?usp=sharing

Combining and Merging Datasets

This blog contains introduction to merging data frames. It gives different scenarios of using keys from different data frames.

The blog consists of the following attachments:
  1. Jupyter Notebook https://drive.google.com/file/d/100RXan-W7mTz2k8hnx8XjcW4JL5JeoPw/view?usp=sharing
  2. Html doc: https://drive.google.com/file/d/1t0J8QBD7kNb-DZDevPs-y36imb0G4I6g/view?usp=sharing

Web Scraping Tutorial 4- Getting the busy information data from Popular time page from Google

Popular Times Popular Times In this blog we will try to scrape the ...