Machine Learning Made Easy

Saturday, November 30, 2019

Book Review: Direct from Dell by Michael Dell

Direct from Dell: Strategies that Revolutionized an Industry by Michael Dell

The books highlights the journey of Dell from the founder's perspective. It captures different business challenges faced by the company right from it's inception to it's heydays and so on. The writer stresses on various business paradigm that the company was able to break in its journey to become the most sought after PC brand in the world. In crux, the book covers the following:

1. Dell's Direct Model: In an age where most of its competitors were using Indirect Channel for market penetration, it used Direct Route to Market without partnering with any reseller/retailer/distributor

2. Customer Centric Approach: Dell has always been very proactive in terms of sensing the pulse of it's customers. Getting regular feedbacks and door to door service have been key to understanding the Demand proposition. It has tried to create products that have gained mass acceptance.There is an instance about the highly ambitious 'Olymic' project that it had to shelve just because there was no Demand for it

3. Inventory: Right from the start, Dell has focussed on reducing the inventory to the extent of having only 5-6 days on inventory on hand.This has reduced the cost and freed up cash for expansion activities. Since the business model is Direct, they dont have any inventory stocked up in channel and hence they can better price their products and pass on some benefit to the customers.In the face of a technology change, they are more prepared and can go to market faster

4. Age of Internet: When the internet was launched way back in the 80's, there were very few people to understand the leverage that it could provide. Dell not only lapped up the opportunity to use internet as a value add to it direct Sales Model, but also used it to improve relationships with Suppliers and Customers.Thus the value chain right from Supplier to Dell to End Customers became highly integrated

The above have been covered in great length within the book.Having worked in a PC industry myself, I can relate to the things espoused by Dell. Overall a very good read for people trying to gain some understanding of Sales Model, Supplier Relationship, Inventory Basics and above all how to go about engaging with Customers in a fruitful manner

View all my reviews

Book Review: RESET by Subramanian Swamy

RESET: Regaining India’s Economic Legacy by Subramanian Swamy
My rating: 4 of 5 stars

Introduction: There are few books that attempt to talk about Indian Economics and establish its links with the decisions made during the pre and post-independence era. Subramanian Swamy doesn’t shy away from recounting numerous key incidents during the course of history that had a long lasting impact on the Indian psyche and economic decision making. The following, I thought, are some of the key highlights of the book

1. Comparison between pre and post-independence India with China in aspects such as Per capita Income, Acreage, cop yields, irrigation,literacy etc using data to highlights similarity and differences is remarkable.Few have ever tried to juxtapose the two Asian giants together in a manner done by Swamy

2. Dismantling the so called Nehruvian Economic Model (Inspired by the Soviet Model of Rapid Industrialization: Proposed by PC Mahalanobis and originally create by Feldman).Most of the problems India has today are a direct consequences of the 5 year Economic Planning called Command Economy. State beliefs about squeezing the Agriculture to produce capital goods spelled doom on the Economy.The only good that came out of the Planning was that we got surplus in Agriculture(The so called Green Revolution the plan for which was proposed by Late Prime Minister Lal Bahadur Shastri)

3. Considering 1980-1990 as the worst time for India as it produced a precarious situation for us. It led to balance of payment crisis, depleted most of the foreign reserves (only few weeks were left before day 0), charred image in International Market of an uncompetitive economy and so on. Provided both Indira and Rajiv got majority in Parliament, it speaks volume about how erroneous the Soviet Planning model was

4. Post 1991 period where Liberalisation freed up avenues of growth for India. Structural changes in the policies related to cancellation of License Raj, easing up of State Control, Public Private partnership to name of few. All this accelerated growth and put India back on track. The main aim was to strengthen agriculture (which absorbs most of the workforce) and remove poverty altogether.For this to happen, it was proposed that the Indian economy grow at around 10% for the decade (1990-2000)

5. The reforms, though revolutionary, had made the political parties wary of the fact that it was against the popular sentiments. From the end of 90's till about early 2000's each government from Rao's to Atalji made conscious efforts to dilute the reforms. All the momentum that was gathered during Rao's term eventually fizzled out

6. Since the independence, India has always relied on Agriculture to push growth and absorb its surplus workforce. On the contrary,with each successive Plan, resource allocation to agriculture was reduced despite it being the top contributor to GDP. The resources were diverted to the Public sector which generated very little profit. The only thing that kept the Indian economy on its feed during the 80's and 90's is the Service Sector. It was second to only agriculture in contribution to GDP

7. Modi Years: Lessons learnt and Future Vision:The current economic situation is challenging on two fronts: Weak Private consumption and hurtling investment rate.All this started in 2014 but deteriorated during the demonetization and GST phases. Most impacted were the MSME and agriculture sectors.Lack of economic know how in the cabinet ministers mean a lot of swadeshi brandishing without concrete actions. The goals and the means to accomplish the 5 trillion economy were conspicuous by their absence from the recent Budget speech made by the Finance minister. The turnaround can still be achieved if we give enough incentives to increase household savings (increasing the fixed deposit rates) and ease out liquidity to the MSME. Also it is very important to sustain a GDP growth rate of 10% for the next decade if India has to eliminate poverty and unemployment and become a developed country

View all my reviews

Sunday, May 26, 2019

Introduction to NLP Blog 5: Creating Document Term Matrix (DTM)

In this post we would look at creating Document Term Matrix (DTM) using the Features generated from Term Frequency(TF) or/and Term Frequency-Inverse Document Frequency(TF-IDF) .This will result in the creation of Data Frame that can be used for any subsequent analyses. This data frame will be called as Document Term Matrix. It has the following Properties:

First column will represent Document ID (Doc_ID)
Other columns will represent the Features selected as a result of profiling activity
Values in the cells will represent either

Term Frequency (TF)
Term Frequency-Inverse Document Frequency(TF-IDF)

In the blog, I have covered how to create functions which takes the following things as input:

Name of the data frame
Column containing the text data
select ngram parameter in the form of 1,2,3...

A list can be passed to ngram. If I pass [1,2], then the function will create unigrams and bigrams summary

It generates the Document ID/Row wise summary of Tokens in the form of frequency or TF-IDF. The following functions have been created:

Tokenize the text
Create frequency profile at Document_ID level
Create TF-IDF profile at Document_ID level
Clean the text for any punctuation

I have also tested the functions to see how to fair when the records becomes high (~100k). The function performed fairly.

Download Link : https://drive.google.com/drive/folders/1Jn9qygledlxQ0KbHJdqgRl5tW6dJI4ag?usp=sharing

Download the ipynb file,html version and the csv file to understand the flow

Links to my previous blogs on NLP:

Blog 1: https://mlmadeeasy.blogspot.com/2019/04/introduction-to-nlp-part-1-tokenization.html

Blog 2: https://mlmadeeasy.blogspot.com/2019/04/introduction-to-nlp-part-2-regular.html

Blog 3: https://mlmadeeasy.blogspot.com/2019/05/introduction-to-nlp-part-3-bag-of-words.htm

Blog 4: https://mlmadeeasy.blogspot.com/2019/05/introduction-to-nlp-part-4-term.html

Tuesday, May 14, 2019

Introduction to NLP Part 4: Term Frequency Inverse Document Frequency (TF-IDF)

In this post we would look at metric other than Term Frequency to generate Features. It is mathematically represented as :

TF-IDF = TF * IDF, where

TF = Number of times a given token appears in a document/Total words in the document

IDF =log(Total Number of Documents/Total documents in which the token is present)

For instance, if my unstructured data set looks like:

Virat Kohli is captain of Indian team
Virat is a great batsman
Virat will break all batting records

Let's say we want to calculate the TF-IDF value of 'Virat' for Sentence 1.
TF = Number of times a given token appears in a document/Total words in the document
= 1/7
= 0.1428

IDF = log(Total Number of Documents/Total documents in which the token is present)
= log(3/3)
= 0

So TF-IDF= 0.1428 * 0
= 0

As can be seen that if a token is present across numerous documents, it TF-IDF value will be close to zero. The TF-IDF score will be high for words used less frequently. Lets calculate the TF-IDF value for 'great' token from Sentence 2

TF = Number of times a given token appears in a document/Total words in the document
= 1/5
= 0.20

IDF = log(Total Number of Documents/Total documents in which the token is present)
= log(3/1)...... base 10
= 0.477

So TF-IDF= 0.20 * 0.477
= 0.095

Since 'great' is not present in Sentence 1 and 3, their TF-IDF score will be zero. So if we make 'great' as a Feature, then the resultant data frame will look like

Great

0.095

In the above example I have converted an unstructured data set into a structured data set. This can be used for any further analysis.

In the blog we will look at the following in detail:

TF-IDF Basics
Create a function that can calculate the TF-IDF values

The following libraries will be used:

Pandas
nltk
string

Download Link:https://drive.google.com/drive/folders/1q-mvC336C2pp6mhcNedncdEiRKMG1K_f?usp=sharing

Download the ipynb file and html version to understand the flow

Links to my previous blogs on NLP:

Blog 1: https://mlmadeeasy.blogspot.com/2019/04/introduction-to-nlp-part-1-tokenization.html

Blog 2: https://mlmadeeasy.blogspot.com/2019/04/introduction-to-nlp-part-2-regular.html

Blog 3: https://mlmadeeasy.blogspot.com/2019/05/introduction-to-nlp-part-3-bag-of-words.htm

Sunday, May 5, 2019

Introduction to NLP Part 3: Bag of Words using Term Frequency (TF)

In this post we would look at bag of words concept in python. Bag of words is basically used to convert unstructured data into structured data by creating Features (similar to columns in a Structured Data Frame). Bag of Words uses frequency as a metric to generate data frame. For instance, if my unstructured data set looks like:

Virat Kohli is captain of Indian team
Virat is a great batsman
Virat will break all batting records

Let's say we take 'Virat' as a feature. Virat appears once in each of the three sentences. So my structured data frame will look like

Virat

In the above example I have converted an unstructured data set into a structured data set. This can be used for any further analysis.

In the blog we will look at the following in detail:

Ngrams

Unigrams
Bigrams
Trigrams

Create data frame for ngrams

The following libraries will be used:

Pandas
nltk

Download Link: https://drive.google.com/drive/folders/1IO0_ZLRDQha8xDImM7_s9GJ6H5riCyuP?usp=sharing

Download the ipynb file and html version to understand the flow

Thursday, April 18, 2019

Introduction to NLP Part 2: Regular Expression in Python

Regular Expression is like a series of characters that is used to search a definite pattern in text. These are often used to extract information from both structured as well as unstructured text corpus. Almost all programming language have a well defined library of functions used for this purpose. In this blog we would look at some of the common functions that are used in python along with some scenario based use cases. The broad objective of this blog is to:

Get familiar with functions used for search
Exploring the 're' library
Use the expressions in a list and data frame to

Search text
Replace text

Link to extract python(ipynb) file:

https://drive.google.com/file/d/1G87XQbALi-EU6koFdz4u2MuFdQ_xm7hY/view?usp=sharing

Link to extract the html version:
https://drive.google.com/file/d/1EOudA7eL1Rk0TyeUwiqvGSRUgD0DpRWX/view?usp=sharing

Saturday, April 6, 2019

Introduction to NLP Part 1: Tokenization, Lemmatization and Stop Word Removal

In this post we would look at how to handle text data in python. Any text analysis activity basically has three main components:

Tokenization
Lemmatization/Stemming
Stop Word Removal

We would look at a small text example and understand how to perform the above three steps using the nltk library. I have performed all the operation by downloading all the methods in nltk using the following line of code

nltk.download()

I have not mentioned the above line of code in the attached python notebook and html version but it is advisable for users to run the above line after doing import nltk. The nltk.download() will take some time (few hours) to download all the relevant packages to your console. After this you can run the entire python script.

Download Link: https://drive.google.com/drive/folders/12LrZTI5qT-vzz6ce5dpXZ2ucdUsfa9S_?usp=sharing

Download the ipynb file and html version to understand the flow