Wednesday, February 5, 2020
Book Review: Coalition Years
The book by Pranab Mukherjee is an ode to his political career. Through the course of the book,
Pranab Mukherjee comes out as an honest man assuming role the party expects from him. A true Congressmen, Pranab Da carries out the chores with arduous sincerity in the face of Coalition politics. The book can be seen in the light of the following points:
• The book starts with the fag end of Atal Ji's government and marks how Pranab Da reinvigorated the Congress party amid the possibility of an NDA-2. Having worked right from grass root levels to Party ranks, Pranab Da garnered respect of the partymen and knew how to connect with people at a personal level. His proximity to 10 Janpath gave gravity to his stature within party lines.An incident during the pre poll campaign in 2004 is of special mention where he visits a village in Left dominated West Bengal and interacts with a boy. Pranab Da assures him of a bund's construction as the first thing once congress comes into power
• His stint as a Defence minister saw an appreciation in defence allocation.During his tenure, he signed pacts with US as well as Russia on major defence cooperation. For most part, he focussed on modernisation of the warfare and transfer of defence technologies with a 'Make in India' thrust in mind .
• The controversial Nuclear deal materialised while he was the Minister of External Affairs . To bring US on the same page, he connected with condoleezza rice at a personal level and made sure that the sanctions put on India by the NSG were softened. This marked the beginning of a landmark era in India-US relationships
• Of all the hats he had donned, the one he had most experience was the Finance Ministry. Introduction of the retrospective tax act, his expansionary policies during 2010,2011 regime and his aggressive stance on implementation of GST, UID made him the man with an independent stance in an otherwise subservient congress
• One aspect according to me that made him class apart was his ability to maintain cordial relationship with all and sundry of the political class. I would like to mention two instances that describe his ability to connect with fellow politicians
○ During the run for president, Bal Thackeray rebuffed his party's stance and supported Pranab Da. As a gesture of gratitude, he went to meet Thackeray at his residence where the later quipped-" It is but obvious that a Maratha lion supports Bengal Tiger"
○ The much debated GST bill was eventually passed by both the houses in 2017 . Since Pranab Da shared cordial relationship with the PM, Narendra Modi gave a congratulatory call to apprise him of the development since he was a staunch supporter of the bill during his tenure as the Finance Minister.
In the end I would like to say that Pranab Mukherjee, in true regards, is the son of the soil. He is definitely the PM India never had
View all my reviews
Saturday, February 1, 2020
Bog 12: Cumulative Sum in R
Cumulative Sum using R
Parag Verma
Introduction
In almost all business problems, the focus is always to look at key takeaways in any analyses.Most Academic disciplines teach a concept known as Pareto Principle which states that 80% of the effect comes from 20% of the causes(also read as cases). If we look at an analyses standpoint, it is always advisable to look at the LOW HANGIN FRUITS first. According to me, these low hanging fruits are established keeping Pareto Principle in mind.
The purpose of writing this blog is to introduce the concept of Cumulative sum in R that helps you deal with practical scenarios while working with datasets. Some of problems we face while handling data set are listed below:
- Create cumulative sum of Sales/Gross Margin/Net Cash flow/Cases around key Attributes such as COuntry, Product Category etc
- Identify groups/Combination that contribute to 80% of the Impact(Sales, Gross Margin etc)
- Identify Products that can help Sales Rep achieve Sales Target
- Quota Allocation exerise where quota is given based on relative contribution of each Product or Region
In this blog, we will look at how to proceed for each of the cases mentioned above
Installing the library: dplyr and tidyr
if(!require("dplyr")){
install.packages("dplyr")
}else{
library(dplyr)
}
if(!require("tidyr")){
install.packages("tidyr")
}else{
library(tidyr)
}
Importing the dataset
For this exercise we will look at the Cybersecurity breaches reported to the US Department of Health and Human Services.There are 1151 records in the data set with 9 variables. Our aim is to calculate the Percentage of Cases reported at each Sate at an Entity Level.Here we will look at the concept of cumulative sum in R
# Ecdat library for importing the dataset
if(!require("Ecfun")){
install.packages("Ecfun",dependencies = T)
}else{
library(Ecfun)
}
if(!require("Ecdat")){
install.packages("Ecdat",dependencies = T)
}else{
library(Ecdat)
}
data(HHSCyberSecurityBreaches)
df<-HHSCyberSecurityBreaches
head(df[,1:3])%>%
knitr::kable()
Name.of.Covered.Entity | State | Covered.Entity.Type |
---|---|---|
Brooke Army Medical Center | TX | Healthcare Provider |
Mid America Kidney Stone Association, LLC | MO | Healthcare Provider |
Alaska Department of Health and Social Services | AK | Healthcare Provider |
Health Services for Children with Special Needs, Inc. | DC | Health Plan |
L. Douglas Carlson, M.D. | CA | Healthcare Provider |
David I. Cohen, MD | CA | Healthcare Provider |
The names of the columns are:
- Name.of.Covered.Entity:A character vector identifying the organization involved in the breach.
- State: State abbreviation
- Covered.Entity.Type: A factor giving the organization type of the covered entity
- Individuals.Affected: An integer giving the number of humans whose records were compromised in the breach.
- Breach.Submission.Date
- Type.of.Breach: A factor giving one of 29 different combinations of 7 different breach types,
- Location.of.Breached.Information: A factor giving one of 47 different combinations of 8 different location categories: “Desktop Computer”, “Electronic Medical Record”, “Email”, “Laptop”, “Network Server”, “Other”, “Other Portable Electronic Device”, “Paper/Films”
- Business.Associate.Present:Logical = (Covered.Entity.Type == “Business Associate”)
- Web.Description: A character vector giving a narrative description of the incident.
Percentage Cases within Each State from Entity
df.interim<-df%>%
group_by(Covered.Entity.Type,State)%>%
summarise(Total_Cases=sum(Individuals.Affected))%>%
arrange(Covered.Entity.Type,desc(Total_Cases))%>%
ungroup()%>%
group_by(Covered.Entity.Type)%>%
mutate(Perc_Contribution=round(100*Total_Cases/sum(Total_Cases),1))
knitr::kable(df.interim%>%
head())
Covered.Entity.Type | State | Total_Cases | Perc_Contribution |
---|---|---|---|
Business Associate | VA | 5093217 | 22.7 |
Business Associate | TN | 4547412 | 20.3 |
Business Associate | NJ | 2964033 | 13.2 |
Business Associate | TX | 2582619 | 11.5 |
Business Associate | NY | 2033804 | 9.1 |
Business Associate | PR | 1069166 | 4.8 |
Here we can see that the records are only related to Business Associate since I took head of the resultant data frame. Lets now look at top 6 records within each group
Top 5 records from each COnvered Entity Type
Top5<-df.interim%>%
mutate(Index=1:n())%>%
filter(Index<6)%>%
select(-Index)
knitr::kable(Top5)
Covered.Entity.Type | State | Total_Cases | Perc_Contribution |
---|---|---|---|
Business Associate | VA | 5093217 | 22.7 |
Business Associate | TN | 4547412 | 20.3 |
Business Associate | NJ | 2964033 | 13.2 |
Business Associate | TX | 2582619 | 11.5 |
Business Associate | NY | 2033804 | 9.1 |
Health Plan | FL | 1240326 | 23.0 |
Health Plan | TN | 1110698 | 20.6 |
Health Plan | MT | 1062509 | 19.7 |
Health Plan | TX | 568640 | 10.6 |
Health Plan | NY | 390830 | 7.3 |
Healthcare Clearing House | FL | 10000 | 56.3 |
Healthcare Clearing House | SC | 4204 | 23.7 |
Healthcare Clearing House | PA | 2300 | 13.0 |
Healthcare Clearing House | WA | 1250 | 7.0 |
Healthcare Provider | IL | 4335571 | 32.5 |
Healthcare Provider | CA | 2098208 | 15.7 |
Healthcare Provider | FL | 1583608 | 11.9 |
Healthcare Provider | AL | 977995 | 7.3 |
Healthcare Provider | GA | 491932 | 3.7 |
Lets now look at States contributing to 80% of the Grievances within each Entity Type
States Contributing to 80% of the Cases
I will break this into two steps.
- First we will calcualte the Cumulative Percentage
- Second we will filter out values less than 80%
Perc_80<-df.interim%>%
mutate(CumulativePerc_Contribution=cumsum(Perc_Contribution))%>% # First Step
filter(CumulativePerc_Contribution < 80) # Second Step
knitr::kable(Perc_80)
Covered.Entity.Type | State | Total_Cases | Perc_Contribution | CumulativePerc_Contribution |
---|---|---|---|---|
Business Associate | VA | 5093217 | 22.7 | 22.7 |
Business Associate | TN | 4547412 | 20.3 | 43.0 |
Business Associate | NJ | 2964033 | 13.2 | 56.2 |
Business Associate | TX | 2582619 | 11.5 | 67.7 |
Business Associate | NY | 2033804 | 9.1 | 76.8 |
Health Plan | FL | 1240326 | 23.0 | 23.0 |
Health Plan | TN | 1110698 | 20.6 | 43.6 |
Health Plan | MT | 1062509 | 19.7 | 63.3 |
Health Plan | TX | 568640 | 10.6 | 73.9 |
Healthcare Clearing House | FL | 10000 | 56.3 | 56.3 |
Healthcare Provider | IL | 4335571 | 32.5 | 32.5 |
Healthcare Provider | CA | 2098208 | 15.7 | 48.2 |
Healthcare Provider | FL | 1583608 | 11.9 | 60.1 |
Healthcare Provider | AL | 977995 | 7.3 | 67.4 |
Healthcare Provider | GA | 491932 | 3.7 | 71.1 |
Healthcare Provider | TN | 467261 | 3.5 | 74.6 |
Healthcare Provider | TX | 341041 | 2.6 | 77.2 |
Healthcare Provider | NY | 334068 | 2.5 | 79.7 |
Final Comments
We have seen how we can use group_by and cumsum function to get an idea of important things at an overall level.This helps us gain an understanding of States driving the numbers at Entity Level
Link to Previous R Blogs
List of Datasets for Practise
https://hofmann.public.iastate.edu/data_in_r_sortable.html
https://vincentarelbundock.github.io/Rdatasets/datasets.html
Web Scraping Tutorial 4- Getting the busy information data from Popular time page from Google
Popular Times Popular Times In this blog we will try to scrape the ...
-
Complete List of various topics in R Complete List of various topics in R Parag Verma Basics o...
-
Web Scraping using Rselenium Web Scraping using Rselenium Parag Verma...
-
Sensors are used in a lot of industrial applications to measure properties of a process. This can be temperature, pressure, humidity, den...