Machine Learning Made Easy

Wednesday, December 4, 2019

Blog 1 Basic of R: Vectors, Matrics, Lists and Data Frame

Different Data Structures in R

Vectors:Initialization, length and Indexing

Vectors are the most basic entities in R.They are the building blocks of storing information. We can store various type of information/values in them. This includes numbers, text, logical values and so on. Lets look at how to create vectors and what are the different types.

# Let us define vectors p,q and r
p <- c(1,2,5.3,6,-2,4) # numeric vector
q <- c("one","two","three") # character vector
r <- c(TRUE,TRUE,TRUE,FALSE,TRUE,FALSE) #logical vector
# The 'c' before the bracket is used for combining the values mentioned in the braces

# Lets check what is there in vector 'p'
p

## [1]  1.0  2.0  5.3  6.0 -2.0  4.0

class(p) # class gives the nature of values stored in 'p' vector

## [1] "numeric"

# Lets check what is there in vector 'q'
q

## [1] "one"   "two"   "three"

class(q)

## [1] "character"

# Lets check what is there in vector 'r'
r

## [1]  TRUE  TRUE  TRUE FALSE  TRUE FALSE

class(r)

## [1] "logical"

# Refer to elements of a vector using subscripts. 
# Accessing the Third element within p
p[3]

## [1] 5.3

# Accessing the Third and fourth element
p[c(3,4)]

## [1] 5.3 6.0

# Let us calculate the length of the vector p
length(p)

## [1] 6

# On accessing the 7th element within p we get an NA as there
# is not element at 7th position in p
p[7]

## [1] NA

Matrices:Initialization, Dimenion and Indexing

All data elements in a matix should be of the SAME TYPE The columns should have the same length

Syntax for creating a matrix
mymatrix <- matrix(vector, nrow=r, ncol=c, byrow=FALSE, dimnames=list(char_vector_rownames, char_vector_colnames))
byrow=TRUE indicates that the matrix should be filled by rows.
byrow=FALSE indicates that the matrix should be filled by columns (the default)
dimnames provides optional labels for the columns and rows.

# generates 5 x 4 numeric matrix 
1:20 # Creates a vector sequence from 1 through 20 with a step of 1

##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20

y<-matrix(1:20, nrow=5,ncol=4)
y

##      [,1] [,2] [,3] [,4]
## [1,]    1    6   11   16
## [2,]    2    7   12   17
## [3,]    3    8   13   18
## [4,]    4    9   14   19
## [5,]    5   10   15   20

# Accessing element in a matrix
y[1,2]

## [1] 6

y[,] # Outputs the entire matrix

##      [,1] [,2] [,3] [,4]
## [1,]    1    6   11   16
## [2,]    2    7   12   17
## [3,]    3    8   13   18
## [4,]    4    9   14   19
## [5,]    5   10   15   20

# another example
cells <- c(1,26,24,68)
rnames <- c("R1", "R2")
cnames <- c("C1", "C2") 
mymatrix <- matrix(cells, nrow=2, ncol=2, byrow=TRUE,
                   dimnames=list(rnames, cnames)) 
mymatrix

##    C1 C2
## R1  1 26
## R2 24 68

# mymatrix[2,3]
# It will give you a SUBSCRIPT OUT OF BOUND ERROR
# No element exist for this combination

# Identify rows, columns or elements using subscripts. 
y[,4] # 4th column of matrix

## [1] 16 17 18 19 20

y[3,] # 3rd row of matrix

## [1]  3  8 13 18

y[2:4,1:3] # rows 2,3,4 of columns 1,2,3

##      [,1] [,2] [,3]
## [1,]    2    7   12
## [2,]    3    8   13
## [3,]    4    9   14

class(y[3,]) # It results in a vector

## [1] "integer"

Lists:Initialization, length and Indexing

Lists are the R objects which contain elements of different data types like numbers, strings, vectors and another list inside it. A list can also contain a matrix or a function as its elements. It is very important to note that all major DATA HEAVY steps involve usageof lists.List is created using list() function.

# Create a list containing a vector, a matrix and a list.
list_data <- list(c("Jan","Feb","Mar"), matrix(c(3,9,5,1,-2,8), nrow = 2),
                  list("green",12.3))

list_data

## [[1]]
## [1] "Jan" "Feb" "Mar"
## 
## [[2]]
##      [,1] [,2] [,3]
## [1,]    3    5   -2
## [2,]    9    1    8
## 
## [[3]]
## [[3]][[1]]
## [1] "green"
## 
## [[3]][[2]]
## [1] 12.3

# Give names to the elements in the list.
names(list_data) <- c("1st Quarter", "A_Matrix", "A Inner list")
names(list_data)

## [1] "1st Quarter"  "A_Matrix"     "A Inner list"

# Accessing the element in the List
# Elements of a List can be accessed by the index of the element in the List

list_data[1] # Accessing the first element in the list

## $`1st Quarter`
## [1] "Jan" "Feb" "Mar"

list_data[[1]][1]# Accessing the first element within the first element

## [1] "Jan"

list_data[[1]][2]

## [1] "Feb"

list_data[[1]][3]

## [1] "Mar"

# Difference between [] and [[]] referencing
list_data[2] # Gives a list

## $A_Matrix
##      [,1] [,2] [,3]
## [1,]    3    5   -2
## [2,]    9    1    8

list_data[[2]] # Gives a matrix

##      [,1] [,2] [,3]
## [1,]    3    5   -2
## [2,]    9    1    8

list_data[3] # Gives a list

## $`A Inner list`
## $`A Inner list`[[1]]
## [1] "green"
## 
## $`A Inner list`[[2]]
## [1] 12.3

list_data[[3]][1] # Gives the inner list

## [[1]]
## [1] "green"

list_data[[3]][[1]] # Gives a vector

## [1] "green"

list_data[[3]][[2]] # Gives a vector

## [1] 12.3

# Manipulating the Elements in a List
# We can add, delete and update list elements as shown below.
# We can add and delete elements only at the end of a list. 
# But we can update any element.

# Add element at the end of the list.
list_data[4] <- "New element"
print(list_data[4])

## [[1]]
## [1] "New element"

# Remove the last element.
list_data[4] <- NULL

# Print the 4th Element.
print(list_data[4])

## $<NA>
## NULL

# Update the 3rd Element.
list_data[3] <- "updated element"
print(list_data[3])

## $`A Inner list`
## [1] "updated element"

# Merging of the List
# Create two lists.
list1 <- list(1,2,3)
list2 <- list("Sun","Mon","Tue")

# Merge the two lists.
merged.list <- c(list1,list2)
merged.list <- list(list1,list2)

# Print the merged list.
print(merged.list)

## [[1]]
## [[1]][[1]]
## [1] 1
## 
## [[1]][[2]]
## [1] 2
## 
## [[1]][[3]]
## [1] 3
## 
## 
## [[2]]
## [[2]][[1]]
## [1] "Sun"
## 
## [[2]][[2]]
## [1] "Mon"
## 
## [[2]][[3]]
## [1] "Tue"

# Converting a List into a vector using unlist function
# Create lists.
list1 <- list(1:5)
print(list1) # Prints the lits

## [[1]]
## [1] 1 2 3 4 5

list1[[1]]# Outputs the vector

## [1] 1 2 3 4 5

# Convert the lists to vectors.
v1 <- unlist(list1)
print(v1)

## [1] 1 2 3 4 5

Data Frame:Initialization, dimesions and Indexing

A data frame is similar to a table with rows and columns.Rows contain one set of values from each column.

The Key Features of a data frame are:
1. The column names should be non-empty.
2. The row names should be unique.
3. The data stored in a data frame can be of numeric, factor or character type.
4. Each column should contain same number of data items.

# Creating a Data Frame

emp.data <- data.frame(
  emp_id = c (1:5), 
  emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
  salary = c(623.3,515.2,611.0,729.0,843.25), 
  
  start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
                         "2015-03-27")),
  stringsAsFactors = FALSE
)

# Print the data frame.         
print(emp.data)

##   emp_id emp_name salary start_date
## 1      1     Rick 623.30 2012-01-01
## 2      2      Dan 515.20 2013-09-23
## 3      3 Michelle 611.00 2014-11-15
## 4      4     Ryan 729.00 2014-05-11
## 5      5     Gary 843.25 2015-03-27

class(emp.data)

## [1] "data.frame"

# Getting the number of rows and columns:Dimensions of a Data Frame
dim(emp.data)

## [1] 5 4

# Getting the names of the columns
colnames(emp.data)

## [1] "emp_id"     "emp_name"   "salary"     "start_date"

# Getting the first few records
head(emp.data)

##   emp_id emp_name salary start_date
## 1      1     Rick 623.30 2012-01-01
## 2      2      Dan 515.20 2013-09-23
## 3      3 Michelle 611.00 2014-11-15
## 4      4     Ryan 729.00 2014-05-11
## 5      5     Gary 843.25 2015-03-27

head(emp.data,3)

##   emp_id emp_name salary start_date
## 1      1     Rick  623.3 2012-01-01
## 2      2      Dan  515.2 2013-09-23
## 3      3 Michelle  611.0 2014-11-15

# Getting the last few records
tail(emp.data)

##   emp_id emp_name salary start_date
## 1      1     Rick 623.30 2012-01-01
## 2      2      Dan 515.20 2013-09-23
## 3      3 Michelle 611.00 2014-11-15
## 4      4     Ryan 729.00 2014-05-11
## 5      5     Gary 843.25 2015-03-27

tail(emp.data,3)

##   emp_id emp_name salary start_date
## 3      3 Michelle 611.00 2014-11-15
## 4      4     Ryan 729.00 2014-05-11
## 5      5     Gary 843.25 2015-03-27

# Get the Structure of the Data frame
# str is used to get the data types and first few values of the columns used
str(emp.data)

## 'data.frame':    5 obs. of  4 variables:
##  $ emp_id    : int  1 2 3 4 5
##  $ emp_name  : chr  "Rick" "Dan" "Michelle" "Ryan" ...
##  $ salary    : num  623 515 611 729 843
##  $ start_date: Date, format: "2012-01-01" "2013-09-23" ...

class(dim(emp.data))

## [1] "integer"

k<-dim(emp.data)
k[1] # Get the number of rows

## [1] 5

k[2] # Get the number of columns

## [1] 4

# Statistical Summary can be obtained using summary function
summary(emp.data)

##      emp_id    emp_name             salary        start_date        
##  Min.   :1   Length:5           Min.   :515.2   Min.   :2012-01-01  
##  1st Qu.:2   Class :character   1st Qu.:611.0   1st Qu.:2013-09-23  
##  Median :3   Mode  :character   Median :623.3   Median :2014-05-11  
##  Mean   :3                      Mean   :664.4   Mean   :2014-01-14  
##  3rd Qu.:4                      3rd Qu.:729.0   3rd Qu.:2014-11-15  
##  Max.   :5                      Max.   :843.2   Max.   :2015-03-27

# Extract specific column from the data frame

# Extracting the First column
result<-emp.data[,1]
result # data type is vector

## [1] 1 2 3 4 5

# Extracting the First two columns
result<-emp.data[,c(1,2)]
result

##   emp_id emp_name
## 1      1     Rick
## 2      2      Dan
## 3      3 Michelle
## 4      4     Ryan
## 5      5     Gary

result <- emp.data[,c("emp_id","emp_name")]
result

##   emp_id emp_name
## 1      1     Rick
## 2      2      Dan
## 3      3 Michelle
## 4      4     Ryan
## 5      5     Gary

# Getting the first row data
result<-emp.data[1,]
result

##   emp_id emp_name salary start_date
## 1      1     Rick  623.3 2012-01-01

class(result) # Results in a vector

## [1] "data.frame"

# Getting the first two rows data
result<-emp.data[c(1,2),]
result

##   emp_id emp_name salary start_date
## 1      1     Rick  623.3 2012-01-01
## 2      2      Dan  515.2 2013-09-23

# Getting the first row data for column 1
result<-emp.data[1,1]
result

## [1] 1

# Getting the first row data for column 1 and column2
result<-emp.data[1,c(1,2)]
result

##   emp_id emp_name
## 1      1     Rick

# Getting the first and second row data for column 1 and column2
result<-emp.data[c(1,2),c(1,2)]
result

##   emp_id emp_name
## 1      1     Rick
## 2      2      Dan

# Expanding the Data Frame by adding the columns
emp.data$dept<-c("IT","Operations","IT","HR","Finance")
colnames(emp.data)

## [1] "emp_id"     "emp_name"   "salary"     "start_date" "dept"

head(emp.data)

##   emp_id emp_name salary start_date       dept
## 1      1     Rick 623.30 2012-01-01         IT
## 2      2      Dan 515.20 2013-09-23 Operations
## 3      3 Michelle 611.00 2014-11-15         IT
## 4      4     Ryan 729.00 2014-05-11         HR
## 5      5     Gary 843.25 2015-03-27    Finance

# Adding a row using rbind function
# Create the second data frame
emp.newdata <-  data.frame(
  emp_id = c (6:8), 
  emp_name = c("Rasmi","Pranab","Tusar"),
  salary = c(578.0,722.5,632.8), 
  start_date = as.Date(c("2013-05-21","2013-07-30","2014-06-17")),
  dept = c("IT","Operations","Fianance"),
  stringsAsFactors = FALSE
)

emp.newdata

##   emp_id emp_name salary start_date       dept
## 1      6    Rasmi  578.0 2013-05-21         IT
## 2      7   Pranab  722.5 2013-07-30 Operations
## 3      8    Tusar  632.8 2014-06-17   Fianance

dim(emp.newdata)

## [1] 3 5

colnames(emp.newdata)

## [1] "emp_id"     "emp_name"   "salary"     "start_date" "dept"

# Bind the two data frames.
emp.finaldata <- rbind.data.frame(emp.data,emp.newdata)
print(emp.finaldata)

##   emp_id emp_name salary start_date       dept
## 1      1     Rick 623.30 2012-01-01         IT
## 2      2      Dan 515.20 2013-09-23 Operations
## 3      3 Michelle 611.00 2014-11-15         IT
## 4      4     Ryan 729.00 2014-05-11         HR
## 5      5     Gary 843.25 2015-03-27    Finance
## 6      6    Rasmi 578.00 2013-05-21         IT
## 7      7   Pranab 722.50 2013-07-30 Operations
## 8      8    Tusar 632.80 2014-06-17   Fianance

Saturday, November 30, 2019

Book Review: Direct from Dell by Michael Dell

Direct from Dell: Strategies that Revolutionized an Industry by Michael Dell

The books highlights the journey of Dell from the founder's perspective. It captures different business challenges faced by the company right from it's inception to it's heydays and so on. The writer stresses on various business paradigm that the company was able to break in its journey to become the most sought after PC brand in the world. In crux, the book covers the following:

1. Dell's Direct Model: In an age where most of its competitors were using Indirect Channel for market penetration, it used Direct Route to Market without partnering with any reseller/retailer/distributor

2. Customer Centric Approach: Dell has always been very proactive in terms of sensing the pulse of it's customers. Getting regular feedbacks and door to door service have been key to understanding the Demand proposition. It has tried to create products that have gained mass acceptance.There is an instance about the highly ambitious 'Olymic' project that it had to shelve just because there was no Demand for it

3. Inventory: Right from the start, Dell has focussed on reducing the inventory to the extent of having only 5-6 days on inventory on hand.This has reduced the cost and freed up cash for expansion activities. Since the business model is Direct, they dont have any inventory stocked up in channel and hence they can better price their products and pass on some benefit to the customers.In the face of a technology change, they are more prepared and can go to market faster

4. Age of Internet: When the internet was launched way back in the 80's, there were very few people to understand the leverage that it could provide. Dell not only lapped up the opportunity to use internet as a value add to it direct Sales Model, but also used it to improve relationships with Suppliers and Customers.Thus the value chain right from Supplier to Dell to End Customers became highly integrated

The above have been covered in great length within the book.Having worked in a PC industry myself, I can relate to the things espoused by Dell. Overall a very good read for people trying to gain some understanding of Sales Model, Supplier Relationship, Inventory Basics and above all how to go about engaging with Customers in a fruitful manner

View all my reviews

Book Review: RESET by Subramanian Swamy

RESET: Regaining India’s Economic Legacy by Subramanian Swamy
My rating: 4 of 5 stars

Introduction: There are few books that attempt to talk about Indian Economics and establish its links with the decisions made during the pre and post-independence era. Subramanian Swamy doesn’t shy away from recounting numerous key incidents during the course of history that had a long lasting impact on the Indian psyche and economic decision making. The following, I thought, are some of the key highlights of the book

1. Comparison between pre and post-independence India with China in aspects such as Per capita Income, Acreage, cop yields, irrigation,literacy etc using data to highlights similarity and differences is remarkable.Few have ever tried to juxtapose the two Asian giants together in a manner done by Swamy

2. Dismantling the so called Nehruvian Economic Model (Inspired by the Soviet Model of Rapid Industrialization: Proposed by PC Mahalanobis and originally create by Feldman).Most of the problems India has today are a direct consequences of the 5 year Economic Planning called Command Economy. State beliefs about squeezing the Agriculture to produce capital goods spelled doom on the Economy.The only good that came out of the Planning was that we got surplus in Agriculture(The so called Green Revolution the plan for which was proposed by Late Prime Minister Lal Bahadur Shastri)

3. Considering 1980-1990 as the worst time for India as it produced a precarious situation for us. It led to balance of payment crisis, depleted most of the foreign reserves (only few weeks were left before day 0), charred image in International Market of an uncompetitive economy and so on. Provided both Indira and Rajiv got majority in Parliament, it speaks volume about how erroneous the Soviet Planning model was

4. Post 1991 period where Liberalisation freed up avenues of growth for India. Structural changes in the policies related to cancellation of License Raj, easing up of State Control, Public Private partnership to name of few. All this accelerated growth and put India back on track. The main aim was to strengthen agriculture (which absorbs most of the workforce) and remove poverty altogether.For this to happen, it was proposed that the Indian economy grow at around 10% for the decade (1990-2000)

5. The reforms, though revolutionary, had made the political parties wary of the fact that it was against the popular sentiments. From the end of 90's till about early 2000's each government from Rao's to Atalji made conscious efforts to dilute the reforms. All the momentum that was gathered during Rao's term eventually fizzled out

6. Since the independence, India has always relied on Agriculture to push growth and absorb its surplus workforce. On the contrary,with each successive Plan, resource allocation to agriculture was reduced despite it being the top contributor to GDP. The resources were diverted to the Public sector which generated very little profit. The only thing that kept the Indian economy on its feed during the 80's and 90's is the Service Sector. It was second to only agriculture in contribution to GDP

7. Modi Years: Lessons learnt and Future Vision:The current economic situation is challenging on two fronts: Weak Private consumption and hurtling investment rate.All this started in 2014 but deteriorated during the demonetization and GST phases. Most impacted were the MSME and agriculture sectors.Lack of economic know how in the cabinet ministers mean a lot of swadeshi brandishing without concrete actions. The goals and the means to accomplish the 5 trillion economy were conspicuous by their absence from the recent Budget speech made by the Finance minister. The turnaround can still be achieved if we give enough incentives to increase household savings (increasing the fixed deposit rates) and ease out liquidity to the MSME. Also it is very important to sustain a GDP growth rate of 10% for the next decade if India has to eliminate poverty and unemployment and become a developed country

View all my reviews