Wednesday, December 4, 2019

Blog 1 Basic of R: Vectors, Matrics, Lists and Data Frame

Different Data Structures in R



Vectors:Initialization, length and Indexing

Vectors are the most basic entities in R.They are the building blocks of storing information. We can store various type of information/values in them. This includes numbers, text, logical values and so on. Lets look at how to create vectors and what are the different types.

# Let us define vectors p,q and r
p <- c(1,2,5.3,6,-2,4) # numeric vector
q <- c("one","two","three") # character vector
r <- c(TRUE,TRUE,TRUE,FALSE,TRUE,FALSE) #logical vector
# The 'c' before the bracket is used for combining the values mentioned in the braces

# Lets check what is there in vector 'p'
p
## [1]  1.0  2.0  5.3  6.0 -2.0  4.0
class(p) # class gives the nature of values stored in 'p' vector
## [1] "numeric"
# Lets check what is there in vector 'q'
q
## [1] "one"   "two"   "three"
class(q)
## [1] "character"
# Lets check what is there in vector 'r'
r
## [1]  TRUE  TRUE  TRUE FALSE  TRUE FALSE
class(r)
## [1] "logical"
# Refer to elements of a vector using subscripts. 
# Accessing the Third element within p
p[3]
## [1] 5.3
# Accessing the Third and fourth element
p[c(3,4)]
## [1] 5.3 6.0
# Let us calculate the length of the vector p
length(p)
## [1] 6
# On accessing the 7th element within p we get an NA as there
# is not element at 7th position in p
p[7]
## [1] NA



Matrices:Initialization, Dimenion and Indexing

All data elements in a matix should be of the SAME TYPE The columns should have the same length

Syntax for creating a matrix
mymatrix <- matrix(vector, nrow=r, ncol=c, byrow=FALSE, dimnames=list(char_vector_rownames, char_vector_colnames))
byrow=TRUE indicates that the matrix should be filled by rows.
byrow=FALSE indicates that the matrix should be filled by columns (the default)
dimnames provides optional labels for the columns and rows.

# generates 5 x 4 numeric matrix 
1:20 # Creates a vector sequence from 1 through 20 with a step of 1
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
y<-matrix(1:20, nrow=5,ncol=4)
y
##      [,1] [,2] [,3] [,4]
## [1,]    1    6   11   16
## [2,]    2    7   12   17
## [3,]    3    8   13   18
## [4,]    4    9   14   19
## [5,]    5   10   15   20
# Accessing element in a matrix
y[1,2]
## [1] 6
y[,] # Outputs the entire matrix
##      [,1] [,2] [,3] [,4]
## [1,]    1    6   11   16
## [2,]    2    7   12   17
## [3,]    3    8   13   18
## [4,]    4    9   14   19
## [5,]    5   10   15   20
# another example
cells <- c(1,26,24,68)
rnames <- c("R1", "R2")
cnames <- c("C1", "C2") 
mymatrix <- matrix(cells, nrow=2, ncol=2, byrow=TRUE,
                   dimnames=list(rnames, cnames)) 
mymatrix
##    C1 C2
## R1  1 26
## R2 24 68
# mymatrix[2,3]
# It will give you a SUBSCRIPT OUT OF BOUND ERROR
# No element exist for this combination

# Identify rows, columns or elements using subscripts. 
y[,4] # 4th column of matrix
## [1] 16 17 18 19 20
y[3,] # 3rd row of matrix 
## [1]  3  8 13 18
y[2:4,1:3] # rows 2,3,4 of columns 1,2,3 
##      [,1] [,2] [,3]
## [1,]    2    7   12
## [2,]    3    8   13
## [3,]    4    9   14
class(y[3,]) # It results in a vector
## [1] "integer"



Lists:Initialization, length and Indexing

Lists are the R objects which contain elements of different data types like numbers, strings, vectors and another list inside it. A list can also contain a matrix or a function as its elements. It is very important to note that all major DATA HEAVY steps involve usageof lists.List is created using list() function.

# Create a list containing a vector, a matrix and a list.
list_data <- list(c("Jan","Feb","Mar"), matrix(c(3,9,5,1,-2,8), nrow = 2),
                  list("green",12.3))

list_data
## [[1]]
## [1] "Jan" "Feb" "Mar"
## 
## [[2]]
##      [,1] [,2] [,3]
## [1,]    3    5   -2
## [2,]    9    1    8
## 
## [[3]]
## [[3]][[1]]
## [1] "green"
## 
## [[3]][[2]]
## [1] 12.3
# Give names to the elements in the list.
names(list_data) <- c("1st Quarter", "A_Matrix", "A Inner list")
names(list_data)
## [1] "1st Quarter"  "A_Matrix"     "A Inner list"
# Accessing the element in the List
# Elements of a List can be accessed by the index of the element in the List

list_data[1] # Accessing the first element in the list
## $`1st Quarter`
## [1] "Jan" "Feb" "Mar"
list_data[[1]][1]# Accessing the first element within the first element
## [1] "Jan"
list_data[[1]][2]
## [1] "Feb"
list_data[[1]][3]
## [1] "Mar"
# Difference between [] and [[]] referencing
list_data[2] # Gives a list
## $A_Matrix
##      [,1] [,2] [,3]
## [1,]    3    5   -2
## [2,]    9    1    8
list_data[[2]] # Gives a matrix
##      [,1] [,2] [,3]
## [1,]    3    5   -2
## [2,]    9    1    8
list_data[3] # Gives a list
## $`A Inner list`
## $`A Inner list`[[1]]
## [1] "green"
## 
## $`A Inner list`[[2]]
## [1] 12.3
list_data[[3]][1] # Gives the inner list
## [[1]]
## [1] "green"
list_data[[3]][[1]] # Gives a vector
## [1] "green"
list_data[[3]][[2]] # Gives a vector
## [1] 12.3
# Manipulating the Elements in a List
# We can add, delete and update list elements as shown below.
# We can add and delete elements only at the end of a list. 
# But we can update any element.

# Add element at the end of the list.
list_data[4] <- "New element"
print(list_data[4])
## [[1]]
## [1] "New element"
# Remove the last element.
list_data[4] <- NULL

# Print the 4th Element.
print(list_data[4])
## $<NA>
## NULL
# Update the 3rd Element.
list_data[3] <- "updated element"
print(list_data[3])
## $`A Inner list`
## [1] "updated element"
# Merging of the List
# Create two lists.
list1 <- list(1,2,3)
list2 <- list("Sun","Mon","Tue")

# Merge the two lists.
merged.list <- c(list1,list2)
merged.list <- list(list1,list2)

# Print the merged list.
print(merged.list)
## [[1]]
## [[1]][[1]]
## [1] 1
## 
## [[1]][[2]]
## [1] 2
## 
## [[1]][[3]]
## [1] 3
## 
## 
## [[2]]
## [[2]][[1]]
## [1] "Sun"
## 
## [[2]][[2]]
## [1] "Mon"
## 
## [[2]][[3]]
## [1] "Tue"
# Converting a List into a vector using unlist function
# Create lists.
list1 <- list(1:5)
print(list1) # Prints the lits
## [[1]]
## [1] 1 2 3 4 5
list1[[1]]# Outputs the vector
## [1] 1 2 3 4 5
# Convert the lists to vectors.
v1 <- unlist(list1)
print(v1)
## [1] 1 2 3 4 5



Data Frame:Initialization, dimesions and Indexing

A data frame is similar to a table with rows and columns.Rows contain one set of values from each column.

The Key Features of a data frame are:
1. The column names should be non-empty.
2. The row names should be unique.
3. The data stored in a data frame can be of numeric, factor or character type.
4. Each column should contain same number of data items.

# Creating a Data Frame

emp.data <- data.frame(
  emp_id = c (1:5), 
  emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
  salary = c(623.3,515.2,611.0,729.0,843.25), 
  
  start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
                         "2015-03-27")),
  stringsAsFactors = FALSE
)

# Print the data frame.         
print(emp.data) 
##   emp_id emp_name salary start_date
## 1      1     Rick 623.30 2012-01-01
## 2      2      Dan 515.20 2013-09-23
## 3      3 Michelle 611.00 2014-11-15
## 4      4     Ryan 729.00 2014-05-11
## 5      5     Gary 843.25 2015-03-27
class(emp.data)
## [1] "data.frame"
# Getting the number of rows and columns:Dimensions of a Data Frame
dim(emp.data)
## [1] 5 4
# Getting the names of the columns
colnames(emp.data)
## [1] "emp_id"     "emp_name"   "salary"     "start_date"
# Getting the first few records
head(emp.data)
##   emp_id emp_name salary start_date
## 1      1     Rick 623.30 2012-01-01
## 2      2      Dan 515.20 2013-09-23
## 3      3 Michelle 611.00 2014-11-15
## 4      4     Ryan 729.00 2014-05-11
## 5      5     Gary 843.25 2015-03-27
head(emp.data,3)
##   emp_id emp_name salary start_date
## 1      1     Rick  623.3 2012-01-01
## 2      2      Dan  515.2 2013-09-23
## 3      3 Michelle  611.0 2014-11-15
# Getting the last few records
tail(emp.data)
##   emp_id emp_name salary start_date
## 1      1     Rick 623.30 2012-01-01
## 2      2      Dan 515.20 2013-09-23
## 3      3 Michelle 611.00 2014-11-15
## 4      4     Ryan 729.00 2014-05-11
## 5      5     Gary 843.25 2015-03-27
tail(emp.data,3)
##   emp_id emp_name salary start_date
## 3      3 Michelle 611.00 2014-11-15
## 4      4     Ryan 729.00 2014-05-11
## 5      5     Gary 843.25 2015-03-27
# Get the Structure of the Data frame
# str is used to get the data types and first few values of the columns used
str(emp.data)
## 'data.frame':    5 obs. of  4 variables:
##  $ emp_id    : int  1 2 3 4 5
##  $ emp_name  : chr  "Rick" "Dan" "Michelle" "Ryan" ...
##  $ salary    : num  623 515 611 729 843
##  $ start_date: Date, format: "2012-01-01" "2013-09-23" ...
class(dim(emp.data))
## [1] "integer"
k<-dim(emp.data)
k[1] # Get the number of rows
## [1] 5
k[2] # Get the number of columns
## [1] 4
# Statistical Summary can be obtained using summary function
summary(emp.data)
##      emp_id    emp_name             salary        start_date        
##  Min.   :1   Length:5           Min.   :515.2   Min.   :2012-01-01  
##  1st Qu.:2   Class :character   1st Qu.:611.0   1st Qu.:2013-09-23  
##  Median :3   Mode  :character   Median :623.3   Median :2014-05-11  
##  Mean   :3                      Mean   :664.4   Mean   :2014-01-14  
##  3rd Qu.:4                      3rd Qu.:729.0   3rd Qu.:2014-11-15  
##  Max.   :5                      Max.   :843.2   Max.   :2015-03-27
# Extract specific column from the data frame

# Extracting the First column
result<-emp.data[,1]
result # data type is vector
## [1] 1 2 3 4 5
# Extracting the First two columns
result<-emp.data[,c(1,2)]
result
##   emp_id emp_name
## 1      1     Rick
## 2      2      Dan
## 3      3 Michelle
## 4      4     Ryan
## 5      5     Gary
result <- emp.data[,c("emp_id","emp_name")]
result
##   emp_id emp_name
## 1      1     Rick
## 2      2      Dan
## 3      3 Michelle
## 4      4     Ryan
## 5      5     Gary
# Getting the first row data
result<-emp.data[1,]
result
##   emp_id emp_name salary start_date
## 1      1     Rick  623.3 2012-01-01
class(result) # Results in a vector
## [1] "data.frame"
# Getting the first two rows data
result<-emp.data[c(1,2),]
result
##   emp_id emp_name salary start_date
## 1      1     Rick  623.3 2012-01-01
## 2      2      Dan  515.2 2013-09-23
# Getting the first row data for column 1
result<-emp.data[1,1]
result
## [1] 1
# Getting the first row data for column 1 and column2
result<-emp.data[1,c(1,2)]
result
##   emp_id emp_name
## 1      1     Rick
# Getting the first and second row data for column 1 and column2
result<-emp.data[c(1,2),c(1,2)]
result
##   emp_id emp_name
## 1      1     Rick
## 2      2      Dan
# Expanding the Data Frame by adding the columns
emp.data$dept<-c("IT","Operations","IT","HR","Finance")
colnames(emp.data)
## [1] "emp_id"     "emp_name"   "salary"     "start_date" "dept"
head(emp.data)
##   emp_id emp_name salary start_date       dept
## 1      1     Rick 623.30 2012-01-01         IT
## 2      2      Dan 515.20 2013-09-23 Operations
## 3      3 Michelle 611.00 2014-11-15         IT
## 4      4     Ryan 729.00 2014-05-11         HR
## 5      5     Gary 843.25 2015-03-27    Finance
# Adding a row using rbind function
# Create the second data frame
emp.newdata <-  data.frame(
  emp_id = c (6:8), 
  emp_name = c("Rasmi","Pranab","Tusar"),
  salary = c(578.0,722.5,632.8), 
  start_date = as.Date(c("2013-05-21","2013-07-30","2014-06-17")),
  dept = c("IT","Operations","Fianance"),
  stringsAsFactors = FALSE
)

emp.newdata
##   emp_id emp_name salary start_date       dept
## 1      6    Rasmi  578.0 2013-05-21         IT
## 2      7   Pranab  722.5 2013-07-30 Operations
## 3      8    Tusar  632.8 2014-06-17   Fianance
dim(emp.newdata) 
## [1] 3 5
colnames(emp.newdata) 
## [1] "emp_id"     "emp_name"   "salary"     "start_date" "dept"
# Bind the two data frames.
emp.finaldata <- rbind.data.frame(emp.data,emp.newdata)
print(emp.finaldata)
##   emp_id emp_name salary start_date       dept
## 1      1     Rick 623.30 2012-01-01         IT
## 2      2      Dan 515.20 2013-09-23 Operations
## 3      3 Michelle 611.00 2014-11-15         IT
## 4      4     Ryan 729.00 2014-05-11         HR
## 5      5     Gary 843.25 2015-03-27    Finance
## 6      6    Rasmi 578.00 2013-05-21         IT
## 7      7   Pranab 722.50 2013-07-30 Operations
## 8      8    Tusar 632.80 2014-06-17   Fianance



Saturday, November 30, 2019

Book Review: Direct from Dell by Michael Dell


Direct from Dell: Strategies that Revolutionized an IndustryDirect from Dell: Strategies that Revolutionized an Industry by Michael Dell


The books highlights the journey of Dell from the founder's perspective. It captures different business challenges faced by the company right from it's inception to it's heydays and so on. The writer stresses on various business paradigm that the company was able to break in its journey to become the most sought after PC brand in the world. In crux, the book covers the following:

1. Dell's Direct Model: In an age where most of its competitors were using Indirect Channel for market penetration, it used Direct Route to Market without partnering with any reseller/retailer/distributor

2. Customer Centric Approach: Dell has always been very proactive in terms of sensing the pulse of it's customers. Getting regular feedbacks and door to door service have been key to understanding the Demand proposition. It has tried to create products that have gained mass acceptance.There is an instance about the highly ambitious 'Olymic' project that it had to shelve just because there was no Demand for it

3. Inventory: Right from the start, Dell has focussed on reducing the inventory to the extent of having only 5-6 days on inventory on hand.This has reduced the cost and freed up cash for expansion activities. Since the business model is Direct, they dont have any inventory stocked up in channel and hence they can better price their products and pass on some benefit to the customers.In the face of a technology change, they are more prepared and can go to market faster

4. Age of Internet: When the internet was launched way back in the 80's, there were very few people to understand the leverage that it could provide. Dell not only lapped up the opportunity to use internet as a value add to it direct Sales Model, but also used it to improve relationships with Suppliers and Customers.Thus the value chain right from Supplier to Dell to End Customers became highly integrated

The above have been covered in great length within the book.Having worked in a PC industry myself, I can relate to the things espoused by Dell. Overall a very good read for people trying to gain some understanding of Sales Model, Supplier Relationship, Inventory Basics and above all how to go about engaging with Customers in a fruitful manner

View all my reviews

Web Scraping Tutorial 4- Getting the busy information data from Popular time page from Google

Popular Times Popular Times In this blog we will try to scrape the ...