Different Data Structures in R
Parag Verma
4th Dec, 2019
Vectors:Initialization, length and Indexing
Vectors are the most basic entities in R.They are the building blocks of storing information. We can store various type of information/values in them. This includes numbers, text, logical values and so on. Lets look at how to create vectors and what are the different types.
# Let us define vectors p,q and r
p <- c(1,2,5.3,6,-2,4) # numeric vector
q <- c("one","two","three") # character vector
r <- c(TRUE,TRUE,TRUE,FALSE,TRUE,FALSE) #logical vector
# The 'c' before the bracket is used for combining the values mentioned in the braces
# Lets check what is there in vector 'p'
p
## [1] 1.0 2.0 5.3 6.0 -2.0 4.0
class(p) # class gives the nature of values stored in 'p' vector
## [1] "numeric"
# Lets check what is there in vector 'q'
q
## [1] "one" "two" "three"
class(q)
## [1] "character"
# Lets check what is there in vector 'r'
r
## [1] TRUE TRUE TRUE FALSE TRUE FALSE
class(r)
## [1] "logical"
# Refer to elements of a vector using subscripts.
# Accessing the Third element within p
p[3]
## [1] 5.3
# Accessing the Third and fourth element
p[c(3,4)]
## [1] 5.3 6.0
# Let us calculate the length of the vector p
length(p)
## [1] 6
# On accessing the 7th element within p we get an NA as there
# is not element at 7th position in p
p[7]
## [1] NA
Matrices:Initialization, Dimenion and Indexing
All data elements in a matix should be of the SAME TYPE The columns should have the same length
Syntax for creating a matrix
mymatrix <- matrix(vector, nrow=r, ncol=c, byrow=FALSE, dimnames=list(char_vector_rownames, char_vector_colnames))
byrow=TRUE indicates that the matrix should be filled by rows.
byrow=FALSE indicates that the matrix should be filled by columns (the default)
dimnames provides optional labels for the columns and rows.
# generates 5 x 4 numeric matrix
1:20 # Creates a vector sequence from 1 through 20 with a step of 1
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
y<-matrix(1:20, nrow=5,ncol=4)
y
## [,1] [,2] [,3] [,4]
## [1,] 1 6 11 16
## [2,] 2 7 12 17
## [3,] 3 8 13 18
## [4,] 4 9 14 19
## [5,] 5 10 15 20
# Accessing element in a matrix
y[1,2]
## [1] 6
y[,] # Outputs the entire matrix
## [,1] [,2] [,3] [,4]
## [1,] 1 6 11 16
## [2,] 2 7 12 17
## [3,] 3 8 13 18
## [4,] 4 9 14 19
## [5,] 5 10 15 20
# another example
cells <- c(1,26,24,68)
rnames <- c("R1", "R2")
cnames <- c("C1", "C2")
mymatrix <- matrix(cells, nrow=2, ncol=2, byrow=TRUE,
dimnames=list(rnames, cnames))
mymatrix
## C1 C2
## R1 1 26
## R2 24 68
# mymatrix[2,3]
# It will give you a SUBSCRIPT OUT OF BOUND ERROR
# No element exist for this combination
# Identify rows, columns or elements using subscripts.
y[,4] # 4th column of matrix
## [1] 16 17 18 19 20
y[3,] # 3rd row of matrix
## [1] 3 8 13 18
y[2:4,1:3] # rows 2,3,4 of columns 1,2,3
## [,1] [,2] [,3]
## [1,] 2 7 12
## [2,] 3 8 13
## [3,] 4 9 14
class(y[3,]) # It results in a vector
## [1] "integer"
Lists:Initialization, length and Indexing
Lists are the R objects which contain elements of different data types like numbers, strings, vectors and another list inside it. A list can also contain a matrix or a function as its elements. It is very important to note that all major DATA HEAVY steps involve usageof lists.List is created using list() function.
# Create a list containing a vector, a matrix and a list.
list_data <- list(c("Jan","Feb","Mar"), matrix(c(3,9,5,1,-2,8), nrow = 2),
list("green",12.3))
list_data
## [[1]]
## [1] "Jan" "Feb" "Mar"
##
## [[2]]
## [,1] [,2] [,3]
## [1,] 3 5 -2
## [2,] 9 1 8
##
## [[3]]
## [[3]][[1]]
## [1] "green"
##
## [[3]][[2]]
## [1] 12.3
# Give names to the elements in the list.
names(list_data) <- c("1st Quarter", "A_Matrix", "A Inner list")
names(list_data)
## [1] "1st Quarter" "A_Matrix" "A Inner list"
# Accessing the element in the List
# Elements of a List can be accessed by the index of the element in the List
list_data[1] # Accessing the first element in the list
## $`1st Quarter`
## [1] "Jan" "Feb" "Mar"
list_data[[1]][1]# Accessing the first element within the first element
## [1] "Jan"
list_data[[1]][2]
## [1] "Feb"
list_data[[1]][3]
## [1] "Mar"
# Difference between [] and [[]] referencing
list_data[2] # Gives a list
## $A_Matrix
## [,1] [,2] [,3]
## [1,] 3 5 -2
## [2,] 9 1 8
list_data[[2]] # Gives a matrix
## [,1] [,2] [,3]
## [1,] 3 5 -2
## [2,] 9 1 8
list_data[3] # Gives a list
## $`A Inner list`
## $`A Inner list`[[1]]
## [1] "green"
##
## $`A Inner list`[[2]]
## [1] 12.3
list_data[[3]][1] # Gives the inner list
## [[1]]
## [1] "green"
list_data[[3]][[1]] # Gives a vector
## [1] "green"
list_data[[3]][[2]] # Gives a vector
## [1] 12.3
# Manipulating the Elements in a List
# We can add, delete and update list elements as shown below.
# We can add and delete elements only at the end of a list.
# But we can update any element.
# Add element at the end of the list.
list_data[4] <- "New element"
print(list_data[4])
## [[1]]
## [1] "New element"
# Remove the last element.
list_data[4] <- NULL
# Print the 4th Element.
print(list_data[4])
## $<NA>
## NULL
# Update the 3rd Element.
list_data[3] <- "updated element"
print(list_data[3])
## $`A Inner list`
## [1] "updated element"
# Merging of the List
# Create two lists.
list1 <- list(1,2,3)
list2 <- list("Sun","Mon","Tue")
# Merge the two lists.
merged.list <- c(list1,list2)
merged.list <- list(list1,list2)
# Print the merged list.
print(merged.list)
## [[1]]
## [[1]][[1]]
## [1] 1
##
## [[1]][[2]]
## [1] 2
##
## [[1]][[3]]
## [1] 3
##
##
## [[2]]
## [[2]][[1]]
## [1] "Sun"
##
## [[2]][[2]]
## [1] "Mon"
##
## [[2]][[3]]
## [1] "Tue"
# Converting a List into a vector using unlist function
# Create lists.
list1 <- list(1:5)
print(list1) # Prints the lits
## [[1]]
## [1] 1 2 3 4 5
list1[[1]]# Outputs the vector
## [1] 1 2 3 4 5
# Convert the lists to vectors.
v1 <- unlist(list1)
print(v1)
## [1] 1 2 3 4 5
Data Frame:Initialization, dimesions and Indexing
A data frame is similar to a table with rows and columns.Rows contain one set of values from each column.
The Key Features of a data frame are:
1. The column names should be non-empty.
2. The row names should be unique.
3. The data stored in a data frame can be of numeric, factor or character type.
4. Each column should contain same number of data items.
# Creating a Data Frame
emp.data <- data.frame(
emp_id = c (1:5),
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),
start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
"2015-03-27")),
stringsAsFactors = FALSE
)
# Print the data frame.
print(emp.data)
## emp_id emp_name salary start_date
## 1 1 Rick 623.30 2012-01-01
## 2 2 Dan 515.20 2013-09-23
## 3 3 Michelle 611.00 2014-11-15
## 4 4 Ryan 729.00 2014-05-11
## 5 5 Gary 843.25 2015-03-27
class(emp.data)
## [1] "data.frame"
# Getting the number of rows and columns:Dimensions of a Data Frame
dim(emp.data)
## [1] 5 4
# Getting the names of the columns
colnames(emp.data)
## [1] "emp_id" "emp_name" "salary" "start_date"
# Getting the first few records
head(emp.data)
## emp_id emp_name salary start_date
## 1 1 Rick 623.30 2012-01-01
## 2 2 Dan 515.20 2013-09-23
## 3 3 Michelle 611.00 2014-11-15
## 4 4 Ryan 729.00 2014-05-11
## 5 5 Gary 843.25 2015-03-27
head(emp.data,3)
## emp_id emp_name salary start_date
## 1 1 Rick 623.3 2012-01-01
## 2 2 Dan 515.2 2013-09-23
## 3 3 Michelle 611.0 2014-11-15
# Getting the last few records
tail(emp.data)
## emp_id emp_name salary start_date
## 1 1 Rick 623.30 2012-01-01
## 2 2 Dan 515.20 2013-09-23
## 3 3 Michelle 611.00 2014-11-15
## 4 4 Ryan 729.00 2014-05-11
## 5 5 Gary 843.25 2015-03-27
tail(emp.data,3)
## emp_id emp_name salary start_date
## 3 3 Michelle 611.00 2014-11-15
## 4 4 Ryan 729.00 2014-05-11
## 5 5 Gary 843.25 2015-03-27
# Get the Structure of the Data frame
# str is used to get the data types and first few values of the columns used
str(emp.data)
## 'data.frame': 5 obs. of 4 variables:
## $ emp_id : int 1 2 3 4 5
## $ emp_name : chr "Rick" "Dan" "Michelle" "Ryan" ...
## $ salary : num 623 515 611 729 843
## $ start_date: Date, format: "2012-01-01" "2013-09-23" ...
class(dim(emp.data))
## [1] "integer"
k<-dim(emp.data)
k[1] # Get the number of rows
## [1] 5
k[2] # Get the number of columns
## [1] 4
# Statistical Summary can be obtained using summary function
summary(emp.data)
## emp_id emp_name salary start_date
## Min. :1 Length:5 Min. :515.2 Min. :2012-01-01
## 1st Qu.:2 Class :character 1st Qu.:611.0 1st Qu.:2013-09-23
## Median :3 Mode :character Median :623.3 Median :2014-05-11
## Mean :3 Mean :664.4 Mean :2014-01-14
## 3rd Qu.:4 3rd Qu.:729.0 3rd Qu.:2014-11-15
## Max. :5 Max. :843.2 Max. :2015-03-27
# Extract specific column from the data frame
# Extracting the First column
result<-emp.data[,1]
result # data type is vector
## [1] 1 2 3 4 5
# Extracting the First two columns
result<-emp.data[,c(1,2)]
result
## emp_id emp_name
## 1 1 Rick
## 2 2 Dan
## 3 3 Michelle
## 4 4 Ryan
## 5 5 Gary
result <- emp.data[,c("emp_id","emp_name")]
result
## emp_id emp_name
## 1 1 Rick
## 2 2 Dan
## 3 3 Michelle
## 4 4 Ryan
## 5 5 Gary
# Getting the first row data
result<-emp.data[1,]
result
## emp_id emp_name salary start_date
## 1 1 Rick 623.3 2012-01-01
class(result) # Results in a vector
## [1] "data.frame"
# Getting the first two rows data
result<-emp.data[c(1,2),]
result
## emp_id emp_name salary start_date
## 1 1 Rick 623.3 2012-01-01
## 2 2 Dan 515.2 2013-09-23
# Getting the first row data for column 1
result<-emp.data[1,1]
result
## [1] 1
# Getting the first row data for column 1 and column2
result<-emp.data[1,c(1,2)]
result
## emp_id emp_name
## 1 1 Rick
# Getting the first and second row data for column 1 and column2
result<-emp.data[c(1,2),c(1,2)]
result
## emp_id emp_name
## 1 1 Rick
## 2 2 Dan
# Expanding the Data Frame by adding the columns
emp.data$dept<-c("IT","Operations","IT","HR","Finance")
colnames(emp.data)
## [1] "emp_id" "emp_name" "salary" "start_date" "dept"
head(emp.data)
## emp_id emp_name salary start_date dept
## 1 1 Rick 623.30 2012-01-01 IT
## 2 2 Dan 515.20 2013-09-23 Operations
## 3 3 Michelle 611.00 2014-11-15 IT
## 4 4 Ryan 729.00 2014-05-11 HR
## 5 5 Gary 843.25 2015-03-27 Finance
# Adding a row using rbind function
# Create the second data frame
emp.newdata <- data.frame(
emp_id = c (6:8),
emp_name = c("Rasmi","Pranab","Tusar"),
salary = c(578.0,722.5,632.8),
start_date = as.Date(c("2013-05-21","2013-07-30","2014-06-17")),
dept = c("IT","Operations","Fianance"),
stringsAsFactors = FALSE
)
emp.newdata
## emp_id emp_name salary start_date dept
## 1 6 Rasmi 578.0 2013-05-21 IT
## 2 7 Pranab 722.5 2013-07-30 Operations
## 3 8 Tusar 632.8 2014-06-17 Fianance
dim(emp.newdata)
## [1] 3 5
colnames(emp.newdata)
## [1] "emp_id" "emp_name" "salary" "start_date" "dept"
# Bind the two data frames.
emp.finaldata <- rbind.data.frame(emp.data,emp.newdata)
print(emp.finaldata)
## emp_id emp_name salary start_date dept
## 1 1 Rick 623.30 2012-01-01 IT
## 2 2 Dan 515.20 2013-09-23 Operations
## 3 3 Michelle 611.00 2014-11-15 IT
## 4 4 Ryan 729.00 2014-05-11 HR
## 5 5 Gary 843.25 2015-03-27 Finance
## 6 6 Rasmi 578.00 2013-05-21 IT
## 7 7 Pranab 722.50 2013-07-30 Operations
## 8 8 Tusar 632.80 2014-06-17 Fianance
Python and R are widely used languages in data science. It is important to learn Data structure and algorithm to build solid programming skills. Thank you for this amazing tutorial.
ReplyDeleteThanks for this valuable blog. It was very informative and interesting. Keep sharing this kind of stuff.
ReplyDeleteMatlab Training in Chennai
HTML5 Training in Chennai
Matlab Course in Chennai
HTML5 Courses in Chennai