Indexing in R
Parag Verma
27th Dec, 2019
Indexing in R
Once we store data in an element such as vectors, lists or a data frame, it becomes very important to understand how we can traverse through the element and extract values. Most of the time the extraction is based on whether a particular consition is true or false. In this respect we will be focussing on a function called ‘which’. It is widely used to retrieve values from an element based on a given condition. It is very important to note that INDEXING in R STARTS FROM 1
Accessing values within a vector
Lets create a vector and try accessing its values.We can retrieve values in a vector by declaring an index inside a single square bracket “[]” operator
s = c("aa", "bb", "cc", "dd", "ee")
s[3]
[1] "cc"
The square bracket operation results in a vector.This can be confirmed by using class(s[3]) operation
class(s[3])
[1] "character"
Extracting more than one value from a vector
s[c(1,3)]
[1] "aa" "cc"
We can see that values at index position 1 and 3 have been extracted
Using negative indices within square brackets
People who have worked in python knows that negative indexing is used as much as positive indexing.In R, it has a different meaning.Negative indexing essentially removes the element from that position.
s[-3]
[1] "aa" "bb" "dd" "ee"
We can see that ‘cc’ was present in the thrid position. After executing the statement s[-3], ’cc’ has been removed
Out of Range Index
What would happen if we try to access element at an index that is not present.Lets try and find that out
s[10]
[1] NA
We would get an NA
Lets remove the first 3 elements from s
s[-1:-3]
[1] "dd" "ee"
Removing the last element from s
s[-length(s)]
[1] "aa" "bb" "cc" "dd"
Indexing with a data frame
In a data frame, we can select rows or columns or both.So essentially we will be looking at ways to extract set of rows and/or subset of columns.Lets declare a data frame
dep.data <- data.frame(
X.Dept_name = c("Production","Finance","HR","Quality Control","Marketting","Sales"),
X.Head_count=c(100,20,5,10,40,70),
X.Avg_salary = c(623.3,515.2,611.0,729.0,843.25,790.50) ,
X.Incentive_given=c("Yes","No","No","Yes","Yes","Yes"),
stringsAsFactors = F
)
dep.data
X.Dept_name X.Head_count X.Avg_salary X.Incentive_given
1 Production 100 623.30 Yes
2 Finance 20 515.20 No
3 HR 5 611.00 No
4 Quality Control 10 729.00 Yes
5 Marketting 40 843.25 Yes
6 Sales 70 790.50 Yes
Get the element at row 1, column 3
Here we will supply row number of row indices and column number for column indices
dep.data[1,3]
[1] 623.3
This can also be done by supply row number of row indices and column NAME for column indices
dep.data[1,"X.Avg_salary"]
[1] 623.3
Get rows 1 and 2, and only column 2
Here we will supply row number of row indices and column number for column indices
dep.data[1:2,2]
[1] 100 20
dep.data[c(1:2),2]
[1] 100 20
This can also be done by supply row number of row indices and column NAME for column indices
dep.data[1:2,"X.Head_count"]
[1] 100 20
Get rows 1 and 2, and column 2 and 3
Here we will supply integer vector to row indices and character vector containing column names to column indices
dep.data[1:2,c("X.Dept_name","X.Head_count")]
X.Dept_name X.Head_count
1 Production 100
2 Finance 20
Indexing with Boolean Vector
Boolean vectors are also widely used to extract values from an element in R
v <- c(1,4,4,3,2,2,3)
v > 2
[1] FALSE TRUE TRUE TRUE FALSE FALSE TRUE
This will give a logical vector where we will have TRUE when v > 2 and FALSE otherwise
Supplying Boolean Vector at index position
Apart from integer values, we can also supply boolean values for extracting values from an element
v [c(T,T,F,F,F,F,F)]
[1] 1 4
Using ‘$’ sign in a data frame for extracting single column
dep.data$X.Dept_name
[1] "Production" "Finance" "HR" "Quality Control"
[5] "Marketting" "Sales"
Using ‘[[]]’ sign in a data frame for extracting single column
This is a technique that I prefer while doing data manipulation. Both ‘$’ and ‘[[]]’ yields a vector but it is very convenient to use while working with dplyr function
dep.data[["X.Dept_name"]]
[1] "Production" "Finance" "HR" "Quality Control"
[5] "Marketting" "Sales"
‘which’ function in R
which function returns the position of the elements in a vector which fulfil a particular condition. It can be simply read as ‘give me the index position of elements which fulfil a certain condiiton’
x <- c(1,5,8,4,6)
# Position of elements having value greater then 3
which(x>3)
[1] 2 3 4 5
# Value of the elements where this consition is true
x[which(x>3)]
[1] 5 8 4 6
Practical Use Case
Lets say there is a data frame which has more than 13 columns. We want to create another data frame that has all the columns from the original data frame except two columns.In such as case, it would be very laborious to write the names of all the 11 columns. In this case we can use the ‘which’ function.Lets us look at all the inbuilt datasets in R under the dplyr package
if(!require("dplyr")){
install.packages("dplyr")
}else{
library(datasets)
}
data(package = "dplyr")
df<-starwars
colnames(df)
[1] "name" "height" "mass" "hair_color" "skin_color"
[6] "eye_color" "birth_year" "gender" "homeworld" "species"
[11] "films" "vehicles" "starships"
Lets say we want to remove species and homeworld column and store the rest of the data into another data frame df.interim
pos<-which(!colnames(df) %in% c("species","homeworld"))
colnames(df)[pos]
[1] "name" "height" "mass" "hair_color" "skin_color"
[6] "eye_color" "birth_year" "gender" "films" "vehicles"
[11] "starships"
We will now create a data frame using ‘pos’
df.interim<-df[,colnames(df)[pos]]
colnames(df.interim)
[1] "name" "height" "mass" "hair_color" "skin_color"
[6] "eye_color" "birth_year" "gender" "films" "vehicles"
[11] "starships"
Final Comments
In this blog we have seen how we can use indexing for various elements in R.Mostly which function will be used for data frames where large number of intermediate data frames are created to calculate the final results
Link to Previous R Blogs
Blog 1-Vectors,Matrics, Lists and Data Frame in R https://mlmadeeasy.blogspot.com/2019/12/2datatypesr.html
Blog 2 - Operators in R https://mlmadeeasy.blogspot.com/2019/12/blog-2-operators-in-r.html
Blog 3 - Loops in R https://mlmadeeasy.blogspot.com/2019/12/blog-3-loops-in-r.html
List of Datasets for Practise https://hofmann.public.iastate.edu/data_in_r_sortable.html
No comments:
Post a Comment