Split text using sapply
Parag Verma
02 April, 2022
Introduction
There are certain situations where we have to extract certain portions from text fields/columns. We can do this using a combination of dplyr,sapply and split functions.For our blog, we will use a LOT( line of therapy) example where a given cell has compressed values of therapy progression and we are trying to extract the first line of therapy
package.name<-c("dplyr","stringr")
for(i in package.name){
if(!require(i,character.only = T)){
install.packages(i)
}
library(i,character.only = T)
}
Step 1: Creating dummy data frame
df<-data.frame(Patient=1:2,
Line_of_Therapy=c("Drug 1 | Drug 2","Drug 1 + Drug 3 | Drug 4 + Drug 5 | Drug 3"))
df
Patient Line_of_Therapy
1 1 Drug 1 | Drug 2
2 2 Drug 1 + Drug 3 | Drug 4 + Drug 5 | Drug 3
Step 2: Extracting the second Line of therapy
df2<-df%>%
mutate(LOT2=sapply(Line_of_Therapy,function(x){
y<-str_split(x,"[|]")[[1]][2]
z<-trimws(y)
return(z)
}))
df2
Patient Line_of_Therapy LOT2
1 1 Drug 1 | Drug 2 Drug 2
2 2 Drug 1 + Drug 3 | Drug 4 + Drug 5 | Drug 3 Drug 4 + Drug 5
Parting Comments
In this blog we looked at a very simple example of how We can use dplyr and sapply to extract the second line of therapy
R Complete Guide
Python Complete Guide
https://www.aimlmadeeasy.com/2021/09/python-complete-guide.html