Showing posts with label text extraction. Show all posts
Showing posts with label text extraction. Show all posts

Friday, April 1, 2022

Extract part of the string in R using sapply and dplyr

Split text using sapply


Introduction

There are certain situations where we have to extract certain portions from text fields/columns. We can do this using a combination of dplyr,sapply and split functions.For our blog, we will use a LOT( line of therapy) example where a given cell has compressed values of therapy progression and we are trying to extract the first line of therapy


package.name<-c("dplyr","stringr")

for(i in package.name){

  if(!require(i,character.only = T)){

    install.packages(i)
  }
  library(i,character.only = T)

}


Step 1: Creating dummy data frame

df<-data.frame(Patient=1:2,
               Line_of_Therapy=c("Drug 1 | Drug 2","Drug 1 + Drug 3 | Drug 4 + Drug 5 | Drug 3"))
df
  Patient                            Line_of_Therapy
1       1                            Drug 1 | Drug 2
2       2 Drug 1 + Drug 3 | Drug 4 + Drug 5 | Drug 3


Step 2: Extracting the second Line of therapy

df2<-df%>%
  mutate(LOT2=sapply(Line_of_Therapy,function(x){
    
    y<-str_split(x,"[|]")[[1]][2]
    z<-trimws(y)
    return(z)
    
    
  }))
  
df2
  Patient                            Line_of_Therapy            LOT2
1       1                            Drug 1 | Drug 2          Drug 2
2       2 Drug 1 + Drug 3 | Drug 4 + Drug 5 | Drug 3 Drug 4 + Drug 5


Parting Comments

In this blog we looked at a very simple example of how We can use dplyr and sapply to extract the second line of therapy

My Youtube Channel

Web Scraping Tutorial 4- Getting the busy information data from Popular time page from Google

Popular Times Popular Times In this blog we will try to scrape the ...