Showing posts with label text extraction. Show all posts
Showing posts with label text extraction. Show all posts

Friday, April 1, 2022

Extract part of the string in R using sapply and dplyr

Split text using sapply


Introduction

There are certain situations where we have to extract certain portions from text fields/columns. We can do this using a combination of dplyr,sapply and split functions.For our blog, we will use a LOT( line of therapy) example where a given cell has compressed values of therapy progression and we are trying to extract the first line of therapy


package.name<-c("dplyr","stringr")

for(i in package.name){

  if(!require(i,character.only = T)){

    install.packages(i)
  }
  library(i,character.only = T)

}


Step 1: Creating dummy data frame

df<-data.frame(Patient=1:2,
               Line_of_Therapy=c("Drug 1 | Drug 2","Drug 1 + Drug 3 | Drug 4 + Drug 5 | Drug 3"))
df
  Patient                            Line_of_Therapy
1       1                            Drug 1 | Drug 2
2       2 Drug 1 + Drug 3 | Drug 4 + Drug 5 | Drug 3


Step 2: Extracting the second Line of therapy

df2<-df%>%
  mutate(LOT2=sapply(Line_of_Therapy,function(x){
    
    y<-str_split(x,"[|]")[[1]][2]
    z<-trimws(y)
    return(z)
    
    
  }))
  
df2
  Patient                            Line_of_Therapy            LOT2
1       1                            Drug 1 | Drug 2          Drug 2
2       2 Drug 1 + Drug 3 | Drug 4 + Drug 5 | Drug 3 Drug 4 + Drug 5


Parting Comments

In this blog we looked at a very simple example of how We can use dplyr and sapply to extract the second line of therapy

My Youtube Channel

Identify customer visit information

Customer Visit Information Customer Visit Information 2025-03-11 ...