Saturday, April 2, 2022

Determine Mono and Combo therapies using stringr

Match string patterns


Introduction

There are certain situations where we have to flag whether a certain word/text is present in a string.For instance, in drug regimen administered to a patient, there are situations where we have to check whether a drug is given as a standalong or in combination with the other.In this blog we will see how this can be done


package.name<-c("dplyr","stringr")

for(i in package.name){

  if(!require(i,character.only = T)){

    install.packages(i)
  }
  library(i,character.only = T)

}


Step 1: Creating dummy data frame

df<-data.frame(Patient=1:3,
               Line_of_Therapy=c("Drug 1 | Drug 2","Drug 1 + Drug 3 | Drug 4 + Drug 5 | Drug 3","Drug 1"))

df
  Patient                            Line_of_Therapy
1       1                            Drug 1 | Drug 2
2       2 Drug 1 + Drug 3 | Drug 4 + Drug 5 | Drug 3
3       3                                     Drug 1

We can see that Drug 1 is given as a combination for Patient 1 and 2 while it is given as a standalone drug for patient 3


Step 2: Pattern Matching

df2<-df%>%
  mutate(Mono_Indicator=sapply(Line_of_Therapy, function(x){
    
    # x<-"Drug 5 | Drug 1 + Drug 2"
    # x<-"Drug 1 | Drug 2"
    
    # Splitting the individual drugs within particular line
    y<-str_split(x,"[|]")[[1]]
    z<-trimws(y)
    
    # Checking if Drug 1 is present as a Mono or not and returning the position
    z1<-which(z=="Drug 1")[1]
    
    # If Drug 1 is not present, then indicate "Not present"
    if(is.na(z1)){
      
      z2<-"Not Mono"
      
    }else{
      
      z2<-"Mono"
      
    }
    
    return(z2)
    
    
  }),
        Combo_Indicator=sapply(Line_of_Therapy, function(x){
    
    # Splitting the individual drugs within particular line
    y<-str_split(x,"[|]")[[1]]
    z<-trimws(y)
    
    # Checking if Drug 1 is present as a Combo or not and returning the position
    z1<-which(str_detect(z,"Drug 1") & str_detect(z,"['+']"))[1]
    
    # If Drug 1 is not present as Combo, then indicate "Not present"
    if(is.na(z1)){
      
      z2<-"Not Combo"
      
    }else{
      
      z2<-"Combo"
      
    }
    
    return(z2)
    
    
  })
  )
  
df2
  Patient                            Line_of_Therapy Mono_Indicator
1       1                            Drug 1 | Drug 2           Mono
2       2 Drug 1 + Drug 3 | Drug 4 + Drug 5 | Drug 3       Not Mono
3       3                                     Drug 1           Mono
  Combo_Indicator
1       Not Combo
2           Combo
3       Not Combo


Parting Comments

In this blog we looked at a very simple example of how We can use dplyr and stringr library to check presence of a string in a column within a data frame

My Youtube Channel

No comments:

Post a Comment

Web Scraping Tutorial 4- Getting the busy information data from Popular time page from Google

Popular Times Popular Times In this blog we will try to scrape the ...