Sunday, April 3, 2022

Determine position of Mono and Combo therapies using stringr

Position of Matched string patterns


Introduction

In drug regimen administered to a patient, we are often required to determine the position of a Drug(as Mono or combo therapy).This helps us to determine the switches/to from a drug.In this blog we will see how this can be done using a very simple illustration


package.name<-c("dplyr","stringr")

for(i in package.name){

  if(!require(i,character.only = T)){

    install.packages(i)
  }
  library(i,character.only = T)

}


Step 1: Creating dummy data frame

Here a + indicates that a drug is added to an existing treatment. Individual therapies are separated by |.For instance, for a patient, if the Line of therapy is Drug 1 + Drug 3 | Drug 4 + Drug 5 | Drug 3, then we can infer the following

  • Drug 3 is used as an add on along with Drug 1 in first line of treatment
  • Drug 5 is used as an add on along with Drug 4 in second line of treatment
  • Drug 3 is used as a Mono therapy(standalone) in theirs line of treatment
df<-data.frame(Patient=1:3,
               Line_of_Therapy=c("Drug 1 | Drug 2","Drug 1 + Drug 3 | Drug 4 + Drug 5 | Drug 1","Drug 5 | Drug 1 + Drug 2"))

df
  Patient                            Line_of_Therapy
1       1                            Drug 1 | Drug 2
2       2 Drug 1 + Drug 3 | Drug 4 + Drug 5 | Drug 1
3       3                   Drug 5 | Drug 1 + Drug 2


Step 2: Pattern Matching and identifying position of Drug 1 in line of therapy


Step 2A: Creating Mono and Combo Flags

Lets identify whether a line of regimen has Drug 1 present as Mono and/or combo

df2<-df%>%
  mutate(Mono_Indicator=sapply(Line_of_Therapy, function(x){
    
    # x<-"Drug 5 | Drug 1 + Drug 2"
    # x<-"Drug 1 | Drug 2"
    
    # Splitting the individual drugs within particular line
    y<-str_split(x,"[|]")[[1]]
    z<-trimws(y)
    
    # Checking if Drug 1 is present as a Mono or not and returning the position
    z1<-which(z=="Drug 1")[1]
    
    # If Drug 1 is not present, then indicate "Not present"
    if(is.na(z1)){
      
      z2<-"Not Mono"
      
    }else{
      
      z2<-"Mono"
      
    }
    
    
    # z1<-ifelse(z,"Combo","Mono")
    return(z2)
    
    
  }),
        Combo_Indicator=sapply(Line_of_Therapy, function(x){
    
    
    # Splitting the individual drugs within particular line
    y<-str_split(x,"[|]")[[1]]
    z<-trimws(y)
    
    # Checking if Drug 1 is present as a Mono or not and returning the position
    z1<-which(str_detect(z,"Drug 1") & str_detect(z,"['+']"))[1]
    
    # If Drug 1 is not present, then indicate "Not present"
    if(is.na(z1)){
      
      z2<-"Not Combo"
      
    }else{
      
      z2<-"Combo"
      
    }
    
    
    # z1<-ifelse(z,"Combo","Mono")
    return(z2)
    
    
  })
  )
  
df2
  Patient                            Line_of_Therapy Mono_Indicator
1       1                            Drug 1 | Drug 2           Mono
2       2 Drug 1 + Drug 3 | Drug 4 + Drug 5 | Drug 1           Mono
3       3                   Drug 5 | Drug 1 + Drug 2       Not Mono
  Combo_Indicator
1       Not Combo
2           Combo
3           Combo


Step 2B: Identifying positions for Mono and/or Combo

Lets now identify the position where Drug 1 is used as a mono and/or combo

interim.df<-df2%>%
  mutate(Mono_Position=sapply(Line_of_Therapy, function(x){
    
    # x<-"Drug 5 | Drug 1 + Drug 2"
    # x<-"Drug 1 | Drug 2"
    
    # Splitting the individual drugs within particular line
    y<-str_split(x,"[|]")[[1]]
    z<-trimws(y)
    
    # Checking if Drug 1 is present as a Mono or not and returning the position
    z1<-which(z=="Drug 1")[1]
    
    # If Drug 1 is not present, then indicate "Not present"
    if(is.na(z1)){
      
      z2<-"Not Mono"
      
    }else{
      
      z2<-z1
      
    }
    
    return(z2)
    
    
  }),
        Combo_Position=sapply(Line_of_Therapy, function(x){
    
    
    # Splitting the individual drugs within particular line
    y<-str_split(x,"[|]")[[1]]
    z<-trimws(y)
    
    # Checking if Drug 1 is present as a Mono or not and returning the position
    z1<-which(str_detect(z,"Drug 1") & str_detect(z,"['+']"))[1]
    
    # If Drug 1 is not present, then indicate "Not present"
    if(is.na(z1)){
      
      z2<-"Not Combo"
      
    }else{
      
      z2<-z1
      
    }
    
    return(z2)
    
    
  })
  )
  
interim.df
  Patient                            Line_of_Therapy Mono_Indicator
1       1                            Drug 1 | Drug 2           Mono
2       2 Drug 1 + Drug 3 | Drug 4 + Drug 5 | Drug 1           Mono
3       3                   Drug 5 | Drug 1 + Drug 2       Not Mono
  Combo_Indicator Mono_Position Combo_Position
1       Not Combo             1      Not Combo
2           Combo             3              1
3           Combo      Not Mono              2


Step 3: Creating the final data frame

final.df<-interim.df%>%
  select(Patient,Line_of_Therapy,Mono_Indicator,Mono_Position,Combo_Indicator,Combo_Position)


final.df
  Patient                            Line_of_Therapy Mono_Indicator
1       1                            Drug 1 | Drug 2           Mono
2       2 Drug 1 + Drug 3 | Drug 4 + Drug 5 | Drug 1           Mono
3       3                   Drug 5 | Drug 1 + Drug 2       Not Mono
  Mono_Position Combo_Indicator Combo_Position
1             1       Not Combo      Not Combo
2             3           Combo              1
3      Not Mono           Combo              2

Parting Comments

In this blog we looked at a very simple example of how We can use dplyr and stringr library to check presence of a string in a column within a data frame

My Youtube Channel

No comments:

Post a Comment

Web Scraping Tutorial 4- Getting the busy information data from Popular time page from Google

Popular Times Popular Times In this blog we will try to scrape the ...