Showing posts with label dplyr. Show all posts
Showing posts with label dplyr. Show all posts

Sunday, April 3, 2022

Determine position of Mono and Combo therapies using stringr

Position of Matched string patterns


Introduction

In drug regimen administered to a patient, we are often required to determine the position of a Drug(as Mono or combo therapy).This helps us to determine the switches/to from a drug.In this blog we will see how this can be done using a very simple illustration


package.name<-c("dplyr","stringr")

for(i in package.name){

  if(!require(i,character.only = T)){

    install.packages(i)
  }
  library(i,character.only = T)

}


Step 1: Creating dummy data frame

Here a + indicates that a drug is added to an existing treatment. Individual therapies are separated by |.For instance, for a patient, if the Line of therapy is Drug 1 + Drug 3 | Drug 4 + Drug 5 | Drug 3, then we can infer the following

  • Drug 3 is used as an add on along with Drug 1 in first line of treatment
  • Drug 5 is used as an add on along with Drug 4 in second line of treatment
  • Drug 3 is used as a Mono therapy(standalone) in theirs line of treatment
df<-data.frame(Patient=1:3,
               Line_of_Therapy=c("Drug 1 | Drug 2","Drug 1 + Drug 3 | Drug 4 + Drug 5 | Drug 1","Drug 5 | Drug 1 + Drug 2"))

df
  Patient                            Line_of_Therapy
1       1                            Drug 1 | Drug 2
2       2 Drug 1 + Drug 3 | Drug 4 + Drug 5 | Drug 1
3       3                   Drug 5 | Drug 1 + Drug 2


Step 2: Pattern Matching and identifying position of Drug 1 in line of therapy


Step 2A: Creating Mono and Combo Flags

Lets identify whether a line of regimen has Drug 1 present as Mono and/or combo

df2<-df%>%
  mutate(Mono_Indicator=sapply(Line_of_Therapy, function(x){
    
    # x<-"Drug 5 | Drug 1 + Drug 2"
    # x<-"Drug 1 | Drug 2"
    
    # Splitting the individual drugs within particular line
    y<-str_split(x,"[|]")[[1]]
    z<-trimws(y)
    
    # Checking if Drug 1 is present as a Mono or not and returning the position
    z1<-which(z=="Drug 1")[1]
    
    # If Drug 1 is not present, then indicate "Not present"
    if(is.na(z1)){
      
      z2<-"Not Mono"
      
    }else{
      
      z2<-"Mono"
      
    }
    
    
    # z1<-ifelse(z,"Combo","Mono")
    return(z2)
    
    
  }),
        Combo_Indicator=sapply(Line_of_Therapy, function(x){
    
    
    # Splitting the individual drugs within particular line
    y<-str_split(x,"[|]")[[1]]
    z<-trimws(y)
    
    # Checking if Drug 1 is present as a Mono or not and returning the position
    z1<-which(str_detect(z,"Drug 1") & str_detect(z,"['+']"))[1]
    
    # If Drug 1 is not present, then indicate "Not present"
    if(is.na(z1)){
      
      z2<-"Not Combo"
      
    }else{
      
      z2<-"Combo"
      
    }
    
    
    # z1<-ifelse(z,"Combo","Mono")
    return(z2)
    
    
  })
  )
  
df2
  Patient                            Line_of_Therapy Mono_Indicator
1       1                            Drug 1 | Drug 2           Mono
2       2 Drug 1 + Drug 3 | Drug 4 + Drug 5 | Drug 1           Mono
3       3                   Drug 5 | Drug 1 + Drug 2       Not Mono
  Combo_Indicator
1       Not Combo
2           Combo
3           Combo


Step 2B: Identifying positions for Mono and/or Combo

Lets now identify the position where Drug 1 is used as a mono and/or combo

interim.df<-df2%>%
  mutate(Mono_Position=sapply(Line_of_Therapy, function(x){
    
    # x<-"Drug 5 | Drug 1 + Drug 2"
    # x<-"Drug 1 | Drug 2"
    
    # Splitting the individual drugs within particular line
    y<-str_split(x,"[|]")[[1]]
    z<-trimws(y)
    
    # Checking if Drug 1 is present as a Mono or not and returning the position
    z1<-which(z=="Drug 1")[1]
    
    # If Drug 1 is not present, then indicate "Not present"
    if(is.na(z1)){
      
      z2<-"Not Mono"
      
    }else{
      
      z2<-z1
      
    }
    
    return(z2)
    
    
  }),
        Combo_Position=sapply(Line_of_Therapy, function(x){
    
    
    # Splitting the individual drugs within particular line
    y<-str_split(x,"[|]")[[1]]
    z<-trimws(y)
    
    # Checking if Drug 1 is present as a Mono or not and returning the position
    z1<-which(str_detect(z,"Drug 1") & str_detect(z,"['+']"))[1]
    
    # If Drug 1 is not present, then indicate "Not present"
    if(is.na(z1)){
      
      z2<-"Not Combo"
      
    }else{
      
      z2<-z1
      
    }
    
    return(z2)
    
    
  })
  )
  
interim.df
  Patient                            Line_of_Therapy Mono_Indicator
1       1                            Drug 1 | Drug 2           Mono
2       2 Drug 1 + Drug 3 | Drug 4 + Drug 5 | Drug 1           Mono
3       3                   Drug 5 | Drug 1 + Drug 2       Not Mono
  Combo_Indicator Mono_Position Combo_Position
1       Not Combo             1      Not Combo
2           Combo             3              1
3           Combo      Not Mono              2


Step 3: Creating the final data frame

final.df<-interim.df%>%
  select(Patient,Line_of_Therapy,Mono_Indicator,Mono_Position,Combo_Indicator,Combo_Position)


final.df
  Patient                            Line_of_Therapy Mono_Indicator
1       1                            Drug 1 | Drug 2           Mono
2       2 Drug 1 + Drug 3 | Drug 4 + Drug 5 | Drug 1           Mono
3       3                   Drug 5 | Drug 1 + Drug 2       Not Mono
  Mono_Position Combo_Indicator Combo_Position
1             1       Not Combo      Not Combo
2             3           Combo              1
3      Not Mono           Combo              2

Parting Comments

In this blog we looked at a very simple example of how We can use dplyr and stringr library to check presence of a string in a column within a data frame

My Youtube Channel

Web Scraping Tutorial 4- Getting the busy information data from Popular time page from Google

Popular Times Popular Times In this blog we will try to scrape the ...