Position of Matched string patterns
Parag Verma
04th April, 2022
Introduction
In drug regimen administered to a patient, we are often required to determine the position of a Drug(as Mono or combo therapy).This helps us to determine the switches/to from a drug.In this blog we will see how this can be done using a very simple illustration
package.name<-c("dplyr","stringr")
for(i in package.name){
if(!require(i,character.only = T)){
install.packages(i)
}
library(i,character.only = T)
}
Step 1: Creating dummy data frame
Here a + indicates that a drug is added to an existing treatment. Individual therapies are separated by |.For instance, for a patient, if the Line of therapy is Drug 1 + Drug 3 | Drug 4 + Drug 5 | Drug 3, then we can infer the following
- Drug 3 is used as an add on along with Drug 1 in first line of treatment
- Drug 5 is used as an add on along with Drug 4 in second line of treatment
- Drug 3 is used as a Mono therapy(standalone) in theirs line of treatment
df<-data.frame(Patient=1:3,
Line_of_Therapy=c("Drug 1 | Drug 2","Drug 1 + Drug 3 | Drug 4 + Drug 5 | Drug 1","Drug 5 | Drug 1 + Drug 2"))
df
Patient Line_of_Therapy
1 1 Drug 1 | Drug 2
2 2 Drug 1 + Drug 3 | Drug 4 + Drug 5 | Drug 1
3 3 Drug 5 | Drug 1 + Drug 2
Step 2: Pattern Matching and identifying position of Drug 1 in line of therapy
Step 2A: Creating Mono and Combo Flags
Lets identify whether a line of regimen has Drug 1 present as Mono and/or combo
df2<-df%>%
mutate(Mono_Indicator=sapply(Line_of_Therapy, function(x){
# x<-"Drug 5 | Drug 1 + Drug 2"
# x<-"Drug 1 | Drug 2"
# Splitting the individual drugs within particular line
y<-str_split(x,"[|]")[[1]]
z<-trimws(y)
# Checking if Drug 1 is present as a Mono or not and returning the position
z1<-which(z=="Drug 1")[1]
# If Drug 1 is not present, then indicate "Not present"
if(is.na(z1)){
z2<-"Not Mono"
}else{
z2<-"Mono"
}
# z1<-ifelse(z,"Combo","Mono")
return(z2)
}),
Combo_Indicator=sapply(Line_of_Therapy, function(x){
# Splitting the individual drugs within particular line
y<-str_split(x,"[|]")[[1]]
z<-trimws(y)
# Checking if Drug 1 is present as a Mono or not and returning the position
z1<-which(str_detect(z,"Drug 1") & str_detect(z,"['+']"))[1]
# If Drug 1 is not present, then indicate "Not present"
if(is.na(z1)){
z2<-"Not Combo"
}else{
z2<-"Combo"
}
# z1<-ifelse(z,"Combo","Mono")
return(z2)
})
)
df2
Patient Line_of_Therapy Mono_Indicator
1 1 Drug 1 | Drug 2 Mono
2 2 Drug 1 + Drug 3 | Drug 4 + Drug 5 | Drug 1 Mono
3 3 Drug 5 | Drug 1 + Drug 2 Not Mono
Combo_Indicator
1 Not Combo
2 Combo
3 Combo
Step 2B: Identifying positions for Mono and/or Combo
Lets now identify the position where Drug 1 is used as a mono and/or combo
interim.df<-df2%>%
mutate(Mono_Position=sapply(Line_of_Therapy, function(x){
# x<-"Drug 5 | Drug 1 + Drug 2"
# x<-"Drug 1 | Drug 2"
# Splitting the individual drugs within particular line
y<-str_split(x,"[|]")[[1]]
z<-trimws(y)
# Checking if Drug 1 is present as a Mono or not and returning the position
z1<-which(z=="Drug 1")[1]
# If Drug 1 is not present, then indicate "Not present"
if(is.na(z1)){
z2<-"Not Mono"
}else{
z2<-z1
}
return(z2)
}),
Combo_Position=sapply(Line_of_Therapy, function(x){
# Splitting the individual drugs within particular line
y<-str_split(x,"[|]")[[1]]
z<-trimws(y)
# Checking if Drug 1 is present as a Mono or not and returning the position
z1<-which(str_detect(z,"Drug 1") & str_detect(z,"['+']"))[1]
# If Drug 1 is not present, then indicate "Not present"
if(is.na(z1)){
z2<-"Not Combo"
}else{
z2<-z1
}
return(z2)
})
)
interim.df
Patient Line_of_Therapy Mono_Indicator
1 1 Drug 1 | Drug 2 Mono
2 2 Drug 1 + Drug 3 | Drug 4 + Drug 5 | Drug 1 Mono
3 3 Drug 5 | Drug 1 + Drug 2 Not Mono
Combo_Indicator Mono_Position Combo_Position
1 Not Combo 1 Not Combo
2 Combo 3 1
3 Combo Not Mono 2
Step 3: Creating the final data frame
final.df<-interim.df%>%
select(Patient,Line_of_Therapy,Mono_Indicator,Mono_Position,Combo_Indicator,Combo_Position)
final.df
Patient Line_of_Therapy Mono_Indicator
1 1 Drug 1 | Drug 2 Mono
2 2 Drug 1 + Drug 3 | Drug 4 + Drug 5 | Drug 1 Mono
3 3 Drug 5 | Drug 1 + Drug 2 Not Mono
Mono_Position Combo_Indicator Combo_Position
1 1 Not Combo Not Combo
2 3 Combo 1
3 Not Mono Combo 2
Parting Comments
In this blog we looked at a very simple example of how We can use dplyr and stringr library to check presence of a string in a column within a data frame
R Complete Guide
Python Complete Guide
https://www.aimlmadeeasy.com/2021/09/python-complete-guide.html