Monday, September 21, 2020

Blog 37: Parsing Pseudo Code using R

Parsing Pseudo Codes


Introduction

In this blog, we will look at how to parse simple text in R. Tidytext library is very rich in the sense it can break text into tidy formats. We can then extract various entities from the text based on specific dictionaries.We will write simple pseudo codes in R and will try to execute it using text parsing and entity extraction


Installing libraries

Lets install tidytext library

package.name<-c("tidytext","textstem","dplyr","tidyr","stringr")

for(i in package.name){
  
  if(!require(i,character.only = T)){
    
    install.packages(i)
  }
  library(i,character.only = T)
  
}


Step 1:Creation of Pseudo Code to calculate mean

The dataset taken in this blog is 'mtcars'. We will try and calculate the mean of hp column.We will first store it in a vector and then use it in a data frame

# mtcars dataset
df<-mtcars

pscode<-"Mean of 'hp'"

text_df<-data.frame(line=1:length(pscode),text=pscode,stringsAsFactors = F)
text_df
  line         text
1    1 Mean of 'hp'


Step 2:Initialising dictionaries

Here we will establish some dictionaries that will be used to identify key entities from the pseudo code

# Idntifying columns
column.identifiers<-colnames(mtcars)

# Identifying mathematical functions
action.identifiers<-data.frame(word=c("mean","average","sum","summation","total"),
                               WithinR=c("mean","mean","sum","sum","sum"),
                               stringsAsFactors = F)

action.identifiers
       word WithinR
1      mean    mean
2   average    mean
3       sum     sum
4 summation     sum
5     total     sum


Step 3:Breaking pseudo code into chunks

We will now break the pseudo code into individual elements and arrange them in a single column

token.df<-text_df %>%
  unnest_tokens(word, text)

row.names(token.df)<-NULL
head(token.df)
  line word
1    1 mean
2    1   of
3    1   hp


Step 4:Extracting Action Entity

Using the below snippet of code, we will match the individual components of the pseudo code with the maths functions

# Extract action to be performed
extract.action<-token.df%>%
  left_join(action.identifiers,by="word")%>%
  filter(!is.na(WithinR))%>%
  select(WithinR)%>%
  pull(WithinR)

extract.action
[1] "mean"


Step 5:Extract Column idenifier

Identify the column on which mathematical function will be applied

extract.column<-token.df%>%
  filter(word %in% column.identifiers)%>%
  select(word)%>%
  pull(word)

extract.column
[1] "hp"


Step 6:Evaluating the pseudo code

We will combine extract.action and extract.column using paste0 function and then evaluate the expression using the eval function

output.value<-eval(parse(text=paste0(extract.action,"(df[['",extract.column,"']])")))
output.value
[1] 146.6875


Final Comments

In this blog we saw a simple example of how a pseudo can be parsed using tidytext and dplyr library and evaluated with the help of custom dictionaries. We can evaluate complex pseudo codes as well with the same logic


No comments:

Post a Comment

Web Scraping Tutorial 4- Getting the busy information data from Popular time page from Google

Popular Times Popular Times In this blog we will try to scrape the ...