Machine Learning Made Easy: Blog 37: Parsing Pseudo Code using R

Parsing Pseudo Codes

Introduction

In this blog, we will look at how to parse simple text in R. Tidytext library is very rich in the sense it can break text into tidy formats. We can then extract various entities from the text based on specific dictionaries.We will write simple pseudo codes in R and will try to execute it using text parsing and entity extraction

Installing libraries

Lets install tidytext library

package.name<-c("tidytext","textstem","dplyr","tidyr","stringr")

for(i in package.name){
  
  if(!require(i,character.only = T)){
    
    install.packages(i)
  }
  library(i,character.only = T)
  
}

Step 1:Creation of Pseudo Code to calculate mean

The dataset taken in this blog is 'mtcars'. We will try and calculate the mean of hp column.We will first store it in a vector and then use it in a data frame

# mtcars dataset
df<-mtcars

pscode<-"Mean of 'hp'"

text_df<-data.frame(line=1:length(pscode),text=pscode,stringsAsFactors = F)
text_df

  line         text
1    1 Mean of 'hp'

Step 2:Initialising dictionaries

Here we will establish some dictionaries that will be used to identify key entities from the pseudo code

# Idntifying columns
column.identifiers<-colnames(mtcars)

# Identifying mathematical functions
action.identifiers<-data.frame(word=c("mean","average","sum","summation","total"),
                               WithinR=c("mean","mean","sum","sum","sum"),
                               stringsAsFactors = F)

action.identifiers

       word WithinR
1      mean    mean
2   average    mean
3       sum     sum
4 summation     sum
5     total     sum

Step 3:Breaking pseudo code into chunks

We will now break the pseudo code into individual elements and arrange them in a single column

token.df<-text_df %>%
  unnest_tokens(word, text)

row.names(token.df)<-NULL
head(token.df)

  line word
1    1 mean
2    1   of
3    1   hp

Step 4:Extracting Action Entity

Using the below snippet of code, we will match the individual components of the pseudo code with the maths functions

# Extract action to be performed
extract.action<-token.df%>%
  left_join(action.identifiers,by="word")%>%
  filter(!is.na(WithinR))%>%
  select(WithinR)%>%
  pull(WithinR)

extract.action

[1] "mean"

Step 5:Extract Column idenifier

Identify the column on which mathematical function will be applied

extract.column<-token.df%>%
  filter(word %in% column.identifiers)%>%
  select(word)%>%
  pull(word)

extract.column

[1] "hp"

Step 6:Evaluating the pseudo code

We will combine extract.action and extract.column using paste0 function and then evaluate the expression using the eval function

output.value<-eval(parse(text=paste0(extract.action,"(df[['",extract.column,"']])")))
output.value

[1] 146.6875

Final Comments

In this blog we saw a simple example of how a pseudo can be parsed using tidytext and dplyr library and evaluated with the help of custom dictionaries. We can evaluate complex pseudo codes as well with the same logic

Link to Previous R Blogs

https://www.aimlmadeeasy.com/2020/06/r-complete-guide.html

List of Datasets for Practise

https://hofmann.public.iastate.edu/data_in_r_sortable.html

https://vincentarelbundock.github.io/Rdatasets/datasets.html

Machine Learning Made Easy

Monday, September 21, 2020

Blog 37: Parsing Pseudo Code using R