Saturday, October 26, 2024

Web Scraping Tutorial 4- Getting the busy information data from Popular time page from Google

Popular Times

In this blog we will try to scrape the busy text from popular times section in google maps.

Step 0: Importing the libraries


Step 1: Start a headless Firefox browser

driver <- rsDriver( 
    browser = c("firefox"), 
    chromever = NULL, 
    verbose = F, 
    extraCapabilities = list("firefoxOptions" = list(args = list("--headless"))) 
) 
web_driver <- driver[["client"]] 

# This link contains Restaurant links for Cedele
nm<-"cedele restaurant "
ad_url<-str_c("https://www.google.co.id/maps/search/ ",nm)

web_driver$navigate(ad_url)


The page looks like the below image

Extarcting the graph for Wednesday

The graph starts from 6 AM and ends at 11 PM.Even though there are no values for lets say 6 AM till about 9.30 and then for times post 9 PM, we would still extract whatever is there in the elements

# xml for 6 AM
xml_6AM<-"/html/body/div[2]/div[3]/div[8]/div[9]/div/div/div[1]/div[2]/div/div[1]/div/div/div[14]/div[3]/div[4]/div/div[1]"

xml_7AM<-"/html/body/div[2]/div[3]/div[8]/div[9]/div/div/div[1]/div[2]/div/div[1]/div/div/div[14]/div[3]/div[4]/div/div[2]"

xml_8AM<-"/html/body/div[2]/div[3]/div[8]/div[9]/div/div/div[1]/div[2]/div/div[1]/div/div/div[14]/div[3]/div[4]/div/div[3]"

# The common xml is 
nm_common<-"/html/body/div[2]/div[3]/div[8]/div[9]/div/div/div[1]/div[2]/div/div[1]/div/div/div[14]/div[3]/div[4]/div/div"


Extracting the individual components

timing_xml <- web_driver$findElements(using = "xpath", value = nm_common)


# Getting the name using getElementText

ls_wednesday<-list()
j<-0
for(i in  1:length(timing_xml)){
  
  j<-j+1
  
  # Getting the busy details
  busy_text <- try(timing_xml[[i]]$getElementAttribute("aria-label")[[1]])
  print(busy_text)
  ls_wednesday[j]<-busy_text
}
## [1] "0% busy at 6 am."
## [1] "0% busy at 7 am."
## [1] "0% busy at 8 am."
## [1] "0% busy at 9 am."
## [1] "36% busy at 10 am."
## [1] "55% busy at 11 am."
## [1] "70% busy at 12 pm."
## [1] "55% busy at 1 pm."
## [1] "33% busy at 2 pm."
## [1] "12% busy at 3 pm."
## [1] "9% busy at 4 pm."
## [1] "16% busy at 5 pm."
## [1] "35% busy at 6 pm."
## [1] "57% busy at 7 pm."
## [1] "54% busy at 8 pm."
## [1] "0% busy at 9 pm."
## [1] "0% busy at 10 pm."
## [1] "0% busy at 11 pm."
wednesday_timing=as.character(ls_wednesday)


wednesday_df<-data.frame(Day="Wednesday",
                      Busy_Details=wednesday_timing)

wednesday_df
##          Day       Busy_Details
## 1  Wednesday   0% busy at 6 am.
## 2  Wednesday   0% busy at 7 am.
## 3  Wednesday   0% busy at 8 am.
## 4  Wednesday   0% busy at 9 am.
## 5  Wednesday 36% busy at 10 am.
## 6  Wednesday 55% busy at 11 am.
## 7  Wednesday 70% busy at 12 pm.
## 8  Wednesday  55% busy at 1 pm.
## 9  Wednesday  33% busy at 2 pm.
## 10 Wednesday  12% busy at 3 pm.
## 11 Wednesday   9% busy at 4 pm.
## 12 Wednesday  16% busy at 5 pm.
## 13 Wednesday  35% busy at 6 pm.
## 14 Wednesday  57% busy at 7 pm.
## 15 Wednesday  54% busy at 8 pm.
## 16 Wednesday   0% busy at 9 pm.
## 17 Wednesday  0% busy at 10 pm.
## 18 Wednesday  0% busy at 11 pm.

Web Scraping Tutorial 4- Getting the busy information data from Popular time page from Google

Popular Times Popular Times In this blog we will try to scrape the ...