Web Scrapping Tutorial 3: Scrolling Down and Expanding Reviews by Presssing More
2024-05-23
Introduction
In this tutorial, we will look at how we can use Rselenium to :
- Scroll Down the review page
- Expand reviews by pressing More
We will also extract the text review along with time stamp and rating given
Step 0: Installing Libraries
package.name<-c("tidyverse","RSelenium")
for(i in package.name){
if(!require(i,character.only = T)){
install.packages(i)
}
library(i,character.only = T)
}
Loading required package: tidyverse
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.0 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Loading required package: RSelenium
Step 1:Start a headless Firefox browser
The syntax for initiating a headless Firefox browser is shown below
driver <- rsDriver(
browser = c("firefox"),
chromever = NULL,
verbose = F,
extraCapabilities = list("firefoxOptions" = list(args = list("--headless")))
)
web_driver <- driver[["client"]]
Once I execute this, Firefox browser would pop up in the
background as shown below.
Step 3: Getting the URL for each store
We can see that there are 4 stores here.We will get the information for lets say one of the stores to understand the process in more detail.Once we are familiar with the process, we can replicate it for the other stores as well.
For this, we will have to follow a two steps process:
- Get the URL for each store
- Once you get the URL, access the URL link and then get the name and address
Get the XML path of the URL link through Inspection. The XML path for the three stores would look like the below:
- /html/body/div[2]/div[3]/div[8]/div[9]/div/div/div[1]/div[2]/div/div[1]/div/div/div[1]/div[1]/div[3]/div/a
- /html/body/div[2]/div[3]/div[8]/div[9]/div/div/div[1]/div[2]/div/div[1]/div/div/div[1]/div[1]/div[5]/div/a
- /html/body/div[2]/div[3]/div[8]/div[9]/div/div/div[1]/div[2]/div/div[1]/div/div/div[1]/div[1]/div[7]/div/a
The difference between the three is only wrt to the penultimate div element. For the first store it is div[3],for the second store it is div[5] and for the third it is div[7]
Now we will use this information to extract all the links. For each of these XML path, we need to get the href(url)
The penultimate div which is the only difference between the store XML paths will be specified as div(instead of div[3],div[5] or div[7]) and then each of these elements, we will extract the href using
# l1<-list()
link_Store <- web_driver$findElements(using = "xpath", value = "/html/body/div[2]/div[3]/div[8]/div[9]/div/div/div[1]/div[2]/div/div[1]/div/div/div[1]/div[1]/div/div/a")
# print(store)
l1<-list()
for(i in 1:length(link_Store) ){
l1[[i]]<-link_Store[[i]]$getElementAttribute("href")[[1]]
}
l1
[[1]]
[1] "https://www.google.co.id/maps/place/Anmol+Patanjali+Store/data=!4m7!3m6!1s0x3be7c9e659d3c0e7:0xcd3cb45a143317ea!8m2!3d19.1202659!4d72.8897626!16s%2Fg%2F11p5jw_gg8!19sChIJ58DTWebJ5zsR6hczFFq0PM0?authuser=0&hl=en&rclk=1"
[[2]]
[1] "https://www.google.co.id/maps/place/Patanjali+Chikitsalay+Powai/data=!4m7!3m6!1s0x3be7c7f214bc8cdf:0x445cf1e34b310805!8m2!3d19.1259382!4d72.9193655!16s%2Fg%2F1pwfbqzfr!19sChIJ34y8FPLH5zsRBQgxS-PxXEQ?authuser=0&hl=en&rclk=1"
[[3]]
[1] "https://www.google.co.id/maps/place/Patanjali+Powai/data=!4m7!3m6!1s0x3be7c7e3010cb5d5:0xa06d2c38a41eb003!8m2!3d19.1186185!4d72.9039256!16s%2Fg%2F11f_j2zbxn!19sChIJ1bUMAePH5zsRA7AepDgsbaA?authuser=0&hl=en&rclk=1"
[[4]]
[1] "https://www.google.co.id/maps/place/Patanjali+Chikitsalay/data=!4m7!3m6!1s0x3be7c7a9b2ec66a7:0xfcf4c5119bd05d3b!8m2!3d19.118694!4d72.903835!16s%2Fg%2F11ssjv5bc1!19sChIJp2bssqnH5zsRO13QmxHF9Pw?authuser=0&hl=en&rclk=1"
We can see that the url for the four stores are now stored in l1 list.
Consolidating everything
review.df2<-review.df2%>%
mutate(Star_Rating=k[["Stars"]])
head(review.df2)
Text
1 \n\n\n\n\na year ago\nWent there at this place yesterday at 9.20pm\nAs mentioned open till 10pm (11/3/23)\n\nThe two lady illiterate staff was soo rude in behavior, (very cheap)\nThey were making faces while we entered inside! ( owner should look out in his video camera)\nAnd after we collected the required product\n(Aloevera juice, amla candy, sapt guggal, etc)\nthey started with ,that no patanjali samriddhi card will be swiped,\nNo bank card will be swiped! (So no discount)\nTotal fraud we are getting after travelling long distance!\nSaid to pay on Google pay!\nAccount is closed for today!\nCome tomorrow for swiping samriddhi card!\nVery disappointed!\n\nNo care for customer!\n\n2\n\nShare
2 \n\n\n\n\n2 years ago\nI wish with no star we could write the review. First time in Patanjali store a pathetic experience. Very rude behaviour. No cooperation at all. Only one suggestion who's reading the review before you pay check the product is in good condition, as they have a very bad attitude on exchanging their own bad products.\nReally very very bad experience.\n\n4\n\nShare
3 \n\n\n\n\n3 years ago\nVery nice store.. you can get here all patanjali products specially all Ayurvedic medicines, patanjali also sells grocery items like all spices, pulses, whole Fran's, grains, oils. This store also sells milk and milk items. Nice staff and owner. Nice service and good maintenance.. Patanjali products are fine.\n\n1\n\nShare
4 \n\n\n\n\n2 years ago\nWent there to buy two things and both were not available\nCorona tablets and Atta\n\nLike\n\nShare
5 \n\n\n\n\n2 years ago\nGood place but there is no good customer service\n\nLike\n\nShare
6 \n\n\n\n\n5 years ago\nGood price and also have an doctor for consulting.\n\n1\n\nShare
Review
1 Went there at this place yesterday at 9.20pmAs mentioned open till 10pm (11/3/23)The two lady illiterate staff was soo rude in behavior, (very cheap)They were making faces while we entered inside! ( owner should look out in his video camera)And after we collected the required product(Aloevera juice, amla candy, sapt guggal, etc)they started with ,that no patanjali samriddhi card will be swiped,No bank card will be swiped! (So no discount)Total fraud we are getting after travelling long distance!Said to pay on Google pay!Account is closed for today!Come tomorrow for swiping samriddhi card!Very disappointed!No care for customer!2
2 I wish with no star we could write the review. First time in Patanjali store a pathetic experience. Very rude behaviour. No cooperation at all. Only one suggestion who's reading the review before you pay check the product is in good condition, as they have a very bad attitude on exchanging their own bad products.Really very very bad experience.4
3 Very nice store.. you can get here all patanjali products specially all Ayurvedic medicines, patanjali also sells grocery items like all spices, pulses, whole Fran's, grains, oils. This store also sells milk and milk items. Nice staff and owner. Nice service and good maintenance.. Patanjali products are fine.1
4 Went there to buy two things and both were not availableCorona tablets and Atta
5 Good place but there is no good customer service
6 Good price and also have an doctor for consulting.1
Time_Stamp Star_Rating
1 a year 1 star
2 2 years 1 star
3 3 years 5 stars
4 2 years 1 star
5 2 years 3 stars
6 5 years 4 stars