Part 1-Basic
2023-11-13
Introduction
Geo-spatial analysis is process of analyzing information by studying features such as location, attributes and their relationship that helps us uncover certain patterns happening at geographical level. For example, a company might want to look at how different retail stores are spread across a state and is there a possible white space opportunity available.Below are some of the other use cases where geo-spatial analysis can be used.
- Map certain metrics such as footprint, store productivity for a given location to draw a comparison among various stores
- White space opportunities
- Understand store coverage using catchment analysis(using constant time or constant area as catchment metrics)
- Competitor Intelligence: Understand the coverage between yours and competitor’s by answering key questions such as where are they located, what is their coverage, has their been a shift in certain location preferences (CBD Vs Heartland), etc
In this blog we will look at how to perform geo-spatial analysis through a step by step approach. We will discuss several use cases and how we answer certain key business questions.
Step 1:Libraries required in R
We would require libraries such as sf, leaflet, osrm, etc to play around with geo-spatial objects.These libraries also tend to use OpenStreetMap.It is a free, open geographic database updated and maintained by a community of volunteers via open collaboration. Contributors collect data from surveys, trace from aerial imagery and also import from other freely licensed geodata sources.
Lets install the libraries
## Loading required package: tidyverse
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.1 ✔ purrr 1.0.1
## ✔ tibble 3.1.8 ✔ dplyr 1.1.0
## ✔ tidyr 1.3.0 ✔ stringr 1.5.0
## ✔ readr 2.1.4 ✔ forcats 1.0.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## Loading required package: sf
## Warning: package 'sf' was built under R version 4.2.3
## Linking to GEOS 3.9.3, GDAL 3.5.2, PROJ 8.2.1; sf_use_s2() is TRUE
## Loading required package: osrm
## Warning: package 'osrm' was built under R version 4.2.3
## Data: (c) OpenStreetMap contributors, ODbL 1.0 - http://www.openstreetmap.org/copyright
## Routing: OSRM - http://project-osrm.org/
## Loading required package: leaflet
## Warning: package 'leaflet' was built under R version 4.2.3
## Loading required package: RColorBrewer
Step 2: Defining the business problem
Before doing any analysis, it is very important to identify and define the business problem.This ensures that the analysis is precise and targeted.
Lets say that McDonald want to identify some location in Indonesia where they can open their outlets.To approach this, we would be using population data and will try to narrow down on places that high population density.This would be our starting point. At the end of the analysis, we should be able to provide names of few cities which can be considered for such expansion.
Indonesia has a total of 38 provinces and then each province has several cities and towns. For our analysis, we would look at analyzing the following:
- Province level data
- Zooming on city level info for ach province
For instance, Bali is a province in Indonesia and it has a total of 8 cities as shown below:
Step 3: Get the shape file for Indonesia Province Level Map
Many BI experts that work in tableau know that to plot a geo-heat map, we need to have a shape file in place. The shape file is then read into tableau which has all the geographical details to plot the map. However, in R, I have experienced that it is very difficult to find a shape file online.In order to circumvent this situation, we leverage geo-json or json files. These files ar easily available on github repo and can be ready into R and converted into shape files.
Lets say we have to plot the population of various provinces in Indonesia.For this, we would require province level geo-json or json objects which can be read into R.
If you just search this on google, you might get the following options
Just open this repo
access the geo-json file
Download the file
Just follow these steps and download the file.The name of the file is “indonesia.geojson” as shown below
I have added the file to my repository. You can download from this link.
Step 4: Read the geo-json file into R
We will now read the file into R
nm_json<-"indonesia.geojson"
indo_prov <- read_sf(nm_json )
indo_prov
## Simple feature collection with 34 features and 5 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 95.01083 ymin: -11.00972 xmax: 141.0118 ymax: 5.906388
## Geodetic CRS: WGS 84
## # A tibble: 34 × 6
## cartodb_id country id_1 slug state geometry
## <int> <chr> <int> <chr> <chr> <MULTIPOLYGON [°]>
## 1 16 Indonesia 1 indonesia-aceh Aceh (((97.97681 4.627501, 98…
## 2 2 Indonesia 31 indonesia-sumater… Suma… (((99.17167 -1.502501, 9…
## 3 8 Indonesia 34 indonesia-yogyaka… Yogy… (((110.702 -8.185054, 11…
## 4 20 Indonesia 33 indonesia-sumater… Suma… (((98.71384 3.769471, 99…
## 5 7 Indonesia 3 indonesia-bangkab… Bang… (((105.3475 -1.84469, 10…
## 6 5 Indonesia 7 indonesia-papuaba… Papu… (((134.2333 -1.741944, 1…
## 7 4 Indonesia 12 indonesia-jawatim… Jawa… (((113.5921 -7.714859, 1…
## 8 10 Indonesia 13 indonesia-kaliman… Kali… (((108.9246 0.558608, 10…
## 9 3 Indonesia 14 indonesia-kaliman… Kali… (((114.5128 -3.54225, 11…
## 10 6 Indonesia 16 indonesia-kaliman… Kali… (((117.0173 -1.164211, 1…
## # … with 24 more rows
There are couple of features in the data read from geo-json file. Most peculiar feature is the geometry column which contains contains coordinates of each province.If we look at the class of the data read, it comes as sf object
class(indo_prov)
## [1] "sf" "tbl_df" "tbl" "data.frame"
Step 5: Plotting the shape file
leaflet(indo_prov) %>%
addTiles() %>%
addPolygons(fillColor = "blue",
color = "#000000",
weight = .5,
fillOpacity = 0.5
,
popup = ~glue::glue("<b>{state}</b><br>{state}")
)
For the sake of this blog,we will create a column with random values and then add this column to the shape file.We will also add another column “Type” to indicate Low, Medium and High values.
Step 5: Adding a dummy column to the shape file
indo_prov2<-indo_prov%>%
mutate(Random_Value = rnorm(nrow(indo_prov),5,2))
indo_prov2$Type <- ifelse(indo_prov2$Random_Value < quantile(indo_prov2$Random_Value,0.30),"Low",
ifelse(indo_prov2$Random_Value < quantile(indo_prov2$Random_Value,0.80),"Medium","High"))
head(indo_prov2)
## Simple feature collection with 6 features and 7 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 95.01083 ymin: -8.203611 xmax: 135.2577 ymax: 5.906388
## Geodetic CRS: WGS 84
## # A tibble: 6 × 8
## cartodb_id country id_1 slug state geometry Rando…¹ Type
## <int> <chr> <int> <chr> <chr> <MULTIPOLYGON [°]> <dbl> <chr>
## 1 16 Indonesia 1 indo… Aceh (((97.97681 4.627501, 98… 1.76 Low
## 2 2 Indonesia 31 indo… Suma… (((99.17167 -1.502501, 9… 4.15 Low
## 3 8 Indonesia 34 indo… Yogy… (((110.702 -8.185054, 11… 7.28 High
## 4 20 Indonesia 33 indo… Suma… (((98.71384 3.769471, 99… 4.64 Medi…
## 5 7 Indonesia 3 indo… Bang… (((105.3475 -1.84469, 10… 6.23 Medi…
## 6 5 Indonesia 7 indo… Papu… (((134.2333 -1.741944, 1… 6.92 High
## # … with abbreviated variable name ¹Random_Value
Step 6: Plotting Random Value in the Map
Plotting the map based on Type column with categorical values
We will create three separate sections of addpolygon depending upon Low, Medium and High value in the Type column
leaflet(indo_prov2) %>%
addTiles() %>%
addPolygons(
# fillColor = "blue",
color = "#000000",
weight = .5,
fillOpacity = 0.5
,
popup = ~glue::glue("<b>{state}</b><br>{state}")
)%>%
addPolygons(data=indo_prov2%>%
filter(Type=="Low"),
fillColor = "green",
fill=T,
weight = .5,
popup = ~glue::glue("<b>{state}</b><br>{state}"))%>%
addPolygons(data=indo_prov2%>%
filter(Type=="Medium"),
fillColor = "blue",
fill=T,
weight = .5,
popup = ~glue::glue("<b>{state}</b><br>{state}"))%>%
addPolygons(data=indo_prov2%>%
filter(Type=="High"),
fillColor = "red",
fill=T,
weight = .5,
popup = ~glue::glue("<b>{state}</b><br>{state}"))
We can see that there are some region with red shades such as Sumatera Utara, Sulawesi Tengara, etc. We will now zoom in on these region using the city level Map
Step 6: Get the shape file for Indonesia City Level Map
Similar to the step 3 that we did for getting the province level Map, we will now download the city level Map for Indonesia.
Go to the github repository as shown below
Open and download the Indonesia cities json file as shown below
The file will be downloaded
City levl file is very big in size and hence I am unable to upload to my repository. Instead, I can point you to the repo I downloaded from. Please download it from here
Step 7: Read the geo-json file into R
We will now read the file into R
nm2_json<-"indonesia-cities.json"
indo_city <- read_sf(nm2_json )
indo_city
## Simple feature collection with 514 features and 8 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 95.01079 ymin: -11.00762 xmax: 141.0194 ymax: 6.07693
## Geodetic CRS: WGS 84
## # A tibble: 514 × 9
## Kind Code Name Year Source Parent Province Country
## <chr> <chr> <chr> <chr> <chr> <int> <chr> <chr>
## 1 City 1101 SIMEULUE 2015 BPS 11 ACEH INDONESIA
## 2 City 1102 ACEH SINGKIL 2015 BPS 11 ACEH INDONESIA
## 3 City 1103 ACEH SELATAN 2015 BPS 11 ACEH INDONESIA
## 4 City 1104 ACEH TENGGARA 2015 BPS 11 ACEH INDONESIA
## 5 City 1105 ACEH TIMUR 2015 BPS 11 ACEH INDONESIA
## 6 City 1106 ACEH TENGAH 2015 BPS 11 ACEH INDONESIA
## 7 City 1107 ACEH BARAT 2015 BPS 11 ACEH INDONESIA
## 8 City 1108 ACEH BESAR 2015 BPS 11 ACEH INDONESIA
## 9 City 1109 PIDIE 2015 BPS 11 ACEH INDONESIA
## 10 City 1110 BIREUEN 2015 BPS 11 ACEH INDONESIA
## # … with 504 more rows, and 1 more variable: geometry <MULTIPOLYGON [°]>
Step 8: Plotting the shape file
leaflet(indo_city) %>%
addTiles() %>%
addPolygons(fillColor = "blue",
color = "#000000",
weight = .5,
fillOpacity = 0.5
,
popup = ~glue::glue("<b>{Name}</b><br>{Name}")
)
Step 9: Plotting the cities of Bali
indo_city2<-indo_city%>%
mutate(Random_Value = rnorm(nrow(indo_city),5,2))
indo_city2$Type <- ifelse(indo_city2$Random_Value < quantile(indo_city2$Random_Value,0.30),"Low",
ifelse(indo_city2$Random_Value < quantile(indo_city2$Random_Value,0.80),"Medium","High"))
head(indo_city2)
## Simple feature collection with 6 features and 10 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 95.38908 ymin: 2.004084 xmax: 98.19707 ymax: 5.249793
## Geodetic CRS: WGS 84
## # A tibble: 6 × 11
## Kind Code Name Year Source Parent Province Country
## <chr> <chr> <chr> <chr> <chr> <int> <chr> <chr>
## 1 City 1101 SIMEULUE 2015 BPS 11 ACEH INDONESIA
## 2 City 1102 ACEH SINGKIL 2015 BPS 11 ACEH INDONESIA
## 3 City 1103 ACEH SELATAN 2015 BPS 11 ACEH INDONESIA
## 4 City 1104 ACEH TENGGARA 2015 BPS 11 ACEH INDONESIA
## 5 City 1105 ACEH TIMUR 2015 BPS 11 ACEH INDONESIA
## 6 City 1106 ACEH TENGAH 2015 BPS 11 ACEH INDONESIA
## # … with 3 more variables: geometry <MULTIPOLYGON [°]>, Random_Value <dbl>,
## # Type <chr>
color_palet<-colorNumeric(palette = "YlOrRd",domain = indo_city2%>%
filter(Province=="BALI")%>%
select(Random_Value)%>%
pull(Random_Value))
Plotting the Map
indo_city_bali<-indo_city2%>%
filter(Province=="BALI")
leaflet(indo_city_bali) %>%
addTiles() %>%
addPolygons(
# fillColor = "blue",
color = "#000000",
weight = .5,
fillOpacity = 0.5
,
)%>%
addPolygons(data=indo_city_bali,
fillColor = ~color_palet(Random_Value),
fill=T,
weight = .5,
popup = ~glue::glue("<b>{Name}</b><br>{Name}"),
highlightOptions = highlightOptions(color = "black",
weight = 2,
bringToFront = TRUE))
Concluding thoughts
The purpose of the blog was to provide a step by step guide on how to access publicly available geographical maps at various levels of granularity(province vs city) and then use them to analyse a metric such as population density. In the next few blogs we will look at how to further leverage maps along with certain demographic information to analyse various use cases.