Francesco Bailo (francesco.bailo@sydney.edu.au)
library(sf)
## Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
Package’s documentation:https://cran.r-project.org/web/packages/sf/index.html
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0 ✔ purrr 1.0.1
## ✔ tibble 3.2.0 ✔ dplyr 1.0.10
## ✔ tidyr 1.2.1 ✔ stringr 1.5.0
## ✔ readr 2.1.3 ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
regions.sf <-
sf::read_sf("data/Reg01012020_g_WGS84.shp")
st_crs(regions.sf)
## Coordinate Reference System:
## User input: WGS 84 / UTM zone 32N
## wkt:
## PROJCRS["WGS 84 / UTM zone 32N",
## BASEGEOGCRS["WGS 84",
## DATUM["World Geodetic System 1984",
## ELLIPSOID["WGS 84",6378137,298.257223563,
## LENGTHUNIT["metre",1]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## ID["EPSG",4326]],
## CONVERSION["UTM zone 32N",
## METHOD["Transverse Mercator",
## ID["EPSG",9807]],
## PARAMETER["Latitude of natural origin",0,
## ANGLEUNIT["Degree",0.0174532925199433],
## ID["EPSG",8801]],
## PARAMETER["Longitude of natural origin",9,
## ANGLEUNIT["Degree",0.0174532925199433],
## ID["EPSG",8802]],
## PARAMETER["Scale factor at natural origin",0.9996,
## SCALEUNIT["unity",1],
## ID["EPSG",8805]],
## PARAMETER["False easting",500000,
## LENGTHUNIT["metre",1],
## ID["EPSG",8806]],
## PARAMETER["False northing",0,
## LENGTHUNIT["metre",1],
## ID["EPSG",8807]]],
## CS[Cartesian,2],
## AXIS["(E)",east,
## ORDER[1],
## LENGTHUNIT["metre",1]],
## AXIS["(N)",north,
## ORDER[2],
## LENGTHUNIT["metre",1]],
## ID["EPSG",32632]]
The coordinate reference system is important as it defines the unit
(e.g. meters or degrees) for the grid size. In this case the CRS is
WGS 84 / UTM zone 32N
or ID["EPSG",32632]]
which is in meters (see
https://epsg.io/32632).
regions.sf %>%
ggplot2::ggplot() +
ggplot2::geom_sf()
You want a mask to your grid, so if you gave multiple features (here 20 regions) you might want to combine them (resolving the internal boundaries). With sf and dplyr this can achieved with:
regions.sf$unit <-
1
italy.sf <-
regions.sf %>%
dplyr::group_by(unit) %>%
dplyr::summarize(AREA = sum(SHAPE_AREA))
italy.sf %>%
ggplot2::ggplot() +
ggplot2::geom_sf()
This is a two step process.
With st_make_grid
we create the grid as simple feature object.
cellsize
is in the unit of the mask feature (here italy.sf
is in
meters). square = FALSE
creates an hexagonal grid instead of a
square grid.
With st_sf
we append to the simple feature object a data frame -
for future data analysis.
italy_hex.sf <-
sf::st_make_grid(italy.sf,
cellsize = 15000,
what = "polygons",
square = FALSE)
italy_hex.sf <-
sf::st_sf(hex_id = 1:length(lengths(italy_hex.sf)),
italy_hex.sf)
italy_hex.sf %>%
ggplot2::ggplot() +
ggplot2::geom_sf()
Finally we crop the grid
italy_hex_cropped.sf <-
sf::st_intersection(italy_hex.sf,
italy.sf %>% st_make_valid())
## Warning: attribute variables are assumed to be spatially constant throughout
## all geometries
italy_hex_cropped.sf %>%
ggplot2::ggplot() +
ggplot2::geom_sf()
Let’s generate a buch of random points within the feature of
regions.sf
. Note: we use st_buffer
so that some points is going to
lay outside of the area we want to interst.
random_pnt.sf <-
sf::st_sample(sf::st_buffer(regions.sf, 50000),
size = rep(100, nrow(regions.sf)))
Our points are in the same CRS of all the other simple features (by design). If this is not the case, you must convert now to a common CRS with
random_pnt.sf <-
sf::st_transform(random_pnt.sf, 32632)
and add a data frame to it
random_pnt.sf <-
sf::st_sf(pnt_id = 1:length(lengths(random_pnt.sf)),
random_pnt.sf)
random_pnt.sf %>%
ggplot2::ggplot() +
ggplot2::geom_sf() +
ggplot2::geom_sf(data = italy_hex_cropped.sf, fill = NA, alpha = .2)
point_intersection.list <-
sf::st_intersects(random_pnt.sf,
italy_hex_cropped.sf)
point_intersection.list
## Sparse geometry binary predicate list of length 2000, where the
## predicate was `intersects'
## first 10 elements:
## 1: 118
## 2: 59
## 3: (empty)
## 4: 76
## 5: 192
## 6: (empty)
## 7: (empty)
## 8: 282
## 9: 97
## 10: 415
Let’s make sure we don’t lose any point in the conversion from list to
vector in case they are out of the intersecting area (here the area of
italy_hex_cropped.sf
). If they are, st_intersects
intersect will
return a vector of length 0 instead of the index of the intersecting
feature. Vectors of length 0 in are lost like tears in the rain when we
unlist()
.
point_intersection.list[sapply(point_intersection.list, FUN = function(x) length(x) == 0)] <- NA
point_intersection.list
## Sparse geometry binary predicate list of length 2000, where the
## predicate was `intersects'
## first 10 elements:
## 1: 118
## 2: 59
## 3: NA
## 4: 76
## 5: 192
## 6: NA
## 7: NA
## 8: 282
## 9: 97
## 10: 415
Then we can add the resulting italy_hex_cropped.sf$hex_id
to
random_pnt.sf
.
random_pnt.sf$hex_id <-
italy_hex_cropped.sf$hex_id[unlist(point_intersection.list)]
head(random_pnt.sf)
## Simple feature collection with 6 features and 2 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: 303389.3 ymin: 4930125 xmax: 451187.8 ymax: 5163356
## Projected CRS: WGS 84 / UTM zone 32N
## pnt_id random_pnt.sf hex_id
## 1 1 POINT (429070.5 4931973) 839
## 2 2 POINT (373757 5013787) 542
## 3 3 POINT (303389.3 4980196) NA
## 4 4 POINT (388447.9 4930125) 639
## 5 5 POINT (451187.8 4989163) 1041
## 6 6 POINT (418223.4 5163356) NA
random_pnt.sf %>%
ggplot2::ggplot() +
ggplot2::geom_sf(aes(colour = hex_id)) +
scale_colour_viridis_c()
As we can see, hex_id
is assigned from North West to South East.