-
Notifications
You must be signed in to change notification settings - Fork 14
/
Copy path2_spDataTypes.Rmd
471 lines (317 loc) · 19.8 KB
/
2_spDataTypes.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
---
title: "Introduction to Spatial Data Types in R"
author: "claudia a engel"
date: "Last updated: `r format(Sys.time(), '%B %d, %Y')`"
output:
html_document:
toc: true
toc_depth: 4
theme: spacelab
mathjax: default
fig_width: 6
fig_height: 6
---
<!--html_preserve-->
<a href="https://github.com/cengel/rpubs-rspatial"><img style="position: absolute; top: 0; right: 0; border: 0;" src="https://camo.githubusercontent.com/a6677b08c955af8400f44c6298f40e7d19cc5b2d/68747470733a2f2f73332e616d617a6f6e6177732e636f6d2f6769746875622f726962626f6e732f666f726b6d655f72696768745f677261795f3664366436642e706e67" alt="Fork me on GitHub" data-canonical-src="https://s3.amazonaws.com/github/ribbons/forkme_right_gray_6d6d6d.png"></a>
<!--/html_preserve-->
```{r knitr_init, echo=FALSE, cache=FALSE}
library(knitr)
#library(rmdformats)
## libraries needed for R code examples
library(sp)
library(raster)
## Global options
options(max.print="75")
opts_chunk$set(echo=TRUE,
cache=TRUE,
prompt=FALSE,
tidy=TRUE,
comment=NA,
message=FALSE,
warning=FALSE)
opts_knit$set(width=75)
```
***
Make sure you have the `sp` and `raster` libraries installed. [Note that the latest `raster`(2.5-2) depends on `sp` (≥ 1.2-0), so make sure that your versions align.]
__Mac users__
Since there is now a Mac binary package for `rgdal` [available on CRAN](https://cran.r-project.org/web/packages/rgdal/index.html) a installing `sp` with the default settings will also install this `rgdal` as one of its dependencies, which includes a built-in basic GDAL that lacks all the extra GDAL formats can handle. There is an alternative `rgdal` distributed by kyngchaos that you can use instead. Formats **not** included in the CRAN distribution but included in the kyngchaos distribution are currently: DODS, Geomedia, Interlis 1, Interlis 2, LIBKML, MSSQLSpatial, NAS, ODBC, OGDI, PGeo, PostgreSQL, OSI, Walk, XLS. (You mostly might care about having the PostgreSQL driver at some point.)
If you __DON'T__ care, simply say:
install.packages(c("sp", "raster"))
If you __DO__ care, follow the instructions below.
#. Install the raster library:
install.packages("raster"))
#. Don't install `sp` with dependencies:
install.packages("sp", dependencies = F)
#. Download the latest GDAL complete from [this site](http://www.kyngchaos.com/files/software/frameworks)
#. Doubleclick and install the downloaded `.dmg` file as you are used to on a Mac.
#. Make sure you have R Version 3.2 or later installed -- if not update it.
#. Download a different rgdal from [this site](http://www.kyngchaos.com/files/software/frameworks).
#. Doubleclick to open the `.dmg` file
#. Move `rgdal_*.tgz` to your Desktop folder
#. Install the local package with:
install.packages("~/Desktop/rgdal_*.tgz", repos = NULL, type = .Platform$pkgType)
__Windows users__
install.packages(c("sp", "raster"))
__Mac and Windows__
Test if all went well:
library (rgdal)
***
## 1. Spatial objects in R
### Conceptualizing a spatial Object
In vector GIS we deal with, points, lines, and polygons:
```{r echo=FALSE}
px <- c(5, 7, 8, 9, 8, 7, 6)
py <- c(7, 3, 4, 8, 9, 15, 14)
plot(px, py, type="n", axes=F, xlab = '', ylab = '')
polygon(px, py)
points(c(6, 9, 8, 8.5), c(9, 14, 8, 9), pch=20)
lines(c(5, 6, 7, 8), c(5, 6,10, 11))
lines(c(8, 9), c(14, 12))
```
***
#### Exercise 1
Discuss with your neighbor: What do we need to specify in order to define spatial vector data?
* lat/lon coordinates
* projection
* attribute data
* if polygon, is it a hole or not
* ... ?
***
In R the `sp` package provides classes and methods for spatial data types[^1].
[^1]: From R Bivand (2011) [Introduction to representing spatial objects in R](http://geostat-course.org/system/files/monday_slides.pdf)
Development of the `sp` began in the early 2000s in an attempt to standardize how spatial data would be treated in R and to allow for better interoparbility between different analysis packages that use spatial data. The package provides classes and methods to create _points_, _lines_, _polygons_, and _grids_ and to operate on them. It is one of the most important packages that you will need to use if dealing with spatial data in R. Many of the spatial analysis packages now use the spatial data types that are implemented in `sp` i.e. they "depend" on the `sp` package.
In `sp` spatial objects are conceptualized in the following way[^2]:
[^2]: Note that this is not the only way spatial objects are conceptualized in R. Other spatial packages may use their own class definitions for spatial data (for example `spatstat`). `sp` provides conversion functions for many of those formats.
The foundational structure for *any* spatial object in `sp` is the `Spatial` class. It has two slots (new-style class objects in R have pre-defined components called slots):
* a __bounding box__
* a __CRS class object__ to define the Coordinate Reference System
## 2. Creating a spatial object
In order to create a spatial object manually the basic steps are:
> I. Create a bunch of points, lines, or polygons (details below)
> II. Convert those to a `Spatial*` object (`*` stands for Points, Lines, or Polygons). This steps adds the bounding box (automatically) and the slot for the Coordinate Reference System or CRS (which needs to be filled manually).
> III. (_Optional_:) Add a data frame with attribute data, which will turn your `Spatial*` object into a `Spatial*DataFrame` object.
### I. Create geometric objects (topology)
__Points__ (which may have 2 or 3 dimensions) as the most basic spatial data object. They are generated out of either a single coordinate or a set of coordinates[^3], like a two-column matrix or a dataframe with a column for latitude and one for longitude.
[^3]: Coordinates should be of type double and will be promoted if not.
__Lines__ are generated out of `Line` objects. A `Line` object is a spaghetti collection of 2D coordinates and is generated out of a two-column matrix or a dataframe with a column for latitude and one for longitude. A `Lines` object is a __list__ of `Line` objects, for example all the contours at a single elevation.
__Polygons__ are generated out `Polygon` objects. A `Polygon` object is a spaghetti collection of 2D coordinates with equal first and last coordinates and is generated out of a two-column matrix or a dataframe with a column for latitude and one for longitude. A `Polygons` object is a __list__ of `Polygon` objects, for example islands belonging to the same country.
### II. Create spatial objects
Both `Line/Lines and Polygon/Polygons` objects are part of the R's basic `graphics` package. So we need to turn those into spatial, i.e. "geographically aware" objects. `SpatialPoints` are directly made out of the coordinates. `SpatialLines` and `SpatialPolygons` objects are made using lists of `Lines` or `Polygons` objects respectively.
### III. Add attributes
The points in a `SpatialPoints` object may be associated with a row of attributes to create a `SpatialPointsDataFrame` object. The coordinates and attributes may, but do not have to be keyed to each other using ID values.
`SpatialLinesDataFrame` and `SpatialPolygonsDataFrame` objects are defined using `SpatialLines` and `SpatialPolygons` objects and standard data frames, and the ID fields are here required to match the data frame row names.
![Simplified diagram of how spatial objects are created in `sp`](images/createSpatialObj.png)
### Spatial methods
A number of spatial methods are available for the classes in `sp`. Among the ones I often use are:
method/class | and what it does
------------ | ----------------
`bbox(x)` | returns the bounding box coordinates
`proj4string(x)` | sets or retrieves projection attributes using the CRS object.
`CRS()` | creates an object of class of coordinate reference system arguments
`spplot(x)` | plots a separate map of all the attributes unless specified otherwise
`coordinates(x)` | returns a matrix with the spatial coordinates. For spatial polygons it returns the centroids.
`over(x, y)` | used for example to retrieve the polygon or grid indexes on a set of points - we'll come back to that one later
`spsample(x)` | sampling of spatial points within the spatial extent of objects
### Creating a spatial object from scratch
***
#### Exercise 2
In this example we will manually generate a point vector object and plot it.
```{r}
xy <- matrix(runif(20), ncol=2) # a matrix with some arbitrary points as coordinates..
xy.sp <- SpatialPoints(xy) # ..converted into a spatial object
plot(xy.sp, pch = 2)
```
Test out some commands:
```{r}
coordinates(xy.sp)
bbox(xy.sp)
str(xy.sp)
summary(xy.sp)
```
Add attributes:
```{r}
df <- data.frame(attr1 = LETTERS[1:10], attr2 = c(10:1))
xy.spdf <- SpatialPointsDataFrame(xy.sp, df)
summary(xy.spdf)
```
Some subsetting:
```{r}
xy.spdf[1:2, ] # row 1 and 2 only
xy.spdf[["attr2"]] # column with "attr2" only
```
***
### Creating a spatial object from a lat/lon table
A `SpatialPointsDataFrame` object can be created directly from `data.frames` by specifying which columns contain the coordinates. This is interesting, for example if you have a spreadsheet that contains latitude, longitude and some values. You an read this into a data frame with `read.table`, and then create the object from the data frame in one step by using the `coordinates()` command. That automatically turns the dataframe object into a `SpatialPointsDataFrame`:
***
####Exercise 3
1. If you haven't already, create a new directory `R_Workshop` on your Desktop.
2. Download and unzip [`RSpatialDataTypes.zip`](https://www.dropbox.com/s/g5p8b1xi2k5lydw/RSpatialDataTypes.zip?dl=1) in this directory.
3. Set your working directory to `R_Workshop`
4. Use `read.csv()` to read `PhiladelphiaZIPHousing.csv` into a dataframe in R and name it `ph.df`.
5. Use `head()` to examine the first few lines of the dataframe.
6. Use `class()` to examine which object class the table belongs to.
7. Convert the table into a spatial object with:
`coordinates(ph.df) <- c("lon", "lat")`
8. Use `class()`again to examine which object class the table belongs to now:
What to you observe?
9. Plot, using the attributes from the table:
`bubble(ph.df, "price")` or
`spplot(ph.df, "use")`
### A brief, but important word about projection.
Note that the Spatial object you just created __does not__ have a projection defined. It is ok to plot, but be aware that for any meaningful spatial operation you will need to define a projection.
10. This is how it's done:
```{r eval=FALSE}
is.projected(ph.df) # see if a projection is defined
proj4string(ph.df) <- CRS("+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs") # this is WGS84
is.projected(ph.df) # voila
```
***
The following example[^4] shows how a set of polygons are built from scratch. Note that the coordinates of the _Sr4_ polygon move in the opposite direction (anti-clockwise) than the other three (clockwise); _Sr4_ is meant to represent a hole in the _Sr3_ polygon. The default value for the hole colour is "transparent". Note that the Polygons objects have to be given character ID values, and that these values must be unique for `Polygons` objects combined in a `SpatialPolygons` object.
[^4]:from [Edzer Pebesma Roger S. Bivand Feb 2005: S Classes and Methods for Spatial Data: the sp Package](http://cran.r-project.org/web/packages/sp/vignettes/intro_sp.pdf)
```{r}
# create polyon objects from coordinates
Sr1 <- Polygon(cbind(c(2,4,4,1,2),c(2,3,5,4,2)))
Sr2 <- Polygon(cbind(c(5,4,2,5),c(2,3,2,2)))
Sr3 <- Polygon(cbind(c(4,4,5,10,4),c(5,3,2,5,5)))
Sr4 <- Polygon(cbind(c(5,6,6,5,5),c(4,4,3,3,4)), hole = TRUE)
# create lists of polygon objects from polygon objects and unique ID
Srs1 <- Polygons(list(Sr1), "s1")
Srs2 <- Polygons(list(Sr2), "s2")
Srs3 <- Polygons(list(Sr3, Sr4), "s3/4")
# create spatial polygons object from lists
SpP <- SpatialPolygons(list(Srs1,Srs2,Srs3), 1:3)
plot(SpP)
```
In order to add attributes to the polygons the `row.names` of the attributes data frame are matched with the ID slots of the SpatialPolygons object, and the rows of the data frame will be re-ordered if necessary.
Add attributes:
```{r}
attr <- data.frame(numbers=1:3, characters=LETTERS[3:1], row.names=c("s3/4", "s1", "s2"))
SpDf <- SpatialPolygonsDataFrame(SpP, attr)
spplot(SpDf)
```
Let's look at its structure:
```{r}
str(SpDf)
```
Whoa.
Note that we can access the information in the slots using the `@`. For example
```{r}
SpDf@bbox
```
However, it is **strongly encouraged** to use the provided functions and methods instead, like
```{r}
bbox(SpDf)
```
To look at the attribute table:
```{r}
as.data.frame(SpDf) # instad of: SpDf@data
```
## 3. Getting spatial data in and out of R
The good news is that typically we do not have to create `Spatial` family objects as tediously as we did above. It is much more common that we work with already existing spatial data.
### How to work with `rgdal`
In order to read those into R and turn them into `Spatial*` family objects we rely on the `rgdal` package. It provides us direct access to the powerful [GDAL library](http://gdal.org) from within R.
We can read in and write out spatial data using:
`readOGR()` and `writeOGR()` (for vector)
`readGDAL()` and `writeGDAL()` (for raster/grids)
The parameters provided for each function vary depending on the exact spatial file type you are reading. We will take an ESRI shapefile as an example. A shapefile - as you know - [consists of various files](https://en.wikipedia.org/wiki/Shapefile), and R expects all those files to be in one directory.
When reading in a shapefile, `readOGR()` expects at least the following two arguments:
datasource name (dsn) # the path to the folder that contains the files
# Note that this is a path to the folder, not a filename!
layer name (layer) # the shapefile name without extension
# Note that this is not a path but just the name of the file!
For example, if I have a shapefile called `Philadelphia.shp` and all its associated files (like _.dbf, .prj, .shx_) in a directory called `PH` on my desktop, and I have my working directory set to my desktop folder, my command to read this shapefile would look like this:
```
readOGR(dsn = "PH", layer = "Philadelphia")
```
or in short:
```
readOGR("PH", "Philadelphia")
```
***
####Exercise 4
1. Load the `rgdal` package.
2. Make sure your working directory is set to the `R_Workshop` folder and it contains the materials you downloaded and unzipped earlier.
3. Read `PhillyTotalPopHHinc.shp` into an object called `philly`. Make sure you understand the directory structure.
4. Examine the file, for example with `summary()`, `class()`, `str("philly", max.level = 2)`,
5. Take a look at the column names of the attribute data with `names()`
6. Take a look at the attribute data with `head()`
7. Note that subsetting works here: `philly[philly$totalPop > 5000,]`
8. Plot some attribute data: `spplot(philly, "medHHinc")`
***
GDAL supports over 200 [raster formats](http://www.gdal.org/formats_list.html) and [vector formats](http://www.gdal.org/ogr_formats.html). Use
`ogrDrivers()` and `gdalDrivers()`
(without arguments) to find out which formats your `rgdal` install can handle.
### Raster data
Rasters are much more compact that vectors. Because of their regular structure the coordinates do not need to be recorded for each pixel or cell in the rectangular extent. A raster is defined by:
- a CRS
- coordinates of its origin
- a distance or cell size in each direction
- a dimension or numbers of cells in each direction
- an array of values
If necessary, the coordinates for any cell can be computed.
In `sp` the `GridTopology` class is the key element of raster representations[^5]. It contains
* the center coordinate pair of the south-west raster cell,
* the two cell sizes in the metric of the coordinates, giving the step to successive centres, and
* the numbers of cells for each dimension.
[^5]: There is also a `SpatialPixels` object which stores grid topology and coordinates of the actual points.
A simple grid can be built like this:
```{r tidy=F}
# specify the grid topology with the following parameters:
# - the smallest coordinates for each dimension, here: 0,0
# - cell size in each dimension, here: 1,1
# - number of cells in each dimension, here: 5,5
gtopo <- GridTopology(c(0,0), c(1,1), c(5,5)) # create the grid
datafr <- data.frame(runif(25)) # make up some data
SpGdf <- SpatialGridDataFrame(gtopo, datafr) # create the grid data frame
summary(SpGdf)
```
And it can be plotted like this:
```{r}
spplot(SpGdf, sp.layout = c('sp.points', SpatialPoints(coordinates(SpGdf))))
```
Alternatively you can use the `raster` package, which works slightly differently.
The `raster` package is a major extension of spatial data classes to access large rasters and in particular to process very large files. It includes object classes for `RasterLayer`, `RasterStacks`, and `RasterBricks`, functions for converting among these classes, and operators for computations on the raster data. Conversion from `sp` type objects into `raster` type objects is easy.
If we wanted to do the same as above, namely creating the same raster object from scratch we would do the following:
```{r tidy=F}
# specify the RasterLayer with the following parameters:
# - minimum x coordinate (left border)
# - minimum y coordinate (bottom border)
# - maximum x coordinate (right border)
# - maximum y coordinate (top border)
# - resolution (cell size) in each dimension
r <- raster(xmn=-0.5, ymn=-0.5, xmx=4.5, ymx=4.5, resolution=c(1,1))
r
```
So here we have created an object of type `RasterLayer`, as compared to above, where we created an object of type `GridTopology`.
Compare this to the output from above and __note something important here__: Different from the grid object we generated from scratch, this raster object has a CRS defined! If the crs argument is missing when creating the Raster object, the x coordinates are within -360 and 360 and the y coordinates are within -90 and 90, the WGS84 projection is used by default!
Good to know.
To add some values to the cells we could the following. Be aware that different from the `GridTopology` object above, which is converted to a `SpatialGridDataFrame` when adding values, this here object remains a `RasterLayer`.
```{r tidy=F}
class(r)
r <- setValues(r, runif(25))
class(r)
plot(r); points(coordinates(r), pch=3)
```
(See the [`rasterVis` package](https://cran.r-project.org/web/packages/rasterVis/index.html) for more advanced plotting of `Raster*` objects.)
***
#### Exercise 5
We can use `readGDAL` from the `sp` package to read in a raster file. It will require one parameter as a minimum, which is the filename of the raster. So let's do this.
1. Make sure your working directory is set to the `R_Workshop` folder and it contains the materials you downloaded and unzipped earlier.
2. Read in with: `dem <- readGDAL("DEM_10m/bushkill_pa.dem")`[^6]
3. Examine with `summary()`
Alternatively we can use the `raster` package to read in the same file:
4. Load the `raster` library
5. Read in with: `dem.r <- raster("DEM_10m/bushkill_pa.dem")`
6. Examine with `dem.r` and compare to the above
7. Extract contour lines and plot them: `contour(dem.r)`
8. Note that this works too: `contour(dem)`
***
[^6]: From (https://www.e-education.psu.edu/geog482fall2/c7_p8.html)
RasterLayer objects can also be created from a simple matrix.
```{r}
class(volcano)
volcano.r <- raster(volcano)
class(volcano.r)
```
This package becomes interesting when you need to work for example with Landsat images that have multiple bands and want to perform map algebra.
There are currently about 140 [R packages on CRAN for reading, visualising, and analysing (geographical) spatial data](http://cran.r-project.org/web/views/Spatial.html). I recommend to visit that website if you are exploring spatial analysis with R.