Accessing WoSIS latest from R

This document shows how to access WoSIS Web Feature Service (WFS) data from R.

For an overview of WoSIS, see https://www.isric.org/explore/wosis. This links to https://www.isric.org/explore/wosis/accessing-wosis-derived-datasets which explains the difference between snapshot and dynamic datasets, and how to access them. The Procedures Manual https://www.isric.org/sites/default/files/isric_report_2018_01.pdf explains how to access the database in various ways; here we expand their brief description of using R for this purpose.

1 Packages

If you do not have these on your system, install with install.packages(..., dependencies=TRUE) or via the R Studio package manager.


library(sp) # spatial data types
library(rgdal) # GDAL access from R


## rgdal: version: 1.4-4, (SVN revision 833)
## Geospatial Data Abstraction Library extensions to R successfully loaded
## Loaded GDAL runtime: GDAL 2.1.3, released 2017/20/01
## Path to GDAL shared files: /Library/Frameworks/R.framework/Versions/3.5/Resources/library/rgdal/gdal
## GDAL binary built with GEOS: FALSE
## Loaded PROJ.4 runtime: Rel. 4.9.3, 15 August 2016, [PJ_VERSION: 493]
## Path to PROJ.4 shared files: /Library/Frameworks/R.framework/Versions/3.5/Resources/library/rgdal/proj
## Linking to sp version: 1.3-1


library(gdalUtils) # wrappers for GDAL utility programs that could be called from the command line
library(aqp)       # Algorithms for Quantitative Pedology


## This is aqp 1.17
##
## Attaching package: 'aqp'
## The following object is masked from 'package:base':
##
## union

GDAL is used for spatial data import/export, coordinate systems etc. Check for a valid GDAL installation:


gdal_setInstallation()
valid_install <- !is.null(getOption("gdalUtils_gdalPath"))
if (valid_install)
   print("Valid GDAL found") else
   {print("No valid GDAL"); quit()}


## [1] "Valid GDAL found"

2 Accessing WoSIS

2.1 Specify WoSIS WFS location and list available layers

Here we get the latest (dynamic) version.


wfs <- "WFS:http://data.isric.org/geoserver/wosis_latest/wfs/"
layers <- ogrinfo(wfs, ro=TRUE, so=TRUE)   # `ro` = read-only; `so` = summary only
length(layers)


## [1] 47


head(layers)


## [1] "INFO: Open of `WFS:http://data.isric.org/geoserver/wosis_latest/wfs/'"
## [2] " using driver `WFS' successful."
## [3] "1: wosis_latest:wosis_latest_bdfi33 (Point)"
## [4] "2: wosis_latest:wosis_latest_bdfiad (Point)"
## [5] "3: wosis_latest:wosis_latest_bdfifm (Point)"
## [6] "4: wosis_latest:wosis_latest_bdfiod (Point)"

There are 47 layers. Their names can be listed here with print(layers); these are also given in the references above.

2.2 Display layer properties

We need to know the layer’s format, extent etc. before we read it. Select the first property in the information list above. Again call ogrinfo but this time also specifying a layer within the WFS.


ogrinfo(wfs, ro=TRUE, so=TRUE, layer="wosis_latest_bdfi33")


## [1] "INFO: Open of `WFS:http://data.isric.org/geoserver/wosis_latest/wfs/'"
## [2] " using driver `WFS' successful."
## [3] ""
## [4] "Layer name: wosis_latest:wosis_latest_bdfi33"
## [5] "Metadata:"
## [6] " ABSTRACT=Bulk density of the fine earth fraction < 2 mm, equilibrated at 33 kPa"
## [7] " KEYWORD_1=features"
## [8] " KEYWORD_2=wosis_latest_bdfi33"
## [9] " TITLE=Bulk density fine earth - 33 kPa (kg/dm³)"
## [10] "Geometry: Point"
## [11] "Feature Count: 76575"
## [12] "Extent: (-171.927505, -77.848663) - (161.600617, 76.228333)"
## [13] "Layer SRS WKT:"
## [14] "GEOGCS[\"WGS 84\","
## [15] " DATUM[\"WGS_1984\","
## [16] " SPHEROID[\"WGS 84\",6378137,298.257223563,"
## [17] " AUTHORITY[\"EPSG\",\"7030\"]],"
## [18] " AUTHORITY[\"EPSG\",\"6326\"]],"
## [19] " PRIMEM[\"Greenwich\",0,"
## [20] " AUTHORITY[\"EPSG\",\"8901\"]],"
## [21] " UNIT[\"degree\",0.0174532925199433,"
## [22] " AUTHORITY[\"EPSG\",\"9122\"]],"
## [23] " AUTHORITY[\"EPSG\",\"4326\"]]"
## [24] "Geometry Column = geom"
## [25] "gml_id: String (0.0) NOT NULL"
## [26] "profile_id: Integer (0.0)"
## [27] "profile_layer_id: Integer (0.0)"
## [28] "country_name: String (0.0)"
## [29] "upper_depth: Integer(Int16) (0.0)"
## [30] "lower_depth: Integer(Int16) (0.0)"
## [31] "layer_name: String (0.0)"
## [32] "bdfi33_value: String (0.0)"
## [33] "bdfi33_value_avg: Real(Float32) (0.0)"
## [34] "bdfi33_method: String (0.0)"
## [35] "bdfi33_date: String (0.0)"
## [36] "bdfi33_dataset_id: String (0.0)"
## [37] "bdfi33_profile_code: String (0.0)"
## [38] "bdfi33_license: String (0.0)"

We see this is “Bulk density of the fine earth fraction < 2 mm, equilibrated at 33 kPa”. Refer to the procedures manual for how each property (here, bulk density) was determined. We see the feature count (about 76k), its geometry (points, as is everything in WoSIS), its Coordinate Reference System (CRS, listed here as GEOCGS under SRS = “Spatial Reference System”, WKT = “Well-Known Type”), which is EPSG:4326.

This function allows SQL queries to limit the extent of the search, by using the (optional) where argument. To use this we need to know the field names, which we saw in the previous output.

For example, all the subsoil bulk densities from India:


ogrinfo(wfs, ro=TRUE, so=TRUE, layer="wosis_latest_bdfi33",
        where="country_name='India' AND upper_depth > 100")


## [1] "INFO: Open of `WFS:http://data.isric.org/geoserver/wosis_latest/wfs/'"
## [2] " using driver `WFS' successful."
## [3] ""
## [4] "Layer name: wosis_latest:wosis_latest_bdfi33"
## [5] "Metadata:"
## [6] " ABSTRACT=Bulk density of the fine earth fraction < 2 mm, equilibrated at 33 kPa"
## [7] " KEYWORD_1=features"
## [8] " KEYWORD_2=wosis_latest_bdfi33"
## [9] " TITLE=Bulk density fine earth - 33 kPa (kg/dm³)"
## [10] "Geometry: Point"
## [11] "Feature Count: 32"
## [12] "Extent: (75.000000, 9.000000) - (79.983333, 28.583334)"
## [13] "Layer SRS WKT:"
## [14] "GEOGCS[\"WGS 84\","
## [15] " DATUM[\"WGS_1984\","
## [16] " SPHEROID[\"WGS 84\",6378137,298.257223563,"
## [17] " AUTHORITY[\"EPSG\",\"7030\"]],"
## [18] " AUTHORITY[\"EPSG\",\"6326\"]],"
## [19] " PRIMEM[\"Greenwich\",0,"
## [20] " AUTHORITY[\"EPSG\",\"8901\"]],"
## [21] " UNIT[\"degree\",0.0174532925199433,"
## [22] " AUTHORITY[\"EPSG\",\"9122\"]],"
## [23] " AUTHORITY[\"EPSG\",\"4326\"]]"
## [24] "Geometry Column = geom"
## [25] "gml_id: String (0.0) NOT NULL"
## [26] "profile_id: Integer (0.0)"
## [27] "profile_layer_id: Integer (0.0)"
## [28] "country_name: String (0.0)"
## [29] "upper_depth: Integer(Int16) (0.0)"
## [30] "lower_depth: Integer(Int16) (0.0)"
## [31] "layer_name: String (0.0)"
## [32] "bdfi33_value: String (0.0)"
## [33] "bdfi33_value_avg: Real(Float32) (0.0)"
## [34] "bdfi33_method: String (0.0)"
## [35] "bdfi33_date: String (0.0)"
## [36] "bdfi33_dataset_id: String (0.0)"
## [37] "bdfi33_profile_code: String (0.0)"
## [38] "bdfi33_license: String (0.0)"

Here there are only 32 records.

The ogrinfo function also allows an (optional) limitation to a bounding box with the spat argument, as a four-element vector (xmin, ymin, xmax, ymax).

For example, all the profile (site) information from a 2∘×2∘ tile in central Europe:


ogrinfo(wfs, ro=TRUE, so=TRUE, layer="wosis_latest_profiles",
        spat=c(6, 48, 8, 50))[11]


## [1] "Feature Count: 26"

There are 26 profiles from the area.

3 WoSIS layers as ESRI shapefiles

One format provided by WoSIS is the ESRI shapefile.

 

3.1 Import to client system

There seems to be no way to directly import from the WFS to an R workspace object, so there must first be an intermediate step: download the WFS layer to a local directory, and then import as usual for GIS layers.

The ogr2ogr function reads from one format on the server and writes to another in our client. The default output file format is an ESRI Shapefile; another format can be specified with the (optional) f argument.

The possible formats are listed at https://docs.geoserver.org/latest/en/user/services/wfs/outputformats.html. Here we will use the default shapefile and also CSV in the next section. Another popular format is JSON “JavaScript Object Notation”.

The where and spat arguments can also be used here. The ogr2ogr function also allows an (optional) transformation to any EPSG-defined CRS with the t_srs argument.

See ?ogr2ogr for many more options.

We download the first property, i.e., bulk density, and save it a subdirectory of the current path, and with the default format. The directory must be first created if necessary.


wosis.dir.name <- "./wosis"
if (!file.exists(wosis.dir.name)) dir.create(wosis.dir.name)
layer.name <- "wosis_latest_bdfi33"
print(wfs)


## [1] "WFS:https://data.isric.org/geoserver/wosis_latest/wfs/"


ogr2ogr(src=wfs,
        dst=wosis.dir.name,
        layer=layer.name,
        f="ESRI Shapefile",  # this is default, but specify it anyway
        overwrite=TRUE)


## character(0)

Because the default destination format is "ESRI Shapefile" many of the field names have to be shortened.

 

3.2 Reading the shapefile into R

Now read the downloaded shapefile into an R sp object. For shapefiles, the directory and layer names must be specified separately as two arguments, dsn (“data set name”) and layer.


(shapefile.name <- paste0('wosis_latest:',layer.name))


## [1] "wosis_latest:wosis_latest_bdfi33"


# here strings are just that, not to be interpreted as R factors
bd33 <- readOGR(dsn=wosis.dir.name, layer=shapefile.name,
                stringsAsFactors = FALSE)


## OGR data source with driver: ESRI Shapefile
## Source: "/Users/rossiter/data/ISRIC/ISRIC_WoSIS/wosis", layer: "wosis_latest:wosis_latest_bdfi33"
## with 76575 features
## It has 14 fields


class(bd33)


## [1] "SpatialPointsDataFrame"
## attr(,"package")
## [1] "sp"


proj4string(bd33)


## [1] "+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"


names(bd33@data)


## [1] "gml_id" "profile_id" "profile_la" "country_na" "upper_dept"
## [6] "lower_dept" "layer_name" "bdfi33_val" "bdfi33_v_1" "bdfi33_met"
## [11] "bdfi33_dat" "bdfi33_d_1" "bdfi33_pro" "bdfi33_lic"


head(bd33@data)


## gml_id profile_id profile_la country_na upper_dept
## 1 wosis_latest_bdfi33.59122 65778 59122 Tunisia 0
## 2 wosis_latest_bdfi33.59123 65778 59123 Tunisia 17
## 3 wosis_latest_bdfi33.59124 65778 59124 Tunisia 37
## 4 wosis_latest_bdfi33.59125 65778 59125 Tunisia 67
## 5 wosis_latest_bdfi33.59126 65778 59126 Tunisia 84
## 6 wosis_latest_bdfi33.59159 65787 59159 Tunisia 0
## lower_dept layer_name bdfi33_val bdfi33_v_1
## 1 17 Ap {1:1.48} 1.48
## 2 37 Bw1 {1:1.50} 1.50
## 3 67 Bw2 {1:1.39} 1.39
## 4 84 Bz1 {1:1.27} 1.27
## 5 117 Bz2 {1:1.29} 1.29
## 6 7 Ap1 {1:1.36} 1.36
##
bdfi33_met
## 1 {"1:corrections = not specified, calculation = not specified, measurement condition = equilibrated at 33 kPa (~1/3 bar), sample type = natural clod"}
## 2 {"1:corrections = not specified, calculation = not specified, measurement condition = equilibrated at 33 kPa (~1/3 bar), sample type = natural clod"}
## 3 {"1:corrections = not specified, calculation = not specified, measurement condition = equilibrated at 33 kPa (~1/3 bar), sample type = natural clod"}
## 4 {"1:corrections = not specified, calculation = not specified, measurement condition = equilibrated at 33 kPa (~1/3 bar), sample type = natural clod"}
## 5 {"1:corrections = not specified, calculation = not specified, measurement condition = equilibrated at 33 kPa (~1/3 bar), sample type = natural clod"}
## 6 {"1:corrections = not specified, calculation = not specified, measurement condition = equilibrated at 33 kPa (~1/3 bar), sample type = natural clod"}
## bdfi33_dat     bdfi33_d_1     bdfi33_pro     bdfi33_lic
## 1 {1:1986-04-01} TN-SOTER TN-Tozeur CC-BY-NC
## 2 {1:1986-04-01} TN-SOTER TN-Tozeur CC-BY-NC
## 3 {1:1986-04-01} TN-SOTER TN-Tozeur CC-BY-NC
## 4 {1:1986-04-01} TN-SOTER TN-Tozeur CC-BY-NC
## 5 {1:1986-04-01} TN-SOTER TN-Tozeur CC-BY-NC
## 6 {1:1986-04-01} TN-SOTER TN-Satfoura CC-BY-NC

The shapefile has been imported as a SpatialPointsDataFrame with the correct CRS, as we saw from the ogrinfo.

Examine the format of the attribute, this is in field *_val:


head(bd33@data$bdfi33_val)


## [1] "{1:1.48}" "{1:1.50}" "{1:1.39}" "{1:1.27}" "{1:1.29}" "{1:1.36}"

The format is {seq:val[,seq:val]} where the seq is an integer on [1...] indicating which measurement number – note that there can be more than one measurement per property, e.g., repeated lab. measurements, and val is the numeric value.

But the average value has its own field, so if we only want the average, it is prepared for us. We see an example here, from six rows chosen to show several profiles with their layers:


bd33@data[75:80, c("profile_id","upper_dept","lower_dept","bdfi33_val","bdfi33_v_1")]


## profile_id upper_dept lower_dept bdfi33_val bdfi33_v_1
## 75 50219 97 147 {1:2.06} 2.06
## 76 50219 147 187 {1:2.06} 2.06
## 77 50219 187 216 {1:2.12} 2.12
## 78 47145 0 15 {1:1.12,2:1.08,3:1.41} 1.20
## 79 47145 15 40 {1:1.04,2:1.36,3:0.99} 1.13
## 80 47145 40 70 {1:0.97,2:0.95,3:1.25} 1.06

 

3.3 Working with the WoSIS layer as a SoilProfileCollection

Convert this to a SoilProfileCollection, a data type defined in the aqp “Algorithms for Quantitive Pedology” package. This data type has separate structures for the site (profile) and horizons.

Initialize with the horizons from the data frame created from the WoSIS layer. The aqp::depth function initializes the SoilProfileCollection object. The formula has the field name of the profile on the left, and the the field names of the horizon boundaries on the right. These fields are in the WoSIS layer.


ds.aqp <- bd33@data
depths(ds.aqp) <- profile_id ~ upper_dept + lower_dept
is(ds.aqp)


## [1] "SoilProfileCollection"


slotNames(ds.aqp)


## [1] "idcol" "hzidcol" "depthcols" "metadata" "horizons"
## [6] "site" "sp" "diagnostic"

 


str(ds.aqp@site)


## 'data.frame': 14602 obs. of 1 variable:

 

## $ profile_id: chr "47145" "47146" "47147" "47148" ...


str(ds.aqp@horizons)


## 'data.frame': 76575 obs. of 15 variables:
## $ gml_id : chr "wosis_latest_bdfi33.597709" "wosis_latest_bdfi33.597710" "wosis_latest_bdfi33.597711" "wosis_latest_bdfi33.597712" ...
## $ profile_id: int 47145 47145 47145 47145 47145 47145 47146 47146 47146 47146 ...
## $ profile_la: int 597709 597710 597711 597712 597713 597714 597715 597716 597717 597718 ...
## $ country_na: chr "Burundi" "Burundi" "Burundi" "Burundi" ...
## $ upper_dept: int 0 15 40 70 110 170 0 12 30 55 ...
## $ lower_dept: int 15 40 70 110 170 210 12 30 55 90 ...
## $ layer_name: chr "A" "Bho" "Bo1" "Bo2" ...
## $ bdfi33_val: chr "{1:1.12,2:1.08,3:1.41}" "{1:1.04,2:1.36,3:0.99}" "{1:0.97,2:0.95,3:1.25}" "{1:1.14,2:1.12,3:1.43}" ...
## $ bdfi33_v_1: num 1.2 1.13 1.06 1.23 1.23 1.4 1.18 1.18 1.03 1.12 ...
## $ bdfi33_met: chr "{\"1:sample type = not specified, corrections = not specified, calculation = not specified, measurement conditi"| __truncated__ "{\"1:sample type = not specified, corrections = not specified, calculation = not specified, measurement conditi"| __truncated__ "{\"1:sample type = not specified, corrections = not specified, calculation = not specified, measurement conditi"| __truncated__ "{\"1:sample type = not specified, corrections = not specified, calculation = not specified, measurement conditi"| __truncated__ ...
## $ bdfi33_dat: chr "{1:1983-10-03,2:1983-10-03,3:1983-10-03}" "{1:1983-10-03,2:1983-10-03,3:1983-10-03}" "{1:1983-10-03,2:1983-10-03,3:1983-10-03}" "{1:1983-10-03,2:1983-10-03,3:1983-10-03}" ...
## $ bdfi33_d_1: chr "US-NCSS" "US-NCSS" "US-NCSS" "US-NCSS" ...
## $ bdfi33_pro: chr "84P0286" "84P0286" "84P0286" "84P0286" ...
## $ bdfi33_lic: chr "CC-BY" "CC-BY" "CC-BY" "CC-BY" ...
## $ hzID : int 1 2 3 4 5 6 7 8 9 10 ...


head(ds.aqp@site)


## profile_id
## 1 47145
## 2 47146
## 3 47147
## 4 47148
## 5 47149
## 6 47150


head(ds.aqp@horizons[c(2,5,6,7,9)],12)


## profile_id upper_dept lower_dept layer_name bdfi33_v_1
## 78 47145 0 15 A 1.20
## 79 47145 15 40 Bho 1.13
## 80 47145 40 70 Bo1 1.06
## 81 47145 70 110 Bo2 1.23
## 82 47145 110 170 Bo3 1.23
## 83 47145 170 210 Bt 1.40
## 84 47146 0 12 A 1.18
## 85 47146 12 30 Bho 1.18
## 86 47146 30 55 Bo1 1.03
## 87 47146 55 90 Bo2 1.12
## 88 47146 90 150 Bt 1.27
## 89 47146 150 200 R / Bt 1.56

Note how the horizons have been grouped into sites, in the @site slot, and the per-horizon (by depth) values are in the @horizons slot. Here we have 76575 horizons in 14602 profiles.

Now this SoilProfileCollection can be used for many aqp functions. For example, here is the depth distribution of average bulk density of the components for the first 24 listed profiles, labelled by genetic horizon:


plot(ds.aqp[1:24,], name="layer_name", color='bdfi33_v_1')

A few layers in this set of profiles are missing bulk denisity.

4 WoSIS layers as CSV files

Another format provided by WoSIS is the CSV ‘comma-separated values’ file.

 

4.1 Read a layer into the client system as a CSV file

We can instead read point information into a text file. Here the profile information for the 2∘×2∘ tile in central Europe:

(Note: overwrite=TRUE does not seem to work for .csv files, so any previous must be manually deleted).


wosis.dir.name <- "./wosis"
layer.name <- "wosis_latest_profiles"
target.name <- paste0(wosis.dir.name,"/",layer.name,".csv")
if (file.exists(target.name)) { file.remove(target.name) }


## [1] TRUE


ogr2ogr(src=wfs, 
        spat=c(6, 48, 8, 50),
        dst=target.name,
        layer=layer.name,
        f="CSV",
        overwrite=TRUE)


## character(0)

And the bulk density:


layer.name <- "wosis_latest_bdfi33"
target.name <- paste0(wosis.dir.name,"/",layer.name,".csv")
if (file.exists(target.name)) { file.remove(target.name) }


## [1] TRUE


ogr2ogr(src=wfs,
        dst=paste0(wosis.dir.name,"/",layer.name,".csv"),
        layer=layer.name,
        f="CSV",
        overwrite=TRUE)


## character(0)

 

 

4.2 Read from a local text file into R objects

The read.csv function reads from a CSV file into an R data.frame.

First, the bulk density points:


layer.name <- "wosis_latest_bdfi33"
bd33.pts <- read.csv(file=paste0(wosis.dir.name,"/",layer.name,".csv"),
                stringsAsFactors = FALSE)
dim(bd33.pts)


## [1] 76575 14


names(bd33.pts)


## [1] "gml_id" "profile_id" "profile_layer_id"
## [4] "country_name" "upper_depth" "lower_depth"
## [7] "layer_name" "bdfi33_value" "bdfi33_value_avg"
## [10] "bdfi33_method" "bdfi33_date" "bdfi33_dataset_id"
## [13] "bdfi33_profile_code" "bdfi33_license"

Notice that here we have the complete field names, not truncated as they were in the shapefiles.

These can now be processed as above (the shapefiles).

Second, the profiles (sites):


layer.name <- "wosis_latest_profiles"
profiles <- read.csv(paste0(wosis.dir.name, "/",layer.name,".csv"),
                stringsAsFactors = FALSE)
names(profiles)


##  [1] "gml_id"                         "profile_id"                    
##  [3] "dataset_id"                     "country_id"                    
##  [5] "country_name"                   "geom_accuracy"                 
##  [7] "latitude"                       "longitude"                     
##  [9] "dsds"                           "cfao_version"                  
## [11] "cfao_major_group_code"          "cfao_major_group"              
## [13] "cfao_soil_unit_code"            "cfao_soil_unit"                
## [15] "cwrb_version"                   "cwrb_reference_soil_group_code"
## [17] "cwrb_reference_soil_group"      "cwrb_prefix_qualifier"         
## [19] "cwrb_suffix_qualifier"          "cstx_version"                  
## [21] "cstx_order_name"                "cstx_suborder"                 
## [23] "cstx_great_group"               "cstx_subgroup"


coordinates(profiles) <- c("longitude", "latitude")
proj4string(profiles) <- CRS("+init=epsg:4326")

Review some site information, e.g., the WRB Reference Soil Groups:


table(profiles@data$cwrb_reference_soil_group)


## 
## Albeluvisols     Andosols    Cambisols    Fluvisols     Gleysols 
##            1            1            7            1            2 
##    Histosols     Luvisols    Phaeozems      Podzols     Regosols 
##            1            3            1            6            1 
##    Umbrisols    Vertisols 
##            1            1

And a map:


spplot(profiles, zcol="cwrb_reference_soil_group", key.space="right")

Contact: