Reading data
Info
This guide assumes that you understand what a database is and how you can use SQL to retrieve data from a database. Not the case? Check the SQL Basics first before continuing here.
Worlds¶
Each world in Nexus is stored in a separate database schema.
Vector features¶
Each feature type has its own database table in Nexus. Thus, all individual features of that type are stored in this table. This allows a convenient way to retrieve the features from a particular type.
As an example, let's get a list of all weather stations:
select * from sandbox.weather_station limit 100
Of course all common SQL expressions are available, see the SQL Basics for an overview.
Categorical data¶
Features may contain properties with categorical values. In this case Nexus maintains a separate helper table
that contains a list of all available categories for that property. Helper tables always follow the naming format {feature_type}_{categorical_column}.
As an example, feature type water_body has a property called category, which can only contain a limited set of values.
The helper table waterbody_category lists those possible values, which are sea, waterway, lake and ditch.
Timeseries¶
Nexus stores timeseries metadata in the timeseries table and the timeseries events in the event table.
A typical workflow is therefore to always retrieve timeseries in two separate steps; first to query the timeseries table and obtain the IDs of any timeseries of interest, and then in a second query to obtain the actual events.
Let's try to play around with some meteorological data¶
Normally, weather stations record a variety of meteorological parameters as so-called timeseries. Let's see if our weather stations also have measurement timeseries:
select * from sandbox.timeseries
where feature_type = 'weather_station'
Okay, that's a lot of results. Maybe we need to filter a bit on the type of parameter that was measured. Let's first see which parameters are available:
select * from sandbox.timeseries_parameter
It turns out that we're interested in precipitation, which is available in Nexus as parameter 'prec_mm_d'. So to get all precipitation timeseries from all weather stations we do:
select * from sandbox.timeseries
where feature_type = 'weather_station' and parameter = 'prec_mm_d'
Note however that we only get the timeseries metadata. If we want the actual measurements, or 'events', we should use the unique ID of the timeseries that we found to get its events. So suppose the first timeseries ID is 5, then to get all its measurements we do:
select * from sandbox.event where timeseries_id = 5
Likewise, we could also request a whole range of events at once:
select * from sandbox.event where timeseries_id in (5, 6, 7)
To put it all together, a workflow that we could have is to first get all weather stations, then for each weather station checking which timeseries are available, and finally getting those measurements. In that way, our results stay manageable, and we can process small batches of data each time.
Rasters¶
In Nexus, raster data take a special place. Their metadata is stored in Nexus Database, but the actual raster files are stored in a separate Datastore. This Datastore is optimized for heavy processing and visualization. Rasters are always stored as COGs (Cloud Optimized Geotiffs). Retrieving rasters, just like timeseries, always goes in two steps:
- Selecting the rasters you want by using the
rastertable - Fetching the actual rasters from the Datastore
Selecting rasters¶
First we always get the metadata through the raster table:
select * from sandbox.raster limit 100
In this query we can filter what we need, for example by selecting on parameter, timestamp (or date range) and
geographical extent (the polygon, or footprint). Then, from the selection of rasters that we actually need, we get
the location field of each raster and use it to retrieve the actual GeoTIFF file from the Nexus Datastore.
The value in the location-column of the raster table contains the raster filename in the Datastore.
Fetching rasters¶
Actual rasters are stored as COGs (Cloud Optimized Geotiffs). You can obtain them using common raster tools, such as:
Warning
Some versions of Rasterio scan the directory content when retrieving a single COG, which is both unnecessary and a huge performance hit. If retrieving a raster takes more than a few seconds, consider setting the following environment variable in Rasterio to prevent this behaviour:
with rasterio.Env(session, GDAL_DISABLE_READDIR_ON_OPEN='EMPTY_DIR') as env:
...
Files¶
Like rasters, files are not stored in the database itself, but on Nexus Datastore. To obtain files,
simply query the file table, filter the files you want, and retrieve the actual files from the Datastore.