smap_io¶
SMAP (Soil Moisture Active Passive) data readers.
Works great in combination with pytesmo.
Citation¶
If you use the software in a publication then please cite it using the Zenodo DOI. Be aware that this badge links to the latest package version.
Please select your specific version at https://doi.org/10.5281/zenodo.596391 to get the DOI of that version. You should normally always use the DOI for the specific version of your record in citations. This is to ensure that other researchers can access the exact research artefact you used for reproducibility.
You can find additional information regarding DOI versioning at http://help.zenodo.org/#versioning
Installation¶
Setup of a complete environment with conda can be performed using the following commands:
$ conda create -q -n smap_io -c conda-forge numpy h5py pyproj netcdf4 pyresample pandas
$ source activate smap_io
$ pip install smap_io
You can also install all needed (conda and pip) dependencies at once using the following commands after cloning this repository. This is recommended for developers of the package.
$ git clone https://github.com/TUW-GEO/smap_io.git --recursive
$ cd smap_io
$ conda create -n smap_io python=3.6 # or any supported python version
$ source activate smap_io
$ conda update -f environment.yml
$ python setup.py develop
Supported Products¶
- SPL3SMP: SMAP L3 Radiometer Global Daily 36 km EASE-Grid Soil Moisture
Additional products will we added when need arises, feel free to open an issue to add a new data product or even better a pull request.
Contribute¶
We are happy if you want to contribute. Please raise an issue explaining what is missing or if you find a bug. We will also gladly accept pull requests against our master branch for new features or bug fixes.
Guidelines¶
If you want to contribute please follow these steps:
- Fork the smap_io repository to your account
- make a new feature branch from the smap_io master branch
- Add your feature
- please include tests for your contributions in one of the test directories We use py.test so a simple function called test_my_feature is enough
- submit a pull request to our master branch
Downloading products¶
SMAP products can be downloaded via HTTPS. You have to register an account with NASA’s Earthdata portal. Instructions can be found here.
After that you can use the command line program smap_download
and your username
and password to download data between 2 dates.
The following command would download all available h5 files of the latest SMAP SPL3SMP data into the folder
~/workspace/smap_data
. For more options on other available parameters
run smap_download --help
.
mkdir ~/workspace/smap_data
smap_download ~/workspace/smap_data --username *name* --password *password*
Reading images¶
SPL3SMP¶
After downloading the data you will have a path with subpaths of the format
YYYY.MM.DD
. Let’s call this path root_path
. To read ‘soil_moisture’
data for the descending overpass of a certain date use the following code:
from smap_io import SPL3SMP_Ds
from datetime import datetime
import os
root_path = os.path.join(os.path.dirname(__file__),
'test_data', 'SPL3SMP')
ds = SPL3SMP_Ds(root_path, overpass=None, var_overpass_str=False)
image = ds.read(datetime(2015, 4, 1))
assert list(image.data.keys()) == ['soil_moisture']
assert image.data['soil_moisture'].shape == (406, 964)
The returned image is of the type pygeobase.Image. Which is only a small wrapper around a dictionary of numpy arrays.
If you only have a single image you can also read the data directly
from smap_io import SPL3SMP_Img
import os
fname = os.path.join(os.path.dirname(__file__),
'test_data', 'SPL3SMP', '2015.04.01',
'SMAP_L3_SM_P_20150401_R13080_001.h5')
ds = SPL3SMP_Img(fname, overpass=None, var_overpass_str=False)
image = ds.read()
assert list(image.data.keys()) == ['soil_moisture']
assert image.data['soil_moisture'].shape == (406, 964)
Conversion to time series format¶
For a lot of applications it is favorable to convert the image based format into a format which is optimized for fast time series retrieval. This is what we often need for e.g. validation studies. This can be done by stacking the images into a netCDF file and choosing the correct chunk sizes or a lot of other methods. We have chosen to do it in the following way:
Store the grid points in a 1D array. This also allows reduction of the data volume by e.g. only saving the points over land.
Store the time series in netCDF4 in the Climate and Forecast convention Orthogonal multidimensional array representation
Store the time series in 5x5 degree cells. This means there will be 2566 cell files and a file called
grid.nc
which contains the information about which grid point is stored in which file. This allows us to read a whole 5x5 degree area into memory and iterate over the time series quickly.
SPL3SMP¶
This conversion can be performed using the smap_repurpose
command line
program. An example would be:
smap_repurpose /SPL3SMP_data /timeseries/data 2015-04-01 2015-04-02 soil_moisture soil_moisture_error --overpass AM
Which would take SMAP SPL3SMP data stored in /SPL3SMP_data
from April 1st
2015 to April 2nd 2015 and store the parameters soil_moisture
and
soil_moisture_error
for the AM
overpass as time series in the
folder /timeseries/data
. When the PM
overpass is selected, time series variables
will be renamed with the suffix _pm.
Conversion to time series is performed by the repurpose package in the background. For custom settings
or other options see the repurpose documentation and the code in
smap_io.reshuffle
.
Note: If a RuntimeError: NetCDF: Bad chunk sizes.
appears during reshuffling, consider downgrading the
netcdf4 library via:
conda install -c conda-forge netcdf4=1.2.2
Reading converted time series data¶
For reading the data the smap_repurpose
command produces the class
SMAPTs
can be used. Optional arguments that are passed to the parent class
(OrthoMultiTs
, as defined in pynetcf.time_series)
can be passed as well:
from smap_io.interface import SMAPTs
ds = SMAPTs(ts_path, parameters=['soil_moisture','soil_moisture_error'],
ioclass_kws={'read_bulk': True})
# read_ts takes either lon, lat coordinates or a grid point indices.
# and returns a pandas.DataFrame
ts = ds.read_ts(45, 15) # (lon, lat)
Bulk reading speeds up reading multiple points from a cell file by storing the file in memory for subsequent calls.