smap_io¶

https://readthedocs.org/projects/smap-io/badge/?version=latest

SMAP (Soil Moisture Active Passive) data readers.

Works great in combination with pytesmo.

Citation¶

If you use the software in a publication then please cite it using the Zenodo DOI. Be aware that this badge links to the latest package version.

Please select your specific version at https://doi.org/10.5281/zenodo.596391 to get the DOI of that version. You should normally always use the DOI for the specific version of your record in citations. This is to ensure that other researchers can access the exact research artefact you used for reproducibility.

You can find additional information regarding DOI versioning at http://help.zenodo.org/#versioning

Installation¶

Setup of a complete environment with conda can be performed using the following commands:

$ conda create -q -n smap_io -c conda-forge numpy h5py pyproj netcdf4 pyresample pandas
$ source activate smap_io
$ pip install smap_io

You can also install all needed (conda and pip) dependencies at once using the following commands after cloning this repository. This is recommended for developers of the package.

$ git clone https://github.com/TUW-GEO/smap_io.git --recursive
$ cd smap_io
$ conda create -n smap_io python=3.6 # or any supported python version
$ source activate smap_io
$ conda update -f environment.yml
$ python setup.py develop

Supported Products¶

SPL3SMP: SMAP L3 Radiometer Global Daily 36 km EASE-Grid Soil Moisture

Additional products will we added when need arises, feel free to open an issue to add a new data product or even better a pull request.

Contribute¶

We are happy if you want to contribute. Please raise an issue explaining what is missing or if you find a bug. We will also gladly accept pull requests against our master branch for new features or bug fixes.

Guidelines¶

If you want to contribute please follow these steps:

Fork the smap_io repository to your account
make a new feature branch from the smap_io master branch
Add your feature
please include tests for your contributions in one of the test directories We use py.test so a simple function called test_my_feature is enough
submit a pull request to our master branch

Downloading products¶

SMAP products can be downloaded via HTTPS. You have to register an account with NASA’s Earthdata portal. Instructions can be found here.

After that you can use the command line program smap_download and your username and password to download data between 2 dates.

The following command would download all available h5 files of the latest SMAP SPL3SMP data into the folder ~/workspace/smap_data. For more options on other available parameters run smap_download --help.

mkdir ~/workspace/smap_data
smap_download ~/workspace/smap_data  --username *name* --password *password*

Reading images¶

SPL3SMP ¶

After downloading the data you will have a path with subpaths of the format YYYY.MM.DD. Let’s call this path root_path. To read ‘soil_moisture’ data for the descending overpass of a certain date use the following code:

from smap_io import SPL3SMP_Ds
from datetime import datetime
import os
root_path = os.path.join(os.path.dirname(__file__),
                         'test_data', 'SPL3SMP')
ds = SPL3SMP_Ds(root_path, overpass=None, var_overpass_str=False)
image = ds.read(datetime(2015, 4, 1))
assert list(image.data.keys()) == ['soil_moisture']
assert image.data['soil_moisture'].shape == (406, 964)

The returned image is of the type pygeobase.Image. Which is only a small wrapper around a dictionary of numpy arrays.

If you only have a single image you can also read the data directly

from smap_io import SPL3SMP_Img
import os
fname = os.path.join(os.path.dirname(__file__),
                     'test_data', 'SPL3SMP', '2015.04.01',
                     'SMAP_L3_SM_P_20150401_R13080_001.h5')
ds = SPL3SMP_Img(fname, overpass=None, var_overpass_str=False)
image = ds.read()
assert list(image.data.keys()) == ['soil_moisture']
assert image.data['soil_moisture'].shape == (406, 964)

Conversion to time series format¶

For a lot of applications it is favorable to convert the image based format into a format which is optimized for fast time series retrieval. This is what we often need for e.g. validation studies. This can be done by stacking the images into a netCDF file and choosing the correct chunk sizes or a lot of other methods. We have chosen to do it in the following way:

Store the grid points in a 1D array. This also allows reduction of the data volume by e.g. only saving the points over land.
Store the time series in netCDF4 in the Climate and Forecast convention Orthogonal multidimensional array representation
Store the time series in 5x5 degree cells. This means there will be 2566 cell files and a file called grid.nc which contains the information about which grid point is stored in which file. This allows us to read a whole 5x5 degree area into memory and iterate over the time series quickly.

SPL3SMP ¶

This conversion can be performed using the smap_repurpose command line program. An example would be:

smap_repurpose /SPL3SMP_data /timeseries/data 2015-04-01 2015-04-02 soil_moisture soil_moisture_error --overpass AM

Which would take SMAP SPL3SMP data stored in /SPL3SMP_data from April 1st 2015 to April 2nd 2015 and store the parameters soil_moisture and soil_moisture_error for the AM overpass as time series in the folder /timeseries/data. When the PM overpass is selected, time series variables will be renamed with the suffix _pm.

Conversion to time series is performed by the repurpose package in the background. For custom settings or other options see the repurpose documentation and the code in smap_io.reshuffle.

Note: If a RuntimeError: NetCDF: Bad chunk sizes. appears during reshuffling, consider downgrading the netcdf4 library via:

conda install -c conda-forge netcdf4=1.2.2

Reading converted time series data¶

For reading the data the smap_repurpose command produces the class SMAPTs can be used. Optional arguments that are passed to the parent class (OrthoMultiTs, as defined in pynetcf.time_series) can be passed as well:

from smap_io.interface import SMAPTs
ds = SMAPTs(ts_path, parameters=['soil_moisture','soil_moisture_error'],
            ioclass_kws={'read_bulk': True})
# read_ts takes either lon, lat coordinates or a grid point indices.
# and returns a pandas.DataFrame
ts = ds.read_ts(45, 15) # (lon, lat)

Bulk reading speeds up reading multiple points from a cell file by storing the file in memory for subsequent calls.

smap_io¶

Citation¶

Installation¶

Supported Products¶

Contribute¶

Guidelines¶

Downloading products¶

Reading images¶

SPL3SMP ¶

Conversion to time series format¶

SPL3SMP ¶

Reading converted time series data¶

Contents¶

Indices and tables¶

smap_io¶

Citation¶

Installation¶

Supported Products¶

Contribute¶

Guidelines¶

Downloading products¶

Reading images¶

SPL3SMP¶

Conversion to time series format¶

SPL3SMP¶

Reading converted time series data¶

Contents¶

Indices and tables¶

SPL3SMP ¶

SPL3SMP ¶