ulmo Readers

ulmo readers / api’s.

note on dates and times

Dates and times can provided a few different ways, depending on what is convenient. They can either be a string representation or as instances of date and datetime objects from python’s datetime standard library module. For strings, the ISO 8061 format (‘YYYY-mm-dd HH:MM:SS’ or some abbreviated version) is accepted, as well dates in ‘mm/dd/YYYY’ format.

Readers for Global to USA-national data

Climate Prediction Center (CPC) Weekly Drought

Climate Prediction Center Weekly Drought Index dataset

ulmo.cpc.drought.get_data(state=None, climate_division=None, start=None, end=None, as_dataframe=False)

Retreives data.

Parameters:
  • state (None or str) – If specified, results will be limited to the state corresponding to the given 2-character state code.
  • climate_division (None or int) – If specified, results will be limited to the climate division.
  • start (None or date (see note on dates and times)) – Results will be limited to those after the given date. Default is the start of the current calendar year.
  • end (None or date (see note on dates and times)) – If specified, results will be limited to data before this date.
  • as_dataframe (bool) – If False (default), a dict with a nested set of dicts will be returned with data indexed by state, then climate division. If True then a pandas.DataFrame object will be returned. The pandas dataframe is used internally, so setting this to True is a little bit faster as it skips a serialization step.
Returns:

data – A dict or pandas.DataFrame representing the data. See the as_dataframe parameter for more.

Return type:

dict or pandas.Dataframe

CUAHSI Hydrologic Information System (HIS)

CUAHSI HIS Central

CUAHSI HIS Central catalog web services

ulmo.cuahsi.his_central.get_services(bbox=None, user_cache=False)

Retrieves a list of services.

Parameters:
  • bbox (None or 4-tuple) – Optional argument for a bounding box that covers the area you want to look for services in. This should be a tuple containing (min_longitude, min_latitude, max_longitude, and max_latitude) with these values in decimal degrees. If not provided then the full set of services will be queried from HIS Central.
  • user_cache (bool) – If False (default), use the system temp location to store cache WSDL and other files. Use the default user ulmo directory if True.
Returns:

services_dicts – A list of dicts that each contain information on an individual service.

Return type:

list

CUAHSI WaterOneFlow (WOF)

CUAHSI WaterOneFlow (WOF) web data access services. These services provide access to a wide variety of data sources that use the standardized WOF service protocol. Most such services are registered with the CUAHSI HIS Central catalog and can be identified via queries using the ulmo.cuahsi.his_central.get_services catalog web service. Each WOF service may have some unique characteristics, such as specific regional and temporal domains, set of variables, or additional constraints. The notes below provides additional usage details for some data sources.

  • NRCS SNOTEL: USDA Natural Resources Conservation Service (NRCS) Snow Telemetry network of remote, high-elevation mountain sites in the western U.S., used to monitor snowpack, precipitation, temperature and other climatic conditions. Timestamps in the request and data response are in PST (UTC-8).
ulmo.cuahsi.wof.get_sites(wsdl_url, suds_cache=('default', ), timeout=None, user_cache=False)

Retrieves information on the sites that are available from a WaterOneFlow service using a GetSites request. For more detailed information including which variables and time periods are available for a given site, use get_site_info().

Parameters:
  • wsdl_url (str) – URL of a service’s web service definition language (WSDL) description. All WaterOneFlow services publish a WSDL description and this url is the entry point to the service.
  • suds_cache (None or tuple) – SOAP local cache duration for WSDL description and client object. Pass a cache duration tuple like (‘days’, 3) to set a custom duration. Duration may be in months, weeks, days, hours, or seconds. If unspecified, the default duration (1 day) will be used. Use None to turn off caching.
  • timeout (int or float) – suds SOAP URL open timeout (seconds). If unspecified, the suds default (90 seconds) will be used.
  • user_cache (bool) – If False (default), use the system temp location to store cache WSDL and other files. Use the default user ulmo directory if True.
Returns:

sites_dict – a python dict with site codes mapped to site information

Return type:

dict

ulmo.cuahsi.wof.get_site_info(wsdl_url, site_code, suds_cache=('default', ), timeout=None, user_cache=False)

Retrieves detailed site information from a WaterOneFlow service using a GetSiteInfo request.

Parameters:
  • wsdl_url (str) – URL of a service’s web service definition language (WSDL) description. All WaterOneFlow services publish a WSDL description and this url is the entry point to the service.
  • site_code (str) – Site code of the site you’d like to get more information for. Site codes MUST contain the network and be of the form <network>:<site_code>, as is required by WaterOneFlow.
  • suds_cache (None or tuple) – SOAP local cache duration for WSDL description and client object. Pass a cache duration tuple like (‘days’, 3) to set a custom duration. Duration may be in months, weeks, days, hours, or seconds. If unspecified, the default duration (1 day) will be used. Use None to turn off caching.
  • timeout (int or float) – suds SOAP URL open timeout (seconds). If unspecified, the suds default (90 seconds) will be used.
  • user_cache (bool) – If False (default), use the system temp location to store cache WSDL and other files. Use the default user ulmo directory if True.
Returns:

site_info – a python dict containing site information

Return type:

dict

ulmo.cuahsi.wof.get_values(wsdl_url, site_code, variable_code, start=None, end=None, suds_cache=('default', ), timeout=None, user_cache=False)

Retrieves site values from a WaterOneFlow service using a GetValues request.

Parameters:
  • wsdl_url (str) – URL of a service’s web service definition language (WSDL) description. All WaterOneFlow services publish a WSDL description and this url is the entry point to the service.
  • site_code (str) – Site code of the site you’d like to get values for. Site codes MUST contain the network and be of the form <network>:<site_code>, as is required by WaterOneFlow.
  • variable_code (str) – Variable code of the variable you’d like to get values for. Variable codes MUST contain the network and be of the form <vocabulary>:<variable_code>, as is required by WaterOneFlow.
  • start (None or datetime (see note on dates and times)) – Start of the query datetime range. If omitted, data from the start of the time series to the end timestamp will be returned (but see caveat, in note below).
  • end (None or datetime (see note on dates and times)) – End of the query datetime range. If omitted, data from the start timestamp to end of the time series will be returned (but see caveat, in note below).
  • suds_cache (None or tuple) – SOAP local cache duration for WSDL description and client object. Pass a cache duration tuple like (‘days’, 3) to set a custom duration. Duration may be in months, weeks, days, hours, or seconds. If unspecified, the default duration (1 day) will be used. Use None to turn off caching.
  • timeout (int or float) – suds SOAP URL open timeout (seconds). If unspecified, the suds default (90 seconds) will be used.
  • user_cache (bool) – If False (default), use the system temp location to store cache WSDL and other files. Use the default user ulmo directory if True.
Returns:

site_values – a python dict containing values

Return type:

dict

Notes

If both start and end parameters are omitted, the entire time series available will typically be returned. However, some service providers will return an error if either start or end are omitted; this is specially true for services hosted or redirected by CUAHSI via the CUAHSI HydroPortal, which have a ‘WSDL’ url using the domain https://hydroportal.cuahsi.org. For HydroPortal, a start datetime of ‘1753-01-01’ has been known to return valid results while catching the oldest start times, though the response may be broken up into chunks (‘paged’).

ulmo.cuahsi.wof.get_variable_info(wsdl_url, variable_code=None, suds_cache=('default', ), timeout=None, user_cache=False)

Retrieves site values from a WaterOneFlow service using a GetVariableInfo request.

Parameters:
  • wsdl_url (str) – URL of a service’s web service definition language (WSDL) description. All WaterOneFlow services publish a WSDL description and this url is the entry point to the service.
  • variable_code (None or str) – If None (default) then information on all variables will be returned, otherwise, this should be set to the variable code of the variable you’d like to get more information on. Variable codes MUST contain the network and be of the form <vocabulary>:<variable_code>, as is required by WaterOneFlow.
  • suds_cache (None or tuple) – SOAP local cache duration for WSDL description and client object. Pass a cache duration tuple like (‘days’, 3) to set a custom duration. Duration may be in months, weeks, days, hours, or seconds. If unspecified, the default duration (1 day) will be used. Use None to turn off caching.
  • timeout (int or float) – suds SOAP URL open timeout (seconds). If unspecified, the suds default (90 seconds) will be used.
  • user_cache (bool) – If False (default), use the system temp location to store cache WSDL and other files. Use the default user ulmo directory if True.
Returns:

variable_info – a python dict containing variable information. If no variable code is None (default) then this will be a nested set of dicts keyed by <vocabulary>:<variable_code>

Return type:

dict

NASA ORNL Daymet weather data services

NASA EARTHDATA ORNL DAAC Daymet web services

ulmo.nasa.daymet.get_variables()

retrieve a list of variables available

Parameters:None
Returns:
  • dictionary of variables with variable abbreviations as keys
  • and description as values
ulmo.nasa.daymet.get_daymet_singlepixel(latitude, longitude, variables=['tmax', 'tmin', 'prcp'], years=None, as_dataframe=True)

Fetches a time series of climate variables from the DAYMET single pixel extraction

Parameters:
  • latitude (float) – The latitude (WGS84), value between 52.0 and 14.5.

  • longitude (float) – The longitude (WGS84), value between -131.0 and -53.0.

  • variables (list of str) – Daymet parameters to fetch. default = [‘tmax’, ‘tmin’, ‘prcp’]. Available options:

    • ‘tmax’: maximum temperature
    • ‘tmin’: minimum temperature
    • ‘srad’: shortwave radiation
    • ‘vp’: vapor pressure
    • ‘swe’: snow-water equivalent
    • ‘prcp’: precipitation;
    • ‘dayl’ : daylength.
  • years (list of int) – List of years to return. Daymet version 2 available 1980 to the latest full calendar year. If None (default), all years will be returned

  • as_dataframe (True (default) or False) – if True return pandas dataframe if False return open file with contents in csv format

Returns:

single_pixel_timeseries

Return type:

pandas dataframe or csv filename

National Climatic Data Center (NCDC)

NCDC Climate Index Reference Sequential (CIRS)

National Climatic Data Center Climate Index Reference Sequential (CIRS) drought dataset

ulmo.ncdc.cirs.get_data(elements=None, by_state=False, location_names='abbr', as_dataframe=False, use_file=None)

Retrieves data.

Parameters:
  • elements (None, str or list) – The element(s) for which to get data for. If None (default), then all elements are used. An individual element is a string, but a list or tuple of them can be used to specify a set of elements. Elements are:

    • ‘cddc’: Cooling Degree Days
    • ‘hddc’: Heating Degree Days
    • ‘pcpn’: Precipitation
    • ‘pdsi’: Palmer Drought Severity Index
    • ‘phdi’: Palmer Hydrological Drought Index
    • ‘pmdi’: Modified Palmer Drought Severity Index
    • ‘sp01’: 1-month Standardized Precipitation Index
    • ‘sp02’: 2-month Standardized Precipitation Index
    • ‘sp03’: 3-month Standardized Precipitation Index
    • ‘sp06’: 6-month Standardized Precipitation Index
    • ‘sp09’: 9-month Standardized Precipitation Index
    • ‘sp12’: 12-month Standardized Precipitation Index
    • ‘sp24’: 24-month Standardized Precipitation Index
    • ‘tmpc’: Temperature
    • ‘zndx’: ZNDX
  • by_state (bool) – If False (default), divisional data will be retrieved. If True, then regional data will be retrieved.

  • location_names (str or None) – This parameter defines what (if any) type of names will be added to the values. If set to ‘abbr’ (default), then abbreviated location names will be used. If ‘full’, then full location names will be used. If set to None, then no location name will be added and the only identifier will be the location_codes (this is the most memory-conservative option).

  • as_dataframe (bool) – If False (default), a list of values dicts is returned. If True, a dict with element codes mapped to equivalent pandas.DataFrame objects will be returned. The pandas dataframe is used internally, so setting this to True is faster as it skips a somewhat expensive serialization step.

  • use_file (None, file-like object or str) – If None (default), then data will be automatically retrieved from the web. If a file-like object or a file path string, then the file will be used to read data from. This is intended to be used for reading in previously-downloaded versions of the dataset.

Returns:

data – A list of value dicts or a pandas.DataFrame containing data. See the as_dataframe parameter for more.

Return type:

list or pandas.DataFrame

NCDC Global Historical Climate Network (GHCN) Daily

National Climatic Data Center Global Historical Climate Network - Daily dataset

ulmo.ncdc.ghcn_daily.get_data(station_id, elements=None, update=True, as_dataframe=False)

Retrieves data for a given station.

Parameters:
  • station_id (str) – Station ID to retrieve data for.
  • elements (None, str, or list of str) – If specified, limits the query to given element code(s).
  • update (bool) – If True (default), new data files will be downloaded if they are newer than any previously cached files. If False, then previously downloaded files will be used and new files will only be downloaded if there is not a previously downloaded file for a given station.
  • as_dataframe (bool) – If False (default), a dict with element codes mapped to value dicts is returned. If True, a dict with element codes mapped to equivalent pandas.DataFrame objects will be returned. The pandas dataframe is used internally, so setting this to True is a little bit faster as it skips a serialization step.
Returns:

site_dict – A dict with element codes as keys, mapped to collections of values. See the as_dataframe parameter for more.

Return type:

dict

ulmo.ncdc.ghcn_daily.get_stations(country=None, state=None, elements=None, start_year=None, end_year=None, update=True, as_dataframe=False)

Retrieves station information, optionally limited to specific parameters.

Parameters:
  • country (str) – The country code to use to limit station results. If set to None (default), then stations from all countries are returned.
  • state (str) – The state code to use to limit station results. If set to None (default), then stations from all states are returned.
  • elements (None, str, or list of str) – If specified, station results will be limited to the given element codes and only stations that have data for any these elements will be returned.
  • start_year (int) – If specified, station results will be limited to contain only stations that have data after this year. Can be combined with the end_year argument to get stations with data within a range of years.
  • end_year (int) – If specified, station results will be limited to contain only stations that have data before this year. Can be combined with the start_year argument to get stations with data within a range of years.
  • update (bool) – If True (default), new data files will be downloaded if they are newer than any previously cached files. If False, then previously downloaded files will be used and new files will only be downloaded if there is not a previously downloaded file for a given station.
  • as_dataframe (bool) – If False (default), a dict with station IDs keyed to station dicts is returned. If True, a single pandas.DataFrame object will be returned. The pandas dataframe is used internally, so setting this to True is a little bit faster as it skips a serialization step.
Returns:

stations_dict – A dict or pandas.DataFrame representing station information for stations matching the arguments. See the as_dataframe parameter for more.

Return type:

dict or pandas.DataFrame

NCDC Global Summary of the Day (GSoD)

National Climatic Data Center Global Summary of the Day dataset

ulmo.ncdc.gsod.get_data(station_codes, start=None, end=None, parameters=None)

Retrieves data for a set of stations.

Parameters:
  • station_codes (str or list) – Single station code or iterable of station codes to retrieve data for.
  • start (None or date (see note on dates and times)) – If specified, data are limited to values after this date.
  • end (None or date (see note on dates and times)) – If specified, data are limited to values before this date.
  • parameters (None, str or list) – If specified, data are limited to this set of parameter codes.
Returns:

data_dict – Dict with station codes keyed to lists of value dicts.

Return type:

dict

ulmo.ncdc.gsod.get_stations(country=None, state=None, start=None, end=None, update=True)

Retrieve information on the set of available stations.

Parameters:
  • country ({None, str, or iterable}) – If specified, results will be limited to stations with matching country codes.
  • state ({None, str, or iterable}) – If specified, results will be limited to stations with matching state codes.
  • start (None or date (see note on dates and times)) – If specified, results will be limited to stations which have data after this start date.
  • end (None or date (see note on dates and times)) – If specified, results will be limited to stations which have data before this end date.
  • update (bool) – If True (default), check for a newer copy of the stations file and download if it is newer the previously downloaded copy. If False, then a new stations file will only be downloaded if a previously downloaded file cannot be found.
Returns:

stations_dict – A dict with USAF-WBAN codes keyed to station information dicts.

Return type:

dict

NOAA GOES Data Collection System (DCS) services

NOAA GOES Data Collection System Access to data stream transmitted via GOES satellite.

ulmo.noaa.goes.get_data(dcp_address, hours, use_cache=False, cache_path=None, as_dataframe=True)

Fetches GOES Satellite DCP messages from NOAA Data Collection System (DCS) field test.

Parameters:
  • dcp_address (str, iterable of strings) – DCP address or list of DCP addresses to be fetched; lists will be joined by a ‘,’.
  • use_cache (bool,) – If True (default) use hdf file to cache data and retrieve new data on subsequent requests
  • cache_path ({None, str},) – If None use default ulmo location for cached files otherwise use specified path. files are named using dcp_address.
  • as_dataframe (bool) – If True (default) return data in a pandas dataframe otherwise return a dict.
Returns:

message_data – Either a pandas dataframe or a dict indexed by dcp message times

Return type:

{pandas.DataFrame, dict}

ulmo.noaa.goes.decode(dataframe, parser, **kwargs)

decodes goes message data in pandas dataframe returned by ulmo.noaa.goes.get_data().

Parameters:
  • dataframe (pandas.DataFrame) – pandas.DataFrame returned by ulmo.noaa.goes.get_data()
  • parser ({function, str}) – function that acts on dcp_message each row of the dataframe and returns a new dataframe containing several rows of decoded data. This returned dataframe may have different (but derived) timestamps than that the original row. If a string is passed then a matching parser function is looked up from ulmo.noaa.goes.parsers
Returns:

decoded_data – pandas dataframe, the format and parameters in the returned dataframe depend wholly on the parser used

Return type:

pandas.DataFrame

USGS National Water Information System (NWIS)

USGS National Water Information System web services

ulmo.usgs.nwis.get_sites(service=None, input_file=None, sites=None, state_code=None, huc=None, bounding_box=None, county_code=None, parameter_code=None, site_type=None, **kwargs)

Fetches site information from USGS services. See the USGS Site Service documentation for a detailed description of options. For convenience, major options have been included with pythonic names. At least one major filter must be specified. Options that are not listed below may be provided as extra kwargs (i.e. keyword=’argument’) and will be passed along with the web services request. These extra keywords must match the USGS names exactly. The USGS Site Service website describes available keyword names and argument formats.

Note

Only the options listed below have been tested and you may have mixed results retrieving data with extra options specified. Currently ulmo requests and parses data in the WaterML 1.x format. Some options are not available in this format.

Parameters:
  • service ({None, ‘instantaneous’, ‘iv’, ‘daily’, ‘dv’}) – The service to use, either “instantaneous”, “daily”, or None (default). If set to None, then both services are used. The abbreviations “iv” and “dv” can be used for “instantaneous” and “daily”, respectively.
  • input_file (None, file path or file object) – If None (default), then the NWIS web services will be queried, but if a file is passed then this file will be used instead of requesting data from the NWIS web services.
  • sites (str, iterable of strings or None) – A major filter. The site(s) to use; lists will be joined by a ‘,’. At least one major filter must be specified.
  • state_code (str or None) – A major filter. Two-letter state code used in stateCd parameter. At least one major filter must be specified.
  • county_code (str, iterable of strings or None) – A major filter. The 5 digit FIPS county code(s) used in the countyCd parameter; lists will be joined by a ‘,’. At least one major filter must be specified.
  • huc (str, iterable of strings or None) – A major filter. The hydrologic unit code(s) to use; lists will be joined by a ‘,’. At least one major filter must be specified.
  • bounding_box (str, iterable of strings or None) – A major filter. This bounding box used in the bBox parameter. The format is westernmost longitude, southernmost latitude, easternmost longitude, northernmost latitude; lists will be joined by a ‘,’. At least one major filter must be specified.
  • parameter_code (str, iterable of strings or None) – Optional filter. Parameter code(s) that will be passed as the parameterCd parameter; lists will be joined by a ‘,’. This parameter represents the following USGS website input: Sites serving parameter codes
  • site_type (str, iterable of strings or None) – Optional filter. The type(s) of site used in siteType parameter; lists will be joined by a ‘,’.
Returns:

return_sites – a python dict with site codes mapped to site information

Return type:

dict

ulmo.usgs.nwis.get_site_data(site_code, service=None, parameter_code=None, statistic_code=None, start=None, end=None, period=None, modified_since=None, input_file=None, methods=None, **kwargs)

Fetches site data.

Parameters:
  • site_code (str) – The site code of the site you want to query data for.
  • service ({None, ‘instantaneous’, ‘iv’, ‘daily’, ‘dv’}) – The service to use, either “instantaneous”, “daily”, or None (default). If set to None, then both services are used. The abbreviations “iv” and “dv” can be used for “instantaneous” and “daily”, respectively.
  • parameter_code (str) – Parameter code(s) that will be passed as the parameterCd parameter.
  • statistic_code (str) – Statistic code(s) that will be passed as the statCd parameter
  • start (None or datetime (see note on dates and times)) – Start of a date range for a query. This parameter is mutually exclusive with period (you cannot use both). It should not be older than 1910-1-1 for ‘iv’ and 1851-1-1 for ‘dv’ services.
  • end (None or datetime (see note on dates and times)) – End of a date range for a query. This parameter is mutually exclusive with period (you cannot use both).
  • period ({None, str, datetime.timedelta}) – Period of time to use for requesting data. This will be passed along as the period parameter. This can either be ‘all’ to signal that you’d like the entire period of record (down to 1910-1-1 for ‘iv’, 1851-1-1 for ‘dv’), or string in ISO 8601 period format (e.g. ‘P1Y2M21D’ for a period of one year, two months and 21 days) or it can be a datetime.timedelta object representing the period of time. This parameter is mutually exclusive with start/end dates.
  • modified_since (None or datetime.timedelta) – Passed along as the modifiedSince parameter.
  • input_file (None, file path or file object) – If None (default), then the NWIS web services will be queried, but if a file is passed then this file will be used instead of requesting data from the NWIS web services.
  • methods (None, str or Python dict) – If None (default), it’s assumed that there is a single method for each parameter. This raises an error if more than one method ids are encountered. If str, this is the method id for the requested parameter/s and can use “all” if method ids are not known beforehand. If dict, provide the parameter_code to method id mapping. Parameter’s method id is specific to site.
Returns:

data_dict – a python dict with parameter codes mapped to value dicts

Return type:

dict

ulmo.usgs.nwis.hdf5.get_site(site_code, path=None, complevel=None, complib=None)

Fetches previously-cached site information from an hdf5 file.

Parameters:
  • site_code (str) – The site code of the site you want to get information for.
  • path (None or file path) – Path to the hdf5 file to be queried, if None then the default path will be used. If a file path is a directory, then multiple hdf5 files will be kept so that file sizes remain small for faster repacking.
  • complevel (None or int {0-9}) – Open hdf5 file with this level of compression. If ``None` (default), then a maximum compression level will be used if a compression library can be found. If set to 0 then no compression will be used regardless of what complib is.
  • complib (None or str {‘zlib’, ‘bzip2’, ‘lzo’, ‘blosc’}) – Open hdf5 file with this type of compression. If None (default) then the best available compression library available on your system will be selected. If complevel argument is set to 0 then no compression will be used.
Returns:

site_dict – a python dict containing site information

Return type:

dict

ulmo.usgs.nwis.hdf5.get_site_data(site_code, agency_code=None, parameter_code=None, path=None, complevel=None, complib=None, start=None)

Fetches previously-cached site data from an hdf5 file.

Parameters:
  • site_code (str) – The site code of the site you want to get data for.
  • agency_code (None or str) – The agency code to get data for. This will need to be set if a site code is in use by multiple agencies (this is rare).
  • parameter_code (None, str, or list) – List of parameters to read. If None (default) read all parameters. Otherwise only read specified parameters. Parameters should be specified with statistic code, i.e. daily streamflow is ‘00060:00003’
  • path (None or file path) – Path to the hdf5 file to be queried, if None then the default path will be used. If a file path is a directory, then multiple hdf5 files will be kept so that file sizes remain small for faster repacking.
  • complevel (None or int {0-9}) – Open hdf5 file with this level of compression. If ``None` (default), then a maximum compression level will be used if a compression library can be found. If set to 0 then no compression will be used regardless of what complib is.
  • complib (None or str {‘zlib’, ‘bzip2’, ‘lzo’, ‘blosc’}) – Open hdf5 file with this type of compression. If None (default) then the best available compression library available on your system will be selected. If complevel argument is set to 0 then no compression will be used.
  • start (None or string formatted date like 2014-01-01) – Filter the dataset to return only data later that the start date
Returns:

data_dict – a python dict with parameter codes mapped to value dicts

Return type:

dict

ulmo.usgs.nwis.hdf5.get_sites(path=None, complevel=None, complib=None)

Fetches previously-cached site information from an hdf5 file.

Parameters:
  • path (None or file path) – Path to the hdf5 file to be queried, if None then the default path will be used. If a file path is a directory, then multiple hdf5 files will be kept so that file sizes remain small for faster repacking.
  • complevel (None or int {0-9}) – Open hdf5 file with this level of compression. If ``None` (default), then a maximum compression level will be used if a compression library can be found. If set to 0 then no compression will be used regardless of what complib is.
  • complib (None or str {‘zlib’, ‘bzip2’, ‘lzo’, ‘blosc’}) – Open hdf5 file with this type of compression. If None (default) then the best available compression library available on your system will be selected. If complevel argument is set to 0 then no compression will be used.
Returns:

sites_dict – a python dict with site codes mapped to site information

Return type:

dict

ulmo.usgs.nwis.hdf5.remove_values(site_code, datetime_dicts, path=None, complevel=None, complib=None, autorepack=True)

Remove values from hdf5 file.

Parameters:
  • site_code (str) – The site code of the site to remove records from.
  • datetime_dicts (a python dict with a list of datetimes for a given variable) – (key) to set as NaNs.
  • path (file path to hdf5 file.)
Returns:

None

Return type:

None

ulmo.usgs.nwis.hdf5.repack(path, complevel=None, complib=None)

Repack the hdf5 file at path. This is the same as running the pytables ptrepack command on the file.

Parameters:
  • path (file path) – Path to the hdf5 file.
  • complevel (None or int {0-9}) – Open hdf5 file with this level of compression. If ``None` (default), then a maximum compression level will be used if a compression library can be found. If set to 0 then no compression will be used regardless of what complib is.
  • complib (None or str {‘zlib’, ‘bzip2’, ‘lzo’, ‘blosc’}) – Open hdf5 file with this type of compression. If None (default) then the best available compression library available on your system will be selected. If complevel argument is set to 0 then no compression will be used.
Returns:

None

Return type:

None

ulmo.usgs.nwis.hdf5.update_site_data(site_code, start=None, end=None, period=None, path=None, methods=None, input_file=None, complevel=None, complib=None, autorepack=True)

Update cached site data.

Parameters:
  • site_code (str) – The site code of the site you want to query data for.
  • start (None or datetime (see note on dates and times)) – Start of a date range for a query. This parameter is mutually exclusive with period (you cannot use both).
  • end (None or datetime (see note on dates and times)) – End of a date range for a query. This parameter is mutually exclusive with period (you cannot use both).
  • period ({None, str, datetime.timedelta}) – Period of time to use for requesting data. This will be passed along as the period parameter. This can either be ‘all’ to signal that you’d like the entire period of record, or string in ISO 8601 period format (e.g. ‘P1Y2M21D’ for a period of one year, two months and 21 days) or it can be a datetime.timedelta object representing the period of time. This parameter is mutually exclusive with start/end dates.
  • path (None or file path) – Path to the hdf5 file to be queried, if None then the default path will be used. If a file path is a directory, then multiple hdf5 files will be kept so that file sizes remain small for faster repacking.
  • methods (None, str or Python dict) – If None (default), it’s assumed that there is a single method for each parameter. This raises an error if more than one method ids are encountered. If str, this is the method id for the requested parameter/s and can use “all” if method ids are not known beforehand. If dict, provide the parameter_code to method id mapping. Parameter’s method id is specific to site.
  • input_file (None, file path or file object) – If None (default), then the NWIS web services will be queried, but if a file is passed then this file will be used instead of requesting data from the NWIS web services.
  • autorepack (bool) – Whether or not to automatically repack the h5 file(s) after updating. There is a tradeoff between performance and disk space here: large files take a longer time to repack but also tend to grow larger faster, the default of True conserves disk space because untamed file growth can become quite destructive. If you set this to False, you can manually repack files with repack().
Returns:

None

Return type:

None

ulmo.usgs.nwis.hdf5.update_site_list(sites=None, state_code=None, huc=None, bounding_box=None, county_code=None, parameter_code=None, site_type=None, service=None, input_file=None, complevel=None, complib=None, autorepack=True, path=None, **kwargs)

Update cached site information.

See ulmo.usgs.nwis.core.get_sites() for description of regular parameters, only extra parameters used for caching are listed below.

Parameters:
  • path (None or file path) – Path to the hdf5 file to be queried, if None then the default path will be used. If a file path is a directory, then multiple hdf5 files will be kept so that file sizes remain small for faster repacking.
  • input_file (None, file path or file object) – If None (default), then the NWIS web services will be queried, but if a file is passed then this file will be used instead of requesting data from the NWIS web services.
  • complevel (None or int {0-9}) – Open hdf5 file with this level of compression. If ``None` (default), then a maximum compression level will be used if a compression library can be found. If set to 0 then no compression will be used regardless of what complib is.
  • complib (None or str {‘zlib’, ‘bzip2’, ‘lzo’, ‘blosc’}) – Open hdf5 file with this type of compression. If None (default) then the best available compression library available on your system will be selected. If complevel argument is set to 0 then no compression will be used.
  • autorepack (bool) – Whether or not to automatically repack the h5 file after updating. There is a tradeoff between performance and disk space here: large files take a longer time to repack but also tend to grow larger faster, the default of True conserves disk space because untamed file growth can become quite destructive. If you set this to False, you can manually repack files with repack().
Returns:

None

Return type:

None

USGS National Elevation Dataset (NED) raster services

National Elevation Dataset (NED) services (Raster)

ulmo.usgs.ned.get_available_layers()

return list of available data layers

ulmo.usgs.ned.get_raster(layer, bbox, path=None, update_cache=False, check_modified=False, mosaic=False)

downloads National Elevation Dataset raster tiles that cover the given bounding box for the specified data layer.

Parameters:
  • layer (str) – dataset layer name. (see get_available_layers for list)
  • bbox ((sequence of float|str)) – bounding box of in geographic coordinates of area to download tiles in the format (min longitude, min latitude, max longitude, max latitude)
  • path (None or path) – if None default path will be used
  • update_cache (True or False (default)) – if False and output file already exists use it.
  • check_modified (True or False (default)) – if tile exists in path, check if newer file exists online and download if available.
  • mosaic (True or False (default)) – if True, mosaic and clip downloaded tiles to the extents of the bbox provided. Requires rasterio package and GDAL.
Returns:

raster_tiles – metadata as a FeatureCollection. local url of downloaded data is in feature[‘properties’][‘file’]

Return type:

geojson FeatureCollection

ulmo.usgs.ned.get_raster_availability(layer, bbox=None)

retrieve metadata for raster tiles that cover the given bounding box for the specified data layer.

Parameters:
  • layer (str) – dataset layer name. (see get_available_layers for list)
  • bbox ((sequence of float|str)) – bounding box of in geographic coordinates of area to download tiles in the format (min longitude, min latitude, max longitude, max latitude)
Returns:

metadata – returns metadata including download urls as a FeatureCollection

Return type:

geojson FeatureCollection

Readers for USA regional (sub-national) data

California Department of Water Resources Historical Data

ulmo.cdec.historical.get_stations()

Fetches information on all CDEC sites.

Returns:df – a pandas DataFrame (indexed on site id) with station information.
Return type:pandas DataFrame
ulmo.cdec.historical.get_sensors(sensor_id=None)

Gets a list of sensor ids as a DataFrame indexed on sensor number. Can be limited by a list of numbers.

Usage example:

from ulmo import cdec
# to get all available sensor info
sensors = cdec.historical.get_sensors()
# or to get just one sensor
sensor = cdec.historical.get_sensors([1])
Parameters:sites (iterable of integers or None)
Returns:df – a python dict with site codes mapped to site information
Return type:pandas DataFrame
ulmo.cdec.historical.get_station_sensors(station_ids=None, sensor_ids=None, resolutions=None)

Gets available sensors for the given stations, sensor ids and time resolutions. If no station ids are provided, all available stations will be used (this is not recommended, and will probably take a really long time).

The list can be limited by a list of sensor numbers, or time resolutions if you already know what you want. If none of the provided sensors or resolutions are available, an empty DataFrame will be returned for that station.

Usage example:

from ulmo import cdec
# to get all available sensors
available_sensors = cdec.historical.get_station_sensors(['NEW'])
Parameters:
  • station_ids (iterable of strings or None)
  • sensor_ids (iterable of integers or None) – check out or use the get_sensors() function to see a list of available sensor numbers
  • resolutions (iterable of strings or None) – Possible values are ‘event’, ‘hourly’, ‘daily’, and ‘monthly’ but not all of these time resolutions are available at every station.
Returns:

dict – a python dict with site codes as keys with values containing pandas DataFrames of available sensor numbers and metadata.

Return type:

a python dict

ulmo.cdec.historical.get_data(station_ids=None, sensor_ids=None, resolutions=None, start=None, end=None)

Downloads data for a set of CDEC station and sensor ids. If either is not provided, all available data will be downloaded. Be really careful with choosing hourly resolution as the data sets are big, and CDEC’s servers are slow as molasses in winter.

Usage example:

from ulmo import cdec
dat = cdec.historical.get_data(['PRA'],resolutions=['daily'])
Parameters:
  • station_ids (iterable of strings or None)
  • sensor_ids (iterable of integers or None) – check out or use the get_sensors() function to see a list of available sensor numbers
  • resolutions (iterable of strings or None) – Possible values are ‘event’, ‘hourly’, ‘daily’, and ‘monthly’ but not all of these time resolutions are available at every station.
Returns:

dict – a python dict with site codes as keys. Values will be nested dicts containing all of the sensor/resolution combinations.

Return type:

a python dict

Lower Colorado River Authority (LCRA)

LCRA Hydromet Data

Access to hydrologic and climate data in the Colorado River Basin (Texas) provided by the Hydromet web site and web service from the Lower Colorado River Authority.

ulmo.lcra.hydromet.get_sites_by_type(site_type)

Gets list of the hydromet site codes and description for site.

Parameters:site_type (str) – In all but lake sites, this is the parameter code collected at the site. For lake sites, it is ‘lake’. See site_types and PARAMETERS
Returns:sites_dict – A python dict with four char long site codes mapped to site information.
Return type:dict
ulmo.lcra.hydromet.get_site_data(site_code, parameter_code, as_dataframe=True, start_date=None, end_date=None, dam_site_location='head')

Fetches site’s parameter data

Parameters:
  • site_code (str) – The LCRA site code (four chars long) of the site you want to query data for.
  • parameter_code (str) – LCRA parameter code. see PARAMETERS
  • start_date (None or datetime) – Start of a date range for a query.
  • end_date (None or datetime) – End of a date range for a query.
  • as_dataframe (True (default) or False) – This determines what format values are returned as. If True (default) then the values will be a pandas.DataFrame object with the values timestamp as the index. If False, the format will be Python dictionary.
  • dam_site_location (‘head’ (default) or ‘tail’) – The site location relative to the dam.
Returns:

  • df (pandas.DataFrame or)
  • values_dict (dict)

ulmo.lcra.hydromet.get_all_sites()

Returns list of all LCRA hydromet sites as geojson featurecollection.

ulmo.lcra.hydromet.get_current_data(service, as_geojson=False)

fetches the current (near real-time) river stage and flow values from LCRA web service.

Parameters:
  • service (str) – The web service providing data. see current_data_services. Currently we have GetUpperBasin and GetLowerBasin.
  • as_geojson (‘True’ or ‘False’ (default)) – If True the data is returned as geojson featurecollection and if False data is returned as list of dicts.
Returns:

  • current_values_dicts (a list of dicts or)
  • current_values_geojson (a geojson featurecollection.)

LCRA Water Quality Data

Access to water quality data in the Colorado River Basin (Texas) provided by the Water Quality web site and web service from the Lower Colorado River Authority.

ulmo.lcra.waterquality.get_sites(source_agency=None)

Fetches a list of sites with location and available metadata.

Parameters:source_agency (str) – LCRA used code of the that collects the data. There are sites whose sources are not listed so this filter may not return all sites of a certain source. See source_map.
Returns:sites_geojson
Return type:geojson FeatureCollection
ulmo.lcra.waterquality.get_historical_data(site_code, start=None, end=None, as_dataframe=False)

Fetches data for a site at a given date.

Parameters:
  • site_code (str) – The site code to fetch data for. A list of sites can be retrieved with get_sites()
  • date (None or date (see note on dates and times)) – The date of the data to be queried. If date is None (default), then all data will be returned.
  • as_dataframe (bool) – This determines what format values are returned as. If False (default), the values dict will be a dict with timestamps as keys mapped to a dict of gauge variables and values. If True then the values dict will be a pandas.DataFrame object containing the equivalent information.
Returns:

data_dict – A dict containing site information and values.

Return type:

dict

ulmo.lcra.waterquality.get_recent_data(site_code, as_dataframe=False)

fetches near real-time instantaneous water quality data for the LCRA bay sites.

Parameters:
  • site_code (str) – The bay site to fetch data for. see real_time_sites
  • as_dataframe (bool) – This determines what format values are returned as. If False (default), the values will be list of value dicts. If True then values are returned as pandas.DataFrame.
Returns:

list of values or dataframe.

Return type:

list

Texas Weather Connection Daily Keetch-Byram Drought Index (KBDI)

ulmo.twc.kbdi.core

This module provides direct access to Texas Weather Connection - Daily Keetch-Byram Drought Index (KBDI) dataset.

ulmo.twc.kbdi.get_data(county=None, start=None, end=None, as_dataframe=False, data_dir=None)

Retreives data.

Parameters:
  • county (None or str) – If specified, results will be limited to the county corresponding to the given 5-character Texas county fips code i.e. 48???.
  • end (None or date (see note on dates and times)) – Results will be limited to data on or before this date. Default is the current date.
  • start (None or date (see note on dates and times)) – Results will be limited to data on or after this date. Default is the start of the calendar year for the end date.
  • as_dataframe (bool) – If False (default), a dict with a nested set of dicts will be returned with data indexed by 5-character Texas county FIPS code. If True then a pandas.DataFrame object will be returned. The pandas dataframe is used internally, so setting this to True is a little bit faster as it skips a serialization step.
  • data_dir (None or directory path) – Directory for holding downloaded data files. If no path is provided (default), then a user-specific directory for holding application data will be used (the directory will depend on the platform/operating system).
Returns:

data – A dict or pandas.DataFrame representing the data. See the as_dataframe parameter for more.

Return type:

dict or pandas.Dataframe

US Army Corps of Engineers (USACE) - Tulsa District Water Control

Access to data provided by the United States Army Corps of Engineers - Tulsa District Water Control web site.

ulmo.usace.swtwc.get_stations()

Fetches a list of station codes and descriptions.

Returns:stations_dict – a python dict with station codes mapped to station information
Return type:dict
ulmo.usace.swtwc.get_station_data(station_code, date=None, as_dataframe=False)

Fetches data for a station at a given date.

Parameters:
  • station_code (str) – The station code to fetch data for. A list of stations can be retrieved with get_stations()
  • date (None or date (see note on dates and times)) – The date of the data to be queried. If date is None (default), then data for the current day is retreived.
  • as_dataframe (bool) – This determines what format values are returned as. If False (default), the values dict will be a dict with timestamps as keys mapped to a dict of gauge variables and values. If True then the values dict will be a pandas.DataFrame object containing the equivalent information.
Returns:

data_dict – A dict containing station information and values.

Return type:

dict