Fort Collins Science Center

You are here: FORT > Science > High Throughput Computing

High Throughput Computing: A Solution for Scientific Analysis

Bioclimatic Variables and Continuous-Surface Drought Indices

Background

Many species are affected by both climatic and non-climatic factors. Climatic changes can impose physiological constraints on species and therefore can affect species distributions to varying degrees. The relationship between climate and most species varies due to local adaptation and other factors limiting distribution, such as dispersion constraints related to habitat availability. However, examining climate over time is useful when quantifying the effects of climatic changes on species distributions for past, current, and forecasted scenarios. Investigators at the USGS Fort Collins Science Center (FORT) ask many research questions associated with species distributions, and derived bioclimatic variables will allow scientists to incorporate climatic conditions, as they relate to biological responses, into their modeling efforts.

Bioclimatic variables (Nix 1986) are derived from climate data (e.g., minimum/maximum temperature and total precipitation) in order to represent information that is more closely associated with a species’ physiological constraints. For example, frost-free degree days, growing degree days, wettest month, and seasonal anomalies will generally capture biological measures better than weather-specific data, such as temperature and precipitation.

The Palmer Drought Severity Index (PDSI) models the difference between the amounts of precipitation required to maintain a normal water balance and the actual amount of precipitation occurring at a particular time (Palmer 1965). However, PDSI has been criticized for its unreliability in heterogeneous regimes (e.g., western U.S.) because it does not account for the sensitivity of local precipitation or for local climate normals across seasons. The Self-Calibrating Palmer Drought Severity Index (SC-PDSI) developed by Wells et al. (2004) calibrates Palmer’s empirical constants of the climatic characteristic and the duration factor, which accounts for local and temporal variations. In addition to these distinctions, PDSI is available as discrete data, but FORT investigators are interested in developing continuous surfaces (Figure 1).

Figure 1
Figure 1. An example of a continuous surface (L) versus a discrete surface (R).

The PDSI is derived by aggregating soil moisture and climatic conditions into 344 divisions across the contiguous United States. Because of computing-resource limitations that existed when this measure was developed, Kangas and Brown (2007) addressed the question of scale and local variability of the PDSI by deriving a continuous surface (4-km spatial resolution) of the index using PRISM (Parameter-elevation Regressions on Independent Slopes Model) data and State Soil Geographic (STATSGO) data (see http://soildatamart.nrcs.usda.gov/) as model inputs.

FORT already has developed bioclimatic variables, and potentially will be developing PDSI and SC-PDSI time-series datasets (Table 1) across the conterminous U.S. as continuous surfaces. In addition to using higher-resolution climate data, we will also use the Soil Survey Geographic (SSURGO) database for deriving the available water capacity (versus STATSGO, which has a larger minimum mapping unit). These data will aid researchers in analyzing organisms’ home ranges, which in turn will help determine the relevance of climate predictors to those species as well as provide information for predicting historic, current, and forecasted species distributions. Though many scientists use bioclimatic variables, our work will provide increased resolution by using downscaled climate data (PRISM OSU and Climate Source) and therefore increased variance resulting from finer-grained topographic and vegetation influences. Because continuous bioclimatic variables and continuous PDSI/SC-PDSI data are not available, the derivations of these data will provide important contributions to researchers.

Table 1. Data inputs and products.

Data Input (PRISM)

Spatial Resolution

U.S. Data Products

# of Inputs (Converted
/ Derived)

State of Project

Time-series (1895-2008)

4km

19 bioclimate variables

4407 / 2260

Completed

Time-series (1980-2009)

2km

19 bioclimate variables

1131 / 580

Completed

Normals (1971-2000)

800m

19 bioclimate variables

39 / 20

Completed

Normals (1971-2000)

400m

19 bioclimate variables

39 / 20

Completed

Time-series (1980-2009)

4km

Continuous PDSI/SC-PDSI

0 / 2712

Pending

Time-series (1895-2008)

2km

Continuous PDSI/SC-PDSI

0 / 696

Pending

Normals (1971-2000)

800m

Continuous PDSI/SC-PDSI

0 / 26

Pending

Normals (1971-2000)

400m

Continuous PDSI/SC-PDSI

0 / 26

Pending

Total Number of Files

--

--

5,616 / 6,340

--

Total File Size (Read / Write)

--

--

39 GB / 26 GB

--

 

Problem

The amount of data and analytical steps required to render the products described above are complex. For example, each set of 20 bioclimatic variables (20 products derived per year) also require the development of approximately 200 interim datasets, for a total of 576,000 datasets to obtain the final products. All datasets require conversion to GIS data formats, conversion of units for precipitation and temperature using relevant scale factors, a defined map projection, and the importation of metadata. We will also be working with Relational Database Management System (RDBMS) spatial datasets for deriving available water capacity and manipulating the data in order to derive SC-PDSI. Therefore, not only does this project require deriving multiple products, it also requires a significant amount of data management, multiple analytical procedures, and numerous processing dependencies.

Solution

Due to the volume of data, the data’s spatial extent (conterminous United States), the spatial resolution, and its temporal resolution, FORT will use High Throughput Computing (HTC) via the HTCondor1 system to manage the processing of jobs. The first component of this project, derivation of bioclimatic variables, was successfully completed using HTC (See HTC Computing Times). We developed an application that creates all the necessary files to run Directed Acyclic Graphs for the bioclimatic analysis, as well as a stand-alone application that creates the bioclimatic variables on each HTCondor machine. For the second component of this project, derivation of PDSI and SC-PDSI, we will first convert all data into a usable GIS data format and then spatially divide the data into smaller extents for processing. We will write programs to be used in HTC for much of the processing of climate and soils data. While deriving PDSI and SC-PDSI, we will rely on a RDBMS (ESRI ArcSDE and PostgreSQL)1, multiple flat-file formats, GIS software, C++ software, and an HTC system. After the data is processed for each spatial extent, we will develop continuous surfaces for the conterminous United States.

References

Also @FORT

O’Donnell, M.S., and D.A. Ignizio. 2012. Bioclimatic predictors for supporting ecological applications in the conterminous United States. U.S. Geological Survey Data Series 691.

1The use of any trade, product, or firm names is for descriptive purposes only and does not imply endorsement by the U.S. Government.

Top of Page
Skip navigation and continue to the page title

Accessibility FOIA Privacy Policies and Notices

Take Pride in America home page. FirstGov button U.S. Department of the Interior | U.S. Geological Survey
URL: http://www.fort.usgs.gov/Condor/BioclimaticVariables.asp
Page Contact Information: AskFORT
Page Last Modified: 10:37:11 PM