Because of the loss, degradation, and fragmentation of sagebrush landscapes in the western United States, Greater Sage-grouse populations have been declining. Scientists at the USGS Fort Collins Science Center (FORT) are evaluating the effects of oil and gas development on approximately 2300 Greater Sage-grouse lek sites that have been monitored for the last 50 years. Specifically, we are evaluating both the density and dispersion of oil and gas wells during the same period in relation to the monitored sites. The density of wells adjacent to lek sites is documented as an important factor for indicating lek persistence (Aldridge and Boyce 2007, Doherty et al. 2008, Walker 2007). However, the spatial pattern, or dispersion, of oil and gas wells has not been studied, and we believe a dispersion index will better approximate disturbance and fragmentation and will therefore better explain changes in habitat use. Dispersion is calculated by quantifying the distance between all combinations of well locations and their nearest neighbor. Then all “nearest distances” are averaged to determine if the wells fit a random or clustered distribution. The dispersion index is therefore the ratio of the observed average distance to the expected average distance for a random distribution.
FORT researchers are evaluating the density and dispersion of oil and gas well sites within 6 defined scales (patch size) surrounding each of 2,300 lek sites over a period of 50 years. As a result, 690,000 spatial data queries (6 scales x 2300 lek sites x 50 years) and 1,380,000 calculations (2 x 690,000) must be made to derive both the density and dispersion of oil and gas wells. The amount of resources and time required for making all these calculations is considerable. We originally started these analyses on 4 desktop computers and after more than 2 months of processing, the results were nowhere close to being completed. As we encountered operating system interruptions due to maintaining security policies throughout the duration of these jobs, we had to manually resume the analysis where they left off. Even without the interruptions, if each combined query and calculation required approximately 1 minute, we would use approximately 137 days on a single machine to make all these calculations.
FORT is now using High Throughput Computing (HTC), in the form of HTCondor1, for the analyses described above. A generalized workflow for this analysis is provided in Figure 1. We will also be using a Relational Database Management System (RDBMS) and ESRI ArcSDE1 for data management. This necessitated (1) installing the ESRI desktop application on all HTCondor machines, (2) installing ArcSDE on a server and (3) ensuring that the RDBMS (PostgreSQL1 in this case) was capable of handling all the port connections.
For each job, we must make an ArcSDE direct connection and an RDBMS connection. If we have 50 machines running jobs, this is 100 concurrent connections. ArcSDE and RDBMS allow concurrent read/writes, which we could otherwise not accomplish if the files were stored using a different format. However, after setting up the necessary database, these analyses are now not only possible but can be completed in a matter of days rather than months. Therefore, we were able to complete the analysis of this project in 135.5 hours using HTC (see HTC Computing Times).
1The use of any trade, product, or firm names is for descriptive purposes only and does not imply endorsement by the U.S. Government.