High Throughput Computing: A Solution for Scientific Analysis

Introduction

Public land management agencies continually face resource management problems that are exacerbated by climate warming, land-use change, and other human activities. As the U.S. Geological Survey (USGS) Fort Collins Science Center (FORT) works with managers in U.S. Department of the Interior (DOI) agencies and other federal, state, and private entities, researchers are finding that the science needed to address these complex ecological questions across time and space produces substantial amounts of data.

The additional data and the volume of computations needed to analyze it require expanded computing resources well beyond single- or even multiple-computer workstations. To meet this need for greater computational capacity, FORT investigated how to resolve the many computational shortfalls previously encountered when analyzing data for such projects. Our objectives included finding a solution that would:

  • harness existing Computer Processing Units (CPUs) when they're idle to run multiple jobs concurrently, which reduces the overall processing time without requiring additional hardware;
  • offer an effective, centralized job-management system;
  • handle job failures due to hardware, software, or network interruptions (obviating the need to manually resubmit the job after each stoppage);
  • be affordable; and most importantly,
  • allow us to complete very large, complex analyses that otherwise would not even be possible.

In short, we envisioned a job-management system that would take advantage of unused FORT CPUs within a local area network (LAN) to effectively distribute and run highly complex analytical processes. What we found was a solution that uses High Throughput Computing (HTC) and High Performance Computing (HPC) systems to do exactly that (Figure 1).

Condor diagram.
Figure 1. A collection of desktop workstations and servers illustrating a centrally managed system where processes can be distributed and managed throughout a Local Area Network (LAN).

Solution

We selected an open-source HTC product known as HTCondor1, which was developed at the University of Wisconsin in 1988. HTCondor is a job manager that allows users to submit jobs to a “pool” of computers by matchmaking job requirements with available workstations. An HTC system allows us to utilize hundreds of processing cores (each workstation having from one to many) that are otherwise not in use—say, at night, during weekends, or when the user is away—without impacting users.

In comparison, high performance computing is a system that uses fewer cores and more memory. HPC is required when large datasets are read into memory and analyzed as a whole, which cannot be handled on standard 32-bit workstations. Although 64-bit workstations can retain more information in memory, they can also reduce processing speeds when the memory is unused. Distributed computing (HTC) is therefore useful for disseminating thousands of smaller memory jobs to a network of 32-bit and/or 64-bit workstations, while 64-bit machines are required when large amounts of memory are needed to process the information. When analyses require both significant amounts of memory per core and many cores (jobs), one can use HPC and HTC in tandem. This is known as High Throughput Performance Computing (HTPC). Although FORT is not currently using HTPC, we do have the hardware for using high throughput computing on high performance servers.

FORT is among the first of USGS science centers to use such technologies. Additionally, from a research standpoint, FORT is testing the use of HTC in areas where it has not been widely applied. For example, we are using HTC for many different types of spatial applications that employ Geographic Information Systems (GIS), remote sensing, and geostatistical simulations. These important scientific applications support the incorporation of broad-scale climate effects and land-use changes into analyses of biological processes, so that scientists and natural resource managers can better understand, predict, and respond to changing conditions on public lands.

FORT Research and HTC

Although FORT’s use of HTC has only begun and is therefore simplistic relative to most grid computing systems (e.g., FORT is not managing a lot of machines, we are not spanning across firewalls, and we are not running processes on multiple collections of HTC systems which are managed differently), the scalability and functionality of HTC systems has considerably expanded the range of possible analyses that we can pursue. Even though our HTC system functions locally, our ability to investigate questions requiring larger computing power (as demonstrated below) is allowing us to ask different and important research questions that we could not have investigated otherwise. Recently, FORT has used HTC on a wide array of natural resource projects that lend themselves to such analyses, including the following:

(Click an example application below to learn how HTC augments research results.)

  1. Maximum Entropy (MaxEnt)
  2. 2D/3D Hydrodynamic Numerical Modeling
  3. Species Trend Analysis and the Effects of their Surrounding Environment
  4. Bioclimatic Variables and Continuous Surface Drought Indices

(The following two examples have not been fully implemented in HTC. However, they are included to demonstrate how HTC can be used in different applications with which FORT is often involved.)

  1. Landscape-Scale Ecological Modeling
  2. Geostatistical Simulations and Statistical Analysis

The Future of HTC at FORT

As management issues and the associated research questions grow more complicated and multi-dimensional, the use of HTC is proving invaluable in allowing FORT to undertake projects requiring extensive analyses. Our immediate and primary goal is to assist FORT scientists with understanding and exploring the use of HTC for both current and future research projects. As demand increases, FORT will assess resource and user needs to determine whether investing in dedicated clusters will be necessary to permit us to increase the number of jobs and duration of jobs. Sometimes jobs run several days, and without dedicated clusters this can be difficult to achieve because jobs are removed when users return to their machines. (The longest duration a job at FORT can run is approximately 2.5 days.) HTC will not only meet DOI and USGS goals for enabling better research, but also help meet computer reduction initiatives, such as sharing computing resources and enabling power management and green computing. Thus far, HTC has allowed FORT to complete many projects that were initially believed impossible to accomplish due to the amount of processing time (see HTC Computing Times). We hope to continue these efforts and further expand the types of research questions we can address as a result.

1The use of any trade, product, or firm names is for descriptive purposes only and does not imply endorsement by the U.S. Government.