Fort Collins Science Center

You are here:  FORT > Products > Pubs > 4037 > Calibration & Validation

SNTEMP (In)Frequently Asked Questions:
Calibration and Validation Issues

Back to SNTEMP FAQ

Q4. I’m working on the River D… again. It’s below a major dam, with fairly uniform flows year-round. The study reach is 100 miles long. The data collectors, whom I’m not affiliated with, are getting mainstem temps every 5 miles, all the way down. Obviously we can’t have 20 validation nodes. Is there some kind of rule-of-thumb for how many of these thermographs might be useful to us?

A4. Actually, there is no reason that you cannot have 20 validation nodes in SNTEMP. You may not WANT 20 because you may not want to deal with the mass of data, but that is another issue.

I’m not sure that there is THE answer to your question. If fish, or whatever is important, are not distributed randomly, then thermographs in locations most biologically important would be my first priority. If fish are random, then just making sure that you are predicting well below each major tributary would be my next cut, I think.

Ongoing research seems to be indicating that water temperatures may be more variable longitudinally than we think, and than our models predict. I wouldn’t be surprised if you find that the model seems to perform well for some validation nodes and not for others. And for those which don’t seem to perform well, my guess is that the thermographs will read cooler in hot weather than the model predicts, and maybe cooler than the thermographs on either side of the "aberrant" one. It will be interesting to see what you find.

The other "researchy" thing to mention relates to travel time. Remember to think about the water’s travel time through the 100 miles. During high flow the model might work OK. During low flow, a multi-day’s simulation may be the only safe way to go, unless you can live with the assumption that neither the flow nor the meteorology really change much from day to day. The model won’t work well otherwise.

But back to your question, one is too few. The opportunity for error or being right for the wrong reason is too great. Thermographs further downstream provide more information, in a sense, in that those furthest upstream merely reflect release temperatures. But if things are as steady as you imply, one every ten miles would be good enough for me. You probably will find that some thermographs have problems, gaps in the data, whatever. [Added 12/2001]


Q17. At some point, we may need a "second opinion" on our results. Can you or someone from your group conduct an independent review of our model results? If not, whom would you recommend?

A17. I’d be happy to provide an informal review, but if you really want a peer review, I’d just as soon not play that role if I can avoid it – don’t really want to spend time in court. I can recommend some knowledgeable folks if you wish. [Added 12/2001]


Q24. Assuming we successfully use the current data to calibrate/validate the model, would the resulting model be applicable to other years? Also, what will happen with different dam-release alternatives?

A24. As with any model, one’s confidence in the results grows after demonstrating that the model performs well over the range of conditions you may expect to find, or even to predict. But one strength of a model is to predict things you cannot easily measure.

Being a physically based model that has been used in a variety of circumstances, I can safely say that SNTEMP predicts well, generally less than 0.5 C on average and less than 1.5 C most of the time, given representative input data. If it predicts generally within these bounds, relative temperature changes that one can expect from a variety of management actions in other year types may be used as good guidance. However, models are always wrong to some degree, so setting the model at risk is a wise idea – i.e., keep your eye on it!


Q25. Is a calibrated model legitimate to use for analyzing what happened in other years, or what will happen with different dam-release alternatives?

A25. As with any model, one’s confidence in the results grows after demonstrating that the model performs well over the range of conditions you may expect to find, or even to predict. But one strength of a model is to predict things you cannot easily measure. Being a physically based model that has been used in a variety of circumstances, I can safely say that SNTEMP predicts well, generally less than 0.5 C on average and less than 1.5 C most of the time, given representative input data. If it predicts generally within these bounds, relative temperature changes that one can expect from a variety of management actions in other year types may be used as good guidance. However, models are always wrong to some degree, so setting the model at risk is a wise idea. I.E., keep your eye on it. [Added 12/2001]


Q26. Would using smaller data intervals to increase the information in the calibration phase (i.e., weeks vs. months) improve the applicability of the model to other years?

A26. The longer the averaging period, the better the model performs. This is an artifact of averaging out the highs and lows that are so often inaccurately simulated. But the "applicability" depends on your objectives. If you need to make weekly decisions, depending on say weekly meteorology, a monthly model obviously won’t cut it. If your question is, does having a greater sample of calibration points covering the same range of conditions improve the model, I’d say yes. It may not actually improve the goodness of the calibration, but it will provide more info on the relative goodness of the model. That is, the goodness of fit statistics will be more revealing. [Added 12/2001]


Q27. Would using smaller data intervals in the calibration phase improve the applicability of the model to scenarios involving alternate dam release strategies for the current year?

A27. Again, depending on your objectives, I’d say no. However, what your question really sounds like is that you don’t feel fully confident that you have a sufficient data set. I recommend that you follow your intuition on this. If you need to make recommendations this year, be conservative by about the magnitude of the probable error shown in the calibration statistics (Table 7). Recognize that any given prediction in time and space can be off by the maximum error, and the more really different conditions encountered as you go forward, the more that maximum will increase. But at the same time, the probable error will close asymptotically to some value I’m guessing will be about 1.5 C. [Added 12/2001]


Q28. Can we use the data from previous years to test our calibrated model?

A28. Sure. Unless some conditions were drastically different from the conditions for which the model were calibrated, this is a good true validation test. [Added 12/2001]


Q29. Does the lack of comparable weather data preclude any meaningful testing of our model with known water temperatures from previous years?

A29. See above answer. But if you mean that you had to use offsite met data rather than on-site data, you may be able to correlate the remote data to the local data and essentially calibrate or adjust the met data before putting it into SNTEMP. [Added 12/2001]


Q30. How much would we lose by using monthlies vs. weeklies for the Calibration phase when only one year’s worth of data is available?

A30. I’m not really sure what you are asking. Lose in predictive ability? You lose the ability to predict on the finer time scale. If your monthly probable error were 1.5 C for example, your weekly error bound will be higher, but I’m not sure by how much. [Added 12/2001]


Q46. Does model error (summary statistics –0.7± 1.2 C (50%CI) really have any meaning for selecting a flow to meet an objective? My feeling is that it really does not for two reasons. First, the error is relative between flows evaluated so that error really becomes moot. Second, the model is predicting system behavior pretty well as evidenced by the observed and predicted comparisons. What are your thoughts? How might the opposition deal with this topic?

A46. These are good questions that are being debated all the time. I know my answer to this question differs from some other "experts" in the field. I believe that a competent lawyer/expert can be expected to challenge almost anything. There are several points that I’d want to make:

  1. 0.7 degrees is a tad more bias than I’d like to see. I usually recommend being less than 0.5 degrees, which is arguably within the accuracy of the measured data. In the future, it might be good to reduce the bias if that can be done without negatively impacting the r-squared vales.
  2. The negative bias means that, overall, the model is tending to simulate 0.7 degrees cooler than "measured" data supports. If you were identifying a flow to meet or stay below a certain specific temperature target at a specific location. I suppose that one might argue to actually factor that into the equation. That is, let’s say you wanted to keep temperatures at or below 16 degrees at the mouth of the river during April and May and that 450 CFS did that 90% of the time (or whatever). Since the model simulates cooler than reality, it might actually take 475 CFS to accomplish the same thing. This actually works in favor of those who want to use less water, so they probably wouldn’t bring it up. Those wanting more water in the river might, however pick on this.
  3. The statistics of more importance might be not the overall statistics but the statistics more closely representing the flow(s), time(s), and place(s) of interest. That is, what are the calibration statistics for the mouth during April and May (or whatever) when the actual flows were near the range being simulated? They are more to the point.
  4. I agree with you completely on the confidence interval. However, others may argue that a flow that provides temperatures and is within 1.2 degrees F is all that you can support.

Q47. Is the way the SNTEMP model is being used to attain a target temperature appropriate? I believe that it is a very appropriate use of the model. How else could one arrive at a best guess without the model? Any thought or opinions?

A47. I couldn’t agree more on this one. How else? Strict empiricism rarely works because there is always more data than you can ever fully use and never just what you want. [Added 12/2001]


Q49. In the current draft of our report, the accuracy of the model is characterized as:

"-0.70 ± 1.2 C (50% confidence interval)." The model bias of -0.70 has not been accounted for in the model output, or in the magnitude of flow necessary to meet the smolt temp objectives. Should this bias be accounted for (either by adjusting model output or by adjusting the criteria)? By not addressing the model bias, is this going to be a weak point in justifying the flow recommendations necessary to meet the smolt temp criteria?

A49. Yes, a competent lawyer/expert may pick up on this. The model as calibrated tends to under simulate water temperatures, though it may be more important to see how the model performs at the mouth of the river during the spring. One could take the bias into consideration in evaluating the flow recommendations. I have never been too focused on this because most times the bias is less than 0.5 degrees C, which is probably about as good as the "measured" data anyway. It would be good to have a well thought out answer to this kind of question. Factors to include would be which side to be "conservative" on, how taking the bias into consideration would or would not affect the "answer", whether the biological criteria are themselves conservative (i.e., Coutant's 2 degrees C wiggle room), and the fact that this model, like all the others used, provides a recommendation that then is to be tested in the field. [Added 12/2001]


Q50. Is the 50% confidence interval a common CI used in temperature modeling? This is the first time I have seen a 50% CI and it appears to give an overly optimistic portrayal of the accuracy of the model when in reality you would only expect (statistically) your estimate to be contained within your CI half of the time.

A50. You are correct that the 50% value is unusual. When SNTEMP was developed, the authors were employing a lot of grab sample data, some of which was not likely to be very representative of "reality". The 50% value probably seemed to be a useful statistic. It does mean that 50% of the time, the model does worse than the "published" value. The maximum error is printed however, and may be reported along with the others. Being a mean daily model, there are some time periods that will not simulate well. Take for example periods of night/morning fog versus afternoon/evening clouds. Both may have the same influence on the total solar radiation, but alter the mean and the max water temperatures in different ways. I don’t tend to be too concerned about it because it’s only a model and no other is likely to do much better without a substantial increase in the cost of data collection and simulation. However, I’m neither a lawyer nor a judge.

I disagree that the 50% value is overly optimistic (though I know what you mean). It is simply a statistic. You could easily calculate the 95% value if you wish. [Added 12/2001]


Q51. I could see using a 50% CI if you used the temperature at the lower bound of the CI as the management target so you would have a 75% chance of achieving the appropriate temperature (or cooler water), but I’m somewhat concerned using the mean values of the target with such a small CI. What’s your opinion on this?

A51. I don’ t fully follow you, though I see where you are headed. One could certainly do most anything in this regard. You could take both the bias and the CI into consideration, realizing that no matter what you do you will be wrong some of the time. I think the important thing is to document what was done and why. Modeling the central tendency of the model matched your focus of using median hydrology and meteorology. If you want to focus on the extremes, it might be better to simulate the 15 or 20 percent exceedence hot-dry conditions. I think this gets back to your objectives. [Added 12/2001]


Q62. In Table 9, I see that the Determination and correlation coefficients are in the range .9 and up for about 18 weeks, then they take a nosedive. Could you enlighten me on this? I’m not sure what it is we are correlating and determining.

A62. Good Spotting. I hadn’t noticed that. However, I think that what you are seeing is just a "statistical anomaly" but I’m not sure. The error terms remain quite low during this "off" period and I suspect that with just 6 elements in the mix across all validation nodes that it just so happened that the correlation was poor. It might be worthwhile to create a scatter plot across all the nodes for one of those "poor" weeks and see if you see anything that explains the poor correlation. [Added 12/2001]


Q72. I have supplied all temperatures and flows for a point source node, but the model still calculates a regression, the results of which are inaccurate. Can this be turned off, or is it relevant to the results?

A72. Under certain circumstances, the model will calculate these regressions (shown in Table 7) for all "headwater" nodes regardless of whether it needs to fill and/or smooth the boundary conditions. But if there are no missing values, the model will not use the results. [Added 12/2001]


Q74. I’m unclear on how the calibration constants in the job control file work. Do they have the direct effect of altering the data by some percentage value and if so, how is it expressed in the job control file (i.e., as a proportion or percentage)?

A74. The formula for all the job control calibration constants is Y^ = A0 + A1(Y) where Y^ is the "new" value, A0 is the "constant" and A1 is the "coefficient" in terms of what is called for in the job control file. See pages III-67 and III-80 in IP#16. [Added 12/2001]


Q75. What type of regression would seem the most logical downstream of a small, flow-through impoundment?

A75. I assume you mean from measured data to fill in missing values. I’d just use the zero lateral heat transport - - option 1, but I always recommend trying them all and picking the best. [Added 12/2001]


Q76. To "validate" the model from this one day, I plug in humidity, air temperature and flow from other days in the summer (with similar Langleys, day-length, etc.) and see if errors are similar to the output from the first chosen day. So far my errors are all similar and generally less than 1 degree F. Is this a reasonable and important exercise? Would you suggest a better (more rigorous/acceptable to peer review) way?

A76. Your flavor of "validation" is fine IF you only (or mostly) want the model to work for "similar" days. If that is your objective, fine. But if you really want the model to be "valid" across a range of conditions (e.g., flow, air temp, etc.) then you must test your calibrated model against conditions that are outside of, or at least push the limits of, conditions for which the model wasn’t calibrated - - and independent data set.

SSTEMP really wasn’t set up for validation as in SNTEMP which calculates all the goodness of fit statistics and so on that likely would be more acceptable to peer review. Nonetheless, you can do an OK job if you compile the error terms in a manner that lets your audience know how far to trust the model for conditions that you wish to make management predictions for, a range that may be outside the domain of your measured data. [Added 12/2001]


Q77. I would also like to conduct a sensitivity analysis similar to page 11 of Information Report 13, but I can’t imagine how to do it in a manageable way. It seems a daunting task to vary everything simultaneously with SSTEMP. What is the easiest way to do this, and so you guys have a separate module to do this?

A77. The Stream Segment Temperature Model, SSTEMP, has been reprogrammed for Windows 9x and NT operating environments. This new version combines all of the previous DOS SSTEMP utilities (Temp, Solar, and Shade) and adds several new features including Windows Help, automated sensitivity analysis, and graphic displays. The program requires up to 6.5 MB after installation and works best with a 1024 x 768, small font display.

If you are interested, a compressed (3.3 MB) version of the installation file, SSUNZIP.exe, may be downloaded from the Internet at:

http://www.fort.usgs.gov/products/software/temp/temp.asp

Running this file will unzip its contents into C:\Windows\Temp. Then you should run Setup.exe from the Temp directory. [Added 12/2001]


Q83. My first question is how much fudging and manipulation can a consultant do to receive a favorable answer that a client needs? Second, what indices are the most used to be able to control a favorable answer if the consultant is somewhat underhanded in dealing with this formula?

A83. As with any model, if one is really "underhanded" you can get the model to do just about anything you want. This can range anywhere from fudging the measured data, to very "liberal" (or conservative) assumptions, to doctoring the model’s output. There really isn’t much that can be done about this level of manipulation except essentially duplicate the whole analysis from A to Z. Obviously, monitoring the post-project condition is in order. [Added 12/2001]


Q84. I have a consultant that tells me through his analyses that the temperature will be raised extremely high in the upstream tributaries [through the ponding project], but it will not affect the downstream trout area due to intervening groundwater influences. I am truly skeptical about his conclusion. If the upstream area that will be impounded by sediment ponds will have a high maximum temperature, then I would think that during a low flow period or worst case scenario like we are having now, the downstream area’s maximum temperature would increase even more since the highest water temperatures are at the lower end of the stream already. This is also without the sedimentation ponds in place.

A84. Unfortunately, the answer is usually "it depends". If inflows are small relative to either groundwater or meteorological heat fluxes, the consultant could be right. It’s best to ask for a copy of the model and all supporting data and assumptions. Double check everything, and see for yourself whether the conclusions make sense. And whether there may be some counterexamples - - like the worst-case scenario you mentioned. [Added 12/2001]


Q106a. Are there any times you would use a model that was calibrated but not validated?

A106a. Yes. For example, the SNTEMP model has been used so widely, that it appears to be "valid" for a wide range of applications (settings, circumstances). Unless there was something that seemed very unusual about the situation, I'd have no problem using it without a formal model validation. The exception would be if I felt that the model was "over-calibrated." I firmly believe that if one must adjust many input values to fit their data, it is not likely to work under altered circumstances. In general, the less calibration you need to do, the more "valid" I believe the model would be. [Added 12/2001]


Q106b. Would your answer be the same whether we were talking of a stream temperature model or HEC-5/5Q?

A106b. I don't know HEC-5Q as well as SNTEMP. The little we have used it here, it does not seem to be as robust as SNTEMP for water temperature, and has not proven very satisfactory for DO. I don't think I can really answer this question. [Added 6/2002]


Q107. I have another question regarding validation/calibration. I am trying to validate water temperature data for a couple of summer months in 1997 and want to use the root mean square error as a statistical index. When I input the variables into the SSTEMP model, I change the ground temperature to the mean monthly air temperature for that month. I also change inflow temperature, segment inflows and outflows, and air temperature; everything else I leave the same. Even though I make Segment Inflow and Outflow equal, and accretion temperatures zero, ground temperatures still affect my results. My first question is: does changing the ground temperature from month to month make the root mean square error comparable from month to month? The bigger question I'm getting at is: what variables can you change without making the root mean square error invalid. I ask this question because when I was using the daily air temperature as the ground temperature, my results where very close to the observed values, however, when I was using the mean monthly air temperature as the ground temperature, my results were not as accurate. I suspect if I used the mean annual air temperature as the ground temperature, my results would be even more inaccurate.

A107. I'm not quite sure I'm with you, but let me see if I can head in the right direction.

The root mean squared (RMS) error may be calculated for any time series comparing simulated with observed values. You could create a set using mean daily air temps as ground temps, mean monthlies, or annual. If as you describe your situation, the dailies provide more accurate results, this then will be reflected in a smaller RMS error. You could use that fact then as justification for calibrating the model with daily ground temp values (from air temps) and no one should complain. You can change any variables you want without "invalidating" an error metric as long as you calculate that metric the same way in all cases.

In addition to the RMS (or simply mean absolute error) I would also recommend simply calculating the mean error since it will clearly show you whether the model is over or underestimating the measured values.

How about sending me your SSTEMP data set (choose Save As). I'm guessing you must be dealing with a very small stream with a fairly large width. Am I right? [Added 6/2002]


Q108. I am working on calibration - I under predict in all locations. My reaches tend to be long (several miles) - will shorter reaches heat up faster? Just to be sure - I am looking at the last table in the KVRTRNS file at model predictions at my nodes. Correct?

A108. The length of the reaches should make no significant difference as far as I am aware. [Added 6/2002]


Q109. The predicted temps were off one day when compared to the actual temps. Any ideas on why this would happen?

A109. One of three things: (a) the model is wrong, (b) the input data are wrong, (c) both. The model can be wrong for many reasons, but they all fall under the heading of averaging. As an example, 24-hour period that is cloudy all day and clear at night is vastly different from one that is cloudy all night and clear all day, yet they both look the same to SNTEMP. [Added 6/2002]


Q176. I've been having a bit of a problem calibrating SNTEMP in a small subbasin (~17 sq miles) of the M. watershed in California, and I'm hoping that you might be able to give me some insight.  The problem I am having is that the model's error ranges +/- 2°C, and tracks with air temperature very well, i.e. when air temperature swings low the model under-predicts, and when air temperature swings high the model over-predicts.  It seems the model is not accounting for air temperature correctly.  Under what conditions would SNTEMP overestimate the effects of air temperature?

Some notes and observations:

Any insight that you can provide will be greatly appreciated.

A176. What you have described I believe may be indicative (primarily) of there being more water in the "river" than you have measured.  How likely might it be that there is actually a good bit of subsurface flow in the river?  Is there anywhere along the river where bedrock comes to the surface where you could compare measured streamflow with what you are using for gaged flow? 

If what I believe is true is indeed the case, the situation is problematic.  Adding streamflow (if you could get a good estimate of the difference) is what should be done, but then other problems occur.  Likely the thermal gradient should also be different, but I can offer little guidance on what that parameter’s value should be.  I would recommend trying widely different values to see if that helps at all.  Ground temperature and accretion temperature all will be equally important.  And the travel time, if measured by a dye study, should be considerably longer than one would otherwise expect.

I know little about the Bowen Ratio, but I don't think I'd mess with that unless you had a good reason to do so.

Added later – Another thought is that you might be underestimating relative humidity.  Check Information Paper 13 for guidance.


Q177. My model appears to be predicting temperatures that are highly variable over time, that is, the model is predicting "spikes" in the temperature that are way above or below the observed data on certain days (and are the same days for each year).  This appears to be happening on a regular (weekly) basis for all the years modeled.  So essentially what is happening is that about every six days the model sees a sharp increase/or decrease in the temperature that is not shown by the observed data.  My question then is this: What inputs/parameters could be causing this regular occurrence of temperature "spikes"?  To what input files should I be looking at so that this could be fixed?

Later - Just wanted to let you know that I think I figured out the problem that I was having with the spikes in the predicted temperatures.  I was on your Q & A web-site for SNTEMP and I came across a discussion you had with someone about solar radiation.  This person was trying to decide if he should enter in his one year of solar radiation data (that he thought it may have been bad data) or just let the model figure it out.  That got me thinking...although I have no reason to believe that the solar radiation data that I have is bad data (it was taken at a fire station vary near the river, hourly for all of 2002), the Q & A discussion and the fact that I was getting bad solar calibration factors in the KVRMETR output file helped me decide to run the model without the solar radiation data and let the model figure it out on its own...It appears to have worked. [Just goes to show that you shouldn’t take my advice J]

A177. Spikes.  My supposition is that you have a formatting problem.  Some value(s) somewhere in your input files are either missing a decimal point or are shifted to the left or right out of their column-defined field.  Since the problem appears periodic, I suppose it is fair to say that it is in one of the files with time series data, which means the hydrology data file or the meteorology data file.  At least that's where I'd start to look for it.  If you don't spot the problem by looking at the input files, then make sure you carefully scrutinize the output files, especially tables 6 and 8.  Look at the days that have the problem compared with the others.  I'm guessing that it won't be that tough to track down.


Q178. In brushing up on the IF 312 class notes, I came across the notes about the KVRMETR output file, in which it states (p138) that the solar radiation calibration factors should be between 1.1 and .09.  In looking at mine, this was not even close to the case.  What does that mean? How should I go about fixing it?

A178. My supposition here is that you have used percentage values instead of decimal fractions for one or more of the input values.  Remember that SSTEMP uses percent values for relative humidity, possible sun, ground reflectivity, and (if you used it) total shade, whereas SNTEMP wants decimal fractions instead.  Is this a possibility?


Q179.  Along the same lines of calibration factors, in topic 18 (p 124, Job Control File) under user-supplied parameters there are some default values listed, e.g., evaporation factor (EFA and EFB), Bowen Ratio, etc... through to p. 125.  I guess I am a little confused here, do I need to enter in those values, or by leaving them as zeros does that assume that those values will be used?

A179.  Leaving most things blank (or zero) will use the defaults.  Look at Table 10 when in doubt.


Q180.

  1. On page 93 of Information Paper 13 you list some guidelines for acceptable errors when running SNTEMP.  One of then says that no more than 10% of the simulated temperatures should be more than 1°C from the observed temperatures.  In doing this calculation, it seems that the H nodes should NOT be included in this analysis because I am telling the model what the temperatures are at these locations, and the model is not predicting anything at these nodes.  Is this a correct assumption? Or should the H node observed vs. simulated be included in the error analysis?
  2. In the VSTATS file it gives you statistical summaries for the V nodes in the system.  In my system I have temperature information for four of the six years modeled at the V nodes (I selected these sites because I had USGS measured flows at these locations), and the model had to fill the missing temperature information for two years.  Do you think this is causing the VSTATS to be better/worse then they should be because of the missing temperature values at these V nodes?  Should I add another V node at a location where I have temperature information for all years modeled, and estimate the flows (which I have already done for some locations (Q nodes)? Will this help give me a more accurate VSTATS output?
  3. I am having a hard time understanding what exactly the determination coefficient is, and what relationship it may have to the correlation coefficient.  Although I have seen the R2 value used in most of the literature that I have read, this D value has not been mentioned once, is it not as important as the R2 value?

A180.

  1. You are absolutely right.  I'm sure I was thinking of V nodes if I didn't say that.  Occasionally, but not much anymore, you also need to be concerned with filling the record at an H node.  In such cases you need to carefully examine the regression that gets put together for that node, but this process is really different from the issue you raised.
  2. You have asked a very perceptive question here.  Yes, the model will very likely report misleadingly good statistics at a V node where some filling (or smoothing) has taken place.  (Just so you'll know, the VSTATS program knows nothing about any filling or smoothing that may have taken place before in the model.  It assumes there is good data everywhere.  More precisely, I suppose, is that Theurer considered the process of filling missing values as separate from the heat transport model.)  Your suggestion is appropriate, but what I might have done is to export the temperatures and calculate my own goodness-of-fit statistics only for the known values outside of the program.  This would likely be less work and less error prone.
  3. D = sqrt(R2) = R.  I do not know why Theurer used D instead of R2, but suspect that it (and using the 50% confidence interval), too, puts a better face on the results.  On his behalf, however, he was used to dealing with poorer quality data than we can generally get these days.  We too often forget that 1984 was a long time ago.  Remember that these statistics are just that -- statistics -- and they may or may not be appropriate for your objectives.

I am not a statistician, but I believe R2 is technically the coefficient of determination -- so-called because the value supposedly tells you how much of the total variation in measured values can be “explained” by the variation in simulated values.  This can be misleading, however, because some of the assumptions in making this calculation are violated in daily water temperatures, specifically that today's temperature is independent of yesterday’s temperature.  R (and D) are the correlation coefficients.  Be careful.


Q181.

  1. I have another question about "acceptable" error as outlined on p. 93 of Information Paper 13. Do those standards apply to ALL nodes within the system (minus the H nodes), or only the V nodes? I know we have discussed this before but I wanted to make sure I was clear.  I have done it for both and am getting 10.9% over 1°C difference for all nodes, and 9.41 % for the V nodes only.  So I am close either way.  But I am getting a maximum error of 4.52°C when using all nodes and have quite a few over 1.5°C, where as I am only getting two over 1.5°C and a maximum of 1.92 for the V nodes only, so you can see why I want to be sure of which nodes to do this analysis on.
  2. If it does apply to all nodes, then in order to do the error analysis for all time periods, I had to "cut out" nodes that did not have observed temperature data, so that some nodes are included for the whole time period, while others are not, is this acceptable?
  3. Lastly, is it acceptable to make changes to the meteorology file one day at a time, that is, if I have one day that has substantial error can I increase an RH value for that one day in order to reduce the error, while leaving the rest of the RH values for time period constant, or do I need adjust the whole time period?

A181.

  1. First, let me stress again that the guidelines in Information Paper13 were just that -- guidelines.  Each error analysis must lean heavily on one's objectives, and frankly these are usually 'overruled' by practicality.  I would be surprised if you didn't have some days, and some locations, that simply do not seem to 'fit' as well as others, whether due to model error, user error, or original measurement error.

    Having said that, I would almost always leave H nodes out of my error analysis in that these represent so-called 'boundary conditions' that are model inputs, not outputs.  [An exception to the rule would be if you used the model to estimate missing values at one or more H nodes and you wanted to examine the error associated with that estimation.  I consider this a special case and I will not address it further here.]

    I believe you should report maximum errors only for V nodes.  But I also believe that you should satisfy yourself about why the model is giving you higher maximum errors at the H nodes, if I understand you correctly.  The regression techniques used at H nodes assume free-flowing conditions with no discontinuities above the assumed location, and they also work better with a larger observed data set to build the model from.  If there are discontinuities or a small data set, you could get poor predictions.  There may be other reasons too, but these are the ones on my mind at the moment.

    As you have described your results, they sound very good.
  2. Even if you did cut out some nodes/time periods, you would have done so for a reason.  Simply explain that rationale in your report and proceed.
  3. The answer to this one is usually a big NO -- at least not without a compelling rationale.  Tweaking the model on a daily basis with parameters like RH is asking for trouble.  But if you critically examine your data and results, and decide for example that your biggest errors occur on days with the highest (or lowest) RH, then that might make you wonder about the representativeness of your data, and that might in turn lead to some method to the madness of making input adjustments.  But this is not typical.

If you have not done so, I advise you to run the EXERR program.  This program systematically looks for correlation between model error (the residuals) and a variety of input variables.  Although I cannot recall ever seeing this program report significant correlations, it's still worthwhile just to get a feel for what it does.  Be sure to look at the scatter plots too, because there might be some curvilinear relations that this simple linear program does not pick up.


Q182. What I'm interested in now is to be able to get my own regression coefficients (ao, a1, a2, a3) to use in the calculation for the daily maximum temperature. I have read the Theurer paper pages II-30 to II-32 and can see how the coefficients are used to get the maximum daily air temperature but I am not clear how the coefficients in Table II-3 of the paper were obtained. Is there another reference I could look at that might help? 

A182. Good for you.  I don't think most people do this.

The empirical coefficients a0-a3 are obtained by solving the equation II(58), rearranged to be:
            Y = (Tax - Tbar-a) = a0 + a1 Hsg + a2 Rh + a3 (S/So)

As shown above, the Y-value is actually the difference between the mean daily and maximum daily air temperature.  This equation is fairly easy to solve in most statistical packages or using Excel, as an example.  Just make a table of the Y-value and the three input (X) values in order, Hsg, Rh, and S/So.  Remember that Hsg is ground level solar radiation, not extra-terrestrial as stated in Information Paper16.  Use as many data points (days) as you have that are relevant to your predictions.  In other words, if you are most interested in really hot days, use those to develop your coefficients.

Then use Excel's Regression function (Tools | Data Analysis | Regression) and block the input and output range.  Do not check the 'Constant is zero' option.  The resulting intercept coefficient becomes a0 and the other X-variable coefficients become a1-a3. 

You may wish to experiment with the coefficients to see if they indeed improve SNTEMP's maximum temperature estimates.  You may be able to calibrate the model more closely, but it can be tedious.


Q183. I have successively built a model of the X. River and I am now writing the report.  While doing this I have been trying to calculate the same test statistics that VSTAT does so I can understand them (and explain if needed) and I can not figure out exactly how (D) the coefficient of determination (effect if bias removed) is calculated.  Where can I find a reference for this calculation?

A183. Very good question.  As far as I know, Theurer never really defined this term in Information Paper16.  My assumption has always been that this statistic would be calculated as follows:

  1. Calculate the means of both the set of observed values and the set of simulated values.  The difference is the bias.
  2. Subtract (or add) the difference between the means of the two sets to all the simulated values.  This 'removes' the bias.
  3. Now calculate the coefficient of determination between the observed values and the 'corrected' simulated values.  This is generally called r (as opposed to R2) in the statistics books.  There are many formulae for this.  Here is one:

r = +/- sqrt[ (explained variation about the mean)/(true total variation about the mean) ]

r = +/- sqrt[ (sum(Ysim - Yobs_mean)2) / (sum(Yobs - Yobs_mean)2) ]

where Ysim = each simulated value of Y
            Yobs_mean = mean of the observed values of Y
           
Give this a try and see if you can duplicate SNTEMP's results.  If not, let me know and we can take a gander at the FORTRAN code, though this might be a mess.  If it does work, I'd also like to know that as well.
Later Response from Questioner - The variation that you suggested did not work either. I can duplicate the other results with "typical" statistics methods but the real puzzler is the D term.  I would appreciate it if you could look at the code or send the relevant section to me.  I am going to send off the same question to Theurer as well. 

FYI...  The model is working better than anything that our local consultant has seen including a “rival” model.  I have simulations set up for May-01 to September-30 for the years 2000, 2001, 2002, 2003, with various scenarios, i.e., "natural conditions" with and without treatment plant inputs, etc..


Q184. I am a graduate student working with G. P. and L. M. on stream temperature dynamics in the X. River.  Our work has shown that the spatial and temporal lags associated with complex floodplain geomorphology are expressed as lower instream temperatures.  In reviewing the relevant literature to find other examples of studies that explicitly address stream temperature predictions, I find few R2 values that I can reference.  Can you point me to articles that compare instream temperatures to a set of predictors that result in an R2 value?  Thanks in advance.

A184. This is a good question.  You will have to do some digging, but I think I can point you towards good places to start that digging.

I can think of three main areas, literature on SNTEMP, other applications I have been involved with or know about, and personal contacts.

  1. There is a fair amount of literature dealing with SNTEMP applications.  SNTEMP has been especially good for what you are seeking because this model "automatically" produces goodness-of-fit metrics for mean daily water temperature (but not maximum).  Keep in mind that many if not most of the applications may be for time steps and/or spatial scales that may be vastly different than what you may be seeking.  Both temporal and spatial resolution can and will affect the goodness-of-fit, and the exact specification of what you are comparing can make a huge difference (Bartholow, 2002, Modeling uncertainty ..., see below)

    A good place to start this search is at http://www.fort.usgs.gov/products/software/SNTEMP/SNTEMP_refs.asp  Not all of these references will have goodness-of-fit metrics, but certainly many will.
  2. See if any of these are useful:

Q185. I'm getting going on calibration of my model, and so far it's looking good. At most locations the model is consistently below the observed temperatures, which is easy enough to calibrate. But as I get the mean error close to zero I'm just getting modeled temperatures that are equally greater than and lower than the observed temperatures. I'm trying to reach the goals you set in paper 13 (i.e. no more than 10% of individual temperatures should have an error greater than 1 degree), and am not in most cases. The model is exhibiting too much of a response to weather conditions. I'm sure this is because I have a series of impoundments in the study area that are not represented in the model. For diurnal fluctuations, validation points below the impoundments show huge fluctuations but I was able to reduce these fluctuations by increasing the Manning's n of the segments, artificially increasing the depth. But this has no effect on mean temperatures, is there a similar way for me to increase the "effective" depths for the mean temperature calculations? Should I try and throw in some time of travel estimates that I could then calibrate, I don't really want to play around with the segment widths as I measured these in the field.

I did realize beforehand that this may be a limitation of the model, but I thought you might know of a trick to get around it.

I'm also trying to get a CE-QUAL-W2 model of the same area running and am having an incredibly tough time of it (it really makes me appreciate the ease of use of SNTEMP). I know you used this in a Shasta Dam study, did you find it quite difficult to start running? If so was there one thing that really helped?

A185.  Thanks for the note.  The calibration goals listed in Information Paper13 are good ones, but not always attainable.  Often one is faced with trade-offs.  In particular, my experience (which is not vast as a model user, just a teacher) is that it is quite difficult to simultaneously reduce the bias and improve the correlation and reduce the maximum errors.

I'm not quite sure I fully understand what you mean when you say that "For diurnal fluctuations, validation points below the impoundments show huge fluctuations but I was able to reduce these fluctuations by increasing the Manning's n of the segments, artificially increasing the depth."  I will assume that you mean that the uncalibrated model predicted more variation in maximum water temperatures than you measured.  Increasing n is certainly the right way to get at this issue. 

But then I also assume that the same was true for the mean daily predictions, i.e., the model produced more variability than you measured.  The essence is that there is some element controlling heat flux that puts a damper on the daily variability.  This is often considered to be due to substantial ground water accretions.  This could also be because there really is more water in the stream, but a large portion is "underground" in the hyporheic zone.  How good are your flow measurements?  One calibration possibility may be to experiment with the so-called thermal gradient, but I'm not sure if this will work.  It may take large changes in this parameter that may not be realistic, though I must admit that I don't have a "reasonable range" to advise you on. 

It is indeed interesting to compare the W2 model with SNTEMP on ease of use.  We found the W2 model very difficult to get up and going, but also found it to be highly accurate when we supplied high quality bathymetry, but this was a costly process!


Q186. You were right when I said diurnal fluctuations that I actually meant maximum temperature predictions. I am calibrating the maximum temperature model by doubling the difference between mean daily temperatures and maximum temperatures and comparing that to my observed diurnal fluctuations, so that was why I referred to diurnal fluctuations.

I'm all done with the calibration and validation now and am happy with the results; I just wanted to make sure I understand what the statistics in the KVRSTAT file are. Really it is just the Determination Coefficient (Detr. Coef.) and the Correlation Coefficient (Corr. Coef.), I need to be clear on. In the user’s manual it states that these are for the regression model, which makes sense to me - but what regression model? The one used to fill in missing stream temperature and flow values at validation points, or is there another?

A186. As a note, you should realize that doubling the difference between mean and maximum will not result in really good estimates of diurnal fluctuation.  I do not understand the process well, but water temperature does not respond symmetrically, regardless of what is in Information Paper13and16.

What did you adjust during calibration and how much did it seem to improve your goodness-of-fit?

The values listed in KVRSTAT are indeed somewhat confusing.  I'm still not sure I completely understand all the nuances involved.  But basically you are right in that the coefficient of determination is r2 and correlation coefficient is r.  Theurer did some funny business like artificially removing the mean bias before computing some of these statistics, so you might get somewhat different results in Excel, for example, though they should be close and representative.  I'm not sure about the reference to the "regression model".  As far as I know, the statistics for the regression model used for filling and/or smoothing appear only in Table 7.  What are in Table 9 are only for the validation nodes.  HOWEVER, if the regression model was used to fill any missing data, by the time SNTEMP computes the validation statistics, it "believes" these were true measurements and computes the statistics with the full set of "observations" regardless of whether they were really measured or not.  We are working on an update to SNTEMP that allows one to compute the statistics with or without missing values.  This same update will also compute the goodness-of-fit statistics with specified maximum temperatures.


Q187. How is the “mean error” calculated in Table 9 (KVRSTAT)? Is this the mean square error of the regression for each time period (in the sample data set this would mean three observations for each time period, two years and a normal data set)? It doesn’t look like it is the MSE as some values in the sample data set are negative value. Is there a location in either the Theurer manual or in OFR 99-112 that explains these statistics?

A187. See pages II-78and79 in Information Paper16, as well as the headings on the VSTATS output table.

[Updated 5/2007]

Top of Page
Skip navigation and continue to the page title

Accessibility FOIA Privacy Policies and Notices

Take Pride in America home page. FirstGov button U.S. Department of the Interior | U.S. Geological Survey
URL: http://www.fort.usgs.gov/products/Publications/4037/faq_calibration.asp
Page Contact Information: AskFORT@usgs.gov
Page Last Modified: 12:26:48 PM