Sunday, February 18, 2007

Can't we all just propagate our errors?

This weekend I continued to work on a lab software upgrade for one of the labs I work in. This is all part of an effort on my part to streamline sample analysis, automate as much as possible, and make the data reduction smoother and more versatile. For those of you who care, I am doing everything with LabView, my programming platform of choice. My first step in the process is to go through the existing sets of code, figure out what does what, and decide if I want to copy the process or change things around.

This got me thinking, again, about error propagation. Most of the significant errors in the existing code are dealt with appropriately, but there are some very easy ones that are just ignored. Truth is, most calculations are fairly easy to propagate errors through, and I am a firm believer that uncertainty must be dealt with honestly, even if in the end it is insignificant. The most popular geochronologic data reduction tool (I am guessing) is Ken Ludwig's Isoplot. This does an excellent job with uncertainties, assuming of course the users report them all. But, Isoplot doesn't interface with the machines, and it can only work with the data it is given.

For this reason I think that the book An Introduction to Error Analysis by John R. Taylor (a physicist from the University of Colorado) should be required for anyone who works in a lab or with the data from a lab. All of the basic methods to propagate uncertainties are covered and explained very well. This doesn't discuss some of the more complicated methods common in geochronology (e.g. the MSWD), but for a starter and reference text it is well worth it. It also isn't directed towards the earth sciences, but again, the basics are the basics. For example, to calculate the size and correct for the depletion of a spike or standard shot of gas, you need to know the reservoir tank and pipette volumes, and the pressure of the gas in the tank (partial pressure if your gas isn't pure). Associating uncertainties with each of these measurements is fairly straightforward, so any number that comes out of these values, say size of spike, or a correction for ionization efficiency, should include those propagated uncertainties. Numbers without uncertainties really don't exist in geology, but they show up in data reduction all the time.

Dealing with errors in a realistic and representative way is the first step. Getting people to report them, even when they are crappy, is the second.

6 comments:

Amali said...

Plus, the Taylor book has an awesome cover picture. Enough to frighten anyone into checking their error bars. :)

Lab Lemming said...

What about the zeroth step: Teaching people what their errors are? And the negative first step: that they do in fact exist?

Even then, statistics can be murky. Ken's stuff calculates error very differently to the original SHIRMP methods. For well behaved samples, they give similar values. For interesting samples, they sometimes do and sometimes don't.

You want bodgy error propagation, you should learn to use an ICP...

Thermochronic said...

Good points all around (book cover through the negatve step) I never worked with the originaly SHRIMP methods. I've used the SQUID package to reduce the SHRIMP data. I use IsoPlot for presenting Ar/Ar data, and in general like the format. Based on your post here I'll defer to you when it comes to Big SHRIMP'in.

When my advisor recommended the book I figured it was because the cover reminded him of me somehow.

So the ICP folks are bad, I wonder wh the worst are with uncertainties. I have a candidate but I should write that up into a formal post first.

thm said...

Ah, Labview, putting the spaghetti in spaghetti code.

I learned about error propagation as an undergrad, how you can take partial derivatives and square and then sum everything. But then I learned about the Monte Carlo method, which with a spreadsheet is almost trivial. See John Denker's post for more details, and a very good summary of error analysis.

The basic idea is: all of the measurements you plug into a formula are really distributions, with the width of the distribution as the uncertainty. So use a spreadsheet to generate a few hundred possible cases, with values drawn from each of those distributions, then send each individual case through your formula, and then look at the distribution of the results.

Of course one thing that complicates life is that many treatments of error analysis focus on statistics, which is limited to dealing with statistical errors. In a physics lab, at least, the systematic errors can be more important than the statistical errors. Taking lots of measurements with a voltmeter, for example, won't help anything if the voltmeter isn't calibrated properly.

Thermochronic said...

What is a little shocking to me now is that in most earth science programs (at least the ones I've been involved with), statistics isn't required. Everything I use I picked up because of my PhD adviser (he will tell you ahead of time that he will ask about your stats in your quals and defense). The degree of misunderstanding can be huge, I still get people confused about the difference between precision and accuracy. Garbage in, garbage out.

THM, thanks for the link, I am going to make a sidebar with reference links at some point, this will definitely be there.

Thermochronic said...

Also, amali, here is a better copy of the image on the front of the book. It happened in 1895 at the Gare Moontparnasse train station (Paris) when an express train busted through a barrier, flew 100 feet across the concourse, through a 2 foot thick wall, and out of the station.