Statistics Help

Cunjo Carl · June 18, 2019

So an experiment at work went pear shaped today when we discovered the calibration data was completely out of whack. There's a couple months of experiment data on the line, and I think I can save it by calibrating our data against itself. It would involve multivariate linear regression though, and I have a question about that. My data is a bunch of y values that vary with x₁ and x₂. If we put it together, and draw a best fit line for y (called y_hat) it will take the nice happy linear form of:

y_hat = B₀ + B₁*x₁ + B₂*x₂

And we can calculate the standard error in our fit line y_hat by:

SE_yhat = sqrt( sum( (y_i - y_hat)² ) )

But what's the standard error in the slope B₁? If this were single variate, it would be:

SE_B1 = (SE_yhat/SE_x1)/sqrt(n-2)

Does the other dependent variable x₂ change this? Physically I know the degrees of freedom term (n-2) should become (n-3), but are there any other changes? Thanks in advance!

Edited June 18, 2019 by Cunjo Carl

mikegarrison · June 18, 2019

Dunno.

But I suppose you could treat it as two separate problems. Assume there is zero error in x2 and just treat is as a single variable problem. The do the same for the other variable. Then treat it as any other case where you are combining the effects of two sources of error. It seems like that should work, as long as any error in x1 and x2 are independent of each other.

But my real answer is "hire a statistician". I did that once, and it was really useful. (It happens I work for a very large Fortune 500 company, and they have statisticians in the company available to help with problems like this.)

Cunjo Carl · June 19, 2019

53 minutes ago, mikegarrison said:

Dunno.

But I suppose you could treat it as two separate problems. Assume there is zero error in x2 and just treat is as a single variable problem. The do the same for the other variable. Then treat it as any other case where you are combining the effects of two sources of error. It seems like that should work, as long as any error in x1 and x2 are independent of each other.

But my real answer is "hire a statistician". I did that once, and it was really useful. (It happens I work for a very large Fortune 500 company, and they have statisticians in the company available to help with problems like this.)

Yeah, if the calibration files weren't hosed I could use them to treat this as two separate problems rather than one combined one. Unfortunately, that was the thing that was lost! I feel like the formula will wind up being more complicated because physically I feel like not all the error in Y should lie with either B1 or B2 individually, and maybe their errors should be added in quadrature to get the standard error in Y or something like that.

Ah, man, I wish I had that kind of budget! I work for one of those small-miracles-on-shoestrings sorts of labs. If no one on the forums happens to know, I'll just have to grit my teeth and dive head first into a stats textbook! :wacko:

mikegarrison · June 19, 2019

13 minutes ago, Cunjo Carl said:

Ah, man, I wish I had that kind of budget! I work for one of those small-miracles-on-shoestrings sorts of labs. If no one on the forums happens to know, I'll just have to grit my teeth and dive head first into a stats textbook!

How much would it cost to redo all the experiments? Or to get the wrong answer? I'm just sayin'....

Shpaget · June 19, 2019

What's the nature of the out of wackness of the calibration? Is there a sensor that is out of cal? If so, can you document the offset and adjust the data according to it?

Cunjo Carl · June 21, 2019

On 6/18/2019 at 11:58 PM, Shpaget said:

What's the nature of the out of wackness of the calibration? Is there a sensor that is out of cal? If so, can you document the offset and adjust the data according to it?

Thanks for asking! I wound up doing exactly as you suggested and now the question is if we need to incorporate anything else.

We're measuring the thermal conductivities of a few different structural materials (like metals and thermal ceramics). We're using the unusual technique of just flowing heat through the materials and measuring the temperature difference created across them! It's less accurate and convenient than other techniques, but has several advantages in the end and provides a nice confirmation point. To make thermal contact to the materials we use liquid metals or thermal paste (like in computers). I did an initial calibration run to measure the thermal resistance of our thermal paste, but I only did two runs! The values came out very close, but 2 points make for terrible statistics. I asked the students to do a battery of calibration runs on the paste so we could make statistics, but they must have forgotten so we had no idea what variability there is in our thermal paste's thermal contact resistance. Unfortunately we can't do the paste calibration runs now because we moved the tool and it might be sensitive to its surroundings.

We now have quite a few experiments with one of the test structural materials using different numbers of material layers and thicknesses (along with different numbers of paste layers). So the question is if we can do multivariable regression to separate the effects of material thickness and paste layers. I went ahead and did the multivariable regression by hand, and it looks great! The residuals make a beautiful Gaussian curve and the p values all show signifigance, so I'm confident that it's a good application of the technique. There is an interesting wrinkle that runs with more layers of thermal paste tend to also have more total structural material thickness.

I've calculated standard error in the structural material's conductivity as though the paste resistance were a known offset at the value we calculated from the regression, just as you suggested! It looks good on paper and it's as far as I've gotten. The question is then if I can get away with doing just this? Most experimentalists probably would, but I have a few weeks to play with so I think I'll plink at it a bit.

Shpaget · June 21, 2019

16 minutes ago, Cunjo Carl said:

The question is then if I can get away with doing just this?

Adam Savage says that the difference between screwing around and science is taking notes. So, document the unconventional method and you're good to go! In my book anyway.

wizzlebippi · June 21, 2019

Can you calibrate the instrument now to determine the error, and apply the correction to the data analytically?

Starstruck69 · June 23, 2019

If it is consistantly out of calibration (thinking linear) then yes you may be able to salvage the work. It depends on instrumentation really. A statistician will help with the maths but how the instrument operates is something they won't know about and is crucial. You can't 'polish a turd 'so you may be better biting the bullet and starting again.

If you find a consistant pattern of error then your in luck. In my experience it doesn't work like that and your data will always have a ? next to it.

Suck it up and start again..

Edit:

Just a thought, if you contact the manufacturer of the instrument and have a chat with their technical engineers they may throw you a life line..

Be honest with them and you will get the answer you need. Their is no shame in admitting your mistake and holding your hands up. As hard as this to do sometimes its the best option. You don't want to be that guy who fudges data...

I have been on both sides of this and honesty is the best policy here.

Edited June 23, 2019 by Starstruck69

Cunjo Carl · June 30, 2019

On 6/23/2019 at 3:57 AM, Starstruck69 said:

If it is consistantly out of calibration (thinking linear) then yes you may be able to salvage the work. It depends on instrumentation really. A statistician will help with the maths but how the instrument operates is something they won't know about and is crucial. You can't 'polish a turd 'so you may be better biting the bullet and starting again.

If you find a consistant pattern of error then your in luck. In my experience it doesn't work like that and your data will always have a ? next to it.

Suck it up and start again..

Edit:

Just a thought, if you contact the manufacturer of the instrument and have a chat with their technical engineers they may throw you a life line..

Be honest with them and you will get the answer you need. Their is no shame in admitting your mistake and holding your hands up. As hard as this to do sometimes its the best option. You don't want to be that guy who fudges data...

I have been on both sides of this and honesty is the best policy here.

Thanks for the ideas! The instrument was a one-off made by my predecessor, with retrofits and modifications by myself for the current project. So, as the technical engineer on staff the buck falls right back to me! ^_^ It always has, somehow. Also, I agree about being honest. I've always been a straight shooter about this sort of thing, though it did make me very unpopular with the higherups in my last job. <_<

Anyways, I spent a while in the books, found a good set of techniques for double-checking the analysis, and helped the student write their paper in time for the deadline! They don't all turn out this well, but I'll happily take it when they do.

A little more in detail, the calibration we were missing was two pieces, a constant and a linear offset, so I was able to pull it from the rest of the data using multiple linear regression. I knew there was a subtle weakness in the way I was applying the linear regression though, and it took me a while to find the name of it: "multicollinearity". Once I had this in hand, I was able to find a way to do the analysis without falling victim to it, and double check with statistical rigor that it was being done correctly. The analysis section of the paper is now twice as long as we were first intending, but the results are fortunately very nearly as high quality as if we had calibration data in hand. Again, thanks for the help and advice everyone- it was really good to have some fallback plans in the back pocket.

Starstruck69 · July 2, 2019

I like a happy ending. Good job Carl i'm glad it all worked out for you:)

Statistics Help

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation