#1
|
|||
|
|||
Question from my statistics midterm I got wrong
This is the only question on my midterm I got wrong, but I've reviewed it closely and don't understand why. Hopefully someone smarter than me can explain it.
18) A study gathers data on the outside temperature during the winter, in degrees Fahrenheit, and the amount of natural gas a household consumes, in cubic feet per day. Call the temperature x and gas consumption y. The house is heated with gas so x helps explain y. The least-squares regression line for predicting y from x is y=1344-19x The correlation between temperature x and gas usage y is r=-0.7. Which of the follwing would not change r? A)Measuring temperature in degrees Celsius instead of degrees Fahrenheit. B) Removing two outliers from the data used to calculate r. C)Measuring gas usage in hundreds of cubic feet, so that all values of y are divided by 100. D) Both A and C. Any help would be appreciated. |
#2
|
|||
|
|||
Re: Question from my statistics midterm I got wrong
I haven't studied this recently, so I could be way off, but wouldn't it have to be D? Intuitively, a measure of how closely the two sets of data are correlated shouldn't depend on what units you use to do the measuring.
The answer can't be B. If you remove outliers, you're changing the data, which could very well change the correlation. The answer might be C, if the fact that Fahrenheit and Celsius aren't absolute temperature scales makes any difference. To make a poker analogy, suppose you gather data on your tournament buyins and how long you last before you bust out, and the correlation is, say, -0.1. Measuring your buyins in British pounds shouldn't affect the correlation, and neither should measuring how long you lasted in picoseconds. However, throwing out that one time your aces got cracked on the first hand by a donkey with AT would affect the correlation. It might help if you posted an equation from your textbook showing how to calculate r given two sets of data. You could then try, for example, multiplying all the numbers in one data set by a constant and seeing if this would affect the correlation. If the equation is based on random variables, you would need to think about how using a different unit would affect the expected value and other relevant properties of the random variable, and therefore the calculation of r. |
#3
|
|||
|
|||
Re: Question from my statistics midterm I got wrong
Shouldn't it be D, both A and C merely change the units and thus may change your regression equation (because of the units scaling) but shouldn't change the underlieing relationship between X and Y. Obviously throwing out 2 outliers is very likely to change r as you are changing your data and thus the newly calculated r value is unlikely to be the same.
What's the trick? Maybe the translation from Celcius to Fahrenheit because it involves the 5/9 scaling as well as the -32 is doing a transformation that would change r and the right answer is C, but I would have thought that what you are measuring is the correlation between two specific things the temperature, which is a real world concept regardless of units, and the amount of heating, which again is a real world amount and thus I would have thought that this kind of transformation doesn't change r. You can always make up some data set and do the Celcius/Fahrenheit conversion and calculate the r value yourself to see. |
#4
|
|||
|
|||
Re: Question from my statistics midterm I got wrong
It is D. Correlation coefficients are invariant under linear transformations of the data.
|
#5
|
|||
|
|||
Re: Question from my statistics midterm I got wrong
[ QUOTE ]
It is D. Correlation coefficients are invariant under linear transformations of the data. [/ QUOTE ] I guess the trick is does changing from Fahrenheit to Celcius represent a linear transformation? The function that does the conversion isn't linear. I.e. if f(x) turns Fahrenheit into Celcius so f(32) = 0 and f(212) = 0 then f(x+y) isn't f(x) + f(y) as f(32+212) = f(244) = 117 7/9 while f(212) + f(32) = 100 + 0 = 100. In addition af(x) isn't f(ax) as f(10 * 32) = f(320) = 160 while 10 * f(32) = 10 * 0 = 0. But I don't think that is what you meant as I've verified with numbers that r stays the same for various numbers when you transform from celcius to fahrenheit so D is correct. |
#6
|
|||
|
|||
Re: Question from my statistics midterm I got wrong
Thanks for the responses guys. On the actual test I put D. According to my professor the answer is A. I emailed for an explanation and here's what I got:
"C is incorrect because measuring gas usage in hundreds of cubic feet implies that the y values will now be averages, averaging removes some of the variability leading to inflated correlations." WTF? How does changing the units 'imply' that the y values will be averages? I probably won't fight this because the prof seems like the type of guy to be hard headed over something like this, and it's not worth that much anyways. I just wanted to confirm my answer was correct for my own peace of mind. |
#7
|
|||
|
|||
Re: Question from my statistics midterm I got wrong
[ QUOTE ]
C)Measuring gas usage in hundreds of cubic feet, so that all values of y are divided by 100. [/ QUOTE ] [ QUOTE ] "C is incorrect because measuring gas usage in hundreds of cubic feet implies that the y values will now be averages." [/ QUOTE ] email him again and ask him how "all values of y are divided by 100", "implies that the y values will now be averages"? I'll bet he thinks he asked something else. Are you sure there wasn't more to the problem? PairTheBoard |
#8
|
|||
|
|||
Re: Question from my statistics midterm I got wrong
Perhaps your Prof. means 427 cu.ft.->4 h.cu.ft. This is of course not what he said.
Enjoy your peace of mind. |
#9
|
|||
|
|||
Re: Question from my statistics midterm I got wrong
[ QUOTE ]
"C is incorrect because measuring gas usage in hundreds of cubic feet implies that the y values will now be averages, averaging removes some of the variability leading to inflated correlations." [/ QUOTE ] That's nonsense. If you're going to ace the class anyway, you are wise to choose to forget about it. But yes, you were right. The linear transformation in "C" was just a special case of the affine transformation in "A", especially if the exam explicitly said "y values divided by 100". Is this a professor, or a grad student teaching this course? |
#10
|
|||
|
|||
Re: Question from my statistics midterm I got wrong
[ QUOTE ]
[ QUOTE ] "C is incorrect because measuring gas usage in hundreds of cubic feet implies that the y values will now be averages, averaging removes some of the variability leading to inflated correlations." [/ QUOTE ] That's nonsense. If you're going to ace the class anyway, you are wise to choose to forget about it. But yes, you were right. The linear transformation in "C" was just a special case of the affine transformation in "A", especially if the exam explicitly said "y values divided by 100". Is this a professor, or a grad student teaching this course? [/ QUOTE ] This is pretty sneaky. The funny thing is, if you're rounding temperatures to the nearest degree, then A is also wrong, and for exactly the same reason. |
Thread Tools | |
Display Modes | |
|
|