Two Plus Two Newer Archives  

Go Back   Two Plus Two Newer Archives > Other Topics > Science, Math, and Philosophy
FAQ Community Calendar Today's Posts Search

Reply
 
Thread Tools Display Modes
  #1  
Old 01-17-2007, 02:22 PM
JMAnon JMAnon is offline
Senior Member
 
Join Date: Feb 2005
Posts: 737
Default some statistics questions

I have a few questions about regression equations as used to demonstrate corellations among multiple variables. I am not a statistician or an economist, and I have no formal education is statistics. I have picked up quite a bit from my casual reading, but I am dealing with a fairly complex report and would like some help with it.

By way of background, the author of the report used a two-stage least squared multi-variable regression (using STATA, I believe) purportedly to demonstrate a correlation between a school district's spending and the educational outcomes of the district's students. The author uses race (% black) , poverty (% free and reduced lunch eligible), district enrollment size, a constructed "performance index," and various "efficiency controls" as independent regressors.

As far as I understand, the "two-stage" part of the analysis means he didn't use actual values as the independent variable regressors, but instead used fitted values from a preliminary regression. The purpose of this is to avoid problems that would result from endogenous variables. He didn't report the results of the preliminary regression that he used to construct the independent variables he used as regressors.

First, this entire approach seems problematic. If the preliminary regressions have low R2s, using fitted values from those equations in a new regression would seem to compound the uncertainty of the ultimate conclusions. Is this method widely accepted as valid?


My next question relates to R2s. Is there a certain R2 value below which statisticians or economists would consider a result to be meaningless? In other words, if he reports an R2 of, say, 11%, is that small enough to say that his results leave too much unexplained to be meaningful? If I plot the results using a scatterplot and then draw a line using fitted values from his regression, it looks like an arrow through a cloud. Intuitively, this can't be a meaningful way of predicting outcomes based on spending.

Next, is there any significance to using logs rather than the actual values when deriving the R2? If so, what is it?

Third, assuming I can get his preliminary regressions, how would one determine whether he was successful in eliminating problems stemming from endogeneity of the variables he purports to construct? My main worry here is that the "performance index" is being used to raise artificially the R2 and statistical significance of his results. Intuitively, it smacks of having the same variable on both sides of the equation.

Finally, is there a "bible" of regression analysis that is easy to understand for a layman that any of you would recommend?


Thanks in advance.
Reply With Quote
  #2  
Old 01-17-2007, 07:18 PM
Siegmund Siegmund is offline
Senior Member
 
Join Date: Feb 2005
Posts: 1,850
Default Re: some statistics questions

[ QUOTE ]
By way of background, the author of the report used a two-stage least squared multi-variable regression

[/ QUOTE ]

Without seeing the actual report, I can't say, but the more complicated your methods are, the more likely you are to be doing voodoo rather than math.


[ QUOTE ]
Is there a certain R2 value below which statisticians or economists would consider a result to be meaningless?

[/ QUOTE ]

Yes, but it depends on sample size. Roughly speaking, r^2*n needs to be large. More formally, if no correlation exists, r*sqrt((n-2)/(1-r^2)) is approximately t-distributed.
It depends on your application whether r significantly different from 0 is all you will require to consider the correlation "meaningful." r^2 tells you the percentage of variation in the data explained by the regression: your "arrow through a cloud" may mean that you have, say, 10% trend and 90% scatter.

[ QUOTE ]
Next, is there any significance to using logs rather than the actual values when deriving the R2?

[/ QUOTE ]

Regressions of Y on X fit straight lines, and treat every observation as equally important. Regressions of log Y on X fit exponential curves, and of log Y on log X fit polynomial curves,and treat large values of Y as having larger uncertainties. These methods and others like them are collectively "quasi-linear regressions."
Reply With Quote
  #3  
Old 01-18-2007, 01:38 PM
JMAnon JMAnon is offline
Senior Member
 
Join Date: Feb 2005
Posts: 737
Default Re: some statistics questions

Thanks so much for the help.
[ QUOTE ]


[ QUOTE ]
Is there a certain R2 value below which statisticians or economists would consider a result to be meaningless?

[/ QUOTE ]

Yes, but it depends on sample size. Roughly speaking, r^2*n needs to be large. More formally, if no correlation exists, r*sqrt((n-2)/(1-r^2)) is approximately t-distributed.
It depends on your application whether r significantly different from 0 is all you will require to consider the correlation "meaningful." r^2 tells you the percentage of variation in the data explained by the regression: your "arrow through a cloud" may mean that you have, say, 10% trend and 90% scatter.


[/ QUOTE ]

Good point. In this case, he is using the fitted values to claim that particular school districts are "underfunded" to achieve specified outcome standards. The R2 of the equation, which treats expenditures per students as the dependent variable, is 11%. His correlation is statistically significant given the sample size, but it just doesn't seem to explain very much. I take his R2 to mean that, granting him the correctness of everything else, according to his equation, expenditures explain 11% of the variation of performance among districts after controlling for the other variables. Extrapolating that to say that a particular district "needs" (as a factual matter) to spend at least the fitted value of his equation to achieve the specified outcome seems like an unwarranted conclusion.

If there is a good book on these techniques that you could recommend, I would really appreciate it.

Thanks again,
JM
Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -4. The time now is 09:36 PM.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.