#1
|
|||
|
|||
Request for help from a stats guru about multicollinearity
If I have a multiple linear regression model where one independent variable is perfectly correlated with the relationship between two other independent variables, will this create the problem of multicollinearity in my model?
As a hypothetical example, suppose I am trying to predict the income of two-person partnerships and have a theory that it can be predicted in part by 3 variables: (1) the age of the oldest partner; (2) the age of the youngest partner; and (3) the difference in age between the two partners. If I include all 3 of these variables in my model, will it suffer from multicollinearity or any other problems? Any help is greatly appreciated. |
#2
|
|||
|
|||
Re: Request for help from a stats guru about multicollinearity
Ok, using your hypothetical example, let X, Y be age of youngest, oldest partner respectively. We need to estimate parameters a, b and c where
Income = aX + bY + c(X-Y) I don't think this a multicollinearity problem, but you'll just have a completely redundant variable in your multiple regression model. If you reshuffle the terms above, you'll just simply get Income = (a+c)(X) + (b-c)(Y) = AX + BY So you could do a standard multiple regression model using 3 variables (to get variables a,b,c) and you could also do a similar model using 2 variables to get (A,B). They'll be exactly the same due to a redundant variable. So in other words, it's a redundant variable problem instead of a multicollinearity problem. I'm not a stats guru BTW. |
|
|