View Single Post
  #3  
Old 06-17-2007, 10:36 PM
Sherman Sherman is offline
Senior Member
 
Join Date: Jun 2005
Location: Ph. D. School
Posts: 3,999
Default Re: Towards a Skill Ratio : is it a joke ?

Jetto,

I'm not 100% sure what your question is, but I will do my best to answer.

First, without gathering the data that I have gathered, I am not sure how you could compute the same correlations I computed. When you say you "randomly split the series in two" I have no idea what you are splitting and I am not sure that is the best approach. Of course I would be more than happy to send you the data if you would like to try it yourself. Anyone who wants it can simply PM me.

In any case, let us take a simple example, say number of hits in a single baseball season. For this metric, I correlated the total number of hits each player had from Season 1 with his total for Season 2. This single correlation represents the degree of association between hits in season 1 and season 2. I did the same thing for all possible pairs of seasons ( [1,3]; [1,4];...[5,6]). This yielded a total of 15 correlations. I then simply average these 15 numbers to represent the average assocation from one season to the next. These average r's are the ones presented in Tables 1 and 2 of this month's article. As I mentioned in the article however, these correlations do not control for the number of at bats (or attempts) that a player makes.

To control for the number of attempts, things get a bit more complicated. First, I predicted each player's hits based on his number of at bats for each season using linear regression. I then saved the residuals. These residuals can be thought of as the number of hits a player gets when his at bats are controlled for. I did this for each of the six seasons. Then, repeating the earlier procedure I computed the correlations for all possible pairs of the 6 residuals for each season. Again this yielded 15 correlations. I once again took the average of these correlations.

These average r's are the ones you see in Tables 3 and 4 of this month's article.



If your question is about the calculation of the individual r's, I simply used standard statistical packages to do so. The general forumula is sum(Zx*Zy)/N. Where Zx are Z-Scored values for variable 1 and Zy are Z-Scored values for variable 2. N is the total sample size. Most statistical packages use N - 1 in the denominator however to provide population estimates.

I hope this makes it more clear how the procedure worked.

Ryne Sherman
Reply With Quote