Re: part of the population is excluded from sampling, now what?
Well, I would think about it like this - you have a population z. That population is composed of subpopulation N and subpopulation x. You have no information on how the members of z are sorted into N and x.
There are at least two different questions here. If you take a sample n from N, will the expected distribution be the same as if you took a sample from z? Also, what conclusions can you draw about z from n, and are they the same as the conclusions you can draw about N?
I think the answer to the first question is yes, but I'm not so sure about the second question. I think that the concrete example is more certain because it describes that x and N were selected using the same process. This is the kind of information that is probably necessary for any sort of rigorous evaluation.
In a way, it depends on what you mean by "there is no particular reason..." In the Babe Ruth situation, it seems that you mean all things are equal, in which case the boxes are essentially arbitrary. The groupings by year are meaningless and used solely for the purpose of categorization. As a result, you can definitely take a sample from some of the boxes, but not all of them. In fact, you could just take one box - say, the 1998 box - and that would be a random sample. You could make statistical statements on the basis of that sample.
But nothing is well-defined. Your disclaimer against clever sticklers would need to be much more thorough for anything to be certain. You'd need to say more about what you're looking at and how it's determined. I think the best mathematical approach might be to describe a very specific case and generalize from there.
|