View Single Post
  #1  
Old 10-31-2007, 08:25 PM
mrmr mrmr is offline
Senior Member
 
Join Date: Jan 2006
Posts: 129
Default part of the population is excluded from sampling, now what?

You take a reasonably sized random sample, n, from a population of size N. Then you find an additional x*N elements of the population. (I don't think it will matter too much, but let's say x is somewhere between 0 and .4)

So the population is now (1 + x)*N sized, but your samples all came from an N sized sub-population. There is no particular reason to think the newly found portion of the population is distributed any differently than the original N elements.

What do you do? Is it reasonable to "pretend" the sample came from the entire population (the one that has (1 + x)*N elements) and proced as normal? If not, how do you calculate a confidence interval for and level of precision of your estimates (of whatever parameter(s) interest you) for the entire population?

So a concrete example of what I am asking would go like this. You have 2000 baseballs that were hit through your window by the neighbor kids over the years. You look at 100 of them (randomly selected) and record the number of bits of glass embedded in each one. You head to your office to estimate the number of bits of glass per baseball for the entire 2000 baseball collection, and of course you want to know how accurate and precise that estimate is, so you will want to calculate those things too. And then you remember that you have 650 baseballs that you forgot to count because the neighbor kids ... eh, enough with this cheesy story. You remember you have 650 more. So you've actually got 2650, but you only sampled from 2000 of them, and you want to get an estimate, and know how precise and accurate it is, but you don't want to do any more leg work or re-examing of baseballs.
Reply With Quote