Simple Probabilty Question - Page 4

f97tosc · #31 05-15-2007, 06:11 PM

[ QUOTE ]

If estimating P as anything but (X + 1)/102 implies some knowledge of the data, then so does having any non-trivial probability distribution for P.

[/ QUOTE ]

Perhaps it could be noted that you get the same answer if you use the uniform prior, and then take P=E(u), to use Jason1990s notation.

Perhaps it could also be noted that exactly this problem has been treated in some detail in the seminal paper "Prior Probabilites", by Ed Jaynes (1968). Interestingly, he argued that 1/(u(1-u)) is the better prior to use (unless we know from the start that are postive numbers of ones and zeros). I highly recommend reading this paper, which provides a great deal of both intuitive and mathematical motivation and insight on this problem.

HP · #32 05-15-2007, 06:34 PM

[ QUOTE ]
Perhaps it could also be noted that exactly this problem has been treated in some detail in the seminal paper "Prior Probabilites", by Ed Jaynes (1968). Interestingly, he argued that 1/(u(1-u)) is the better prior to use (unless we know from the start that are postive numbers of ones and zeros). I highly recommend reading this paper, which provides a great deal of both intuitive and mathematical motivation and insight on this problem.

[/ QUOTE ]

OMG tytytytytytyty

ty!

f97tosc · #33 05-15-2007, 06:42 PM

[ QUOTE ]
Therefore, in fact, it would be a bad property if you got the same Beta distribution again. It would mean that the method does not produce better and better estimates as the sample size grows. What actually happens, as I described above, is that if the sample size grows and the proportion remains the same, then the method produces distributions which converge to the point mass at that proportion.

[/ QUOTE ]

I don't know if this is what the original poster was referring to, but one point here is that even though the parameters change after each observation (and this is a good property), the functional Beta form is retained. And this is actually not a bad property because it means that we can make observations, and after each observation just update two parameters - we don't need to do any messy integrations to derive a new f after each observation; it is Beta every time. The technical term for this very useful property is conjugate prior.