Bayes Rules: STAT 295 1/27/09

Important points about assignments:

Assignments need to be put together carefully so that it is easy to grade them. They should start with a narrative that says what the assignment is about, then how you went about solving the problem, then the results. Use tables as appropriate to summarize. Put the R code into an appendix. As Jeff stated, statistics is not just about mathematics and programming, it is also about interpreting and conveying your results to others clearly.
We ask you not to work alone. We prefer you to work in small groups of 2 or 3. Not only does this reduce the grading burden, more importantly it gives you practice in working with other colleagues, which is an essential aspect of professional statistical practice.
Please type your assignments. This makes them easy to read.
Be sure to turn in a hard copy of your assignment.
Also, email R code to us; if the code isn't working right, we can then copy-paste into R which may make it easier to figure out what isn't right.

We introduced Bayes' theorem last time. Jeff remarked again that the denominator, the marginal probability of the data, is computed by summing the numerator P(D|A_i)P(A_i) over all the states of nature A_i. This uses the "Law of Total Probability" introduced earlier.

Please note that the likelihood function can be multiplied by any constant k>0 without changing the results from Bayes' theorem, because the numerator and the denominator will both be multiplied by k, which will cancel.

The likelihood is actually an equivalence class of functions, which differ only in that one of the members of the class is a constant multiple of another. It is important to understand that even though the likelihood P(D|A_i) is thought of as the sampling distribution of D given A_i, it is not the same as the sampling distribution, which tells us how different data D depend on A_i; but the likelihood is always evaluated for the actual data D that was actually observed, and its important characteristic is how it varies as A_i is varied for constant D. The sampling distribution's important characteristic is how it varies as D is varied for constant A_i. Since the likelihood is an equivalence class, it does not integrate or sum to 1 over the states of nature A_i, and it is not a probability on A_i.

We discussed the hemoccult test and the consequences of making wrong decisions. A false positive can result in a colonoscopy, and colonoscopies can have adverse consequences for the patient, even death. On the other hand, a false negative means that a developing cancer may be missed. As Dr. Osler pointed out, the people who put out the test can adjust the reagents used in the test to produce more true positives, but only at the cost of increasing the number of false positives, or vice versa. Some of this is driven by economics and insurance companies, since the hemoccult test, which has many faults, is also very cheap (3 cents), whereas a colonoscopy is expensive (several thousands of dollars) as well as more risky. So there are tradeoffs. A complete analysis really involves decision theory, and is outside of the scope of this course, but we will mention some aspects of decision theory from time to time.

Jeff skipped several examples on the charts, which I may make some comments on later...stay tuned.

He went to the capture-recapture problem on the fish population in a lake. We catch 60 fish, tag them, and release them. After the fish have had a time to mix well with the untagged fish, we catch 100 of them and note that 10 are tagged. It is natural to estimate the number of fish in the lake at 600. Jeff noted that there is an extensive literature on this problem in the frequentist literature.

The Bayesian solution is quite simple. After some discussion, it was decided that the likelihood function is a hypergeometric distribution. In this case,

P(D|n)=C_xⁿC_k-x^N-n/C_k^N

where N=number of fish in the lake, n=number caught=100, k=number tagged=60, x=number of those caught that are tagged=10.

This likelihood can also be obtained without using hypergeometric functions by just considering the probability of sampling each tagged or untagged fish one-by-one, noting that the size of the sample and the number of fish of each type decrease by 1 each time a sample is tallied (sampling without replacement).

We finished by starting the discussion of the sleep example from chapter 2 of the book. The beta prior was chosen by the investigator based on a notion of the mean and the 90th percentile of the sleepers. We also noted that if you have a beta prior in this binomial sampling situation, you will get a beta posterior. Also, even though the likelihood, prior and posterior all "look alike" in containing factors θ^u(1-θ)^v, the thing that is important in the likelihood is (u,v), the data, whereas the thing that is important in both the prior and the posterior is the unknown state of nature θ.

Bayes Rules

Tuesday, January 27, 2009

STAT 295 1/27/09

No comments:

Blog Archive

About Me