We continued studying the study guide for the quiz on Friday.
I had left you with the "three cards" problem. I brought in my trick coins and we determined that there are three states of nature, HH, HT and TT. We did the calculation in a spreadsheet format, taking a prior of 1/3 on each SON, and recognized that if we observe H, the likelihood of observing H is 1 if it is the HH coin but only 1/2 if it is the HT coin. So this yields a spreadsheet that is similar to the Monty Hall (standard) problem, and if we see a H, the posterior probability is 2/3 that the other side is also H.
We then finished the cancer problem. The probability that a member of the general population who tests positive does not have the gene is 495/593 or about 5/6, and 1/6 that this individual does have the gene. Part (3) of the question asks first, what's the probability that someone with a positive test gets the disease: That is
1/6*0.2+5/6*0.0002 or about 0.03333+0.00017=0.0335. Of these, most have the gene. Only 0.00017/0.0335 or about 0.005. That's only 1/2 of 1 percent.
We discussed the galaxy problem. As many of you pointed out, it is exactly like the Shakespeare and Marlowe problem. The problem sets the prior at P(E)=0.8, P(S)=0.2; the likelihood is gotten by raising the probability for each type of object to a power equal to the number of that type of object that the machine found, and multiplying the values together for the three types of object. We found (using a spreadsheet calculation) that the posterior probability was about equal for E and S after this evidence.
On the plagiarism problem, a question was asked about what the meaning of the code is. We discussed how mathematical tables are constructed: You calculate the number to more significant digits than you plan to publish, and round up or down according the the digit that follows the one you plan to publish. If that following digit is a '5', you have a choice to round up or down. By flipping a coin you can round up or down in a random way such that is unlikely to be duplicated by someone else independently putting together a table. So in this way you embed a secret code, known only to you, into the table by the rounding pattern. We calculated that if the prior probability is 1/2 for plagiarism vs. accidental agreement (no cheating), then the posterior probability is about 10-30 that no cheating was involved if the code is duplicated exactly.
We discussed the reason for choosing equal priors: The law says in civil cases that the side with the preponderance of evidence wins the case. That is anything more than 50%.
I also mentioned that this technique is used to prevent plagiarism in other cases, e.g., map making, by putting small but innocuous mistakes in a map. Also, mistakes in the genome from generation to generation can be used as a "clock" to tell how far back in time two present-day organisms had a common ancestor, as well as the degree of relationship between a number of organisms, for example, how closely related human beings from various parts of the world are when traced back in time.
Finally, we got most of the way through the first urn problem. We decided that there are 10 states of nature corresponding to there being 1, 2,...,10 balls in the urn. The probability that the first ball is unmarked is of course 1, independent of the SON. The second ball is unmarked, but here the probability of that given the SON is 0 if the urn contains only one ball (because then the only ball in the urn is marked), 1/2 in the case of 2 balls in the urn, 2/3 in the case of 3 balls in the urn, and so forth. The third ball was marked, and now there are two marked balls in the urn so we're picking one that we'd already popped back in. Now for SONs 2, 3, ..., 10 the probability of picking a marked ball is 2/2, 2/3, 2/4, ..., 2/10. That is where we left it. We'll return to this problem on Wednesday.
I mentioned that this sort of thing is used, e.g., by biologists who catch fish, tag them, and release them, then after the population has had a chance to mix up, catching a sample again and seeing what proportion of the fish caught the second time are tagged. This can be used to estimate the size of a population of fish in a lake.
Monday, October 6, 2008
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment