Wednesday, October 29, 2008

Class, 10/29

We started by discussing the homework. I emphasized that there will never be '+' signs separating the probabilities of the individual events in the likelihood. You will always get the likelihood by multiplying the probabilities of the individual events together (whatever they are). It was evident from several of the homeworks that the likelihood had been calculated incorrectly. In one case, enough Excel code had been given to me to know that a '+' sign had been used instead of a '*' sign. I do not know what happened in the other cases. In any case, any group whose total score was less than 36/40 may resubmit on Friday for partial additional credit.

The other problems were minor.

I did note that there is another, and maybe better way (other than in the problem statement) to get the answer to the first problem. That is simply to put prior probabilities on the states of nature, with half on the "null hypothesis" that the die is unbiased, and distributing the remainder among the alternatives. Then do the usual thing: prior*likelihood = joint, sum joint to get marginal likelihood under all hypotheses, divide that into the joint to get the posterior. And then, just look at the posterior probability p of the "null hypothesis". The odds on the null hypothesis are then p/(1-p). At least one group actually used this method, which made me proud!

We then started on the practice problems.

Problem #1 is similar to the copyright problem we discussed in class. We expect a student to get answers right, because they are supposed to know the material. No one would suspect students, even if they got the answers right, because that's what's supposed to happen. But the mistakes (just like in the copyright problem) are the key. Mistakes should be made at random. So if one student copies another, then s/he will copy the mistakes perfectly, but if not, then with probability 1/5 (since there are five possible answers). The two states of nature are Cheat, and No Cheat. We discussed the prior and decided that on Cheat it might be 1/10. It could be larger or smaller and arguments were given for each. The likelihood is 1 for Cheat and 0.27 for No Cheat, since each coincidence has probability 0.2 and there are seven coincidences. We found that with our prior, the posterior probability of cheating is nearly 1, and the professor ought to take appropriate action.

In Problem #2, the aim was not to actually calculate anything, but to explain how a calculation could be arranged. So the SON are the possible number of taxis, from 1 to (we decided) not more than 50,000. We discussed ways to set a prior. One would be to pick a probability, say 0.9 or 0.99, and raise it to the power of the number of taxis in the SON. A second was to use a straight-line ramp from 1 (highest) to 50,000 (lowest). A third was to use something like 1/N where N is the number of taxis in the SON. Whichever method we use, we just write down the numbers in an Excel spreadsheet, add them up, divide by the sum and then enter these numbers into the normalized prior.

The likelihood is 0 if the number of taxis is less than 150 (because you can't see taxi number 37, for example, if there are only 36 taxis), and is (1/N)7 for each SON where N is greater than 149, because this is "sampling with replacement," so each taxi seen has probability of its number being observed of 1/N, and the likelihood is the product of these for each observed taxi (there are 7).

Then the usual: prior*likelihood= joint, sum the joint, etc....

Then, you can decide on what probability you want for the number of taxis. If you want the probability to be greater than 0.99, just sum down the posterior until you get 0.99. The number corresponding to that last line is your best guess.

For the third problem, you have to start by keeping the two cases (standard drug and new drug) separate. For each of these, you want to compute the posterior probability that the particular drug cures the disease (r or s). This is just a standard calculation, like the one for the homework on Monday. The trick is what to do with this information to decide what the probability is that the new drug is better than the old one. We'll talk about this next time.

No comments: