Bayes Rules: Class, 11/10

We went over the exam.

Problem 1 is similar to the second half drug-testing problem on the study sheet; In the first half you would have to compute the posterior probability of the cure rate for each drug, but here I just told you what the posterior probability of the predict rate of Tom and Joe was. The second half is to arrange Tom and Joe's true predict rates and posterior probabilities along the sides and top of a square table, multiply pairwise to get the probability that Tom's true rate is, say 0.3 and Joe's 0.4, and then add up all of the rates above the "staircase" that separates equal predict rates from those where Joe is better than Tom. For equal, we add up the probabilities along the diagonal. This is not an inference problem, you do not need to list SON, priors, likelihoods, joint, and posterior probabilities. Some people didn't add up the correct boxes, but I didn't understand the principle that they used.

The second problem was also not an inference problem, so no priors and no likelihoods. It is instead a prediction problem. Here's how you can tell the difference. In an inference problem, there are unknown states of nature that can not be directly observed; you are trying to compute the probability of each of these unknown states of nature. Here, you are doing something different. You are trying to predict data that will be known for sure in the future (that is, if the shuttle is destroyed, everyone will know it; if it is not destroyed after ten flights, everyone will know that too. That is data, not a state of nature.) One way to answer the question is to say that if a disaster takes place, it will take place in the first flight or in the second flight or in the third flight...or in the tenth flight, so the probability of one of these happening is the sum of the individual probabilities. So, the answer would be 10/80. This isn't quite right, although I gave 18 points credit for this answer. It is approximately correct, though. The correct answer is to compute the probability of (no disaster in flight 1 and no disaster in flight 2 and...and no disaster in flight 10) to get the probability that no disaster will happen. Then subtract that from 1 to get the probability of disaster. The result is 1-(79/80)¹⁰=0.118, which is pretty close to the approximate answer of 0.125. We can know that the approximate answer is not really right (although it's a good approximation of both the individual probability and the number of flights is small), because if there had been 100 flights, it gives a probability of disaster of 100/80, which is greater than 1 and impossible because probabilities have to be between 0 and 1.

I drew a fairly elaborate decision tree for the second part of this problem. I did it in terms of losses, and assigned 0 loss if the Hubble was fixed and the ISS was serviced. Since the ISS can be serviced in any case by Russian rockets, there's no loss from that regardless of which branch we chose. With the (approximate) probability of catastrophe of 1/8 if all ten flights are made, and assuming that the catastrophe happens so that there is a 50% chance that it affects the Hubble mission, I got a loss of 1/8(C+H/2) for that branch, where H is the loss if the Hubble isn't serviced, and C is the loss if a catastrophe takes place. For the "Hubble only" branch, the loss is 1/80(C+H). That is always less than the ten missions branch, so we cut that one off. For the "Don't fly" branch, the loss is H since the Hubble isn't serviced. The loss is the same on the "Don't fly" branch as the "Service Hubble only" branch if H=1/80(C+H) or 79H=C. This is as far as we can go to help the NASA Administrator. Whether to fly or not depends upon which has the larger loss. If C is greater than 79H in the Administrator's mind, then the mission should not be flown.

I don't envy Michael Griffin.

Problem 3 is about drugs, but it is not similar to the drug problem on the study sheet in that I told you that the cure rate of one of the drugs is exactly 0.2 (based on a large amount of data). The discussion of the experimental drug is like the one on the study sheet, in that you need to set up states of nature for the cure rate (0.05, 0.15, 0.25,..., 0.95 for example), put a prior on each (flat for example), compute the likelihood for each (r¹⁵(1-r)³⁵ for each cure rate r), compute the joints, compute the marginal likelihood, divide the joints by the marginal likelihood to get the posterior. Then to get the probability that the new drug is better than the old one, just add up the posteriors for cure rates bigger than 0.2.

Problem 4 is like the "diagnose" stage of a spam filter or a medical diagnosis system. The first half would be gathering information, for example on emails, and determining the probability of obtaining a given word given that its spam/not spam. The simple way would be just by frequency in each of the two categories. The second, or "diagnose" half looks for the words, and forms a likelihood by raising the probability for each word to a power equal to the number of times the word appears, and multiplying them all together. Note that the biggest mistake here was not raising to the power, or adding the terms in the likelihood instead of multiplying them.

Never, ever, add the terms in the likelihood! It's the probability of (data A and data B and data C), so the probabilities must be multiplied.

The final problem constructs a probability tree (not a decision tree). At the base of the tree, as usual, are the unknown (to the investigator) states of nature: HH, HT, TT. Some people wrote those correctly but didn't realize that HT is twice as likely as either HH or TT. Some others put the data at the base of the tree, which is never to be done. So, if a person answers "yes", it may be because he tossed HH, or it may be because he tossed HT and is telling the truth. Since 25% of the time HH gets tossed, a total of 15%=40%-25% of the responses are due to HT being tossed. Since that happens half the time, it must be that a total of 2*15%=30% of the subjects would have answered "yes" if they were all telling the truth.

To summarize the difficulties that people had: 1) States of nature, which are not observable, always go at the root of a probability tree. Data never goes there. The sub-branches of a probability tree will always be conditional probabilities of data given the state of nature. 2) Distinguish between prediction of data that will be observed in the future, and inference of states of nature (unobservable). 3) a decision box belongs at the root of a decision tree, and the branches coming out of it are the actions being contemplated (eg., don't fly, fly just Hubble, fly ten missions). Then there will be probability branches attached to each branch (sometimes only one, as in "don't fly"). Losses go at the tips of the probability branches, and are then propagated backwards down the tree. Choose the branch with the smallest loss. You can use utilities instead of losses, in which case you choose the branch with the greatest utility.

I finished my discussion by pointing out that the tables that we have been calculating for rates r=0.05, 0.15, 0.25,..., 0.95 are approximations to continuous functions. Then the summation of them is like a Riemann sum, and when you make the divisions finer and finer you are actually getting better and better approximations to an integral. Since integration is in general hard, statisticians often resort to various approximation techniques to get the answers they need.

We'll go back to crime on Wednesday. We need to decide on dates for the presentations, so I hope everyone will be in class on Wednesday.

There will be a guest talk by a research physician at UVM on Friday, November 21. We'll also discuss investing and Bayesian jokes/songs in the next week.

Bayes Rules

Tuesday, November 11, 2008

Class, 11/10

No comments:

Blog Archive

About Me