Bayes Rules

Friday, October 3, 2008

Class, 10/3

We passed over the Fermi problem bullet after I re-explained the geometric mean method.

We reminded everyone of the basic equation of conditional probability that underlies everything we are doing: P(A,B)=P(A|B)P(B)=P(B|A)P(A). We talked about the three equivalent facts about whether a distribution is independent or not. It is independent if and only if P(A|B)=P(A) for every A and B, it is independent if P(A,B)=P(A)P(B) for every A and B, and if P(A|B)=P(A) for every A and B, then necessarily P(B|A)=P(A) for every A and B.

We then showed how to construct the unique table of joint probabilities when we are given the marginal probabilities: Just multiply the marginal in a row with the marginal in a column and put the result in the corresponding cell.

We then took a table of independent probabilities and, by changing four cells in a square, just adding an arbitrary number to two of the cells on a diagonal and subtracting the same number from the cells on the other diagonal, and got a table where the probabilities are not independent.

We discussed the Monty Hall problem and variants. We found that if there are four doors, and Regular Monty opens two of them, each showing a goat, then the probability of getting the prize goes from 1/4 to 3/4 if we switch. We found that it goes from 1/4 to 3/8 if Monty opens one door and we switch to one of the others. We did this by a spreadsheet calculation. We then thought of a simpler way: Since your probability of initially picking the right door is 1/4, the probability that one of the other doors has the prize is 3/4. That doesn't change when Monty opens one of them and shows you a goat. So, since there are two doors left, the probability that you'll get the right one if you switch is 1/2 times 3/4, or 3/8.

I left you with another related problem to think about: There are three cards which have been made by pasting together two cards so that the backs are visible. One has two red backs, one has a red back and a blue back, and one has two blue backs. The cards are put in a hat and shaken, and you pick one out, looking at only one back. It is red. What is the probability that the other side is red?

We'll discuss that next time.

We went onto the cancer problem. We took a population of 10000 individuals. The problem statement says that 1% of the population has the gene, so that's 100 who have the gene and 9900 who don't. Of those who have the gene, 98 will be detected by the test and 2 missed (false negative). Of those who don't have the gene, the test will falsely identify 5% as having the gene, or 495 in all (false positives) and will correctly say that the remaining 9405 do not have the gene. Looking at just the positives, there are 98 true positives and 495 false positives, so that the probabilty that a person has the gene if they test positive is only 98/593 or about 1/6; the remaining 495/593, or about 5/6, do not have the gene.

We ran out of time here and will continue on Monday, finishing this problem and then going on in the study guide.

In answer to a question, I stated that if there is an item that we don't get to in our review, then similar items will not appear on the test. I also stated that I expected there to be five questions on the test, and that there should be enough time for everyone to do all of them. I pointed out that usual test-taking strategy says to go for the easy ones first and save the bulk of the time for the harder ones. I also said that it is very important to at least try to answer every question, since I cannot give credit if an item goes completely unanswered. We agreed that people who come early (not earlier thatn 10 AM, please) could start early, and that you may be able to stay an extra 5 minutes (but not more, because of the class that meets next in this room) to finish.

Be sure to bring your calculators. I do not have a loaner calculator!

Class, 10/1

We discussed the fourth problem set, which was done pretty well by you all. I pointed out several errors that were made:

One group forgot, on the second problem, that three different widgets were sampled independently, so that the likelihood had three factors in it, not one.

One group didn't recognize that the third problem had only two states of nature, that is, whether it is Urn #1 or Urn #2. Somehow this group ended up with five states of nature, that is, 1R, 1W, 2R, 2W and 2B, where the number is the urn number and the letter the color. The point is that the thing you don't know and want to learn is always the way to determine what the states of nature are. Here, what we don't know is which urn we've picked, so that tells us what the states of nature are.

One group got the states of nature right, but in the third ball selection forgot that it is the number of balls in the urn when the ball is picked that gives the denominator. True, this is a step made without replacement, but since the first two steps all involved returning the ball (that is, with replacement), there are still ten balls in the urn when the third ball is picked.

On the last problem, one group computed correctly the contribution to the likelihood for each word, but then added them instead of multiplying to get the likelihood. Since the likelihood is the probability that we got 3 of the first word AND 5 of the second AND 3 of the third, you have to multiply. When you compute the probability of one thing AND the probability of another thing, you always multiply. Addition is for when you want to compute the probability of one thing OR the probability of another thing. For example, when you add the joint probabilities in a spreadsheet calculation, you are computing the probability of (data,SON1) OR the probability of (data,SON2) OR ..., to get the probability of the data, irregardless of which SON is true.

We then finished the "Trewel" problem and calculated that the best thing for Alan to do is to shoot in the air, letting Bob and Charlie duke it out, and then, with one of them dead for sure, to come in on his second try and try to kill the survivor. We also recalculated the probability of Alan eventually killing Bob when Alan goes first. See Class, 9/29 for a calculation.

Finally, we discussed the first item on the study sheet, Fermi problems. We discussed the length of the Nile river...and I reminded everyone about the geometric mean trick. If you can put a reasonable lower bound on a quantity and a reasonable upper bound, so that you are pretty sure that the true value is between those bounds, then a decent guess at the correct value is to multiply the lower bound by the upper bound and take the square root of that number. For the Nile, a lower bound might be 100 miles and an upper bound 10,000 miles, which would give an estimate of 1000 miles. Wikipedia says 4100 miles, so this is not a great estimate. For the Mississippi, we imagined the U.S. as a box 3000 miles wide and 2000 miles high, so a length of 2000 miles. The actual length is 2340 miles, so that worked better.

I pointed out that the important thing with regard to Fermi problems is how you got the answer, not the actual value of the answer.

Monday, September 29, 2008

Class, 9/29

Today we discussed further the polling example and actually calculated a simple result. I'll try to post a copy of the calculation later. We found that with 6 favoring candidate A out of a sample of 10, the posterior probability that candidate A wins (has over 50% of the vote) is about 71%. The posterior distribution is closely bell-shaped and peaked at r=0.6=6/10. That's a general rule.

We also asked what would happen if we used a more realistic prior that put more weight near 1/2. In response to a question, I pointed out that you can't put it near 0.6 because that would be "cheating," using the same data twice. You have to do it without looking at the data. We found that a prior that rises to a maximum at 0.5 and then falls again will do two things: It will narrow the posterior distribution somewhat, and will also shift the peak closer to 0.5. If there is a whole lot of data, then the effect of the prior will be negligible, but in our example it can be significant.

The following webpage has election predictions with a chart (lower right hand) that shows a similar posterior distribution, based on calculating the electoral vote in a simulation (this is a modern computational technique, even more powerful than the spreadsheet method we discussed). The left-hand chart has other information that summarizes the posterior probability in several ways: Where the maximum of the posterior probability is, what the win probability is, and so forth.

We discussed a situation where two people enter into a consecutive duel: Alan and Bob will take shots at each other in turn. Alan's probability of hitting Bob and putting him out of commission on one shot is 1/3; Bob's probability of putting Alan out of commission is 2/3. We asked, if they keep taking turns until one hits the other, what's the probability of Alan eventually hitting Bob if he goes first? If he goes second?

Although this could be calculated (as one student suggested) by multiplying and adding probabilities until the numbers were very small, I suggested another way.

If Alan goes first, he'll win outright on his first shot 1/3 of the time. He'll also eventually win with probability (2/3)*P(Alan wins eventually | Bob goes first). So

P(Alan wins eventually | Alan goes first)=1/3+(2/3)*P(Alan wins eventually | Bob goes first).

But P(Alan wins eventually | Bob goes first) can be calculated in terms of P(Alan wins eventually | Alan goes first), because the only way that Alan can win eventually if Bob goes first is if Bob misses on his first try (P=1/3). Then Alan will have a second chance, and the probability that he'll eventually win if Bob goes first is therefore

P(Alan wins eventually | Bob goes first)=(1/3)*P(Alan wins eventually | Alan goes first)

Therefore, P(Alan wins eventually | Alan goes first)=1/3+(1/3)*(2/3)*P(Alan wins eventually | Alan goes first). This can be solved for P(Alan wins eventually | Alan goes first). This turns to be 3/7, and P(Alan wins eventually | Bob goes first)=(1/3)*(3/7)=1/7.

Finally, I brought in Charlie, who is a crack shot and never misses. We decided that if Charlie goes first, he'll knock off Bob since Alan is a poorer shot, and if Alan misses, Charlie will knock him off the second time around. Also if Bob went first to be followed by Charlie, Bob will try to knock off Charlie first since if he knocked of Alan, he's a goner. But if Alan goes first, the first guess that he'd go after Charlie seemed to be wrong, as one student pointed out. Actually, Alan has a better chance of survival if he shoots his gun into the air! More on this later.

Study Sheet for First Quiz

The first test will be on October 10 (Friday). There is a study sheet here. We will start discussing this on Wednesday or Friday, so get together with your group and prepare yourselves for our discussion.

Sunday, September 28, 2008

Class, 9/26

We talked about Problem #4. I mentioned first that it was modelled after a book by Mosteller and Wallace (Mosteller is a famous statistician), in which they tried to determine the authorship of several of the disputed articles in the famous Federalist Papers, published to try to convince Americans to adopt the Constitution.

The idea is that each time a word is used, that may reflect on the author, since different authors tend to use words with different frequencies. So, one author might use "while" and another might use "whilst." ("Whilst" was still in common use in this part of the world in 1789.) So, we can form the likelihood in this problem by multiplying the probability of a word, given the author, for each time a word appears in the text. There are two authors, so we will get a product of many numbers, one number for each word in the sample text.

This requires computing quantities like (0.002)⁵. Unfortunately, this results in some very small numbers. I recommended using powers of ten notation, so that you would have, for example, (0.002)⁵=(2x10^-3)⁵=32x10^-15. You'll get a small integer times some very small number written in scientific notation. The good news is that the power of ten will cancel out of the final answer.

At the end of class, I discussed polls a bit. We determined that the states of nature are the various proportions r of voters who favor candidate A over candidate B. There are infinitely many such numbers. We also discussed how the error in the result will theoretically go down as the size of the sample goes up, so for example the error (plus or minus) in the number of voters in the sample favoring either candidate is roughly (N*r*(1-r))^1/2. So, if N is 1000 and r=1/2, the expected error in the number of voters is about 15, and the error in r is about 15/1000 or 0.015; double that gives what those of you who took statistics before would call the 95% error bar, that is, we expect to have an error larger than +/-0.003 in only 5% of cases.

In real life, sampling difficulties will make the real error bigger than this, so it's more normal for pollsters to quote a somewhat larger number, for example, +/-0.005.

We talked about a basic Bayesian way to do this in practice, namely, in a spreadsheet. We could list a sequence of equally spaced center-points for an interval of r, for example, 0.05, 0.15, 0.25,...0.95, representing intervals of length 0.1. We assign each value of r a prior. One suggestion was 1/10 for each, but one student pointed out that a more realistic prior would be larger for values of r around 1/2 and smaller or near-zero for values of r that deviate significantly from 1/2. Then we can compute the likelihood, which we determined was given by rⁿ(1-r)^N-n, for each value of r, where N is the total number of voters in the sample and n is the number of voters favoring candidate A. (We ignored non-responses). Then in a few mouse strokes we can calculate the joint probability column, compute its sum, and divide it into each joint probability to get the posterior probability.

Finally, we set the date of the first test for Friday, October 10.

Wednesday, September 24, 2008

Class, 9/24

Today we first went through the problem set.

On problem 1, I stressed that the data is that the "expert" said that it is a Super Growth Stock, not that it is a SGS (that is something we don't know, so it can't be data). Something that you know to be true is data; something that you want to know but don't is a state of nature. Because of this confusion, at least one group got their tree wrong by ignoring the branch where SGS was false. This means that they missed counting in their calculation the probability that the "expert" said that the stock was a SGS, when it was not (false positive). If the false positive rate is very large, then you cannot ignore this part of the problem!

The only other problem was that the statement says that your friend is no better than a monkey throwing darts at the stock pages in picking stocks, so the prior probability that he actually picked a SGS is only 1/1000. Some people entertained the notion that it was 1/2, but that contradicts the statement of the problem.

Problem 2 and Problem 1 are basically the same, the only difference is that the false positive rate and the false negative rate are different in this problem (they were the same in problem 1).

In problem 3, the main difficulty was that it has two parts. First, after picking out four chocolate chip cookies in a row, the posterior probability that you have the all chocolate chip cookie box is now 42/43, not 1/2, as some assumed. This changes the probability that the next cookie chosen from the box is a CC cookie quite significantly: It is very close to 1 after you get 4 CC cookies out and no raisin oatmeal cookies.

In problem 5, one group tried to solve it by displaying a particular example of independence in a table and showing that the three relationships were satisfied. The problem is that this only shows it for that particular table, but not in general. To use this method, you'd have to do it for every single one of the infinite number of tables that could be conjured up. This is obviously impossible. What I was looking for was something like this (for one of the things requested):

If P(A|B)=P(A), show that P(A,B)=P(A)P(B).

Because P(A,B)=P(A|B)P(B) no matter what, by substitution from the assumed condition that P(A|B)=P(A) we find that P(A,B)=P(A)P(B).

The More Independence problem was no problem!

We finished the class by discussing a slightly more complicated example of a decision tree, where you are personally deciding whether to install an airbag into your car after, for example, it had deployed in an accident. We showed how the expected value or loss (loss in this case) would propagate backwards in the decision tree, allowing us to cut off a more costly branch at the square (decision) box to choose the best outcome.

Comments on Journals

Some brief comments on this week's journals.

1) It is important to understand that in the "likelihood" column, the probability of what was actually observed, given each of the states of nature, the numbers in this column do not have to add to 1. That is because the states of nature are on the right hand side of the bar; the probabilities in this column are not the probabilities of the states of nature, but the probabilities of the data, given the states of nature.

2) One student wrote about HIV testing, and about the effects of false positives (the book gives an example). He pointed out that a false positive that is due to mislabeling of the sample affects two people, not only the one who gets the false positive report, but also the one whose sample was actually positive, but who probably got a negative report because the wrong sample was assigned to him/her (switched at the lab).

3) Another student wrote from personal experience about a friend whose mother got a false positive mammogram and had to undergo significant psychological and physical pain before cancer was ruled out. On the other hand, failure to follow up on a positive test could be catastrophic, given that in the general population almost 10% of women who get a positive test actually have the disease. What should a doctor do? Certainly, tell his patient that there is a better than 90% probability that she does not have cancer, but on the other hand, that it needs to be followed up.

4) I talked in class about alternative King/Brother scenarios. If you know that the King's sibling is older, it has to be a sister. If you know that the King is the oldest, then there's a 50-50 chance that the younger sibling is a brother.

5) I also discussed what another student brought up, that we professors have to grade you students on a linear scale, which ignores everyone's particular strengths and weaknesses. But, I pointed out, we can and will write letters of recommendation that deal with all of the issues that we know about you, so as to give a potential employer/school a more three-dimensional understanding of you as an individual.

6) One student wrote that we must consider the consequences of actions that we take as well as the probabilities. This was wonderful to read, as that is what we talked about a little on Wednesday and is a major part of the course. Another student talked about the fact that wagers look different when they are for small change than they do when they are about major bucks. We'll talk about that too.