Bayes Rules

Monday, September 29, 2008

Class, 9/29

Today we discussed further the polling example and actually calculated a simple result. I'll try to post a copy of the calculation later. We found that with 6 favoring candidate A out of a sample of 10, the posterior probability that candidate A wins (has over 50% of the vote) is about 71%. The posterior distribution is closely bell-shaped and peaked at r=0.6=6/10. That's a general rule.

We also asked what would happen if we used a more realistic prior that put more weight near 1/2. In response to a question, I pointed out that you can't put it near 0.6 because that would be "cheating," using the same data twice. You have to do it without looking at the data. We found that a prior that rises to a maximum at 0.5 and then falls again will do two things: It will narrow the posterior distribution somewhat, and will also shift the peak closer to 0.5. If there is a whole lot of data, then the effect of the prior will be negligible, but in our example it can be significant.

The following webpage has election predictions with a chart (lower right hand) that shows a similar posterior distribution, based on calculating the electoral vote in a simulation (this is a modern computational technique, even more powerful than the spreadsheet method we discussed). The left-hand chart has other information that summarizes the posterior probability in several ways: Where the maximum of the posterior probability is, what the win probability is, and so forth.

We discussed a situation where two people enter into a consecutive duel: Alan and Bob will take shots at each other in turn. Alan's probability of hitting Bob and putting him out of commission on one shot is 1/3; Bob's probability of putting Alan out of commission is 2/3. We asked, if they keep taking turns until one hits the other, what's the probability of Alan eventually hitting Bob if he goes first? If he goes second?

Although this could be calculated (as one student suggested) by multiplying and adding probabilities until the numbers were very small, I suggested another way.

If Alan goes first, he'll win outright on his first shot 1/3 of the time. He'll also eventually win with probability (2/3)*P(Alan wins eventually | Bob goes first). So

P(Alan wins eventually | Alan goes first)=1/3+(2/3)*P(Alan wins eventually | Bob goes first).

But P(Alan wins eventually | Bob goes first) can be calculated in terms of P(Alan wins eventually | Alan goes first), because the only way that Alan can win eventually if Bob goes first is if Bob misses on his first try (P=1/3). Then Alan will have a second chance, and the probability that he'll eventually win if Bob goes first is therefore

P(Alan wins eventually | Bob goes first)=(1/3)*P(Alan wins eventually | Alan goes first)

Therefore, P(Alan wins eventually | Alan goes first)=1/3+(1/3)*(2/3)*P(Alan wins eventually | Alan goes first). This can be solved for P(Alan wins eventually | Alan goes first). This turns to be 3/7, and P(Alan wins eventually | Bob goes first)=(1/3)*(3/7)=1/7.

Finally, I brought in Charlie, who is a crack shot and never misses. We decided that if Charlie goes first, he'll knock off Bob since Alan is a poorer shot, and if Alan misses, Charlie will knock him off the second time around. Also if Bob went first to be followed by Charlie, Bob will try to knock off Charlie first since if he knocked of Alan, he's a goner. But if Alan goes first, the first guess that he'd go after Charlie seemed to be wrong, as one student pointed out. Actually, Alan has a better chance of survival if he shoots his gun into the air! More on this later.

Study Sheet for First Quiz

The first test will be on October 10 (Friday). There is a study sheet here. We will start discussing this on Wednesday or Friday, so get together with your group and prepare yourselves for our discussion.

Sunday, September 28, 2008

Class, 9/26

We talked about Problem #4. I mentioned first that it was modelled after a book by Mosteller and Wallace (Mosteller is a famous statistician), in which they tried to determine the authorship of several of the disputed articles in the famous Federalist Papers, published to try to convince Americans to adopt the Constitution.

The idea is that each time a word is used, that may reflect on the author, since different authors tend to use words with different frequencies. So, one author might use "while" and another might use "whilst." ("Whilst" was still in common use in this part of the world in 1789.) So, we can form the likelihood in this problem by multiplying the probability of a word, given the author, for each time a word appears in the text. There are two authors, so we will get a product of many numbers, one number for each word in the sample text.

This requires computing quantities like (0.002)⁵. Unfortunately, this results in some very small numbers. I recommended using powers of ten notation, so that you would have, for example, (0.002)⁵=(2x10^-3)⁵=32x10^-15. You'll get a small integer times some very small number written in scientific notation. The good news is that the power of ten will cancel out of the final answer.

At the end of class, I discussed polls a bit. We determined that the states of nature are the various proportions r of voters who favor candidate A over candidate B. There are infinitely many such numbers. We also discussed how the error in the result will theoretically go down as the size of the sample goes up, so for example the error (plus or minus) in the number of voters in the sample favoring either candidate is roughly (N*r*(1-r))^1/2. So, if N is 1000 and r=1/2, the expected error in the number of voters is about 15, and the error in r is about 15/1000 or 0.015; double that gives what those of you who took statistics before would call the 95% error bar, that is, we expect to have an error larger than +/-0.003 in only 5% of cases.

In real life, sampling difficulties will make the real error bigger than this, so it's more normal for pollsters to quote a somewhat larger number, for example, +/-0.005.

We talked about a basic Bayesian way to do this in practice, namely, in a spreadsheet. We could list a sequence of equally spaced center-points for an interval of r, for example, 0.05, 0.15, 0.25,...0.95, representing intervals of length 0.1. We assign each value of r a prior. One suggestion was 1/10 for each, but one student pointed out that a more realistic prior would be larger for values of r around 1/2 and smaller or near-zero for values of r that deviate significantly from 1/2. Then we can compute the likelihood, which we determined was given by rⁿ(1-r)^N-n, for each value of r, where N is the total number of voters in the sample and n is the number of voters favoring candidate A. (We ignored non-responses). Then in a few mouse strokes we can calculate the joint probability column, compute its sum, and divide it into each joint probability to get the posterior probability.

Finally, we set the date of the first test for Friday, October 10.

Wednesday, September 24, 2008

Class, 9/24

Today we first went through the problem set.

On problem 1, I stressed that the data is that the "expert" said that it is a Super Growth Stock, not that it is a SGS (that is something we don't know, so it can't be data). Something that you know to be true is data; something that you want to know but don't is a state of nature. Because of this confusion, at least one group got their tree wrong by ignoring the branch where SGS was false. This means that they missed counting in their calculation the probability that the "expert" said that the stock was a SGS, when it was not (false positive). If the false positive rate is very large, then you cannot ignore this part of the problem!

The only other problem was that the statement says that your friend is no better than a monkey throwing darts at the stock pages in picking stocks, so the prior probability that he actually picked a SGS is only 1/1000. Some people entertained the notion that it was 1/2, but that contradicts the statement of the problem.

Problem 2 and Problem 1 are basically the same, the only difference is that the false positive rate and the false negative rate are different in this problem (they were the same in problem 1).

In problem 3, the main difficulty was that it has two parts. First, after picking out four chocolate chip cookies in a row, the posterior probability that you have the all chocolate chip cookie box is now 42/43, not 1/2, as some assumed. This changes the probability that the next cookie chosen from the box is a CC cookie quite significantly: It is very close to 1 after you get 4 CC cookies out and no raisin oatmeal cookies.

In problem 5, one group tried to solve it by displaying a particular example of independence in a table and showing that the three relationships were satisfied. The problem is that this only shows it for that particular table, but not in general. To use this method, you'd have to do it for every single one of the infinite number of tables that could be conjured up. This is obviously impossible. What I was looking for was something like this (for one of the things requested):

If P(A|B)=P(A), show that P(A,B)=P(A)P(B).

Because P(A,B)=P(A|B)P(B) no matter what, by substitution from the assumed condition that P(A|B)=P(A) we find that P(A,B)=P(A)P(B).

The More Independence problem was no problem!

We finished the class by discussing a slightly more complicated example of a decision tree, where you are personally deciding whether to install an airbag into your car after, for example, it had deployed in an accident. We showed how the expected value or loss (loss in this case) would propagate backwards in the decision tree, allowing us to cut off a more costly branch at the square (decision) box to choose the best outcome.

Comments on Journals

Some brief comments on this week's journals.

1) It is important to understand that in the "likelihood" column, the probability of what was actually observed, given each of the states of nature, the numbers in this column do not have to add to 1. That is because the states of nature are on the right hand side of the bar; the probabilities in this column are not the probabilities of the states of nature, but the probabilities of the data, given the states of nature.

2) One student wrote about HIV testing, and about the effects of false positives (the book gives an example). He pointed out that a false positive that is due to mislabeling of the sample affects two people, not only the one who gets the false positive report, but also the one whose sample was actually positive, but who probably got a negative report because the wrong sample was assigned to him/her (switched at the lab).

3) Another student wrote from personal experience about a friend whose mother got a false positive mammogram and had to undergo significant psychological and physical pain before cancer was ruled out. On the other hand, failure to follow up on a positive test could be catastrophic, given that in the general population almost 10% of women who get a positive test actually have the disease. What should a doctor do? Certainly, tell his patient that there is a better than 90% probability that she does not have cancer, but on the other hand, that it needs to be followed up.

4) I talked in class about alternative King/Brother scenarios. If you know that the King's sibling is older, it has to be a sister. If you know that the King is the oldest, then there's a 50-50 chance that the younger sibling is a brother.

5) I also discussed what another student brought up, that we professors have to grade you students on a linear scale, which ignores everyone's particular strengths and weaknesses. But, I pointed out, we can and will write letters of recommendation that deal with all of the issues that we know about you, so as to give a potential employer/school a more three-dimensional understanding of you as an individual.

6) One student wrote that we must consider the consequences of actions that we take as well as the probabilities. This was wonderful to read, as that is what we talked about a little on Wednesday and is a major part of the course. Another student talked about the fact that wagers look different when they are for small change than they do when they are about major bucks. We'll talk about that too.

Monday, September 22, 2008

Class, 9/22

In class today we talked about the value of a human life. One student proposed that we could multiply the amount that a person would earn per year by the number of years worked. We decided that a kind of "average" amount would be appropriate, chose $50,000/year and 40 years to get a value of $2 million. There was some discussion about the costs of an individual to society, but I pointed out that people do pay taxes so maybe that's not something that enters into the equation. I also stated that although the exact number varies from government agency to agency, one such number I had recently read had the transportation department valuing a human life at around $5 million. In our philosophy of Fermi Problems, $2 million and $5 million are not that far removed, so our estimate wasn't bad.

We then asked whether air bags would be something that the government should require, given the cost of an air bag and the number of people that would be saved per year if air bags were required. This required estimating several numbers. One is the number of highway fatalities per year. Several numbers were raised, of the order of several hundred thousand per year. I knew that the actual number is smaller, about 50,000 per year. OK, that's only a factor of 10 (significant, but not horrible). Armed with this number and the population of the U.S., which everyone thought was about 300 million, we figured that the incidence of highway deaths is about 1 per 10,000 per year. We also estimated that maybe 1/3 (and the person who made this estimate thought it was high, which it is) of the deaths could be saved if air bags were installed in every car. The actual number of saved lives is estimated at about 10,000 per year.

So now we set up a very simple decision tree. On one branch, if air bags are installed, that saves 10,000 lives per year, each worth an estimated 5 million dollars, for a total savings of 50 billion dollars per year. On the other hand, one student said that to replace the air bags in a car costs about $700, but that's more than it costs to install them initially, which may be about $300. We also estimated that maybe 20 million cars go on the road each year. That adds up to an additional $6 billion per year. So, by investing $6 billion in air bags each year, we'll save lives estimated in value at $50 billion. This cost-benefit analysis says that the government should mandate air bags, as the net savings to society is only one-tenth of the cost of mandating them.

Saturday, September 20, 2008

Class 9/19

Today you all took a "quiz" (two forms) which asked various questions for you to express your opinion on. The point of the "quiz" was to point out various ways in which our decisions may be affected by the language used to pose the problems, and other factors that seem on their face to be irrelevant. When the risks are posed in a negative way, the reaction we have and the decision we make may well be different from the same problem when the risks are posed in a positive way (lives saved versus lives lost, for example). We talked about the trolley problem and saw that, even though the decisions seemed to be the same when you just look at the outcomes (one person versus five people dead), the reluctance of most people to push the fat man off the bridge contrasts with their greater willingness to simply throw a switch, may have evolutionary reasons. I pointed out that when this question is posed to a person having their brain scanned in an MRI machine, different parts of the brain are active when the two different questions are posed.