Bayes Rules

Monday, September 29, 2008

Study Sheet for First Quiz

The first test will be on October 10 (Friday). There is a study sheet here. We will start discussing this on Wednesday or Friday, so get together with your group and prepare yourselves for our discussion.

Sunday, September 28, 2008

Class, 9/26

We talked about Problem #4. I mentioned first that it was modelled after a book by Mosteller and Wallace (Mosteller is a famous statistician), in which they tried to determine the authorship of several of the disputed articles in the famous Federalist Papers, published to try to convince Americans to adopt the Constitution.

The idea is that each time a word is used, that may reflect on the author, since different authors tend to use words with different frequencies. So, one author might use "while" and another might use "whilst." ("Whilst" was still in common use in this part of the world in 1789.) So, we can form the likelihood in this problem by multiplying the probability of a word, given the author, for each time a word appears in the text. There are two authors, so we will get a product of many numbers, one number for each word in the sample text.

This requires computing quantities like (0.002)⁵. Unfortunately, this results in some very small numbers. I recommended using powers of ten notation, so that you would have, for example, (0.002)⁵=(2x10^-3)⁵=32x10^-15. You'll get a small integer times some very small number written in scientific notation. The good news is that the power of ten will cancel out of the final answer.

At the end of class, I discussed polls a bit. We determined that the states of nature are the various proportions r of voters who favor candidate A over candidate B. There are infinitely many such numbers. We also discussed how the error in the result will theoretically go down as the size of the sample goes up, so for example the error (plus or minus) in the number of voters in the sample favoring either candidate is roughly (N*r*(1-r))^1/2. So, if N is 1000 and r=1/2, the expected error in the number of voters is about 15, and the error in r is about 15/1000 or 0.015; double that gives what those of you who took statistics before would call the 95% error bar, that is, we expect to have an error larger than +/-0.003 in only 5% of cases.

In real life, sampling difficulties will make the real error bigger than this, so it's more normal for pollsters to quote a somewhat larger number, for example, +/-0.005.

We talked about a basic Bayesian way to do this in practice, namely, in a spreadsheet. We could list a sequence of equally spaced center-points for an interval of r, for example, 0.05, 0.15, 0.25,...0.95, representing intervals of length 0.1. We assign each value of r a prior. One suggestion was 1/10 for each, but one student pointed out that a more realistic prior would be larger for values of r around 1/2 and smaller or near-zero for values of r that deviate significantly from 1/2. Then we can compute the likelihood, which we determined was given by rⁿ(1-r)^N-n, for each value of r, where N is the total number of voters in the sample and n is the number of voters favoring candidate A. (We ignored non-responses). Then in a few mouse strokes we can calculate the joint probability column, compute its sum, and divide it into each joint probability to get the posterior probability.

Finally, we set the date of the first test for Friday, October 10.

Wednesday, September 24, 2008

Class, 9/24

Today we first went through the problem set.

On problem 1, I stressed that the data is that the "expert" said that it is a Super Growth Stock, not that it is a SGS (that is something we don't know, so it can't be data). Something that you know to be true is data; something that you want to know but don't is a state of nature. Because of this confusion, at least one group got their tree wrong by ignoring the branch where SGS was false. This means that they missed counting in their calculation the probability that the "expert" said that the stock was a SGS, when it was not (false positive). If the false positive rate is very large, then you cannot ignore this part of the problem!

The only other problem was that the statement says that your friend is no better than a monkey throwing darts at the stock pages in picking stocks, so the prior probability that he actually picked a SGS is only 1/1000. Some people entertained the notion that it was 1/2, but that contradicts the statement of the problem.

Problem 2 and Problem 1 are basically the same, the only difference is that the false positive rate and the false negative rate are different in this problem (they were the same in problem 1).

In problem 3, the main difficulty was that it has two parts. First, after picking out four chocolate chip cookies in a row, the posterior probability that you have the all chocolate chip cookie box is now 42/43, not 1/2, as some assumed. This changes the probability that the next cookie chosen from the box is a CC cookie quite significantly: It is very close to 1 after you get 4 CC cookies out and no raisin oatmeal cookies.

In problem 5, one group tried to solve it by displaying a particular example of independence in a table and showing that the three relationships were satisfied. The problem is that this only shows it for that particular table, but not in general. To use this method, you'd have to do it for every single one of the infinite number of tables that could be conjured up. This is obviously impossible. What I was looking for was something like this (for one of the things requested):

If P(A|B)=P(A), show that P(A,B)=P(A)P(B).

Because P(A,B)=P(A|B)P(B) no matter what, by substitution from the assumed condition that P(A|B)=P(A) we find that P(A,B)=P(A)P(B).

The More Independence problem was no problem!

We finished the class by discussing a slightly more complicated example of a decision tree, where you are personally deciding whether to install an airbag into your car after, for example, it had deployed in an accident. We showed how the expected value or loss (loss in this case) would propagate backwards in the decision tree, allowing us to cut off a more costly branch at the square (decision) box to choose the best outcome.

Comments on Journals

Some brief comments on this week's journals.

1) It is important to understand that in the "likelihood" column, the probability of what was actually observed, given each of the states of nature, the numbers in this column do not have to add to 1. That is because the states of nature are on the right hand side of the bar; the probabilities in this column are not the probabilities of the states of nature, but the probabilities of the data, given the states of nature.

2) One student wrote about HIV testing, and about the effects of false positives (the book gives an example). He pointed out that a false positive that is due to mislabeling of the sample affects two people, not only the one who gets the false positive report, but also the one whose sample was actually positive, but who probably got a negative report because the wrong sample was assigned to him/her (switched at the lab).

3) Another student wrote from personal experience about a friend whose mother got a false positive mammogram and had to undergo significant psychological and physical pain before cancer was ruled out. On the other hand, failure to follow up on a positive test could be catastrophic, given that in the general population almost 10% of women who get a positive test actually have the disease. What should a doctor do? Certainly, tell his patient that there is a better than 90% probability that she does not have cancer, but on the other hand, that it needs to be followed up.

4) I talked in class about alternative King/Brother scenarios. If you know that the King's sibling is older, it has to be a sister. If you know that the King is the oldest, then there's a 50-50 chance that the younger sibling is a brother.

5) I also discussed what another student brought up, that we professors have to grade you students on a linear scale, which ignores everyone's particular strengths and weaknesses. But, I pointed out, we can and will write letters of recommendation that deal with all of the issues that we know about you, so as to give a potential employer/school a more three-dimensional understanding of you as an individual.

6) One student wrote that we must consider the consequences of actions that we take as well as the probabilities. This was wonderful to read, as that is what we talked about a little on Wednesday and is a major part of the course. Another student talked about the fact that wagers look different when they are for small change than they do when they are about major bucks. We'll talk about that too.

Monday, September 22, 2008

Class, 9/22

In class today we talked about the value of a human life. One student proposed that we could multiply the amount that a person would earn per year by the number of years worked. We decided that a kind of "average" amount would be appropriate, chose $50,000/year and 40 years to get a value of $2 million. There was some discussion about the costs of an individual to society, but I pointed out that people do pay taxes so maybe that's not something that enters into the equation. I also stated that although the exact number varies from government agency to agency, one such number I had recently read had the transportation department valuing a human life at around $5 million. In our philosophy of Fermi Problems, $2 million and $5 million are not that far removed, so our estimate wasn't bad.

We then asked whether air bags would be something that the government should require, given the cost of an air bag and the number of people that would be saved per year if air bags were required. This required estimating several numbers. One is the number of highway fatalities per year. Several numbers were raised, of the order of several hundred thousand per year. I knew that the actual number is smaller, about 50,000 per year. OK, that's only a factor of 10 (significant, but not horrible). Armed with this number and the population of the U.S., which everyone thought was about 300 million, we figured that the incidence of highway deaths is about 1 per 10,000 per year. We also estimated that maybe 1/3 (and the person who made this estimate thought it was high, which it is) of the deaths could be saved if air bags were installed in every car. The actual number of saved lives is estimated at about 10,000 per year.

So now we set up a very simple decision tree. On one branch, if air bags are installed, that saves 10,000 lives per year, each worth an estimated 5 million dollars, for a total savings of 50 billion dollars per year. On the other hand, one student said that to replace the air bags in a car costs about $700, but that's more than it costs to install them initially, which may be about $300. We also estimated that maybe 20 million cars go on the road each year. That adds up to an additional $6 billion per year. So, by investing $6 billion in air bags each year, we'll save lives estimated in value at $50 billion. This cost-benefit analysis says that the government should mandate air bags, as the net savings to society is only one-tenth of the cost of mandating them.

Saturday, September 20, 2008

Class 9/19

Today you all took a "quiz" (two forms) which asked various questions for you to express your opinion on. The point of the "quiz" was to point out various ways in which our decisions may be affected by the language used to pose the problems, and other factors that seem on their face to be irrelevant. When the risks are posed in a negative way, the reaction we have and the decision we make may well be different from the same problem when the risks are posed in a positive way (lives saved versus lives lost, for example). We talked about the trolley problem and saw that, even though the decisions seemed to be the same when you just look at the outcomes (one person versus five people dead), the reluctance of most people to push the fat man off the bridge contrasts with their greater willingness to simply throw a switch, may have evolutionary reasons. I pointed out that when this question is posed to a person having their brain scanned in an MRI machine, different parts of the brain are active when the two different questions are posed.

Wednesday, September 17, 2008

Problem set comments

We talked about the problem set today. In particular we discussed the King and Brother problem and the ordinary dice problem.

In the King and Brother problem, the easiest way is to list all of the (assumed equally probable) ways that a couple can have two children:

BB p=1/4
BG p=1/4
GB p=1/4
GG p=1/4

The last of these didn't happen, so only the remaining three remain. Their probabilities are equal, so by inspection we see that in one of the three cases, the king has a brother, and in two of them, he has a sister. So, the probability that the king has a brother is 1/3. In symbols,

P(brother|king)=1/3

Again, looking at the list above, we can see that considering all kinds of families, the probability that there will be a king if the royal family has two children is 3/4, that is, there will be a king in all cases except GG, in which case the oldest girl will become Queen. We also see that the probability that there is one brother in addition to the king is P(brother,king)=1/4, since that only happens in the BB case. So, using the formula for conditional probability, we can calculate

P(brother|king)=P(brother,king)/P(king)=(1/4)/(3/4)=1/3, same as above.

We also got this using a tree, but I haven't figured out how to draw one and put it in the blog yet.

We found that in the case where there are three siblings, that

P(two brothers|king)=1/7
P(one brother|king)=3/7.

We got this both by listing all cases and using the formula.

We also did the ordinary dice problem by listing all of the possibilities in a square array, and putting the total in each entry:

| 1 2 3 4 5 6
------------------------------------
1 | 2 3 4 5 6 7
2 | 3 4 5 6 7 8
3 | 4 5 6 7 8* 9
4 | 5 6 7 8 9 10
5 | 6 7 8* 9 10 11
6 | 7 8 9 10 11 12

(Sorry, this isn't coming out formatted the way I expected, I apologize).

We noted that there are 36 entries in the table, and five of them total 8 (marked in red), so

P(total 8)=5/36

We discussed why (4,4) should not be repeated twice. By coloring the dice red and green, and tossing first the red one and then the green one, we get only one (4,4) amongst the examples that total to 8.

We see that there are 11 total cases that have a five showing (row 5 and column 5). Of these, two total 8 (starred). So,

P(total 8|5 shows)=2/11

Also, we just saw that there are 5 cases that total 8 (red), and of these two have a '5' showing (starred). So,

P(5 shows|total 8)=2/5

We also got the last two results by using the formula for conditional probability.