Bayes Rules: September 2008

Monday, September 29, 2008

Class, 9/29

Today we discussed further the polling example and actually calculated a simple result. I'll try to post a copy of the calculation later. We found that with 6 favoring candidate A out of a sample of 10, the posterior probability that candidate A wins (has over 50% of the vote) is about 71%. The posterior distribution is closely bell-shaped and peaked at r=0.6=6/10. That's a general rule.

We also asked what would happen if we used a more realistic prior that put more weight near 1/2. In response to a question, I pointed out that you can't put it near 0.6 because that would be "cheating," using the same data twice. You have to do it without looking at the data. We found that a prior that rises to a maximum at 0.5 and then falls again will do two things: It will narrow the posterior distribution somewhat, and will also shift the peak closer to 0.5. If there is a whole lot of data, then the effect of the prior will be negligible, but in our example it can be significant.

The following webpage has election predictions with a chart (lower right hand) that shows a similar posterior distribution, based on calculating the electoral vote in a simulation (this is a modern computational technique, even more powerful than the spreadsheet method we discussed). The left-hand chart has other information that summarizes the posterior probability in several ways: Where the maximum of the posterior probability is, what the win probability is, and so forth.

We discussed a situation where two people enter into a consecutive duel: Alan and Bob will take shots at each other in turn. Alan's probability of hitting Bob and putting him out of commission on one shot is 1/3; Bob's probability of putting Alan out of commission is 2/3. We asked, if they keep taking turns until one hits the other, what's the probability of Alan eventually hitting Bob if he goes first? If he goes second?

Although this could be calculated (as one student suggested) by multiplying and adding probabilities until the numbers were very small, I suggested another way.

If Alan goes first, he'll win outright on his first shot 1/3 of the time. He'll also eventually win with probability (2/3)*P(Alan wins eventually | Bob goes first). So

P(Alan wins eventually | Alan goes first)=1/3+(2/3)*P(Alan wins eventually | Bob goes first).

But P(Alan wins eventually | Bob goes first) can be calculated in terms of P(Alan wins eventually | Alan goes first), because the only way that Alan can win eventually if Bob goes first is if Bob misses on his first try (P=1/3). Then Alan will have a second chance, and the probability that he'll eventually win if Bob goes first is therefore

P(Alan wins eventually | Bob goes first)=(1/3)*P(Alan wins eventually | Alan goes first)

Therefore, P(Alan wins eventually | Alan goes first)=1/3+(1/3)*(2/3)*P(Alan wins eventually | Alan goes first). This can be solved for P(Alan wins eventually | Alan goes first). This turns to be 3/7, and P(Alan wins eventually | Bob goes first)=(1/3)*(3/7)=1/7.

Finally, I brought in Charlie, who is a crack shot and never misses. We decided that if Charlie goes first, he'll knock off Bob since Alan is a poorer shot, and if Alan misses, Charlie will knock him off the second time around. Also if Bob went first to be followed by Charlie, Bob will try to knock off Charlie first since if he knocked of Alan, he's a goner. But if Alan goes first, the first guess that he'd go after Charlie seemed to be wrong, as one student pointed out. Actually, Alan has a better chance of survival if he shoots his gun into the air! More on this later.

Study Sheet for First Quiz

The first test will be on October 10 (Friday). There is a study sheet here. We will start discussing this on Wednesday or Friday, so get together with your group and prepare yourselves for our discussion.

Sunday, September 28, 2008

Class, 9/26

We talked about Problem #4. I mentioned first that it was modelled after a book by Mosteller and Wallace (Mosteller is a famous statistician), in which they tried to determine the authorship of several of the disputed articles in the famous Federalist Papers, published to try to convince Americans to adopt the Constitution.

The idea is that each time a word is used, that may reflect on the author, since different authors tend to use words with different frequencies. So, one author might use "while" and another might use "whilst." ("Whilst" was still in common use in this part of the world in 1789.) So, we can form the likelihood in this problem by multiplying the probability of a word, given the author, for each time a word appears in the text. There are two authors, so we will get a product of many numbers, one number for each word in the sample text.

This requires computing quantities like (0.002)⁵. Unfortunately, this results in some very small numbers. I recommended using powers of ten notation, so that you would have, for example, (0.002)⁵=(2x10^-3)⁵=32x10^-15. You'll get a small integer times some very small number written in scientific notation. The good news is that the power of ten will cancel out of the final answer.

At the end of class, I discussed polls a bit. We determined that the states of nature are the various proportions r of voters who favor candidate A over candidate B. There are infinitely many such numbers. We also discussed how the error in the result will theoretically go down as the size of the sample goes up, so for example the error (plus or minus) in the number of voters in the sample favoring either candidate is roughly (N*r*(1-r))^1/2. So, if N is 1000 and r=1/2, the expected error in the number of voters is about 15, and the error in r is about 15/1000 or 0.015; double that gives what those of you who took statistics before would call the 95% error bar, that is, we expect to have an error larger than +/-0.003 in only 5% of cases.

In real life, sampling difficulties will make the real error bigger than this, so it's more normal for pollsters to quote a somewhat larger number, for example, +/-0.005.

We talked about a basic Bayesian way to do this in practice, namely, in a spreadsheet. We could list a sequence of equally spaced center-points for an interval of r, for example, 0.05, 0.15, 0.25,...0.95, representing intervals of length 0.1. We assign each value of r a prior. One suggestion was 1/10 for each, but one student pointed out that a more realistic prior would be larger for values of r around 1/2 and smaller or near-zero for values of r that deviate significantly from 1/2. Then we can compute the likelihood, which we determined was given by rⁿ(1-r)^N-n, for each value of r, where N is the total number of voters in the sample and n is the number of voters favoring candidate A. (We ignored non-responses). Then in a few mouse strokes we can calculate the joint probability column, compute its sum, and divide it into each joint probability to get the posterior probability.

Finally, we set the date of the first test for Friday, October 10.

Wednesday, September 24, 2008

Class, 9/24

Today we first went through the problem set.

On problem 1, I stressed that the data is that the "expert" said that it is a Super Growth Stock, not that it is a SGS (that is something we don't know, so it can't be data). Something that you know to be true is data; something that you want to know but don't is a state of nature. Because of this confusion, at least one group got their tree wrong by ignoring the branch where SGS was false. This means that they missed counting in their calculation the probability that the "expert" said that the stock was a SGS, when it was not (false positive). If the false positive rate is very large, then you cannot ignore this part of the problem!

The only other problem was that the statement says that your friend is no better than a monkey throwing darts at the stock pages in picking stocks, so the prior probability that he actually picked a SGS is only 1/1000. Some people entertained the notion that it was 1/2, but that contradicts the statement of the problem.

Problem 2 and Problem 1 are basically the same, the only difference is that the false positive rate and the false negative rate are different in this problem (they were the same in problem 1).

In problem 3, the main difficulty was that it has two parts. First, after picking out four chocolate chip cookies in a row, the posterior probability that you have the all chocolate chip cookie box is now 42/43, not 1/2, as some assumed. This changes the probability that the next cookie chosen from the box is a CC cookie quite significantly: It is very close to 1 after you get 4 CC cookies out and no raisin oatmeal cookies.

In problem 5, one group tried to solve it by displaying a particular example of independence in a table and showing that the three relationships were satisfied. The problem is that this only shows it for that particular table, but not in general. To use this method, you'd have to do it for every single one of the infinite number of tables that could be conjured up. This is obviously impossible. What I was looking for was something like this (for one of the things requested):

If P(A|B)=P(A), show that P(A,B)=P(A)P(B).

Because P(A,B)=P(A|B)P(B) no matter what, by substitution from the assumed condition that P(A|B)=P(A) we find that P(A,B)=P(A)P(B).

The More Independence problem was no problem!

We finished the class by discussing a slightly more complicated example of a decision tree, where you are personally deciding whether to install an airbag into your car after, for example, it had deployed in an accident. We showed how the expected value or loss (loss in this case) would propagate backwards in the decision tree, allowing us to cut off a more costly branch at the square (decision) box to choose the best outcome.

Comments on Journals

Some brief comments on this week's journals.

1) It is important to understand that in the "likelihood" column, the probability of what was actually observed, given each of the states of nature, the numbers in this column do not have to add to 1. That is because the states of nature are on the right hand side of the bar; the probabilities in this column are not the probabilities of the states of nature, but the probabilities of the data, given the states of nature.

2) One student wrote about HIV testing, and about the effects of false positives (the book gives an example). He pointed out that a false positive that is due to mislabeling of the sample affects two people, not only the one who gets the false positive report, but also the one whose sample was actually positive, but who probably got a negative report because the wrong sample was assigned to him/her (switched at the lab).

3) Another student wrote from personal experience about a friend whose mother got a false positive mammogram and had to undergo significant psychological and physical pain before cancer was ruled out. On the other hand, failure to follow up on a positive test could be catastrophic, given that in the general population almost 10% of women who get a positive test actually have the disease. What should a doctor do? Certainly, tell his patient that there is a better than 90% probability that she does not have cancer, but on the other hand, that it needs to be followed up.

4) I talked in class about alternative King/Brother scenarios. If you know that the King's sibling is older, it has to be a sister. If you know that the King is the oldest, then there's a 50-50 chance that the younger sibling is a brother.

5) I also discussed what another student brought up, that we professors have to grade you students on a linear scale, which ignores everyone's particular strengths and weaknesses. But, I pointed out, we can and will write letters of recommendation that deal with all of the issues that we know about you, so as to give a potential employer/school a more three-dimensional understanding of you as an individual.

6) One student wrote that we must consider the consequences of actions that we take as well as the probabilities. This was wonderful to read, as that is what we talked about a little on Wednesday and is a major part of the course. Another student talked about the fact that wagers look different when they are for small change than they do when they are about major bucks. We'll talk about that too.

Monday, September 22, 2008

Class, 9/22

In class today we talked about the value of a human life. One student proposed that we could multiply the amount that a person would earn per year by the number of years worked. We decided that a kind of "average" amount would be appropriate, chose $50,000/year and 40 years to get a value of $2 million. There was some discussion about the costs of an individual to society, but I pointed out that people do pay taxes so maybe that's not something that enters into the equation. I also stated that although the exact number varies from government agency to agency, one such number I had recently read had the transportation department valuing a human life at around $5 million. In our philosophy of Fermi Problems, $2 million and $5 million are not that far removed, so our estimate wasn't bad.

We then asked whether air bags would be something that the government should require, given the cost of an air bag and the number of people that would be saved per year if air bags were required. This required estimating several numbers. One is the number of highway fatalities per year. Several numbers were raised, of the order of several hundred thousand per year. I knew that the actual number is smaller, about 50,000 per year. OK, that's only a factor of 10 (significant, but not horrible). Armed with this number and the population of the U.S., which everyone thought was about 300 million, we figured that the incidence of highway deaths is about 1 per 10,000 per year. We also estimated that maybe 1/3 (and the person who made this estimate thought it was high, which it is) of the deaths could be saved if air bags were installed in every car. The actual number of saved lives is estimated at about 10,000 per year.

So now we set up a very simple decision tree. On one branch, if air bags are installed, that saves 10,000 lives per year, each worth an estimated 5 million dollars, for a total savings of 50 billion dollars per year. On the other hand, one student said that to replace the air bags in a car costs about $700, but that's more than it costs to install them initially, which may be about $300. We also estimated that maybe 20 million cars go on the road each year. That adds up to an additional $6 billion per year. So, by investing $6 billion in air bags each year, we'll save lives estimated in value at $50 billion. This cost-benefit analysis says that the government should mandate air bags, as the net savings to society is only one-tenth of the cost of mandating them.

Saturday, September 20, 2008

Class 9/19

Today you all took a "quiz" (two forms) which asked various questions for you to express your opinion on. The point of the "quiz" was to point out various ways in which our decisions may be affected by the language used to pose the problems, and other factors that seem on their face to be irrelevant. When the risks are posed in a negative way, the reaction we have and the decision we make may well be different from the same problem when the risks are posed in a positive way (lives saved versus lives lost, for example). We talked about the trolley problem and saw that, even though the decisions seemed to be the same when you just look at the outcomes (one person versus five people dead), the reluctance of most people to push the fat man off the bridge contrasts with their greater willingness to simply throw a switch, may have evolutionary reasons. I pointed out that when this question is posed to a person having their brain scanned in an MRI machine, different parts of the brain are active when the two different questions are posed.

Wednesday, September 17, 2008

Problem set comments

We talked about the problem set today. In particular we discussed the King and Brother problem and the ordinary dice problem.

In the King and Brother problem, the easiest way is to list all of the (assumed equally probable) ways that a couple can have two children:

BB p=1/4
BG p=1/4
GB p=1/4
GG p=1/4

The last of these didn't happen, so only the remaining three remain. Their probabilities are equal, so by inspection we see that in one of the three cases, the king has a brother, and in two of them, he has a sister. So, the probability that the king has a brother is 1/3. In symbols,

P(brother|king)=1/3

Again, looking at the list above, we can see that considering all kinds of families, the probability that there will be a king if the royal family has two children is 3/4, that is, there will be a king in all cases except GG, in which case the oldest girl will become Queen. We also see that the probability that there is one brother in addition to the king is P(brother,king)=1/4, since that only happens in the BB case. So, using the formula for conditional probability, we can calculate

P(brother|king)=P(brother,king)/P(king)=(1/4)/(3/4)=1/3, same as above.

We also got this using a tree, but I haven't figured out how to draw one and put it in the blog yet.

We found that in the case where there are three siblings, that

P(two brothers|king)=1/7
P(one brother|king)=3/7.

We got this both by listing all cases and using the formula.

We also did the ordinary dice problem by listing all of the possibilities in a square array, and putting the total in each entry:

| 1 2 3 4 5 6
------------------------------------
1 | 2 3 4 5 6 7
2 | 3 4 5 6 7 8
3 | 4 5 6 7 8* 9
4 | 5 6 7 8 9 10
5 | 6 7 8* 9 10 11
6 | 7 8 9 10 11 12

(Sorry, this isn't coming out formatted the way I expected, I apologize).

We noted that there are 36 entries in the table, and five of them total 8 (marked in red), so

P(total 8)=5/36

We discussed why (4,4) should not be repeated twice. By coloring the dice red and green, and tossing first the red one and then the green one, we get only one (4,4) amongst the examples that total to 8.

We see that there are 11 total cases that have a five showing (row 5 and column 5). Of these, two total 8 (starred). So,

P(total 8|5 shows)=2/11

Also, we just saw that there are 5 cases that total 8 (red), and of these two have a '5' showing (starred). So,

P(5 shows|total 8)=2/5

We also got the last two results by using the formula for conditional probability.

Monday, September 15, 2008

Class 9/15

Today we did several things. I told you about the problem in this week's "Cartalk" program on NPR. The problem involves two people who want to hire themselves out. Their employer insists that there be a 90% chance that they will show up for work (they have to drive), but one of them has a car that is only 70% reliable and the other's car is only 80% reliable. We were able to show that between them they would be able to get to work with better than 90% reliablility, using two different arguments.

(I still haven't figured out how to get pictures into here, so bear with me).

The first solution went this way: They first try to start the car that is 80% reliable. If it starts, they go in it. Otherwise, they try to start the car that is 70% reliable. That will happen on 20% of the days, so the probability that they will use the second car is 0.2*0.7=0.14. Add that to the 0.8 if they use the first car, and the result is 0.94, or a 94% probability that they will get to work.

Put in terms of conditional probability, we have

P(car 1 starts)=0.8,
P(car 1 doesn't start)=0.2,
P(car 2 starts|car 1 doesn't start)=0.7,
P(car 2 starts, car 1 doesn't start)=0.7*0.2=0.14 (by conditional probability formula)
P(they get to work)= 0.8+0.14

The other idea was even simpler. The probability that neither car will start is 0.2*0.3=0.06. Subtract that from 1 and you get 0.94, same as before.

Here we use the fact that the probability that car 2 will not start is independent of whether car 1 starts or not, so

P(car 2 doesn't)=P(car 2 doesn't|car 1 doesn't star2)
=0.3,
P(car 1 and car 2 both don't start)
=P(car 1 doesn't start)*P(car 2 doesn't start)
=0.2*0.3
=0.06

Both solutions are correct.

We used the opportunity to explain independence and dependence.

A and B are independent if P(A|B)=P(A). You will work on two problems involving independence for next Monday, including showing that several definitions of independence are entirely equivalent. We also wrote down a table of joint probabilities of A and B, identified the marginal probabilities that are obtained by summing across the rows and down the columns of the table. There's another problem for Monday that uses tables like this.

We spent the rest of the period discussing various flavors of the Monty Hall problem (mentioned below). We found just by our discussion that it always pays to switch in the Angelic Monty problem, it never pays in the Monty from Hell problem, and it doesn't matter in the Ignorant Monty problem. Finally, we analyzed the Mixture Monty problem by recognizing that there are six states of nature, 3 doors x 2 possible Montys. We set up a table, and used the fact that our evidence was that Monty opened door 2 and we saw a goat, to figure out the likelihood. There was some confusion, for example we found that

P(Open 2 & see goat|Door 3)= 1 for Angelic Monty and 0 for Monty from Hell

(because if you've chosen the wrong door, Angelic Monty will in this case show the only door that he can that has a goat, door #2, whereas Monty from Hell will open the door with the prize and will not open door #2).

We found that it is better to switch if we are facing Mixture Monty.

The easiest way to see this intuitively is that Angelic Monty will offer you a chance to switch twice as often as Monty from Hell will. That's because you'll choose the wrong door 2/3 of the time, so 2/3 of the time Angelic Monty will offer you a chance to switch, while Monty from Hell will offer you a switch in only that 1/3 of the cases where you have picked the right door. If you're offered a chance to switch, it's twice as likely that you're facing Angelic Monty than Monty from Hell.

Sunday, September 14, 2008

Class 9/12

Today in class we mostly pursued the Monty Hall problem, in various disguises. In the process we learned about Bayes' theorem:

P(A,B)=P(A|B)P(B)=P(B|A)P(A),

and as long as P(B) is not zero, we can divide to get the usual form of Bayes' theorem:

P(A|B)=P(B|A)P(A)/P(B).

We identified P(A) as the prior probability of A (before we observed B), P(B|A) as the likelihood, P(B|A)P(A) as the joint probability of A and B, P(B) as the probability of observing the evidence B that we actually observed (that Monty opens the second door), and P(A|B) as the posterior probability of A being true, given that we've observed evidence B.

We also learned that the denominator, P(B), is gotten by summing P(A,B)=P(B|A)P(A) over all possible values of B (not A!).

We solved Monty Hall first with a probability tree, then with a "natural frequencies" approach like that in Calculated Risks, and finally with a spreadsheet-like calculation in which we had columns. We put the possible states of nature (here the different places the prize might be) in the first column, the prior probabilities (1/3) of each state of nature in the second, the likelihood (probability of the data that Monty opens door 2 given that you choose door 1 and the prize is behind the door for that row) in column 3, the product of columns 2 and 3 in column 4 (the joint probability), the sum of column 4 under that column, and then the posterior probability in column 5, which is the entry in each row column 4 divided by the sum under that column.

I haven't figured out yet how to produce such a table in the blog, when I do I'll add it.

Journals

I was really pleased with the journals. Two were short of three full pages, please watch this. But the journals this week were really thoughtful and had some good ideas. Keep it up!

Thursday, September 11, 2008

HCOL 195 Class Members, Post Your Comments/Questions

OK, folks, the blog is started.

You can click on the comments section of any blog entry and log in under your Google account (if you do not have one, you can sign up for one here), and submit comments and/or questions about any item.

I will check the blog page on a regular basis and respond to questions and comments; others are welcome and encouraged do the same.

Please note that entries prior to September 1 are for a graduate class I taught a year ago, and have no relevance to our class. You are welcome to read them and comment on them, but they are quite advanced!

Bill

Monty Hall Problem

The basic thing to keep in mind is that initially, your chance of choosing the door with the prize is 1 in 3. This does not change when Monty shows you (as he must, under the rules) that one of the remaining doors has a goat. You knew that already, you just didn't know which door. So, if your door doesn't have the prize, then the one he didn't reveal must have the prize. Since there is a 1 in 3 chance that your door has the prize, there must be a 2 in 3 chance that the one he didn't open has the prize, and it pays to switch.

One way to think about it is to suppose that there are 100 doors, one of which has the prize. The chance that you initially choose the door with the prize is only 1 in 100. So, there is a 99 in 100 chance that one of those other doors has it. Monty knows which one it is, so he can open 98 doors and never reveal the prize. He will do this in 99 out of 100 games played, and in those 99 out of 100 games, the door he doesn't open will have the prize. In the 1 out of 100 games played where you chose the prize door, he'll open 99 doors at random, and the remaining door will have a goat. So in 99 out of 100 games played, it pays to switch.

I left the class with several variations of the problem to think about for class on Friday:

Ignorant Monty: Monty doesn't know where the prize is, and sometimes randomly opens the door with the prize. Is it an advantage to switch, to stay, or doesn't it matter?
Angelic Monty: Monty knows where the prize is. If you choose the door with the prize, he opens it and congratulates you. If you choose the wrong door, he opens a door without the prize and offers you the chance to switch. Is it an advantage to switch, to stay, or doesn't it matter?
Monty From Hell: Monty knows where the prize is. If you choose the door with the prize, he opens a door without the prize and offers you the chance to switch; but if you choose a door with the goat, he opens it and says, "too bad, you lose." Is it an advantage to switch, to stay, or doesn't it matter?
I didn't mention this, but we'll think about it as well. This is "Mixture Monty." Before coming on stage, Monty flips a coin. If it comes up heads, he behaves as Angelic Monty on stage. If it comes up tails, he behaves as Monty From Hell. Is it an advantage to switch, to stay, or doesn't it matter?

Comments on problem sets

These are the main rules I suggested for handling the problem sets.

Unless otherwise stated, all problem sets are to be done together in your study groups. I expect one paper to be turned in for each study group. If there are disagreements within the group, spell them out in that paper.
Be sure to do every part of every question.
Be sure to explain carefully how you got each answer. The reasoning process is crucial.
Do reality checks. Do the results seem reasonable? If they don't seem reasonable, say so, even if you don't know why!
Carry units and cancel them to make sure that the units of the answer are correct. In particular, be careful to distinguish length, area, and volume.
Use metric system and powers-of-ten (scientific) notation, it's much more error resistant.

September 11, 2008 10:13 AM

Bayes Rules