Bayes Rules: October 2008

Friday, October 31, 2008

Bayes and the election

Andrew Gelman has posted an interesting article on the use of Bayesian methods to predict the election outcome here. It discusses the website fivethirtyeight.com, maintained by Nate Silver.

Wednesday, October 29, 2008

We started by discussing the homework. I emphasized that there will never be '+' signs separating the probabilities of the individual events in the likelihood. You will always get the likelihood by multiplying the probabilities of the individual events together (whatever they are). It was evident from several of the homeworks that the likelihood had been calculated incorrectly. In one case, enough Excel code had been given to me to know that a '+' sign had been used instead of a '*' sign. I do not know what happened in the other cases. In any case, any group whose total score was less than 36/40 may resubmit on Friday for partial additional credit.

The other problems were minor.

I did note that there is another, and maybe better way (other than in the problem statement) to get the answer to the first problem. That is simply to put prior probabilities on the states of nature, with half on the "null hypothesis" that the die is unbiased, and distributing the remainder among the alternatives. Then do the usual thing: prior*likelihood = joint, sum joint to get marginal likelihood under all hypotheses, divide that into the joint to get the posterior. And then, just look at the posterior probability p of the "null hypothesis". The odds on the null hypothesis are then p/(1-p). At least one group actually used this method, which made me proud!

We then started on the practice problems.

Problem #1 is similar to the copyright problem we discussed in class. We expect a student to get answers right, because they are supposed to know the material. No one would suspect students, even if they got the answers right, because that's what's supposed to happen. But the mistakes (just like in the copyright problem) are the key. Mistakes should be made at random. So if one student copies another, then s/he will copy the mistakes perfectly, but if not, then with probability 1/5 (since there are five possible answers). The two states of nature are Cheat, and No Cheat. We discussed the prior and decided that on Cheat it might be 1/10. It could be larger or smaller and arguments were given for each. The likelihood is 1 for Cheat and 0.2⁷ for No Cheat, since each coincidence has probability 0.2 and there are seven coincidences. We found that with our prior, the posterior probability of cheating is nearly 1, and the professor ought to take appropriate action.

In Problem #2, the aim was not to actually calculate anything, but to explain how a calculation could be arranged. So the SON are the possible number of taxis, from 1 to (we decided) not more than 50,000. We discussed ways to set a prior. One would be to pick a probability, say 0.9 or 0.99, and raise it to the power of the number of taxis in the SON. A second was to use a straight-line ramp from 1 (highest) to 50,000 (lowest). A third was to use something like 1/N where N is the number of taxis in the SON. Whichever method we use, we just write down the numbers in an Excel spreadsheet, add them up, divide by the sum and then enter these numbers into the normalized prior.

The likelihood is 0 if the number of taxis is less than 150 (because you can't see taxi number 37, for example, if there are only 36 taxis), and is (1/N)⁷ for each SON where N is greater than 149, because this is "sampling with replacement," so each taxi seen has probability of its number being observed of 1/N, and the likelihood is the product of these for each observed taxi (there are 7).

Then the usual: prior*likelihood= joint, sum the joint, etc....

Then, you can decide on what probability you want for the number of taxis. If you want the probability to be greater than 0.99, just sum down the posterior until you get 0.99. The number corresponding to that last line is your best guess.

For the third problem, you have to start by keeping the two cases (standard drug and new drug) separate. For each of these, you want to compute the posterior probability that the particular drug cures the disease (r or s). This is just a standard calculation, like the one for the homework on Monday. The trick is what to do with this information to decide what the probability is that the new drug is better than the old one. We'll talk about this next time.

Tuesday, October 28, 2008

How Not To Collect Statistics

Here's an interesting article about how not to collect statistics.

Another interesting article

The New York Times has an interesting article today on decision-making and how bad people are at it.

Monday, October 27, 2008

Class, 10/27

Today we discussed criminal trials from the juror's point of view. We decided, after some discussion, that the worst thing would be to convict someone who was actually innocent. We know from the Innocence Project that an unacceptably high proportion of people in prison are probably innocent. We set up a decision tree with branches Convict Innocent, Acquit Innocent (the worst and best outcomes) in a probability fork with u being the probability of CI and (1-u) the probability of AI, and the "for certain" branch of the tree being Acquit Guilty. After some discussion we decided on something like u=0.o1 which would mean that 99% of people sent to prison would actually be guilty (assuming we can evaluate that probability as a juror). With a loss of 0 for AI and 1000 for CI, we found that the loss for AG would be 10 to make us indifferent between the two branches.

We also discussed the case of Convict Guilty, and although some thought that AI would personally be better than CG (both correct decisions), this didn't seem to hold up when we replaced AG with CG in the decision tree we drew. CG for certain seemed better than CI with probability even as small as 0.001.

We also discussed whether the seriousness of the case and the harshness of the punishment should not also change our losses. Surely, some thought, the penalty for a traffic ticket is not as onerous a penalty as 20 years in prison for a serious crime, if the person accused were actually innocent, and the death penalty is even more unacceptable if the accused were actually innocent (even though Vermont doesn't have the death penalty, a recent Vermont jury did give the death penalty in a federal case, so it's not entirely moot even for Vermonters). So, the loss for CI ought to be larger if the penalty is more serious, some said. One student would never give the death penalty...for that student, the loss is effectively infinite.

The next several classes will be devoted to discussing the practice problems for the second test. We will pick up the juror discussion again after the test.

Sunday, October 26, 2008

Class, 10/24

We decided on November 7 as the date for the next test. A study guide can be found here, which we will be discussing next week and the following week. You'll get a paper copy on Monday.

Today we talked about juror's decisions. We didn't do any math, but we discussed the various options available to a juror, including the consequences of making the wrong decision (convict someone who is innocent, acquit someone who is guilty). We'll pick the discussion up again on Monday and try to quantify the results of our conversation on Friday.

Friday, October 24, 2008

An interesting short article

The New York Times published this interesting article on statistics, baseball and health care today.

Wednesday, October 22, 2008

Class, 10/22

OK, so today I revisited the insurance problem, but from the point of view of expected loss rather than expected utility. It's not any different, but I drew a picture on the blackboard that showed that there was a range in insurance premiums between the minimum premium that the insuarnce company would sell the insurance policy for (p=m/h, see previous posting for definitions) and the maximum amount the homeowner would buy it for (p=loss(m)/loss(h)). The homeowner hopes that between these two limits, there will be competition between insurance companies that will give him or her a good deal on insurance.

We then discussed testing various hypotheses based on data observed.

First, we discussed testing a coin which may be fair or unfair.

We decided that if it was fair, a reasonable prior would be P(fair)=0.5, and P(unfair)=0.5.

But then, what does P(unfair) mean? If the coin is fair, it is supposed to be Heads with probability 0.5. But if it is unfair? What is the probability? We decided to split the probability up equally amongst the possibilities, which we chose to be 0.05, 0.15, 0.25,...0.85, 0.95 with each possibility having prior probability 0.05 (after some discussion that reminded us that the the total probability has to be 1, and already we have expended 0.5 on the "null hypothesis" that the coin is fair.)

So we set up a "spreadsheet" calculation. We discussed how to actually do it if we were doing it with Excel.

We didn't actually do the calculation, but I will tell you that the result is: With 60 heads and 40 tails, the probability of obtaining a result that extreme or more extreme (60 or more heads, or 40 or less tails) is about 0.05, but the probability that the coin is fair, given that we have observed the data (60 heads and 40 tails) is about 0.5.

This is very interesting. The standard statistical test of statistical significance, how extreme the result is, is very different from what the Bayesian result is.

We also discussed the problem of estimating the probability that an unknown proportion (here, the bias of the coin, or in the problem set, the cure rate of the new drug) is greater than some fixed value (say 0.2). The spreadsheet is the same except for no special picking out of 0.5; this means we crossed out this line and used 0.1 for the alternatives 0.5, 0.15,...0.95, and to determine the probability that the the new drug is better than the old one, we just add the probabilities for the states of nature that are greater than 0.2. (Again we didn't do the actual calculation.

We finished with a challenge: How to decide, if you are on a jury, whether to convict or acaquit a defendant in a criminal case. More clearly, what does "beyond a reasonable doubt" mean?

Tuesday, October 21, 2008

Class, 10/20

We talked about utilities. First we looked at the shapes of the curves that you all derived over the weekend. We learned that a straight line is neutral, a utility curve that curves up is risk-seeking and one that curves down is risk-averse.

We then discussed insurance on a house. We found that if h is the value of the house and m is the premium we pay for the insurance, and p is the probability of disaster (e.g., a fire burns the house down), then the insurance company will demand that p be less than m/h. On the other hand, the owner of the house (if her utility is neutral) will demand that p be greater than m/h, and no transaction can take place. But insurance is bought and sold, so there has to be an explanation for this. And there is an explanation, because although insurance companies have a nearly neutral utility curve except for truly huge amounts, people do not, and most people have risk-averse utility curves. They will demand that p be greater than u(-m)/u(-h) where u(-) means the value of the curve at the point in question (the quantities are negative because in both the case of the premium and the potential catastrophe, the person ends up with less assets). But, if you have a utility curve that curves down, that means that the ratio u(-m)/u(-h) will be less than m/h, so the person will be willing to buy the insurance after all. Therefore, the insurance company can now set a value of m that the consumer will be willing to pay and which will also give a profit to the company, thus keeping the stockholders happy.

I remarked that this is actually how all commerce works. There are two parties, a seller and a buyer. They are willing to make a transaction because their utility curves are different, and so it is a "win-win" situation where everyone, both the buyer and the seller, feel themselves better off (in terms of utilities) than they did before the transaction took place.

One student had remarked in class and in journals that this approach (decision theory) might not be adequate when considering lotteries, where there is a huge payoff of very low probability. Should someone wager to win the lottery, even if taxes and annuitization made it a positive payoff on expected return basis? My answer is, "Not Really." The reason is that we don't (or shouldn't) make decisions based on expected return. We should make decisions based on expected utility or expected loss. I posed the question, would you rather have $280 million with probability 1/2, or $10 million for sure. The overwhelming choice of the class was, take the $10 million. This means, that to most of the people in the class, having $280 million isn't that much better than having $10 million. This means that in the lottery problem, you probably won't want to use $280 million as the leaves on the ends of the decision tree. You probably will make as good a decision if you just put $10 million there. And if you did this, your decision would be just as rational, and would tell you that the lottery is not really a good place to invest your money (unless your only reward is the thrill of entering the lottery!) Final comment is that the student who raised this issue initially agreed that when utilities or losses were used as the payoff, then it would not be a problem.

I finally drew on the board a generally useful way to estimate utilities for any events whatsoever. I used the Monty Hall example of a car, a goat, and a trip to Hawaii. Presumably the car is the best and the goat is the worst, with the Hawaii trip in between. Draw a decision tree, put the car and the goat on the probability branches and the Hawaii trip on the "get for certain" branch. Then, pick a probability for getting the car that makes you neutral between the two branches of the decision tree. There should be a point where you are neutral, for if the probability of getting the car is 1, you'd take the car for sure, but if the probability of getting the car is 0, you'd pick the Hawaii trip for sure. One student volunteered p=0.8. That means that her utility for the Hawaii trip is 0.8, since at that point, both branches of the decision tree have exactly the same value.

I remarked finally that if you use this method for evaluating utilities, then the utilities so calculated are actually probabilities!

Saturday, October 18, 2008

Class, 10/17

Today we spent most of the class discussing the lottery problem I left you with last time.

What we need to compute is the probability that no one wins the lottery, the probability that exactly one person wins the lottery, exactly two people, and so forth.

After some discussion we decided that the probability that no one wins is the probability that the first person loses AND that the second person loses AND....AND that the last person loses. Since these are independent events, we need to multiply the probabilities of each of these events (AND always means multiply the probabilities). If I write w=1/80,000,000, the probability that a given person wins the lottery, then the probability that that person loses is (1-w). The probability that everyon loses is (1-w)^N, where N=200,000,000 is the number of tickets sold. Although that looks terrible to compute, actually a hand calculator correctly computed this number to be 0.082.

To get the probability that exactly one person wins, we decided that it is equal to the probability that (the first person wins AND all the others lose) OR (the second person wins AND all the others lose) OR...OR the last person wins ane all the others lose). The AND means multiplication, and the OR means adding probabilities. To get the probability that one specified person wins AND all the others lose, this is equal to w*(1-w)^N-1 which is hardly different from w*(1-w)^N since the extra factor of (1-w) is very, very clost to 1. But there are N tickets, so the OR means we add this number to itself N times and the probability that exactly one person wins, one of the N tickets, is (Nw)*0.082 or 0.205.

For two people we follow the same principle: We compute probability that (the first person wins AND the probability that exactly one of the other people wins AND that all the other people lose) OR (the second person wins AND exactly one of the other people wins AND that all the others lose) OR...OR (the last person wins AND exactly one of the other people wins AND that all the others lose). We just computed the probability that one of the other people wins AND that all the others lose, it's 0.208. And the probability that a particular person wins is still w. And, there are N identical numbers that are OR'ed together, so we have to multiply by N again, getting (Nw)*0.208. However, there is a little complication, because if you look at the first two numbers above, in both of them there is a piece that comes from the first person winning AND the second person winning. And a similar thing can be said about any pair of items above. So what has happened is that each pair of people appears twice in the sum and is therefore counted twice as much as it should be. So the probability we want has to be divided by 2, and the answer we need is: The probability that exactly two people win is (Nw)*0.208/2=0.257.

In a similar way, the probability that exactly three people win can be computed as (Nw)*0.257/3=0.214; similarly to the case of two people, we note that each triple of tickets gets counted three times, so we have to divide by 3. And so forth for the case of exactly four, five, six and so on. Once you get to seven people, there's less than a 1% chance that that many people will win.

Now we can compute the expected value of a ticket. Your probability of winning is w. If no one else wins (probability 0.082) then you would win $280M. If exactly one other person wins (probability 0.205) then you would win $140M. And so forth. Adding it all up, the expected value of a ticket is $1.29. We calculated $1.12 by dividing $280M by 2.5=200M tickets/80M probability of winning.

But still, taxes and the fact that you can't take home the entire jackpot if you want it all at once means that it is not worth it (from an expected value point of view). The only reason to buy a ticket is for the fun of it.

I gave out some worksheets for estimating your utility function for money. You should work them out this weekend. We'll discuss them on Monday.

Thursday, October 16, 2008

Decision tree diagram

Here is the photo I took on Wednesday. The quality isn't great, but you should be able to copy it to your clipboard and look at it in more detail. In fact, I just clicked on it (double click on a Mac, I don't know what you do with a PC) and Firefox presented it in a separate window, and the numbers were easily readable.

In addition to the decision tree we discussed the PowerBall lottery. p=1/80,000,000 to win, 200,000,000 tickets sold, $280,000,000 jackpot. The question is, does it pay (in an expected return sense) to enter? We immediately noticed that there might be more than one winner, and we estimated roughly 2.5 winners on average. This makes a ticket worth $1.41, so at first sight it appears to be a good idea to enter. But there are several problems, which we uncovered on further discussion. One is taxes: You would be taxed at the highest bracket, which is in the 35-39% range (depending on the tax law), as well as Vermont income tax. Also, you don't get the money all at once, but in installments over 20 years. To get the money at once, you have to take a discount of about 50% (since the way the lottery works, the state buys you an annuity that pays out over 20 years, and you would only get the amount that they would pay the insurance company to buy the annuity). Thus, it seems that it isn't worthwhile after all.

I left you with the problem of trying to get a more precise estimate of the expected return, considering the probability that 1, 2, 3,... more winners will win the lottery other than you.

Tuesday, October 14, 2008

Class, 10/13

We went through the test, and I won't repeat what we talked about except to note several things that I want to emphasize.

1) On the Fermi problems, several tips. Don't try to be too fancy, as in trying to estimate low, middle and high income populations and housing prices, and rolling it together to get an average. Better is to estimate something close to the median cost of a house and just multiplying by the number of houses. Very few people have expensive houses, and it isn't going to increase your accuracy by trying to factor that information in. Also, don't forget to divide the population of the U.S. (300 million) by the average family size (around 4).

2) On the coin problem, the easiest way is to recognize that the method chosen (pick coin at random and flip) has an equal probability of seeing any particular side. Cross off the tails (5 instances) and there are 7 ways to get a head. Of these, 4 will have a head on the other side. If you use the "spreadsheet" method, recognize that there are three states of nature, HH, HT and TT, with prior probability of 2/6, 3/6 and 1/6, respectively. The same idea works if you use a tree...recall that the branches at the base of any probability tree will always be the states of nature and their prior probabilities. This is why it is important to start any analysis by identifying the distinct states of nature, and then their prior probabilities, no matter what method you use. Then the likelihoods are branches off the base branches, identified as the data observed and the probability of observing that data (H in this case) given that the corresponding state of nature is true.

3) Here the important thing to recognize is that the taxis continue to drive around, so it is sampling with replacement, and the three factors in the likelihood will not change from observation to observation since the number of taxis available does not change.

4) This one is probably best solved with the natural frequencies method. Take 1500 students in the group. 150 will have taken the drug and 1350 will not have taken it. Of those that took it, 147 will be caught by the test and 3 will escape detection. Of those that did not take it, 40.5 will be falsely caught and 1309.5 will correctly be identified as not taking the drug. This gives the answer to the last part of the question (40.5), and by computing the ratio 147/(147+40.5)=0.78 we get the probability that a student has taken the drug, given that he tests positive.

5) The table is dependent, since the entries in at least one cell do not equal the product of the marginal probabilities in the corresponding row and column. To get an independent table, just multiply those marginals and enter the product into the corresponding row and column.

6) The easiest way to do this one is to focus on the gains and losses, rather than the absolute amount that you get back at the end. Thus, the gain is $500 for the bond, and for the mutual fund it is 0.7*$9700*0.15-0.3*$9700*0.1=$727.50. However, you have to pay a commission out of this (-$300 tollgate), so you'll actually have an expected profit of $427.50. Since this is less than $500, you'll prefer the bond.

If you focus on how much you get back, you have to be careful, because on the mutual fund branch you will already have subtracted the commission so you don't want a tollgate or you'd be paying the commission twice.

Generally speaking, it's a lot easier to do these problems by focusing on the gain or loss rather than the amount you get back after a year.

Reading update

Finish reading Smart Choices. Also, read Chapters 18, 19, 21 and 22 of Flip.

Wednesday, October 8, 2008

Class, 10/8

We finished discussing the study guide.

First we finished the "take balls out, mark them, and put them back" scenario for estimating the number of balls in the urn. The basic idea here is, make a picture to yourself of the number of marked balls that have to be in any urn at each sampling event. The probability of picking the particular ball (marked or unmarked) is the fraction of balls of that type in that urn. Calculate that for each urn (State of Nature) and multiply the likelihood by that fraction. Then go on to the next sampling event, remembering to mark the ball so that the probabilities will change on the next sampling event. Then, you know the routine: Multiply prior times likelihood to get joint; sum the joint; divide the sum into each joint probability to get the posterior; sum the posterior to verify that the sum is 1.

The same idea is operative for the balls marked by numbers.

As we discussed in class, there are really two possibilities for this. The problem in the study sheet is for sampling without replacement. This means that the number of balls changes after every sampling event. This would be appropriate if, for example, we were interested in figuring how many German tanks had been produced from the serial numbers of captured tanks (which are out of commission after capture).

We could have put the balls back in the urn after sampling them; that leads to a slightly different problem, where it may be possible to sample the same object more than once. For example, you might be an airplane-spotter: You want to estimate the number of airplanes owned by an airline, and you might know that the numbers on their tailfins are sequential. Since you might see the same airplane more than once, this would be sampling without replacement.

The difference between the two scenarios is: With sampling without replacement, the number of items in the urn being sampled (airplanes, for example) doesn't change, so the denominator remains constant at the number of items originally in the urn. With sampling with replacement, the denominator decreases by 1 each time an item is sampled. Otherwise, the two cases are the same.

But these problems have basically the same structure as the "catch-and-release" problem with the unmarked balls that we mark. Identify the states of nature (the unknown number of items in the original set-up), put a prior on them, calculate the likelihood, by multiplying the probability of sampling each item in turn together (considering the particular method of sampling/marking/number displayed), compute the joint, calculate the sum, divide to get the posterior, check that the posterior sums to 1.

The next problem, ants and beetles, is exactly like the polling problem we discussed earlier. Instead of voters who say they will vote for candidate A or B, we have insects that we identify as ants or beetles. In both cases, the states of nature are all of the true proportions of each kind that exist in the population being sampled (voters, insects). Both of these assume that the number of items in the population is very large.

Finally, on the Decision Problem: The way we attacked this problem was to look at a simpler problem than the one in the study sheet: Should you just produce parts, or should you sacrifice one part, pay $150, and produce the remaining parts knowing that the machine would be in a "good" state and would produce a much higher proportion of "good" parts.

We set up a decision tree: First, a square box, representing the decision we had to make...they are: fix machine first and produce 23 parts; or just produce 24 parts, regardless.

On the first scenario, we set up a "toll gate" for -$150 on that branch. Since we knew that machine to be in a "good" state on the right of that, we could then decide that the expected return on that branch would be 0.95*23*$2000 since bad parts aren't worth anything.

On the second scenario, there is no initial cost of $150 so no "toll gate." But we now have two branches, one with probability 0.9 in which the machine is "good", and one with probability 0.1 in which the machine is "bad". If the machine is "good," 0.95 of the parts will be useful. If it is "bad," only 0.7 of the parts will be useful. By tracing the probability tree backwards, the expected number of parts that will be useful is (0.9*0.95+0.1*0.7). This is multiplied by 24*$2000 to get the expected profit.

We found that the best scenario under these rules was to abandon caution and just produce parts. The extra part (24 instead of 23) produces more expected profit than the more reliable machine minus the cost of $150 to make sure the machine is OK.

One student mentioned that there might be other circumstances, like the need to make a minimum profit. This is quite true, but it wasn't part of the assumptions of the problem. An example might be that if you don't make the minimum profit, someone might come over and break your legs. That's a different problem, but it can be analyzed by the tools we are developing. You just have to build that into the decision tree you build.

Don't forget: Bring your calculators, come early if you can, stay a little late if you can (but not later than 10:05 for the next class) and be sure to attempt to answer as many questions as you can. Budget your time. Make sure you convince me that you know how to answer a question even if you don't have time to do the complete calculation.

Monday, October 6, 2008

Interesting things in this week's journals

One person remarked on the volatility of the stock market, particularly as we are experiencing now. Generally speaking, the stock market is a risky proposition in the sense that over short periods of time it can be quite volatile. Over the past several months, it is down close to 30%. One should not put money into the stock market that you're going to need soon, even within the next five years or more. That's where you should put money you won't need for a decade or more. The second thing is diversification, which is best achieved by investing in mutual funds that represent a broad cross-section of the market. The third thing is to have a mix of stocks and fixed income investments like bonds and money market funds whose volatility is much less, even though their long-term potential for return is lower than with stocks. (Historically, stocks have returned of order 10% per year over a long period, although they can be down sharply in any given year. Bonds typically return a few percent over inflation.)

Another person also talked about the stock market, and mentioned recent volatility. I did mention that recent volatility, although bad, is by no means a percentage record. That is, a 700 point drop in one day is about 7%. There have been much larger percentage drops in history, although 700 points may be a point record. But really, only percentage changes actually reflect what's really happening.

Several people reported that they'd like more clarification of various points. There are many places where you can get this kind of response. Your journal is one; class is another, and you know by now that I'm happy to get your questions. Or you can talk to me out of class. Or you can ask questions by posting them as comments to this blog. Or you can send me email. I welcome all of these.

One person had read Lewis' book and asked about the Prisoner's Dilemma problem. This is of course a problem in game theory, which isn't really part of this course. But it is an interesting problem nonetheless, as it raises the question of whether there is a way for the prisoners to avoid falling into the jailors' trap, thus ending up with sentences that are more favorable to both. There are approaches that can do this, by embedding this particular game into a larger one. You might find information about this on WikiPedia or on the web.

One person did an analysis of all of the possible shooting orders for the Trewel problem, not just ABC but also all other permutations such as CBA, CAB, BAC, etc. In all of these situations, the result is that the best shooter has the best probability of surviving, and the worst shooter has the second best probability of surviving. It is Bob that has the lowest probability of making it out alive. Very interesting!

One person asked about Fermi problems, "If you are possibly starting with a completely wrong number, what's the point?" The point is that you often are in a situation where a decision must be made on imperfect knowledge, so you have to make such estimates. So, it is a good skill to perfect, and practice makes perfect. The more practice you have, the more skilled and confident you become, the better you'll do.

Another person asked about polls taken over a period of time and asked, does each poll have its own bell-shaped curve? The answer is yes, each poll has some uncertainty and therefore its own bell-shaped curve. Now peoples' opinions change over time, so we can't just average the polls over several months together to figure out what is going to happen next. The average of several polls is more likely to reflect what opinion was halfway through the polling period. More sophisticated approaches (something statisticians call "regresssion", which is the subject of a Burack lecture next Monday afternoon) would be required.

One person made a mathematical mistake, which I want to point out. If we have several probabilities expressed as percentages, e.g., 3% and 2%, you cannot multiply them to get the probability of a joint event as 6%. That's because, expressed as probabilities, these are 0.03 and 0.02, respectively, so the probability of the joint event is 0.0006, 0r 0.06%.

One person mentioned an interest in political science and polls, and I mentioned that Prof. Andrew Gelman at Columbia University has a blog to which he posts daily. He is a Bayesian statistician and a political science, author of an interesting book, "Red State, Blue State." Some of what he posts is advanced but much is quite accessible to nonstatisticians. His blog can be found here. I read it every day.

A very interesting problem was posed by one person, who mentioned hanging out with a friend and finding, within a short distance of each other, two four-leaf clovers. If the probabililty of finding one four-leaf clover is 10^-4=1/10000, does this mean that finding two nearby each other is 10^-8? Actually, it probably isn't, for several reasons. The first is basically a fallacy: It may be that this is correct for any two people sitting together at random places around the earth, but the fact is that if you find one four-leaf clover, all of a sudden your attention is drawn to this low-probability event, but it is an event that has already happened. So the probability that is really relevant, once you have found one, is P(find a second|found one), and that is at least 10^-4. If you hadn't found a second one, the first one probably wouldn't have been written about. The same fallacy underlies the occasional news show where you learn that someone who has already won the lottery has won again. The probability that you win a second time, given that you won once, is the same as the probability that you win once (assuming independence). But the only reason that the event made the news is because of the second win. It is a mistake to be very surprised that occasionally someone wins twice.

The other reason is that four-leaved clovers are (as the person mentioned) due to genetics, or to soil condition, or to other external factors. That means that it probably isn't the case the P(find a second one|found one)=10^-4. Because of these factors, this probability probably isn't independent, so writing that P(find a second one|found one) might be much larger than P(find one). It's not unlikely that four-leaved clovers grow in proximity, that is, in clusters.

Another person asked about the formula square root of N*p*(1-p) for the expected uncertainty of the number of coin flips or voters voting for a candidate, where p is the true probability in the entire population. I pointed out that this formula isn't a part of the course, but was brought up to answer a question that was asked in class. You are not responsible for this formula, and I will not derive it. But one thing puzzled this person: the uncertainty is smaller, the farther away from 0.5 p is. But it really is true. One way to see this is to ask about the case p=0. In that case, the voters are unanimous in favoring candidate B, or the coin has two tails. You have no variation at all, so this formula evaluates to 0, as it should. By extension, as you go away from zero, the variation will increase, and because of symmetry (after all heads and tails are symmetric states; voting for A and voting for B are similarly symmetric), it will decrease again as you go to values of p greater than 0.5.

Class, 10/6

We continued studying the study guide for the quiz on Friday.

I had left you with the "three cards" problem. I brought in my trick coins and we determined that there are three states of nature, HH, HT and TT. We did the calculation in a spreadsheet format, taking a prior of 1/3 on each SON, and recognized that if we observe H, the likelihood of observing H is 1 if it is the HH coin but only 1/2 if it is the HT coin. So this yields a spreadsheet that is similar to the Monty Hall (standard) problem, and if we see a H, the posterior probability is 2/3 that the other side is also H.

We then finished the cancer problem. The probability that a member of the general population who tests positive does not have the gene is 495/593 or about 5/6, and 1/6 that this individual does have the gene. Part (3) of the question asks first, what's the probability that someone with a positive test gets the disease: That is
1/6*0.2+5/6*0.0002 or about 0.03333+0.00017=0.0335. Of these, most have the gene. Only 0.00017/0.0335 or about 0.005. That's only 1/2 of 1 percent.

We discussed the galaxy problem. As many of you pointed out, it is exactly like the Shakespeare and Marlowe problem. The problem sets the prior at P(E)=0.8, P(S)=0.2; the likelihood is gotten by raising the probability for each type of object to a power equal to the number of that type of object that the machine found, and multiplying the values together for the three types of object. We found (using a spreadsheet calculation) that the posterior probability was about equal for E and S after this evidence.

On the plagiarism problem, a question was asked about what the meaning of the code is. We discussed how mathematical tables are constructed: You calculate the number to more significant digits than you plan to publish, and round up or down according the the digit that follows the one you plan to publish. If that following digit is a '5', you have a choice to round up or down. By flipping a coin you can round up or down in a random way such that is unlikely to be duplicated by someone else independently putting together a table. So in this way you embed a secret code, known only to you, into the table by the rounding pattern. We calculated that if the prior probability is 1/2 for plagiarism vs. accidental agreement (no cheating), then the posterior probability is about 10^-30 that no cheating was involved if the code is duplicated exactly.

We discussed the reason for choosing equal priors: The law says in civil cases that the side with the preponderance of evidence wins the case. That is anything more than 50%.

I also mentioned that this technique is used to prevent plagiarism in other cases, e.g., map making, by putting small but innocuous mistakes in a map. Also, mistakes in the genome from generation to generation can be used as a "clock" to tell how far back in time two present-day organisms had a common ancestor, as well as the degree of relationship between a number of organisms, for example, how closely related human beings from various parts of the world are when traced back in time.

Finally, we got most of the way through the first urn problem. We decided that there are 10 states of nature corresponding to there being 1, 2,...,10 balls in the urn. The probability that the first ball is unmarked is of course 1, independent of the SON. The second ball is unmarked, but here the probability of that given the SON is 0 if the urn contains only one ball (because then the only ball in the urn is marked), 1/2 in the case of 2 balls in the urn, 2/3 in the case of 3 balls in the urn, and so forth. The third ball was marked, and now there are two marked balls in the urn so we're picking one that we'd already popped back in. Now for SONs 2, 3, ..., 10 the probability of picking a marked ball is 2/2, 2/3, 2/4, ..., 2/10. That is where we left it. We'll return to this problem on Wednesday.

I mentioned that this sort of thing is used, e.g., by biologists who catch fish, tag them, and release them, then after the population has had a chance to mix up, catching a sample again and seeing what proportion of the fish caught the second time are tagged. This can be used to estimate the size of a population of fish in a lake.

Friday, October 3, 2008

Class, 10/3

We passed over the Fermi problem bullet after I re-explained the geometric mean method.

We reminded everyone of the basic equation of conditional probability that underlies everything we are doing: P(A,B)=P(A|B)P(B)=P(B|A)P(A). We talked about the three equivalent facts about whether a distribution is independent or not. It is independent if and only if P(A|B)=P(A) for every A and B, it is independent if P(A,B)=P(A)P(B) for every A and B, and if P(A|B)=P(A) for every A and B, then necessarily P(B|A)=P(A) for every A and B.

We then showed how to construct the unique table of joint probabilities when we are given the marginal probabilities: Just multiply the marginal in a row with the marginal in a column and put the result in the corresponding cell.

We then took a table of independent probabilities and, by changing four cells in a square, just adding an arbitrary number to two of the cells on a diagonal and subtracting the same number from the cells on the other diagonal, and got a table where the probabilities are not independent.

We discussed the Monty Hall problem and variants. We found that if there are four doors, and Regular Monty opens two of them, each showing a goat, then the probability of getting the prize goes from 1/4 to 3/4 if we switch. We found that it goes from 1/4 to 3/8 if Monty opens one door and we switch to one of the others. We did this by a spreadsheet calculation. We then thought of a simpler way: Since your probability of initially picking the right door is 1/4, the probability that one of the other doors has the prize is 3/4. That doesn't change when Monty opens one of them and shows you a goat. So, since there are two doors left, the probability that you'll get the right one if you switch is 1/2 times 3/4, or 3/8.

I left you with another related problem to think about: There are three cards which have been made by pasting together two cards so that the backs are visible. One has two red backs, one has a red back and a blue back, and one has two blue backs. The cards are put in a hat and shaken, and you pick one out, looking at only one back. It is red. What is the probability that the other side is red?

We'll discuss that next time.

We went onto the cancer problem. We took a population of 10000 individuals. The problem statement says that 1% of the population has the gene, so that's 100 who have the gene and 9900 who don't. Of those who have the gene, 98 will be detected by the test and 2 missed (false negative). Of those who don't have the gene, the test will falsely identify 5% as having the gene, or 495 in all (false positives) and will correctly say that the remaining 9405 do not have the gene. Looking at just the positives, there are 98 true positives and 495 false positives, so that the probabilty that a person has the gene if they test positive is only 98/593 or about 1/6; the remaining 495/593, or about 5/6, do not have the gene.

We ran out of time here and will continue on Monday, finishing this problem and then going on in the study guide.

In answer to a question, I stated that if there is an item that we don't get to in our review, then similar items will not appear on the test. I also stated that I expected there to be five questions on the test, and that there should be enough time for everyone to do all of them. I pointed out that usual test-taking strategy says to go for the easy ones first and save the bulk of the time for the harder ones. I also said that it is very important to at least try to answer every question, since I cannot give credit if an item goes completely unanswered. We agreed that people who come early (not earlier thatn 10 AM, please) could start early, and that you may be able to stay an extra 5 minutes (but not more, because of the class that meets next in this room) to finish.

Be sure to bring your calculators. I do not have a loaner calculator!

Class, 10/1

We discussed the fourth problem set, which was done pretty well by you all. I pointed out several errors that were made:

One group forgot, on the second problem, that three different widgets were sampled independently, so that the likelihood had three factors in it, not one.

One group didn't recognize that the third problem had only two states of nature, that is, whether it is Urn #1 or Urn #2. Somehow this group ended up with five states of nature, that is, 1R, 1W, 2R, 2W and 2B, where the number is the urn number and the letter the color. The point is that the thing you don't know and want to learn is always the way to determine what the states of nature are. Here, what we don't know is which urn we've picked, so that tells us what the states of nature are.

One group got the states of nature right, but in the third ball selection forgot that it is the number of balls in the urn when the ball is picked that gives the denominator. True, this is a step made without replacement, but since the first two steps all involved returning the ball (that is, with replacement), there are still ten balls in the urn when the third ball is picked.

On the last problem, one group computed correctly the contribution to the likelihood for each word, but then added them instead of multiplying to get the likelihood. Since the likelihood is the probability that we got 3 of the first word AND 5 of the second AND 3 of the third, you have to multiply. When you compute the probability of one thing AND the probability of another thing, you always multiply. Addition is for when you want to compute the probability of one thing OR the probability of another thing. For example, when you add the joint probabilities in a spreadsheet calculation, you are computing the probability of (data,SON1) OR the probability of (data,SON2) OR ..., to get the probability of the data, irregardless of which SON is true.

We then finished the "Trewel" problem and calculated that the best thing for Alan to do is to shoot in the air, letting Bob and Charlie duke it out, and then, with one of them dead for sure, to come in on his second try and try to kill the survivor. We also recalculated the probability of Alan eventually killing Bob when Alan goes first. See Class, 9/29 for a calculation.

Finally, we discussed the first item on the study sheet, Fermi problems. We discussed the length of the Nile river...and I reminded everyone about the geometric mean trick. If you can put a reasonable lower bound on a quantity and a reasonable upper bound, so that you are pretty sure that the true value is between those bounds, then a decent guess at the correct value is to multiply the lower bound by the upper bound and take the square root of that number. For the Nile, a lower bound might be 100 miles and an upper bound 10,000 miles, which would give an estimate of 1000 miles. Wikipedia says 4100 miles, so this is not a great estimate. For the Mississippi, we imagined the U.S. as a box 3000 miles wide and 2000 miles high, so a length of 2000 miles. The actual length is 2340 miles, so that worked better.

I pointed out that the important thing with regard to Fermi problems is how you got the answer, not the actual value of the answer.

Bayes Rules