Bayes Rules: Class, 11/3

We spent some time discussing the catch-and-release problem and similar ones. I made several points:

1) Since the likelihood is the probability of data point 1 AND data point 2 AND ...., you must always MULTIPLY the probabilities of the individual data points to get the likelihood of the entire data set. NEVER add them!

2) In this problem, as we draw samples (fish), the total number of fish in the lake and the total number of fish of each type (tagged, untagged) decrease by 1 each time a fish is caught. This means we are sampling without replacement. It also means that each time we catch a tagged fish, the next time we catch a tagged fish we have to use a numerator and a denominator that are decreased by 1. So, for example, with 100 fish in the lake, 10 of them tagged, the first tagged fish we catch has a probability of 10/100, the second a probability of 9/99, the third 8/98, etc. After catching the five tagged fish, then there are 95 fish left, 90 of them untagged. So the first untagged fish we catch has probability 90/95, the second 89/94, and so forth.

3) It doesn't matter what order the fish are caught, the likelihood will be the same. So, you might as well treat all of the first kind first and then handle those of the remaining kind.

4) After computing the rest of the table to get the posterior, you can add up the posterior probabilities for intervals in the number of fish, e.g., to get the probability that the total number of fish is between 15 and 25 (inclusive), just add the posterior probabilities for each of those numbers of fish.

5) If the number of items (fish, voters) is very large, you can approximate the ratios by the same number, just pretend that the number of fish in the lake and the number of each kind don't change as more fish are caught. The error committed will be quite small in this case. What you are doing is approximating sampling without replacement by sampling with replacement.

We talked about the astrology problem. The states of nature are the numbers p=0.05, o.15,...,0.95 the way we are setting up the problem. The prior could be uniform, but if your experience is that astrology is probably bunk, you might want to skew the prior to smaller numbers; or if your experience is that it works, you might want to skew it to larger numbers. This is not cheating, it is using information in your past experience.

The likelihood is p⁴(1-p)⁷ for each of the values of p. The rest of the table is filled out as usual. Then, the answer to the question (the probability that the astrologer is able to predict the future at least 85% of the time) is the sum of the two posterior probabilities for the values of p=0.85 and 0.95.

We discussed the expert systems problem. The basic idea is that you can train a Bayesian system by, for example, telling it the symptoms observed and the diagnosis for a number of patients. This allows the system to estimate the conditional probabilities

p(symptom|diagnosis)

for a lot of symptoms and diagnoses. These can then be used as terms in the likelihood for a new patient who comes in and whose symptoms are determined and put into the system. What we've described in class is known as a naive Bayes classifier. It's also the basis of Bayesian spam filters and many other useful practical applications of Bayesian methods (including artificial intelligence systems).

We started discussing basic decision problems by drawing the tree for the "general with two routes" problem we had discussed earlier; this time we postulated that the general might be risk averse, in which case he would choose the 200 soldiers for certain rather than the gamble between no soldiers surviving with probability 2/3 and all of them surviving with probability 1/3. Alternatively, a risk seeking general would choose the gamble. We'll pick up on this on Wednesday.

Bayes Rules

Monday, November 3, 2008

Class, 11/3

No comments:

Blog Archive

About Me