Wednesday, October 8, 2008

Class, 10/8

We finished discussing the study guide.

First we finished the "take balls out, mark them, and put them back" scenario for estimating the number of balls in the urn. The basic idea here is, make a picture to yourself of the number of marked balls that have to be in any urn at each sampling event. The probability of picking the particular ball (marked or unmarked) is the fraction of balls of that type in that urn. Calculate that for each urn (State of Nature) and multiply the likelihood by that fraction. Then go on to the next sampling event, remembering to mark the ball so that the probabilities will change on the next sampling event. Then, you know the routine: Multiply prior times likelihood to get joint; sum the joint; divide the sum into each joint probability to get the posterior; sum the posterior to verify that the sum is 1.

The same idea is operative for the balls marked by numbers.

As we discussed in class, there are really two possibilities for this. The problem in the study sheet is for sampling without replacement. This means that the number of balls changes after every sampling event. This would be appropriate if, for example, we were interested in figuring how many German tanks had been produced from the serial numbers of captured tanks (which are out of commission after capture).

We could have put the balls back in the urn after sampling them; that leads to a slightly different problem, where it may be possible to sample the same object more than once. For example, you might be an airplane-spotter: You want to estimate the number of airplanes owned by an airline, and you might know that the numbers on their tailfins are sequential. Since you might see the same airplane more than once, this would be sampling without replacement.

The difference between the two scenarios is: With sampling without replacement, the number of items in the urn being sampled (airplanes, for example) doesn't change, so the denominator remains constant at the number of items originally in the urn. With sampling with replacement, the denominator decreases by 1 each time an item is sampled. Otherwise, the two cases are the same.

But these problems have basically the same structure as the "catch-and-release" problem with the unmarked balls that we mark. Identify the states of nature (the unknown number of items in the original set-up), put a prior on them, calculate the likelihood, by multiplying the probability of sampling each item in turn together (considering the particular method of sampling/marking/number displayed), compute the joint, calculate the sum, divide to get the posterior, check that the posterior sums to 1.

The next problem, ants and beetles, is exactly like the polling problem we discussed earlier. Instead of voters who say they will vote for candidate A or B, we have insects that we identify as ants or beetles. In both cases, the states of nature are all of the true proportions of each kind that exist in the population being sampled (voters, insects). Both of these assume that the number of items in the population is very large.

Finally, on the Decision Problem: The way we attacked this problem was to look at a simpler problem than the one in the study sheet: Should you just produce parts, or should you sacrifice one part, pay $150, and produce the remaining parts knowing that the machine would be in a "good" state and would produce a much higher proportion of "good" parts.

We set up a decision tree: First, a square box, representing the decision we had to make...they are: fix machine first and produce 23 parts; or just produce 24 parts, regardless.

On the first scenario, we set up a "toll gate" for -$150 on that branch. Since we knew that machine to be in a "good" state on the right of that, we could then decide that the expected return on that branch would be 0.95*23*$2000 since bad parts aren't worth anything.

On the second scenario, there is no initial cost of $150 so no "toll gate." But we now have two branches, one with probability 0.9 in which the machine is "good", and one with probability 0.1 in which the machine is "bad". If the machine is "good," 0.95 of the parts will be useful. If it is "bad," only 0.7 of the parts will be useful. By tracing the probability tree backwards, the expected number of parts that will be useful is (0.9*0.95+0.1*0.7). This is multiplied by 24*$2000 to get the expected profit.

We found that the best scenario under these rules was to abandon caution and just produce parts. The extra part (24 instead of 23) produces more expected profit than the more reliable machine minus the cost of $150 to make sure the machine is OK.

One student mentioned that there might be other circumstances, like the need to make a minimum profit. This is quite true, but it wasn't part of the assumptions of the problem. An example might be that if you don't make the minimum profit, someone might come over and break your legs. That's a different problem, but it can be analyzed by the tools we are developing. You just have to build that into the decision tree you build.

Don't forget: Bring your calculators, come early if you can, stay a little late if you can (but not later than 10:05 for the next class) and be sure to attempt to answer as many questions as you can. Budget your time. Make sure you convince me that you know how to answer a question even if you don't have time to do the complete calculation.

No comments: