Bayes Rules: STAT 295 1/22/09

In class I mentioned advice on programming style. I recommend in particular Software Carpentry, which although it is based on the Python language, has useful information that can be used generally for any programming project in a modern computer language. This website has mp3 files of a number of lectures on various aspects of programming, along with charts that go with the lectures. Besides topics such as programming style, there are other important points that are very useful, such as version control software that allows multiple programmers to work on the same project without stepping on each others' toes, and allows earlier versions to be resurrected in case a later version is "broken" in ways that are hard to trace.

More to come...

Sorry for the delay. I had to perform a complete backup on my computer, which took a long time.

Assignment: Learn R syntax as described by Jeff in his discussion.

With regard to the homework: You noted that the confidence interval coverage was not always as advertised. In fact, coverage will not be as advertised when n and p are small; in particular you need np and n(1-p) to be at least 10 for this to work as advertised.

We then reviewed some basic statistical notions about the sample mean and its relationship to expectations, how these are used in simple (frequentist) statistical estimation, the meaning of variance and sample variance, the fact that the denominator (n-1) in the sample variance makes it unbiased (but note...the usual formula for the standard deviation gotten by taking the square root of this is not unbiased!)

If x is a discrete random variable that takes values x₁,...,x_q with probabilities p(x₁)...p(x_q) respectively, then we define the expectation of x as E[x]=Σ_i=1^qx_ip(x_i).

More generally for any function g(x) of x, we define the the expectation of g(x) as E(g(x))=Σ_i=1^qg(x_i)p(x_i).

For continuous random variables, the sum becomes an integral.

Repeated samples: How does the sample mean behave? Leads to confidence intervals, etc. Based on thought experiment, "what happens if I take repeated samples?" but in reality we only take one sample. This is one feature that distinguishes Bayesian reasoning from frequentist reasoning...the Bayesian conditions on the one unique sample that we have, and doesn't ask what happens if we sample repeatedly in this thought experiment.

Tried some R code. We discussed in particuler the invisible() function, accessing lists, and the fact that one must use print() within a function in order to get R to display something on the screen.

Probability densities are used for continuous random variables. We briefly set out the rules.

Note that density can be zero over some of the interval, but unlike a distribution, can be larger than 1.

In the definition of the beta density, define Γ(a)=∫₀^∞x^a-1e^-adx. For integers, this is a factorial: Γ(n)=(n-1)! The gamma function is the unique analytic extension of the factorial function and allows us to interpolate the factorial function.

We started on the third chart set.

Bayes' picture, although widely seen on the web, is probably not of him.

Jeff gave his rant about naming theorems for the person instead of what it does. I have some sympathy for this viewpoint.

The proof of Bayes' theorem is trivial.

We regard Bayes' theorem as a model of learning. At any point in time, we have an opinion about hypotheses, parameters, etc., which is given by our prior probability distribution. When new data comes in, Bayes' theorem can be used to update our opinion, giving our posterior probability distribution.

Sometimes (as in the simulation approach we will use extensively in this course) you can bypass the computation of the denominator P(D) in Bayes' theorem. So often we see it written as a proportionality. Until about 20 years ago, before the simulation methods were introduced, we could not bypass the computation of P(D), which was a major pain since it usually requires integrating over a space of high dimension, which is difficult. So despite its elegance, the Bayesian calculations were difficult and were available only for very simple situations (e.g., normal regression). But that has all changed.

In particular, the denominator is given in the discrete case by ΣP(D|A_i)P(A_i) and in the continuous case by ∫p(x|θ)p(θ)dθ.

Unfortunately, my notation on the slides is inconsistent...we ought to have written X instead of D and Θ instead of A. This will be fixed for the next time the class is taught.

Finally, I want to point out two other blogs that are useful. The author of the book, Jim Albert, has a blog for his course here. You may find his discussion of the election interesting. This blog is not very active. And, I read Andrew Gelman's blog daily. You'll find lots of discussion of Bayesian things there. Andrew is a statistician/political scientist, and he's particularly interested in voting patterns and other issues of this sort.

Bayes Rules

Thursday, January 22, 2009

STAT 295 1/22/09

No comments:

Blog Archive

About Me