In
statistics, 'empirical Bayes methods' are a class of methods which use empirical data to evaluate / approximate the conditional
probability distributions that arise from
Bayes' theorem. These methods allow one to estimate quantities (
probabilities,
averages, etc.) about an individual member of a population by combining information from empirical measurements on the individual and on the entire population.
Empirical Bayes methods involve:
★ An "underlying"
probability distribution of some unobservable quantity is assigned to each member of a
statistical population. This quantity is a
random variable if a member of the population is chosen at random. The probability distribution of this random variable is not known, and is thought of as a property of the population.
★ An observable quantity assigned to each member of the population. When a random sample is taken from the population, it is desired first to estimate the "underlying" probability distribution, and then to estimate the value of the unobservable quantity assigned to each member of the sample.
Introduction
In the Bayesian approach to statistics, we consider the problem of estimating some probability (such as a future outcome or a noisy measurement), based on measurements of our data, a model for these measurements, and some model for our prior beliefs about the system. Let us consider a standard two-stage model, where we write our data measurements as a vector
, and our prior beliefs as some vector of random unknowns
. We assume we can model our measurements with a conditional probability distribution (or
likelihood )
, and also the prior as
, where
is some hyper-parameter. For example, we might choose
to be a
binomial distribution, and
as a
Beta distribution (the
conjugate prior). Empirical Bayes then employs the complete set of empirical data to make inferences about the prior
, and then plugs this into the likelihood
to make estimates for future outcomes of individual measurements.
To see this in action, use
Bayes' theorem to write an expression for the
posterior probability for
as
:
and let us define the
marginal (or
normalizing constant) as
:
Empircal Bayes (EB) combines
Bayesianism and
frequentist approaches to estimation. EB approximates the marginal (and/or full posterior) distribution (with point estimation,
maximum likelihood estimation (MLE), squared error loss (SEL), Monte Carlo/numerical integration, etc.) and then estimates the approximate marginal using the empirical data (
frequency probability). EB takes several forms, including non-parameteric and parameteric forms. We describe a few common examples below.
Point estimation
Robbins method (1956): non-parameteric empirical Bayes (NPEB)
We consider a case of
compound sampling, where probability for each
is specified by a
Poisson distribution,
:
while the prior is unspecified except that it is also i.i.d (call it
). Compound sampling arises in a variety of statistical estimation problems, such as accident rates, clinical trials, etc. We simply seek a point estimate of
.
Because the prior is unspecified, we seek a non-parametric estimate of the posterior.
We may write a point Bayes estimate for the prior (assuming squared error loss (SEL)) as (see Carlin and Louis, Sec. 3.2 and Appendix B):
:
'Proof:'
First, we show that the point estimate for the prior is the estimated mean of the posterior.
Write the posterior risk for a point estimate for
under
squared error loss (SEL) (or
loss function) as
:
To find the minimum error, set the derivative with respect to a equal to zero
:
Solving for a yields
:
A quick check shows that the second derivative is greater than zero, indicating a true minimum.
:
Note that had we chosen a
absolute error loss, the point estimate would instead be the median, and with
0-1 error loss, it would be the mode (definitions pending).
Second, we wish to show that the point estimate is a simple ratio of the estimated marginals for the specific case of the
Poisson distribution. Write the point estimate for the prior as:
:
Multiply the expression by
and use Bayes definition of the marginal to obtain
:
To take advantage of this, Robbins (1955) suggested estimating the marginals with their empirical frequencies ''Y''
''d'', yielding the fully non-parametric estimate as:
:
(see also
Good-Turing frequency estimation).
'Example: Accident rates'
Suppose each customer of an insurance company has an "accident rate" Θ and is insured against "accidents"; the probability distribution of
is the "underlying" distribution, and is unknown. The number of "accidents" suffered by each customer in a specified baseline time period has a
Poisson distribution whose expected value is the particular customer's "accident rate". That number of "accidents" is the observable quantity. A crude way to estimate the underlying probability distribution of the "accident rate" Θ is to estimate the proportion of members of the whole population suffering 0, 1, 2, 3, ... accidents during the specified time period to be equal to the corresponding proportion in the observed random sample. Having done so, it is then desired the "accident rate" of each customer in the sample. One may use the
conditional expected value of the "accident rate" Θ given the observed number ''X'' of "accidents" during the baseline period.
Thus, if a customer suffers six "accidents" during the baseline period, that customer's estimated "accident rate" is 7 × [the proportion of the sample who suffered 7 "accidents"] / [the proportion of the sample who suffered 6 "accidents"].
Parametric empirical Bayes
If the likelihood and its prior take on simple parametric forms (such as 1- or 2-dimensional likelihood functions with simple
conjugate priors), then the empirical Bayes problem is only to estimate the marginal
and the hyperparameters
using the complete set of empirical measurements. For example, one common approach, called (parametric Empirical Bayes point estimation) is to approximate the marginal using the
maximum likelihood estimate (MLE), or a
Moments expansion, which allows one to express the hyperparameters
in terms of the empirical mean and variance. This simplified marginal allows one to plug in the empirical averages into a point estimate for the prior
. The resulting equation for the prior
is greatly simplified, as shown below.
There are several common Parameteric Empirical Bayes models, including the
Poisson-Gamma model (below), the
Beta-binomial model, the
Gaussian-Gaussian model, the
multinomial-Dirichlet model, as well specific models for
Bayesian linear regression (see below) and
Bayesian multivariate linear regression. More advanced approaches include
hierarchial Bayesian models and
Bayesian mixture models.
Poisson-Gamma model
For example, in the example above, let the likelihood be a
Poisson distribution, and let the prior now be specified by the
conjugate prior, which is a
Gamma distribution (
) (where
):
:
It is straightforward to show the posterior is also a Gamma distribution. Write
:
where we have omitted the marginal since it does not depend explicitly on
.
Expanding terms which do depend on
gives the posterior as:
:
So we see that the posterior density is also a
Gamma distribution , where
, and
. Also notice that the marginal is simply the integral of the posterior over all
, which turns out to be a
negative binomial distribution.
To apply Empirical Bayes, we will approximate the marginal using the
maximum likelihood estimate (MLE). But since the posterior is a Gamma distribution, the MLE of the marginal turns out to be just the mean the of posterior, which is the point estimate
we need. Recalling that the mean
of a Gamma distribition
is simply
, we have
:
To obtain the values of
and
, Empirical Bayes prescribes estimating mean
and variance
using the complete set of empirical data.
The resulting point estimate
is therefore like a weighted average of the sample mean
and the prior mean
. This turns out to be a general feature of Empirical Bayes; the point estimates for the prior (i.e mean) will look like a weighted averages of the sample estimate and the prior estimate. (Likewise for estimates of the variance).
: to be continued
References
★ Herbert Robbins, ''An Empirical Bayes Approach to Statistics'', Proceeding of the Third Berkeley Symposium on Mathematical Statistics, volume 1, pages 157-163, University of California Press, Berkeley, 1956.
★ Bradley P. Carlin and Thomas A. Louis, ''Bayes and Empirical Bayes Methods for Data Analysis'', Chapman & Hall/CRC, Second edition 2000,
★ Peter E. Rossi, Greg M. Allenby, and Robert McCulloch, ''Bayesian Statistics and Marketing'', John Wiley & Sons, Ltd, 2006
★ George Casella, ''An Introduction to Empirical Bayes Data Analysis'' American Statistician, Vol. 39, No. 2 (May, 1985), pp. 83-87
External links
★
Use of Empirical Bayes Method in estimating road safety (North America)
★
Empirical Bayes Methods for missing data analysis
★
Using the Beta-Bionomial distribution to assess performance of a biometric identification device
★
A Hierarchical Naive Bayes Classifiers (for continuous and
discrete variables).