9 Bayesian Modelling
In this section we will explore extensions of the Bayesian methods framework for other modelling:
regression
latent variable models
hierarchical models
9.1 Regression Models
We can consider an infinite sequence
can be factorized as
where each term has a deFinetti representation.
Given the above structure, inference for
inference for
is done through the marginal model for the X variablesinference for
is done through the conditional model for Y given that X is observed.
For the latter, the fact that X is random is irrelevant as we have conditioned the model on observed values of X.
When considering the statistical behaviour of Bayesian (and frequentist) procedures, we need to remember that X and Y have a joint structure.
Prediction
9.2 Linear Regression
We can start with the following linear regression model
where for
is a scalar is (1 x d) is (d x 1) , independently.
With this structure, we can describe the model for the partially exchangeable random variables (error terms)
We can look at the vector form of the linear regression model
where the response variable and the error terms are (n x 1) vectors and the predictors are an (nxd) matrix.
We can then have a conditional model
where
With this structure, we know the likelihood to be
We can derive a joint conjugate prior
where
where
We can explore the exponents of the above posterior as a quadratic form.
The expression
which equates to
Quadratic term:
and therefore
–>>Linear term:
$\beta^TM_n^{-1}m_n = \beta^TX^Ty + \beta^TM_0^{-1}m_0 $
and therefore
Constant term:
and therefore
Given us the joint posterior (under proportionality) as
Which tells us that the conditional posterior and marginalizing the joint posterior over
where
Note: see slides 218-221 for the marginal
Assigning prior ignorance to
We can alternatively use a g-prior: with hyperparameter
and hence
If, for the g-prior we have
then we will have
which gives us the procedure for ridge regression.
Jeffrey’s prior for linear regression (see slide 228 for derivation):