13-revJump.Rtex


\frame{
\sffamily
\frametitle{Reversible Jump}
\begin{itemize}
\item In the case of the linear model we could integrate out the regression coefficients and work exclusively with the vector $\bfgamma$.  Hence, although the size of $\bftheta$ changes with $\bfgamma$, we did not really have to deal with the varying dimension.

\item In the case of the generalized linear model, it is not obvious how we can integrate out the coefficients explicitly.  Hence, our algorithm needs to deal with the fact that the number of coefficients can change. 

\item A slight generalization of Metropolis Hastings algorithms called reversible jump MCMC \citep{Gr95,brooks2003efficient} can be used in this context.
\end{itemize}
}




\frame{
\sffamily
\frametitle{Reversible Jump}
\begin{itemize}
\item Let's start by establishing some notation.
\begin{itemize}
\item Let $K$ index the model of interest.  Associate with such model a prior $p(K)$.

\item Each model $\mathcal{M}_K$ has a set of distinct set of parameters $\bftheta_K$.  The associated prior is $p(\bftheta_K \mid K)$ which depends (explicitly or implicitly) on the dimension $K$.

\item Given model $\mathcal{M}_K$ and parameters $\bftheta_K$, data is generated according to a likelihood $p(\bfy \mid K, \bftheta_K)$.

\item We denote the corresponding posterior distribution as $p(K, \bftheta_K \mid \bfy)$ which can be factorized as 
$$
p(K, \bftheta_K \mid \bfy) = p(K \mid \bfy) p(\bftheta_K \mid K, \bfy)
$$
\end{itemize}
\item Hence, if we could compute $p(K \mid \bfy)$ for all $K$, then model selection would be straightforward.  However, this involves integrating over all other parameters of model!
\end{itemize}
}




\frame{
\sffamily
\frametitle{Reversible Jump}
\begin{itemize}
\item To derive the algorithm:
\begin{itemize}
\item First, augment each vector $\bftheta_K$ with auxiliary variables $\bfu_K$, so that the dimension of $(\bftheta_K, \bfu_K)$ \textit{is the same} for all $K$.
\item Second, define a (consistent) collection of deterministic transformations $T_{K,K'} \left\{ (\bftheta_K, \bfu_K), (\bftheta_{K'}, \bfu_{K'}) \right\}$ that map the augmented parameter spaces of different models (this is typically done by defining transformation only between ``neighboring'' models).
\end{itemize}

\item Note that, in some sense, the proposals are \textit{deterministic} because the transformation is deterministic.  (However, the presence of the auxiliary variables $\bfu_K$ introduces randomness).
\end{itemize}
}





\frame{
\sffamily
\frametitle{Reversible Jump}
\begin{itemize}

\item The acceptance probability becomes
\begin{multline*}
\left\{ 1 ,
\frac{p\left(K^{(p)}, \theta^{(p)}_{K^{(p)}} \mid \bfy \right)  }{p\left(K^{(c)}, \theta^{(c)}_{K^{(c)}} \mid \bfy \right)}
%
\frac{g_{K^{(p)}} \left( \bfu^{(p)}_{K^{(p)}} \right)}{g_{K^{(c)}} \left( \bfu^{(c)}_{K^{(c)}} \right)}
%
\frac{d_{K^{(p)}, K^{(c)}}}{d_{K^{(c)}, K^{(p)}}} \right.  \\  \left.
%
\left|  \frac{\partial T_{K^{(c)}, K^{(p)}} \left( \bftheta^{(c)}_{K^{(c)}}, \bfu^{K^{(c)}}_{(c)} \right) }
{ \partial \left( \bftheta^{(c)}_{K^{(c)}}, \bfu^{K^{(c)}}_{(c)} \right)  } \right|
 \right\}
\end{multline*}
where $g_K(\bfu_K)$ is the density of the auxiliary variables and $d_{K, K'}$ is the probability of proposing a move to state $K'$ when you are currently in state $K$.

\item Note that the auxiliary variables $\bfu_K$ do not need to be stored and can be simulated on the fly ...
\end{itemize}
}





\frame{
\sffamily
\frametitle{Reversible Jump}
\begin{itemize}
\item With these preliminaries, and assuming the current state of your chain is $\left( K^{(c)}, \bftheta_{K^{(c)}}^{(c)} \right)$ the RJMCMC algorithm is:
\begin{enumerate}
\item Propose a move to model $\mathcal{M}_{K^{(p)}}$ with probability $d_{K^{(c)}, K^{(p)}}$.
\item Generate $\bfu^{(p)} \sim g_{K^{(p)}}$ and $\bfu^{(c)} \sim g_{K^{(c)}}$.
\item Set $\left( \bftheta^{(p)}_{K^{(p)}},  \bfu^{(p)}_{K^{(p)}} \right) = T_{K^{(c)}, K^{(p)}}\left( \bftheta^{(c)}_{K^{(c)}},  \bfu^{(c)}_{K^{(c)}} \right)$.
\item Accept this proposal with probability
{\scriptsize \begin{multline*}
\left\{ 1 ,
\frac{p\left(K^{(p)}, \theta^{(p)}_{K^{(p)}} \mid \bfy \right)  }{p\left(K^{(c)}, \theta^{(c)}_{K^{(c)}} \mid \bfy \right)}
%
\frac{g_{K^{(p)}} \left( \bfu^{(p)}_{K^{(p)}} \right)}{g_{K^{(c)}} \left( \bfu^{(c)}_{K^{(c)}} \right)}
%
\frac{d_{K^{(p)}, K^{(c)}}}{d_{K^{(c)}, K^{(p)}}} \right.  \\  \left.
%
\left|  \frac{\partial T_{K^{(c)}, K^{(p)}} \left( \bftheta^{(c)}_{K^{(c)}}, \bfu^{K^{(c)}}_{(c)} \right) }
{ \partial \left( \bftheta^{(c)}_{K^{(c)}}, \bfu^{K^{(c)}}_{(c)} \right)  } \right|
 \right\}
\end{multline*}}
\end{enumerate}
\end{itemize}
}



\frame{
\sffamily
\frametitle{Reversible Jump}
\begin{itemize}
\item The deterministic move can be justified as the limit of random proposals whose variability goes to zero.

\item Remember that the proposals need to be reversible!

\item Note that there are potentially a number of simplifications in the previous acceptance probability.  For example, you typically do not need to generate all the entries of $\bfu^{(c)}_{K^{(c)}}$ and $\bfu^{(p)}_{K^{(p)}}$.

\item Note that the auxiliary variable densities determine the proposed values in for model $K^(p)$ so need to be chosen carefully - comparable to independent M-H schemes.

\item The transitions above need to be combined with ``regular'' transition kernels within each dimension!
\end{itemize}
}





\frame{
\sffamily
\frametitle{Reversible Jump}

\begin{itemize}
{\small
\item {\bf Example (variable selection in a GLMM): } Reversible jump here is quite simple. At each iteration propose to add or remove the variable depending on whether it is currently in the model.

The deterministic transformation between the full and reduced models is $ T_{0,1}{(\theta,u)} = (\theta, \beta_e)$, so $\beta_e^{(p)} = u^{(c)}$, where $\theta$ are the parameters always in the model, $\beta_e$ is the enforcement coefficient, and $u$ is the single auxiliary variable. 

$d_{0,1} = d_{1,0} = 1$ and the Jacobian is one, so the proposals involve:

\begin{itemize}
\item Addition: propose $\beta_e^{(p)} = u^{(c)}$ from $u^{(c)} \sim g_0(u)$ with term $1/g_0(u^{(c)})$ in the acceptance ratio
\item Removal:  propose removal ($u^{(p)}=\beta_e^{(c)}$)  with term $g_0(\beta_e^{(c)})$ in the acceptance ratio
\end{itemize}

Note that auxiliary variable density $g_0(u)$ plays the role of a constant proposal distribution for $\beta_e$ so should be well-chosen.

See \texttt{rjmcmc-childSupport.R} for the implementation using a user-defined sampler in NIMBLE.
}
\end{itemize}
}