Mixture Normal proposal for Metropolis-Hastings algorithm

时间:2022-06-11 07:05:34

Abstract. A crucial problem in Bayesian posterior computation is efficient sampling from a univariate distribution, e.g. a full conditional distribution in applications of the Gibbs sampler. But in many case, it is difficult to sample from the full conditional distribution, so one often uses Metropolis-Hastingsalgorithm to sample from the univariate distribution. In this paper, we propose an efficient proposal distribution in Metropolis-Hastings algorithm using mixture normal distribution, i.e. mixture normal proposal Metropolis-Hastings algorithm. It improves the Metropolis-Hastings algorithm by carrying more information from the target function. Simulation results show that the mixture normal proposal Metropolis-Hastings algorithm outperforms the conventional Metropolis-Hastings algorithm (random walk Metropolis algorithm) and parallel tempering algorithm.

Key words:mixture normal proposal distribution; Metropolis-Hastings algorithm; Markov Chain Monte Carlo;Gibbs sampler

1.Introduction

With the arrival of modern computers, the Monte Carlo methods (i.e. statistical simulation methods) havebecome the main tools in complicated statistical model and high-dimension problem [7;8]. After several years development, the Monte Carlo methods have greatly expanded our scientific horizon. From the development of early computational physics to modern computational biology, Monte Carlo methods allow people to follow complicated system [2]. 1950s, statistical physicists (N. Metropolis, A. Rosenbluth, M. Rosenbluth, A. Teller, E. Teller, etc.) introduced the Markov Chain Monte Carlo methods [8], and used these methods to simulate Ising model and spin glass model etc. The Metropolis algorithm [9] was perhaps the first sampling algorithm for iterative simulations. Hastings [6] generalized this algorithm by allowing a asymmetric proposal distribution. It is called the Metropolis-Hastings algorithm. The heart of Markov Chain Monte Carlo methods is sampling. Given the target distribution (often this target distribution is multimodal or high-dimension), how to sample form the target distribution efficiently is a crucial step for other problems (such as estimation, integral, etc.). The random walk Metropolis algorithm [8;11] is an special case of Metropolis-Hastings algorithm. Since the random walk Metropolis algorithm has its simple case, it is widely used in practice.Suppose the target distribution is, starting point is. The random walk Metropolis algorithm chosses the symmetric proposal distribution, and this iterating algorithm contains two steps (fromto):

・Propose a “proposal sample” from any symmetric proposal distribution .

・Calculate the Metropolis ratio:

Accept the “proposal sample” with probability, and set ; otherwise, reject it, and set .

The function is chosen, of course, precisely to ensure that the Markov chain is reversible with respect to the target distribution, so that the target distribution is stationary for the chain. The efficiency of random walk Metropolis algorithm depends crucially on the varianceof the proposal density. If the varianceis too large, it is easy to lead to a high rejection rate; on the other hand, ifis too small, the Markov chain will converge slowly. Gemen et al. [12] showed that, for standard normal target distribution, if the proposal distribution was normal distribution, the optimal choice forwas, and the corresponding acceptance rate was approximately. However, the proposal distribution in the random walk Metropolis algorithm wastes the information from the target distribution, since it is symmetric distribution. Inspired by importance sampling [8], if we choose the proposal distribution which is reasonably close to the target distribution, we can construct an efficient transition scheme. Therefore, we use mixture normal proposal distribution for Metropolis-Hastings algorithm. Simulation results show that, the mixture normal proposal Metropolis-Hastings algorithm outperforms the random walk Metropolis algorithm and parallel tempering algorithm [3;7;8].This paper is organized as follows. In Section 2, we introduce how to use mixture normal distribution for Metropolis-Hastings algorithm. In Section 3, the performance of mixture normal proposal Metropolis-Hastings algorithm and other sampling algorithm (random walk Metropolis algorithm, parallel tempering algorithm) is compared in simulation studies. In Section 4, we summary this paper with the conclusions and some further research on this work.

2.Mixture Normal Proposal Metropolis-Hastings algorithm

From the nonparametric smooth technology [4;5;14], we know that mixture normal distribution can approximate any smooth curve. In this chapter, we use mixture normal distribution to approximate the target distribution. Through this chapter, we let the target distribution be . Similar to B-splines approximation [1], mixture normal distribution can approximate target distribution locally. We first give the expression of mixture normal function:

Whereare coefficients,

are the sequence of normal distribution functions, indicate the normal distribution function, which mean is the i-th knots starting fromand variance is the space of knots. The number of knots is . Because we known the value of target function on each knots, we getequations on each knots:

In order to get the mixture normal approximation, we need two additional equations to solve all the coefficients of mixture normal function. We use natural boundary condition in our approximation:

Then from these equations, we can get the coefficients of mixture normal approximation. Fig. 1 illustrates how to use mixture normal function approximate the target density function. In this figure, the target density function is the mixture of two normal distributions, the number of knots is .Therefore, Using composition method [2;8], we can sample “proposal value” efficiently from mixture normal proposal distribution for Metropolis-Hastings algorithm.

3.Simulation studies

In this chapter, we use a multimodal example to illustrate the power of mixture normal proposal Metropolis-Hastings algorithm. In this example, we suppose the target density distribution is a mixture normal distribution:

We compare mixture normal proposal Metropolis-Hastings algorithm with random walk Metropolis algorithm and parallel tempering. For random walk Metropolis algorithm (RWM), we choose the proposal

Fig. 1: Mixture normal distribution (read dotted curve) and the target distribution (black solid curve). The circle indicates the knots.

variance is, and use temper() in R package ‘mcmc’ for parallel tempering (PT). For mixture normal proposal Metropolis-Hastings algorithm, we choose

for MNPMH-1 andfor MNPMH-2. Fig. 2 shows the comparison of their performances (the sample size is generated from these four sampling algorithms). We find in this figure that MNPMH-1 and MNPMH-2 perform significant better than RWM and PT, and MNPMH-2performs like independent sampling.

Fig. 2: Comparison of the performance of RWM, PT, MNPMH-1 and MNPMH-2. (a1),(a2),(a3),(a4) are histograms for them; (b1),(b2),(b3),(b4) are autocorrelation plots for them; (c1),(c2),(c3),(c4) are sample paths for them.

Table 1 summary the acceptance rate, IAT [8] and CPU time (seconds) for RWM, PT, MNPMH-1 and MNPMH-2. We can find in this Table that MNPMH-1 and MNPMH-2 have higher acceptance rate than RWM and PT, so they have lower IAT than RWM and PT. Though MNPMH-2 consumes more CPU time, for its highest acceptance rate and lowest IAT, MNPMH-2 performs like independent sampling. Therefore, MNPMHs outperform RWM and PT in this simulation example.

Table 1: Comparison of RWM, PT, MNPMH-1 and MNPMH-2.

4.Conclusions

In this paper, we have proposed a novel sampling algorithm, i.e. mixture normal proposal Metropolis-Hastings algorithm. Through one simulation example, we illustrate the superiority of mixture normal proposal Metropolis-Hastings algorithm against random walk Metropolis-Hastings algorithm and parallel tempering algorithm. Further researches on this work can be done on different angles. First, we can extend mixture normal proposal Metropolis-Hastings algorithm to high dimension case. Second, combining with Gibbs sampling algorithm, we can also develop mixture normal proposal Gibbs sampling algorithm. We think further analysis along these directions is of interest.

5.Acknowledgements

The authors thank the referees and edits for their careful reading and constructive suggestions. The authors would also like to thank for the partial support from the National Basic Research Program of China (973 Program) grant no.2007CB814900 (Financial Risk).

6.References

[1]C. D. Boor. A Practical Guide to Spline. Springer, 1978.

[2]G. S. Fishman. Monte Carlo: concepts, algorithm, and applications. Springer, 1995.

[3]C. J. Geyer. Markov chain maximum likelihood. Computing Science and Statistic: The 23th symposium on the interface. 1991, 156-163.

[4]W. Hardle, M. Muller, S. Sperlich, and A. Werwatz. Nonparametric and Semiparametric Models. Springer, 2004.

[5]J. D. Hart. Nonparametric Smoothing and Lack-of-Fit Tests. Springer, 1997.

[6]W. K. Hastings. Monte carlo sampling methods using markov chains and their applications. Biometrika. 1970, 57: 97-109.

[7]F. Liang, C. Liu, and R. J. Carroll. Advanced Markov Chain Monte Carlo Methods:Learning From Past Samples. Wiley, 2010.

[8] J. S. Liu. Monte Carlo Strategies in Scientific Computing. Springer, 2001.

[9] N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller. Equations of state calculations by fast computing machines. Journal of Chemical Physics. 1953, 21 (6): 1087-1091.

[10] G. O. Robert, and R. L. Tweedie. Exponential convergence of Langevin diffusions and their discrete approximation. Bernoulli. 1996, 2: 341-363.

[11] G. O. Robert, A. Gelman, and W. R. Gilks. Weak convergence and optimal scaling of random walk Metropolis algorithm. The Annals of Applied Probability. 1997, 7: 110-120.

[12] G. O. Robert, and J. S. Rosenthal. Optimal scaling for various Metropolis-Hastings algorithm. Statistical Science. 1998, 60: 255-268.

[13] J. S. Rosenthal. Optimal proposal distributions and adaptive MCMC. In Handbook of Markov ChainMonte Carlo. Chapman and Hall/CRC Press, 2010.

[14] M. P. Wand, and M. C. Jones. Kernel Smoothing. Chapman and Hall/CRC, 1995.

上一篇:The Analysis and Chaos control on DC/AC Con... 下一篇:On Research of Information Structure Design...