Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MCMC convergence for latent factor parameters. #2

Open
tpbilton opened this issue Aug 17, 2020 · 2 comments
Open

MCMC convergence for latent factor parameters. #2

tpbilton opened this issue Aug 17, 2020 · 2 comments

Comments

@tpbilton
Copy link

Hi,

I want to say a big thanks for putting in the effort of developing this R package. I quite like the approach you have taken to fit the multivariate LMM.

I have a query regarding convergence of the MCMC chains. I ran the code in the rmd file in the vignette but changed lines 172-187 to

n_iter = 1000;  # how many samples to collect at once?
for(i  in 1:70) {
  print(sprintf('Run %d',i))
  MegaLMM_state = sample_MegaLMM(MegaLMM_state,n_iter)  # run MCMC chain n_samples iterations. grainSize is a paramter for parallelization (smaller = more parallelization)
  
  MegaLMM_state = save_posterior_chunk(MegaLMM_state)  # save any accumulated posterior samples in the database to release memory
  print(MegaLMM_state) # print status of current chain
  plot(MegaLMM_state) # make some diagnostic plots. These are saved in a pdf booklet: diagnostic_plots.pdf
  
  # set of commands to run during burn-in period to help chain converge
  if(MegaLMM_state$current_state$nrun < MegaLMM_state$run_parameters$burn || i < 20) {
    MegaLMM_state = reorder_factors(MegaLMM_state,drop_cor_threshold = 0.6) # Factor order doesn't "mix" well in the MCMC. We can help it by manually re-ordering from biggest to smallest
    MegaLMM_state = clear_Posterior(MegaLMM_state)
    print(MegaLMM_state$run_parameters$burn)
  }
}

The diagnostic plots from the plot function are here. I mainly want to focus on the last page of the pdf. Looking at the code, I'm guessing that the different colored lines correspond to the five traits (out of the 100 traits in this dataset) with the largest mean lambda value. It is a bit disconcerting, however, to see for a simulated dataset that some of the traceplots for some lambda's do not seem to be setting down to a converged distribution but seem to jump around even after a large number of iterations (though factors 1-6 look good). Personally, I would be uncomfortable with these trace plots (to me, this suggests that there might be some multi-modality in the posterior distributions). However, I'm curious to know whether you have ways to improve the convergence of the MCMCs in MegaLMM, or whether such trace plots are not a concern (and if so why).

Many thanks,
Timothy

@deruncie
Copy link
Owner

deruncie commented Aug 17, 2020 via email

@tpbilton
Copy link
Author

Hi Dan,

Really appreciate your in depth response and yes what you have said makes a lot of sense and sheds a lot of light on what I'm seeing in the results.

One take away is then, looking at the trace plots of lambda may not be most important (but is helpful for determining whether you have enough factors) but rather the parameters one is interested in estimating. In my case, I wanting to use MegaLMM mainly for microbiome data and am mostly interested in the beta coefficients so I'll have a look at the trace plots for these parameters.

I think some of the points you have made you should put in some documentation, maybe a separate tutorial for the more advanced users (but I'm guessing that probably a time issue for you). I think these are some really important points from a practical point of view of implementing MegaLMM on real data.

Not to take away the good work you have done, but it would be nice if these issues could be sorted or have some mathematical theory that show the posteriors for the parameters of interest are not hugely affected by these convergence issues. I'm keen to see some more development of MegaLMM and happy to help out if possible.

I'll do a bit more playing around with some data and look at the convergence for the actual parameters of interest.

Cheers,
Timothy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants