|
MPI
for Molecular Genetics -
Computational
Molecular Biology
bgmm: Belief-based Gaussian Mixture Modeling
|
|
bgmm is an R package for knowledge-based mixture modeling. It implements mixture modeling variants, which differ with respect to the amount of incorporated knowledge, and spread the entire range from unsupervised to supervised modeling.
Our focus is on partially supervised modeling, which to our knowledge is not supported by any other open-access software. bgmm is also the only R package, which implements semi-supervised modeling. The availability of all mixture modeling variants allows for a comparison analysis between estimates
obtained with different models. The figure on the left schematically illustrates the percentage of labeled observations and their certainty required by each implemented variant. |
 
|
The basic functionality of bgmm, described in Biecek et al. includes:
- belief-based mixture modeling - our theoretical contribution to partially supervised mixture modeling,
- soft-label mixture modeling - a partially supervised mixture modeling method, proposed by Come et al.,
- semi-supervised mixture modeling,
- unsupervised mixture modeling,
- specifying constraints on the fitted model structure,
- simulation of data from user-specified model parameters or model structure,
- plotting of the fitted models of up to two-dimensional data,
- model selection - fitting a range of models with different structures or component numbers. The models are evaluated using the GIC scores,
- prediction of classes or clusters for a given set of observations using the fitted models.
|
Additionally, bgmm offers application of mixture modeling to differential gene expression analysis, as proposed in Szczurek et al. The modeled data are one-dimensional log expression ratios of treatment versus control. The labeled observations are genes expected to be differentially expressed, i.e., up- or down-regulated in this experiment.
bgmm can be applied to fit a two- or three-component mixture model to this input data and knowledge. The two components correspond to a low variance Gaussian for the unchanged, and a high variance Gaussian for the differential genes. The three components correspond to a low mean Gaussian for the down-regulated, zero-mean for the unchanged, and a high mean Gaussian for the up-regulated genes (illustrated on the plots on the right).
The posterior probabilities in the fitted model of choice are used to compute the probabilities of differential expression for each gene in the analyzed experiment.
|
|
|
|
Download The latest release of the bgmm package is available from CRAN.
A demo in a html format presents the basic model-fitting, model selection, data simulation and prediction functionality of the bgmm package. The presented function calls and output plots include functions described in Biecek et al., and more, e.g. modeling of one-dimensional data and application to differential gene expression analysis.
For more details about the specific functions refer to the bgmm reference manual .
| |
|
References
- E. Szczurek, P. Biecek, J. Tiuryn and M. Vingron (2010).
Introducing knowledge into differential expression analysis. J Comput Biol., 17(8):953--67   pdf
- P. Biecek, E. Szczurek, M. Vingron and J. Tiuryn.
The R package bgmm: mixture modeling with uncertain knowledge. Submitted.
-
Côme, E., Oukhellou, L., Denux, T., et al. 2009. Learning from partially supervised data using mixture models and belief functions. Pattern Recogn. 42, 334.348.
|