Truncated Gaussian Mixtures

This package allows one to fit a gaussian mixture model using Truncated Gaussian Kernels. Works only for Gaussians truncated to lie inside some box.

The algorithm is adapted from this paper by Lee & Scott, as well as the algorithm for computing the first two moments of a truncated gaussian with full covariances.

Quick Usage

The quickest no-frills way to use TruncatedGaussianMixtures is by using the fit_gmm method by specifying your data as a DataFrame.

using TruncatedGaussianMixtures
using DataFrames, Distributions

# Mock Data: Generate a mixture of truncated gaussian mixtures
μ1 = [0.2, 0.7]; 
Σ1 = [0.05 0.04;0.04 0.05];

μ2 = [0.1, 0.2]; 
Σ2 = [0.05 -0.02;-0.02 0.03];

a = [0.0, 0.0]; b = [1.0, 1.0] # Lower and upper limits of the bounding box

dist = MixtureModel(
    [TruncatedMvNormal(MvNormal(μ1, Σ1), a, b), 
     TruncatedMvNormal(MvNormal(μ2, Σ2), a, b)],
    [0.3, 0.7]
)

df = DataFrame(rand(dist, 80_000)', [:x, :y])

# Lets fit a 2 component Truncated Gaussian Mixture model
# with general covariance matrices, and also show a progress bar
gmm = fit_gmm(df, 2, a, b; cov=:full, tol=1e-5, progress=true);

Advantages

As we can see the standard Gaussian Mixture Model has its kernels avoid the edges. A truncated kernel reproduces the probability distributions at the edges as well.

Usage

# Create the fit
EM = fit_gmm(X, 2, a, b;   # data, n_components, lower, upper
  cov=:diag,  # Choose between :diag and :full for diagonal or full covariances
  block_structure=[1,1], # Specify the blocks that can be correlated with each other
                         # [a,a] means that the first and second dimension are in the same block
                         # Only relavent if one uses cov=:full
  tol=1e-2,   # tolerance for the stopping criteria.
  MAX_REPS=100, # Maximum number of EM update steps
  verbose=false,  # Verbose output usefull for debugging 
  progress=true,  # Gives a progress bar to show the progress of the fit
  responsibilities=false, # Returns the EM object as opposed to Distributions.jl object
  block_structure=false) # One can specify a block structure for the covariances