class pycave.bayes.GaussianMixture(num_components=1, *, covariance_type='diag', init_strategy='kmeans', init_means=None, convergence_tolerance=0.001, covariance_regularization=1e-06, batch_size=None, trainer_params=None)[source]

Bases: lightkit.estimator.configurable.ConfigurableBaseEstimator[pycave.bayes.gmm.model.GaussianMixtureModel], lightkit.estimator.mixins.PredictorMixin[Union[numpy.ndarray, torch.Tensor], torch.Tensor]

Probabilistic model assuming that data is generated from a mixture of Gaussians. The mixture is assumed to be composed of a fixed number of components with individual means and covariances. More information on Gaussian mixture models (GMMs) is available on Wikipedia.

See also


PyTorch module for a Gaussian mixture model.


Configuration class for a Gaussian mixture model.

  • num_components (int) -- The number of components in the GMM. The dimensionality of each component is automatically inferred from the data.

  • covariance_type (Literal['full', 'tied', 'diag', 'spherical']) -- The type of covariance to assume for all Gaussian components.

  • init_strategy (Literal['random', 'kmeans', 'kmeans++']) -- The strategy for initializing component means and covariances.

  • init_means (Optional[Tensor]) -- An optional initial guess for the means of the components. If provided, must be a tensor of shape [num_components, num_features]. If this is given, the init_strategy is ignored and the means are handled as if K-means initialization has been run.

  • convergence_tolerance (float) -- The change in the per-datapoint negative log-likelihood which implies that training has converged.

  • covariance_regularization (float) -- A small value which is added to the diagonal of the covariance matrix to ensure that it is positive semi-definite.

  • batch_size (Optional[int]) -- The batch size to use when fitting the model. If not provided, the full data will be used as a single batch. Set this if the full data does not fit into memory.

  • num_workers -- The number of workers to use for loading the data. Only used if a PyTorch dataset is passed to fit() or related methods.

  • trainer_params (Optional[Dict[str, Any]]) --

    Initialization parameters to use when initializing a PyTorch Lightning trainer. By default, it disables various stdout logs unless PyCave is configured to do verbose logging. Checkpointing and logging are disabled regardless of the log level. This estimator further sets the following overridable defaults:

    • max_epochs=100


The number of epochs passed to the initializer only define the number of optimization epochs. Prior to that, initialization is run which may perform additional iterations through the data.


For batch training, the number of epochs run (i.e. the number of passes through the data), does not align with the number of epochs passed to the initializer. This is because the EM algorithm needs to be split up across two epochs. The actual number of minimum/maximum epochs is, thus, doubled. Nonetheless, num_iter_ indicates how many EM iterations have been run.



Fits the Gaussian mixture on the provided data, estimating component priors, means and covariances.


Computes the most likely components for each of the provided datapoints.


Computes a distribution over the components for each of the provided datapoints.


Samples datapoints from the fitted Gaussian mixture.


Computes the average negative log-likelihood (NLL) of the provided datapoints.


Computes the negative log-likelihood (NLL) of each of the provided datapoints.

Inherited Methods


Clones the estimator without copying any fitted attributes.


Fits the estimator using the provided data and subsequently predicts the labels for the data using the fitted estimator.


Returns the estimator's parameters as passed to the initializer.


Loads the estimator and (if available) the fitted model.


Loads the fitted attributes that are stored at the fitted path.


Initializes this estimator by loading its parameters.


Saves the estimator to the provided directory.


Saves the fitted attributes of this estimator.


Saves the parameters of this estimator.


Sets the provided values on the estimator.


Returns the trainer as configured by the estimator.



Returns the list of fitted attributes that ought to be saved and loaded.


The fitted PyTorch module with all estimated parameters.


A boolean indicating whether the model converged during training.


The number of iterations the model was fitted for, excluding initialization.


The average per-datapoint negative log-likelihood at the last training step.