GaussianMixture

class pycave.bayes.GaussianMixture(num_components=1, *, covariance_type='diag', init_strategy='kmeans', init_means=None, convergence_tolerance=0.001, covariance_regularization=1e-06, batch_size=None, trainer_params=None)[source]

Bases: ConfigurableBaseEstimator[GaussianMixtureModel], PredictorMixin[Union[ndarray, Tensor], Tensor]

Probabilistic model assuming that data is generated from a mixture of Gaussians.

The mixture is assumed to be composed of a fixed number of components with individual means and covariances. More information on Gaussian mixture models (GMMs) is available on Wikipedia.

See also

GaussianMixtureModel

PyTorch module for a Gaussian mixture model.

GaussianMixtureModelConfig

Configuration class for a Gaussian mixture model.

Parameters:
  • num_components (int) -- The number of components in the GMM. The dimensionality of each component is automatically inferred from the data.

  • covariance_type (CovarianceType) -- The type of covariance to assume for all Gaussian components.

  • init_strategy (GaussianMixtureInitStrategy) -- The strategy for initializing component means and covariances.

  • init_means (torch.Tensor | None) -- An optional initial guess for the means of the components. If provided, must be a tensor of shape [num_components, num_features]. If this is given, the init_strategy is ignored and the means are handled as if K-means initialization has been run.

  • convergence_tolerance (float) -- The change in the per-datapoint negative log-likelihood which implies that training has converged.

  • covariance_regularization (float) -- A small value which is added to the diagonal of the covariance matrix to ensure that it is positive semi-definite.

  • batch_size (int | None) -- The batch size to use when fitting the model. If not provided, the full data will be used as a single batch. Set this if the full data does not fit into memory.

  • num_workers -- The number of workers to use for loading the data. Only used if a PyTorch dataset is passed to fit() or related methods.

  • trainer_params (dict[str, Any] | None) --

    Initialization parameters to use when initializing a PyTorch Lightning trainer. By default, it disables various stdout logs unless PyCave is configured to do verbose logging. Checkpointing and logging are disabled regardless of the log level. This estimator further sets the following overridable defaults:

    • max_epochs=100

Note

The number of epochs passed to the initializer only define the number of optimization epochs. Prior to that, initialization is run which may perform additional iterations through the data.

Note

For batch training, the number of epochs run (i.e. the number of passes through the data), does not align with the number of epochs passed to the initializer. This is because the EM algorithm needs to be split up across two epochs. The actual number of minimum/maximum epochs is, thus, doubled. Nonetheless, num_iter_ indicates how many EM iterations have been run.

Methods

fit

Fits the Gaussian mixture on the provided data, estimating component priors, means and covariances.

predict

Computes the most likely components for each of the provided datapoints.

predict_proba

Computes a distribution over the components for each of the provided datapoints.

sample

Samples datapoints from the fitted Gaussian mixture.

score

Computes the average negative log-likelihood (NLL) of the provided datapoints.

score_samples

Computes the negative log-likelihood (NLL) of each of the provided datapoints.

Inherited Methods

clone

Clones the estimator without copying any fitted attributes.

fit_predict

Fits the estimator using the provided data and subsequently predicts the labels for the data using the fitted estimator.

get_params

Returns the estimator's parameters as passed to the initializer.

load

Loads the estimator and (if available) the fitted model.

load_attributes

Loads the fitted attributes that are stored at the fitted path.

load_parameters

Initializes this estimator by loading its parameters.

save

Saves the estimator to the provided directory.

save_attributes

Saves the fitted attributes of this estimator.

save_parameters

Saves the parameters of this estimator.

set_params

Sets the provided values on the estimator.

trainer

Returns the trainer as configured by the estimator.

Attributes

persistent_attributes

Returns the list of fitted attributes that ought to be saved and loaded.

model_

The fitted PyTorch module with all estimated parameters.

converged_

A boolean indicating whether the model converged during training.

num_iter_

The number of iterations the model was fitted for, excluding initialization.

nll_

The average per-datapoint negative log-likelihood at the last training step.