GaussianMixture¶
 class pycave.bayes.GaussianMixture(num_components=1, *, covariance_type='diag', init_strategy='kmeans', init_means=None, convergence_tolerance=0.001, covariance_regularization=1e06, batch_size=None, trainer_params=None)[source]¶
Bases:
lightkit.estimator.configurable.ConfigurableBaseEstimator
[pycave.bayes.gmm.model.GaussianMixtureModel
],lightkit.estimator.mixins.PredictorMixin
[Union
[numpy.ndarray
,torch.Tensor
],torch.Tensor
]Probabilistic model assuming that data is generated from a mixture of Gaussians. The mixture is assumed to be composed of a fixed number of components with individual means and covariances. More information on Gaussian mixture models (GMMs) is available on Wikipedia.
See also
PyTorch module for a Gaussian mixture model.
Configuration class for a Gaussian mixture model.
 Parameters
num_components (
int
)  The number of components in the GMM. The dimensionality of each component is automatically inferred from the data.covariance_type (
Literal
['full', 'tied', 'diag', 'spherical'])  The type of covariance to assume for all Gaussian components.init_strategy (
Literal
['random', 'kmeans', 'kmeans++'])  The strategy for initializing component means and covariances.init_means (
Optional
[Tensor
])  An optional initial guess for the means of the components. If provided, must be a tensor of shape[num_components, num_features]
. If this is given, theinit_strategy
is ignored and the means are handled as if Kmeans initialization has been run.convergence_tolerance (
float
)  The change in the perdatapoint negative loglikelihood which implies that training has converged.covariance_regularization (
float
)  A small value which is added to the diagonal of the covariance matrix to ensure that it is positive semidefinite.batch_size (
Optional
[int
])  The batch size to use when fitting the model. If not provided, the full data will be used as a single batch. Set this if the full data does not fit into memory.num_workers  The number of workers to use for loading the data. Only used if a PyTorch dataset is passed to
fit()
or related methods.trainer_params (
Optional
[Dict
[str
,Any
]]) Initialization parameters to use when initializing a PyTorch Lightning trainer. By default, it disables various stdout logs unless PyCave is configured to do verbose logging. Checkpointing and logging are disabled regardless of the log level. This estimator further sets the following overridable defaults:
max_epochs=100
Note
The number of epochs passed to the initializer only define the number of optimization epochs. Prior to that, initialization is run which may perform additional iterations through the data.
Note
For batch training, the number of epochs run (i.e. the number of passes through the data), does not align with the number of epochs passed to the initializer. This is because the EM algorithm needs to be split up across two epochs. The actual number of minimum/maximum epochs is, thus, doubled. Nonetheless,
num_iter_
indicates how many EM iterations have been run.
Methods
Fits the Gaussian mixture on the provided data, estimating component priors, means and covariances. 

Computes the most likely components for each of the provided datapoints. 

Computes a distribution over the components for each of the provided datapoints. 

Samples datapoints from the fitted Gaussian mixture. 

Computes the average negative loglikelihood (NLL) of the provided datapoints. 

Computes the negative loglikelihood (NLL) of each of the provided datapoints. 
Inherited Methods
Clones the estimator without copying any fitted attributes. 

Fits the estimator using the provided data and subsequently predicts the labels for the data using the fitted estimator. 

Returns the estimator's parameters as passed to the initializer. 

Loads the estimator and (if available) the fitted model. 

Loads the fitted attributes that are stored at the fitted path. 

Initializes this estimator by loading its parameters. 

Saves the estimator to the provided directory. 

Saves the fitted attributes of this estimator. 

Saves the parameters of this estimator. 

Sets the provided values on the estimator. 

Returns the trainer as configured by the estimator. 
Attributes

Returns the list of fitted attributes that ought to be saved and loaded. 

The fitted PyTorch module with all estimated parameters. 

A boolean indicating whether the model converged during training. 

The number of iterations the model was fitted for, excluding initialization. 

The average perdatapoint negative loglikelihood at the last training step. 