KMeans¶

class pycave.clustering.KMeans(num_clusters=1, *, init_strategy='kmeans++', convergence_tolerance=0.0001, batch_size=None, trainer_params=None)[source]¶

Bases: ConfigurableBaseEstimator[KMeansModel], TransformerMixin[Union[ndarray, Tensor], Tensor], PredictorMixin[Union[ndarray, Tensor], Tensor]

Model for clustering data into a predefined number of clusters. More information on K-means clustering is available on Wikipedia.

See also

`KMeansModel`	PyTorch module for the K-Means model.
`KMeansModelConfig`	Configuration class for a K-Means model.

Parameters:

num_clusters (int) -- The number of clusters.
init_strategy (KMeansInitStrategy) -- The strategy for initializing centroids.
convergence_tolerance (float) -- Training is conducted until the Frobenius norm of the change between cluster centroids falls below this threshold. The tolerance is multiplied by the average variance of the features.
batch_size (int | None) -- The batch size to use when fitting the model. If not provided, the full data will be used as a single batch. Set this if the full data does not fit into memory.
trainer_params (dict[str, Any] | None) --
Initialization parameters to use when initializing a PyTorch Lightning trainer. By default, it disables various stdout logs unless PyCave is configured to do verbose logging. Checkpointing and logging are disabled regardless of the log level. This estimator further sets the following overridable defaults:
- max_epochs=300

Note

The number of epochs passed to the initializer only define the number of optimization epochs. Prior to that, initialization is run which may perform additional iterations through the data.

Methods

`fit`	Fits the KMeans model on the provided data by running Lloyd's algorithm.
`predict`	Predicts the closest cluster for each item provided.
`score`	Computes the average inertia of all the provided datapoints.
`score_samples`	Computes the inertia for each of the the provided datapoints.
`transform`	Transforms the provided data into the cluster-distance space.

Inherited Methods

`clone`	Clones the estimator without copying any fitted attributes.
`fit_predict`	Fits the estimator using the provided data and subsequently predicts the labels for the data using the fitted estimator.
`fit_transform`	Fits the estimator using the provided data and subsequently transforms the data using the fitted estimator.
`get_params`	Returns the estimator's parameters as passed to the initializer.
`load`	Loads the estimator and (if available) the fitted model.
`load_attributes`	Loads the fitted attributes that are stored at the fitted path.
`load_parameters`	Initializes this estimator by loading its parameters.
`save`	Saves the estimator to the provided directory.
`save_attributes`	Saves the fitted attributes of this estimator.
`save_parameters`	Saves the parameters of this estimator.
`set_params`	Sets the provided values on the estimator.
`trainer`	Returns the trainer as configured by the estimator.

Attributes

`persistent_attributes`	Returns the list of fitted attributes that ought to be saved and loaded.
`model_`	The fitted PyTorch module with all estimated parameters.
`converged_`	A boolean indicating whether the model converged during training.
`num_iter_`	The number of iterations the model was fitted for, excluding initialization.
`inertia_`	The mean squared distance of all datapoints to their closest cluster centers.