This class implements a k-means clustering algorithm. More...

#include <ClusteringKMeans.h>

Inheritance diagram for Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >:

Data Structures
class	Cluster
	This class implements one cluster that holds the mean values of all observations belonging to this cluster and the indices of all observations belonging to this cluster. More...

Public Types
enum	InitializationStrategy { IS_LARGEST_DISTANCE , IS_RANDOM }
	Definition of individual initialization strategies. More...

typedef Clustering< tUseIndices >::template Data< T, tDimension >	Data
	(Re-)Definition of a data object providing the data which will be clustered.

typedef Data::DataIndex	DataIndex
	(Re-)Definition of an index that addresses one specific observation element in the data object that stores all observations.

typedef Data::DataIndices	DataIndices
	(Re-)Definition of a vector holding (size_t) indices.

typedef Data::Observation	Observation
	(Re-)Definition of an observation object.

typedef std::vector< Cluster >	Clusters
	Definition of a vector holding cluster objects.

Public Member Functions
	ClusteringKMeans ()
	Creates an empty k-means object.

	ClusteringKMeans (ClusteringKMeans &&clustering) noexcept
	Move constructor.

	ClusteringKMeans (const Data &data)
	Creates a new k-means object by a given data object.

	ClusteringKMeans (Data &&data)
	Creates a new k-means object by a given data object.

const Clusters &	clusters () const
	Returns the clusters of this k-means clustering object.

void	sortClusters ()
	Sorts the clusters regarding their number of elements.

TSquareDistance	maximalSqrDistance () const
	Calculates the maximal square distance between the mean observation value of each clusters and all observations belonging to the cluster.

void	determineClustersByNumber (const size_t numberClusters, const InitializationStrategy strategy=IS_LARGEST_DISTANCE, const size_t iterations=5, Worker *worker=nullptr)
	Determines the clusters for this object, ensure that this object has been initialized with a valid set of observations.

void	determineClustersByDistance (const TSquareDistance maximalSqrDistance, size_t maximalClusters=0, const size_t iterations=5, Worker *worker=nullptr)
	Determines the clusters for this object, ensure that this object has been initialized with a valid set of observations.

bool	addCluster (const size_t iterations=5, TSquareDistance sqrDistance=TSquareDistance(0), Worker *worker=nullptr)
	Adds a new clusters for this object.

void	removeCluster (const size_t iterations=5, Worker *worker=nullptr)
	Removes one cluster from this object.

size_t	findCluster (const Observation &observation)
	Finds a best matching cluster for a given independent observation.

void	applyOptimizationIteration ()
	Explicitly applies one further optimization iteration for an existing set of clusters.

void	applyOptimizationIteration (Worker *worker)
	Explicitly applies one further optimization iteration for an existing set of clusters.

void	clear ()
	Clears all determined clusters but registered the data information is untouched.

bool	isValid () const
	Returns whether this object holds a valid set of observations.

	operator bool () const
	Returns whether this object holds a valid set of observations.

ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices > &	operator= (ClusteringKMeans &&clustering)
	Move operator.

Protected Member Functions
void	determineInitialClustersLargestDistance (const size_t numberClusters)
	Determines the initial clusters for this object with the IS_LARGEST_DISTANCE strategy.

void	determineInitialClustersRandom (const size_t numberClusters)
	Determines the initial clusters for this object with the IS_RANDOM strategy.

void	applyOptimizationIterationSubset (Lock *lock, const unsigned int firstObservation, const unsigned int numberObservations)
	Explicitly applies one further optimization iteration for an existing set of clusters.

Static Protected Member Functions
static DataIndex	smallestObservation (const Data &data)
	Determines the smallest observation (euclidean distance to origin) from a set of observations.

static TSquareDistance	sqrDistance (const Observation &observation)
	Returns the square distance between an observation and the origin.

Protected Attributes
Data	data_
	The data that stores the observations of this clustering object, either with index-access or pointer-access.

Clusters	clusters_
	The current clusters of this object.

Detailed Description

template<typename T, size_t tDimension, typename TSum = T, typename TSquareDistance = T, bool tUseIndices = true>
class Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >

This class implements a k-means clustering algorithm.

Beware: Due to performance issues, this class does not copy the given observation values, this expects that the given observation values exist as long as the KMean object exists.

Template Parameters

T	The data type of each element of an observation
tDimension	The dimension of each observation (the number of elements in each observation), with range [1, infinity)
TSum	The data type of the intermediate sum values, that is necessary to determine e.g. the mean parameters
TSquareDistance	The data type of the square distance value, might be different from T

Member Typedef Documentation

◆ Clusters

template<typename T , size_t tDimension, typename TSum = T, typename TSquareDistance = T, bool tUseIndices = true>

typedef std::vector<Cluster> Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::Clusters

Definition of a vector holding cluster objects.

◆ Data

template<typename T , size_t tDimension, typename TSum = T, typename TSquareDistance = T, bool tUseIndices = true>

typedef Clustering<tUseIndices>::template Data<T, tDimension> Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::Data

(Re-)Definition of a data object providing the data which will be clustered.

◆ DataIndex

template<typename T , size_t tDimension, typename TSum = T, typename TSquareDistance = T, bool tUseIndices = true>

typedef Data::DataIndex Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::DataIndex

(Re-)Definition of an index that addresses one specific observation element in the data object that stores all observations.

◆ DataIndices

template<typename T , size_t tDimension, typename TSum = T, typename TSquareDistance = T, bool tUseIndices = true>

typedef Data::DataIndices Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::DataIndices

(Re-)Definition of a vector holding (size_t) indices.

◆ Observation

template<typename T , size_t tDimension, typename TSum = T, typename TSquareDistance = T, bool tUseIndices = true>

typedef Data::Observation Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::Observation

(Re-)Definition of an observation object.

Member Enumeration Documentation

◆ InitializationStrategy

template<typename T , size_t tDimension, typename TSum = T, typename TSquareDistance = T, bool tUseIndices = true>

enum Ocean::ClusteringKMeans::InitializationStrategy

Definition of individual initialization strategies.

Enumerator
IS_LARGEST_DISTANCE	The first cluster is determined by selection of the (euclidean) smallest observation, the remaining clusters are defined by observations with largest distance to the already existing clusters.
IS_RANDOM	All clusters are selected randomly.

Constructor & Destructor Documentation

◆ ClusteringKMeans() [1/4]

template<typename T , size_t tDimension, typename TSum , typename TSquareDistance , bool tUseIndices>

Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::ClusteringKMeans ( )

inline

Creates an empty k-means object.

◆ ClusteringKMeans() [2/4]

template<typename T , size_t tDimension, typename TSum , typename TSquareDistance , bool tUseIndices>

Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::ClusteringKMeans ( ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices > && clustering )

inlinenoexcept

Move constructor.

Parameters

clustering The clustering object to be moved

◆ ClusteringKMeans() [3/4]

template<typename T , size_t tDimension, typename TSum , typename TSquareDistance , bool tUseIndices>

Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::ClusteringKMeans ( const Data & data )

inlineexplicit

Creates a new k-means object by a given data object.

Parameters

data	The data object to be used to determine the clusters.

See also: determineClusters().

◆ ClusteringKMeans() [4/4]

template<typename T , size_t tDimension, typename TSum , typename TSquareDistance , bool tUseIndices>

Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::ClusteringKMeans ( Data && data )

inlineexplicit

Creates a new k-means object by a given data object.

Parameters

data	The data object that will be moved and used to determine the clusters.

See also: determineClusters().

Member Function Documentation

◆ addCluster()

template<typename T , size_t tDimension, typename TSum , typename TSquareDistance , bool tUseIndices>

bool Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::addCluster	(	const size_t	iterations = `5`,
		TSquareDistance	sqrDistance = `TSquareDistance(0)`,
		Worker *	worker = `nullptr`
	)

Adds a new clusters for this object.

Parameters

iterations	The number of optimization iterations that are applied after the new cluster has been added, with range [1, infinity)
sqrDistance	The minimal square distance between the cluster's mean and an observation of this cluster so that this cluster is divided into two clusters
worker	Optional worker object to distribute the computation

Returns: True, if a new cluster have been added, False if no further cluster could be added or if the provided distance was too large

◆ applyOptimizationIteration() [1/2]

template<typename T , size_t tDimension, typename TSum , typename TSquareDistance , bool tUseIndices>

void Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::applyOptimizationIteration ( )

Explicitly applies one further optimization iteration for an existing set of clusters.

Do not call this function before initial clusters have been found.

See also: clusters(), determineCluster().

◆ applyOptimizationIteration() [2/2]

template<typename T , size_t tDimension, typename TSum , typename TSquareDistance , bool tUseIndices>

void Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::applyOptimizationIteration ( Worker * worker )

Explicitly applies one further optimization iteration for an existing set of clusters.

Do not call this function before initial clusters have been found.

Parameters

worker The worker object to distribute the computation

See also: clusters(), determineCluster().

◆ applyOptimizationIterationSubset()

template<typename T , size_t tDimension, typename TSum , typename TSquareDistance , bool tUseIndices>

void Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::applyOptimizationIterationSubset	(	Lock *	lock,
		const unsigned int	firstObservation,
		const unsigned int	numberObservations
	)

protected

Explicitly applies one further optimization iteration for an existing set of clusters.

This functions operates on a subset of all observations.

Parameters

lock	Optional lock object if this function is executed on multiple threads in parallel
firstObservation	The first observation that will be handled
numberObservations	The number of observations that will be handled

See also: clusters(), determineCluster().

◆ clear()

template<typename T , size_t tDimension, typename TSum , typename TSquareDistance , bool tUseIndices>

void Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::clear ( )

Clears all determined clusters but registered the data information is untouched.

◆ clusters()

template<typename T , size_t tDimension, typename TSum , typename TSquareDistance , bool tUseIndices>

const ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::Clusters & Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::clusters ( ) const

inline

Returns the clusters of this k-means clustering object.

Returns: The determined k-means clusters

◆ determineClustersByDistance()

template<typename T , size_t tDimension, typename TSum , typename TSquareDistance , bool tUseIndices>

void Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::determineClustersByDistance	(	const TSquareDistance	maximalSqrDistance,
		size_t	maximalClusters = `0`,
		const size_t	iterations = `5`,
		Worker *	worker = `nullptr`
	)

Determines the clusters for this object, ensure that this object has been initialized with a valid set of observations.

This function adds new clusters within several iterations until the defined maximalSqrDistance is larger than the distance within all clusters or until the defined maximal number of clusters is reached.

Parameters

maximalSqrDistance	The maximal square distance in the final clusters between the clusters' mean observation values and the observations in the clusters
maximalClusters	The maximal number of clusters that will be created (even if maximalSqrDistance is not reached), with range [0, infinity), define 0 to ignore this parameter
iterations	The number of optimization iterations that are applied after each time a new cluster is added [1, infinity)
worker	Optional worker object to distribute the computation

◆ determineClustersByNumber()

template<typename T , size_t tDimension, typename TSum , typename TSquareDistance , bool tUseIndices>

void Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::determineClustersByNumber	(	const size_t	numberClusters,
		const InitializationStrategy	strategy = `IS_LARGEST_DISTANCE`,
		const size_t	iterations = `5`,
		Worker *	worker = `nullptr`
	)

Determines the clusters for this object, ensure that this object has been initialized with a valid set of observations.

Parameters

numberClusters	The number of clusters that will be created, with range [1, numberObservations())
strategy	The initialization strategy for the first clusters
iterations	The number of optimization iterations that are applied after the initial clusters have been determined, with range [1, infinity)
worker	Optional worker object to distribute the computation

See also: clusters().

◆ determineInitialClustersLargestDistance()

template<typename T , size_t tDimension, typename TSum , typename TSquareDistance , bool tUseIndices>

void Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::determineInitialClustersLargestDistance ( const size_t numberClusters )

protected

Determines the initial clusters for this object with the IS_LARGEST_DISTANCE strategy.

First the smallest observation object is selected as first cluster,
all following clusters are determined by observations that have the largest distance to the already existing clusters.

Parameters

numberClusters The number of initial clusters that will be created.

◆ determineInitialClustersRandom()

template<typename T , size_t tDimension, typename TSum , typename TSquareDistance , bool tUseIndices>

void Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::determineInitialClustersRandom ( const size_t numberClusters )

protected

Determines the initial clusters for this object with the IS_RANDOM strategy.

All clusters are created randomly.br>

Parameters

numberClusters The number of initial clusters that will be created.

◆ findCluster()

template<typename T , size_t tDimension, typename TSum , typename TSquareDistance , bool tUseIndices>

size_t Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::findCluster ( const Observation & observation )

Finds a best matching cluster for a given independent observation.

However, the observation is not added to this cluster, it's simply a lookup for the best matching cluster.

Parameters

observation The observation for that the best matching cluster is determined

Returns: The index of the best matching cluster, -1 if no cluster could be found

See also: clusters().

◆ isValid()

template<typename T , size_t tDimension, typename TSum , typename TSquareDistance , bool tUseIndices>

bool Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::isValid ( ) const

inline

Returns whether this object holds a valid set of observations.

Returns: True, if so

◆ maximalSqrDistance()

template<typename T , size_t tDimension, typename TSum , typename TSquareDistance , bool tUseIndices>

TSquareDistance Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::maximalSqrDistance ( ) const

Calculates the maximal square distance between the mean observation value of each clusters and all observations belonging to the cluster.

Returns: Maximal square distance for all clusters

◆ operator bool()

template<typename T , size_t tDimension, typename TSum , typename TSquareDistance , bool tUseIndices>

Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::operator bool ( ) const

inlineexplicit

Returns whether this object holds a valid set of observations.

Returns: True, if so

◆ operator=()

template<typename T , size_t tDimension, typename TSum , typename TSquareDistance , bool tUseIndices>

ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices > & Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::operator= ( ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices > && clustering )

inline

Move operator.

Parameters

clustering The clustering object to be moved

Returns: Reference to this object

◆ removeCluster()

template<typename T , size_t tDimension, typename TSum , typename TSquareDistance , bool tUseIndices>

void Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::removeCluster	(	const size_t	iterations = `5`,
		Worker *	worker = `nullptr`
	)

Removes one cluster from this object.

The cluster with smallest maximal distance of all observations to the mean observation value of the clusters is removed.

Parameters

iterations	The number of optimization iterations that are applied after the cluster has been removed, with range [1, infinity)
worker	Optional worker object to distribute the computation

◆ smallestObservation()

template<typename T , size_t tDimension, typename TSum , typename TSquareDistance , bool tUseIndices>

ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::DataIndex Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::smallestObservation ( const Data & data )

inlinestaticprotected

Determines the smallest observation (euclidean distance to origin) from a set of observations.

Parameters

data	The observation data in which the smallest observation is determined, must be valid

Returns: The index of the smallest observation

◆ sortClusters()

template<typename T , size_t tDimension, typename TSum , typename TSquareDistance , bool tUseIndices>

void Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::sortClusters ( )

Sorts the clusters regarding their number of elements.

◆ sqrDistance()

template<typename T , size_t tDimension, typename TSum , typename TSquareDistance , bool tUseIndices>

TSquareDistance Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::sqrDistance ( const Observation & observation )

inlinestaticprotected

Returns the square distance between an observation and the origin.

Parameters

observation The observation for that the square distance is determined

Returns: Resulting square distance

Field Documentation

◆ clusters_

template<typename T , size_t tDimension, typename TSum = T, typename TSquareDistance = T, bool tUseIndices = true>

Clusters Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::clusters_

protected

The current clusters of this object.

◆ data_

template<typename T , size_t tDimension, typename TSum = T, typename TSquareDistance = T, bool tUseIndices = true>

Data Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::data_

protected

The data that stores the observations of this clustering object, either with index-access or pointer-access.

The documentation for this class was generated from the following file:

ClusteringKMeans.h

Data Structures

Public Types

Public Member Functions

Protected Member Functions

Static Protected Member Functions

Protected Attributes

Detailed Description

Member Typedef Documentation

◆ Clusters

◆ Data

◆ DataIndex

◆ DataIndices

◆ Observation

Member Enumeration Documentation

◆ InitializationStrategy

Constructor & Destructor Documentation

◆ ClusteringKMeans() [1/4]

◆ ClusteringKMeans() [2/4]

◆ ClusteringKMeans() [3/4]

◆ ClusteringKMeans() [4/4]

Member Function Documentation

◆ addCluster()

◆ applyOptimizationIteration() [1/2]

◆ applyOptimizationIteration() [2/2]

◆ applyOptimizationIterationSubset()

◆ clear()

◆ clusters()

◆ determineClustersByDistance()

◆ determineClustersByNumber()

◆ determineInitialClustersLargestDistance()

◆ determineInitialClustersRandom()

◆ findCluster()

◆ isValid()

◆ maximalSqrDistance()

◆ operator bool()

◆ operator=()

◆ removeCluster()

◆ smallestObservation()

◆ sortClusters()

◆ sqrDistance()

Field Documentation

◆ clusters_

◆ data_