Ocean
Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices > Class Template Reference

This class implements a k-means clustering algorithm. More...

Inheritance diagram for Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >:

Data Structures

class  Cluster
 This class implements one cluster that holds the mean values of all observations belonging to this cluster and the indices of all observations belonging to this cluster. More...
 

Public Types

enum  InitializationStrategy { IS_LARGEST_DISTANCE , IS_RANDOM }
 Definition of individual initialization strategies. More...
 
typedef Clustering< tUseIndices >::template Data< T, tDimension > Data
 (Re-)Definition of a data object providing the data which will be clustered. More...
 
typedef Data::DataIndex DataIndex
 (Re-)Definition of an index that addresses one specific observation element in the data object that stores all observations. More...
 
typedef Data::DataIndices DataIndices
 (Re-)Definition of a vector holding (size_t) indices. More...
 
typedef Data::Observation Observation
 (Re-)Definition of an observation object. More...
 
typedef std::vector< ClusterClusters
 Definition of a vector holding cluster objects. More...
 

Public Member Functions

 ClusteringKMeans ()
 Creates an empty k-means object. More...
 
 ClusteringKMeans (ClusteringKMeans &&clustering) noexcept
 Move constructor. More...
 
 ClusteringKMeans (const Data &data)
 Creates a new k-means object by a given data object. More...
 
 ClusteringKMeans (Data &&data)
 Creates a new k-means object by a given data object. More...
 
const Clustersclusters () const
 Returns the clusters of this k-means clustering object. More...
 
void sortClusters ()
 Sorts the clusters regarding their number of elements. More...
 
TSquareDistance maximalSqrDistance () const
 Calculates the maximal square distance between the mean observation value of each clusters and all observations belonging to the cluster. More...
 
void determineClustersByNumber (const size_t numberClusters, const InitializationStrategy strategy=IS_LARGEST_DISTANCE, const size_t iterations=5, Worker *worker=nullptr)
 Determines the clusters for this object, ensure that this object has been initialized with a valid set of observations. More...
 
void determineClustersByDistance (const TSquareDistance maximalSqrDistance, size_t maximalClusters=0, const size_t iterations=5, Worker *worker=nullptr)
 Determines the clusters for this object, ensure that this object has been initialized with a valid set of observations. More...
 
bool addCluster (const size_t iterations=5, TSquareDistance sqrDistance=TSquareDistance(0), Worker *worker=nullptr)
 Adds a new clusters for this object. More...
 
void removeCluster (const size_t iterations=5, Worker *worker=nullptr)
 Removes one cluster from this object. More...
 
size_t findCluster (const Observation &observation)
 Finds a best matching cluster for a given independent observation. More...
 
void applyOptimizationIteration ()
 Explicitly applies one further optimization iteration for an existing set of clusters. More...
 
void applyOptimizationIteration (Worker *worker)
 Explicitly applies one further optimization iteration for an existing set of clusters. More...
 
void clear ()
 Clears all determined clusters but registered the data information is untouched. More...
 
bool isValid () const
 Returns whether this object holds a valid set of observations. More...
 
 operator bool () const
 Returns whether this object holds a valid set of observations. More...
 
ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices > & operator= (ClusteringKMeans &&clustering)
 Move operator. More...
 

Protected Member Functions

void determineInitialClustersLargestDistance (const size_t numberClusters)
 Determines the initial clusters for this object with the IS_LARGEST_DISTANCE strategy. More...
 
void determineInitialClustersRandom (const size_t numberClusters)
 Determines the initial clusters for this object with the IS_RANDOM strategy. More...
 
void applyOptimizationIterationSubset (Lock *lock, const unsigned int firstObservation, const unsigned int numberObservations)
 Explicitly applies one further optimization iteration for an existing set of clusters. More...
 

Static Protected Member Functions

static DataIndex smallestObservation (const Data &data)
 Determines the smallest observation (euclidean distance to origin) from a set of observations. More...
 
static TSquareDistance sqrDistance (const Observation &observation)
 Returns the square distance between an observation and the origin. More...
 

Protected Attributes

Data data_
 The data that stores the observations of this clustering object, either with index-access or pointer-access. More...
 
Clusters clusters_
 The current clusters of this object. More...
 

Detailed Description

template<typename T, size_t tDimension, typename TSum = T, typename TSquareDistance = T, bool tUseIndices = true>
class Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >

This class implements a k-means clustering algorithm.

Beware: Due to performance issues, this class does not copy the given observation values, this expects that the given observation values exist as long as the KMean object exists.

Template Parameters
TThe data type of each element of an observation
tDimensionThe dimension of each observation (the number of elements in each observation), with range [1, infinity)
TSumThe data type of the intermediate sum values, that is necessary to determine e.g. the mean parameters
TSquareDistanceThe data type of the square distance value, might be different from T

Member Typedef Documentation

◆ Clusters

template<typename T , size_t tDimension, typename TSum = T, typename TSquareDistance = T, bool tUseIndices = true>
typedef std::vector<Cluster> Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::Clusters

Definition of a vector holding cluster objects.

◆ Data

template<typename T , size_t tDimension, typename TSum = T, typename TSquareDistance = T, bool tUseIndices = true>
typedef Clustering<tUseIndices>::template Data<T, tDimension> Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::Data

(Re-)Definition of a data object providing the data which will be clustered.

◆ DataIndex

template<typename T , size_t tDimension, typename TSum = T, typename TSquareDistance = T, bool tUseIndices = true>
typedef Data::DataIndex Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::DataIndex

(Re-)Definition of an index that addresses one specific observation element in the data object that stores all observations.

◆ DataIndices

template<typename T , size_t tDimension, typename TSum = T, typename TSquareDistance = T, bool tUseIndices = true>
typedef Data::DataIndices Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::DataIndices

(Re-)Definition of a vector holding (size_t) indices.

◆ Observation

template<typename T , size_t tDimension, typename TSum = T, typename TSquareDistance = T, bool tUseIndices = true>
typedef Data::Observation Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::Observation

(Re-)Definition of an observation object.

Member Enumeration Documentation

◆ InitializationStrategy

template<typename T , size_t tDimension, typename TSum = T, typename TSquareDistance = T, bool tUseIndices = true>
enum Ocean::ClusteringKMeans::InitializationStrategy

Definition of individual initialization strategies.

Enumerator
IS_LARGEST_DISTANCE 

The first cluster is determined by selection of the (euclidean) smallest observation, the remaining clusters are defined by observations with largest distance to the already existing clusters.

IS_RANDOM 

All clusters are selected randomly.

Constructor & Destructor Documentation

◆ ClusteringKMeans() [1/4]

template<typename T , size_t tDimension, typename TSum , typename TSquareDistance , bool tUseIndices>
Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::ClusteringKMeans
inline

Creates an empty k-means object.

◆ ClusteringKMeans() [2/4]

template<typename T , size_t tDimension, typename TSum , typename TSquareDistance , bool tUseIndices>
Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::ClusteringKMeans ( ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices > &&  clustering)
inlinenoexcept

Move constructor.

Parameters
clusteringThe clustering object to be moved

◆ ClusteringKMeans() [3/4]

template<typename T , size_t tDimension, typename TSum , typename TSquareDistance , bool tUseIndices>
Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::ClusteringKMeans ( const Data data)
inlineexplicit

Creates a new k-means object by a given data object.

Parameters
dataThe data object to be used to determine the clusters.
See also
determineClusters().

◆ ClusteringKMeans() [4/4]

template<typename T , size_t tDimension, typename TSum , typename TSquareDistance , bool tUseIndices>
Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::ClusteringKMeans ( Data &&  data)
inlineexplicit

Creates a new k-means object by a given data object.

Parameters
dataThe data object that will be moved and used to determine the clusters.
See also
determineClusters().

Member Function Documentation

◆ addCluster()

template<typename T , size_t tDimension, typename TSum , typename TSquareDistance , bool tUseIndices>
bool Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::addCluster ( const size_t  iterations = 5,
TSquareDistance  sqrDistance = TSquareDistance(0),
Worker worker = nullptr 
)

Adds a new clusters for this object.

Parameters
iterationsThe number of optimization iterations that are applied after the new cluster has been added, with range [1, infinity)
sqrDistanceThe minimal square distance between the cluster's mean and an observation of this cluster so that this cluster is divided into two clusters
workerOptional worker object to distribute the computation
Returns
True, if a new cluster have been added, False if no further cluster could be added or if the provided distance was too large

◆ applyOptimizationIteration() [1/2]

template<typename T , size_t tDimension, typename TSum , typename TSquareDistance , bool tUseIndices>
void Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::applyOptimizationIteration

Explicitly applies one further optimization iteration for an existing set of clusters.

Do not call this function before initial clusters have been found.

See also
clusters(), determineCluster().

◆ applyOptimizationIteration() [2/2]

template<typename T , size_t tDimension, typename TSum , typename TSquareDistance , bool tUseIndices>
void Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::applyOptimizationIteration ( Worker worker)

Explicitly applies one further optimization iteration for an existing set of clusters.

Do not call this function before initial clusters have been found.

Parameters
workerThe worker object to distribute the computation
See also
clusters(), determineCluster().

◆ applyOptimizationIterationSubset()

template<typename T , size_t tDimension, typename TSum , typename TSquareDistance , bool tUseIndices>
void Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::applyOptimizationIterationSubset ( Lock lock,
const unsigned int  firstObservation,
const unsigned int  numberObservations 
)
protected

Explicitly applies one further optimization iteration for an existing set of clusters.

This functions operates on a subset of all observations.

Parameters
lockOptional lock object if this function is executed on multiple threads in parallel
firstObservationThe first observation that will be handled
numberObservationsThe number of observations that will be handled
See also
clusters(), determineCluster().

◆ clear()

template<typename T , size_t tDimension, typename TSum , typename TSquareDistance , bool tUseIndices>
void Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::clear

Clears all determined clusters but registered the data information is untouched.

◆ clusters()

template<typename T , size_t tDimension, typename TSum , typename TSquareDistance , bool tUseIndices>
const ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::Clusters & Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::clusters
inline

Returns the clusters of this k-means clustering object.

Returns
The determined k-means clusters

◆ determineClustersByDistance()

template<typename T , size_t tDimension, typename TSum , typename TSquareDistance , bool tUseIndices>
void Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::determineClustersByDistance ( const TSquareDistance  maximalSqrDistance,
size_t  maximalClusters = 0,
const size_t  iterations = 5,
Worker worker = nullptr 
)

Determines the clusters for this object, ensure that this object has been initialized with a valid set of observations.

This function adds new clusters within several iterations until the defined maximalSqrDistance is larger than the distance within all clusters or until the defined maximal number of clusters is reached.

Parameters
maximalSqrDistanceThe maximal square distance in the final clusters between the clusters' mean observation values and the observations in the clusters
maximalClustersThe maximal number of clusters that will be created (even if maximalSqrDistance is not reached), with range [0, infinity), define 0 to ignore this parameter
iterationsThe number of optimization iterations that are applied after each time a new cluster is added [1, infinity)
workerOptional worker object to distribute the computation

◆ determineClustersByNumber()

template<typename T , size_t tDimension, typename TSum , typename TSquareDistance , bool tUseIndices>
void Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::determineClustersByNumber ( const size_t  numberClusters,
const InitializationStrategy  strategy = IS_LARGEST_DISTANCE,
const size_t  iterations = 5,
Worker worker = nullptr 
)

Determines the clusters for this object, ensure that this object has been initialized with a valid set of observations.

Parameters
numberClustersThe number of clusters that will be created, with range [1, numberObservations())
strategyThe initialization strategy for the first clusters
iterationsThe number of optimization iterations that are applied after the initial clusters have been determined, with range [1, infinity)
workerOptional worker object to distribute the computation
See also
clusters().

◆ determineInitialClustersLargestDistance()

template<typename T , size_t tDimension, typename TSum , typename TSquareDistance , bool tUseIndices>
void Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::determineInitialClustersLargestDistance ( const size_t  numberClusters)
protected

Determines the initial clusters for this object with the IS_LARGEST_DISTANCE strategy.

First the smallest observation object is selected as first cluster,
all following clusters are determined by observations that have the largest distance to the already existing clusters.

Parameters
numberClustersThe number of initial clusters that will be created.

◆ determineInitialClustersRandom()

template<typename T , size_t tDimension, typename TSum , typename TSquareDistance , bool tUseIndices>
void Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::determineInitialClustersRandom ( const size_t  numberClusters)
protected

Determines the initial clusters for this object with the IS_RANDOM strategy.

All clusters are created randomly.br>

Parameters
numberClustersThe number of initial clusters that will be created.

◆ findCluster()

template<typename T , size_t tDimension, typename TSum , typename TSquareDistance , bool tUseIndices>
size_t Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::findCluster ( const Observation observation)

Finds a best matching cluster for a given independent observation.

However, the observation is not added to this cluster, it's simply a lookup for the best matching cluster.

Parameters
observationThe observation for that the best matching cluster is determined
Returns
The index of the best matching cluster, -1 if no cluster could be found
See also
clusters().

◆ isValid()

template<typename T , size_t tDimension, typename TSum , typename TSquareDistance , bool tUseIndices>
bool Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::isValid
inline

Returns whether this object holds a valid set of observations.

Returns
True, if so

◆ maximalSqrDistance()

template<typename T , size_t tDimension, typename TSum , typename TSquareDistance , bool tUseIndices>
TSquareDistance Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::maximalSqrDistance

Calculates the maximal square distance between the mean observation value of each clusters and all observations belonging to the cluster.

Returns
Maximal square distance for all clusters

◆ operator bool()

template<typename T , size_t tDimension, typename TSum , typename TSquareDistance , bool tUseIndices>
Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::operator bool
inlineexplicit

Returns whether this object holds a valid set of observations.

Returns
True, if so

◆ operator=()

template<typename T , size_t tDimension, typename TSum , typename TSquareDistance , bool tUseIndices>
ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices > & Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::operator= ( ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices > &&  clustering)
inline

Move operator.

Parameters
clusteringThe clustering object to be moved
Returns
Reference to this object

◆ removeCluster()

template<typename T , size_t tDimension, typename TSum , typename TSquareDistance , bool tUseIndices>
void Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::removeCluster ( const size_t  iterations = 5,
Worker worker = nullptr 
)

Removes one cluster from this object.

The cluster with smallest maximal distance of all observations to the mean observation value of the clusters is removed.

Parameters
iterationsThe number of optimization iterations that are applied after the cluster has been removed, with range [1, infinity)
workerOptional worker object to distribute the computation

◆ smallestObservation()

template<typename T , size_t tDimension, typename TSum , typename TSquareDistance , bool tUseIndices>
ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::DataIndex Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::smallestObservation ( const Data data)
inlinestaticprotected

Determines the smallest observation (euclidean distance to origin) from a set of observations.

Parameters
dataThe observation data in which the smallest observation is determined, must be valid
Returns
The index of the smallest observation

◆ sortClusters()

template<typename T , size_t tDimension, typename TSum , typename TSquareDistance , bool tUseIndices>
void Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::sortClusters

Sorts the clusters regarding their number of elements.

◆ sqrDistance()

template<typename T , size_t tDimension, typename TSum , typename TSquareDistance , bool tUseIndices>
TSquareDistance Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::sqrDistance ( const Observation observation)
inlinestaticprotected

Returns the square distance between an observation and the origin.

Parameters
observationThe observation for that the square distance is determined
Returns
Resulting square distance

Field Documentation

◆ clusters_

template<typename T , size_t tDimension, typename TSum = T, typename TSquareDistance = T, bool tUseIndices = true>
Clusters Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::clusters_
protected

The current clusters of this object.

◆ data_

template<typename T , size_t tDimension, typename TSum = T, typename TSquareDistance = T, bool tUseIndices = true>
Data Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::data_
protected

The data that stores the observations of this clustering object, either with index-access or pointer-access.


The documentation for this class was generated from the following file: