Ocean
|
This class implements a k-means clustering algorithm. More...
#include <ClusteringKMeans.h>
Data Structures | |
class | Cluster |
This class implements one cluster that holds the mean values of all observations belonging to this cluster and the indices of all observations belonging to this cluster. More... | |
Public Types | |
enum | InitializationStrategy { IS_LARGEST_DISTANCE , IS_RANDOM } |
Definition of individual initialization strategies. More... | |
typedef Clustering< tUseIndices >::template Data< T, tDimension > | Data |
(Re-)Definition of a data object providing the data which will be clustered. | |
typedef Data::DataIndex | DataIndex |
(Re-)Definition of an index that addresses one specific observation element in the data object that stores all observations. | |
typedef Data::DataIndices | DataIndices |
(Re-)Definition of a vector holding (size_t) indices. | |
typedef Data::Observation | Observation |
(Re-)Definition of an observation object. | |
typedef std::vector< Cluster > | Clusters |
Definition of a vector holding cluster objects. | |
Public Member Functions | |
ClusteringKMeans () | |
Creates an empty k-means object. | |
ClusteringKMeans (ClusteringKMeans &&clustering) noexcept | |
Move constructor. | |
ClusteringKMeans (const Data &data) | |
Creates a new k-means object by a given data object. | |
ClusteringKMeans (Data &&data) | |
Creates a new k-means object by a given data object. | |
const Clusters & | clusters () const |
Returns the clusters of this k-means clustering object. | |
void | sortClusters () |
Sorts the clusters regarding their number of elements. | |
TSquareDistance | maximalSqrDistance () const |
Calculates the maximal square distance between the mean observation value of each clusters and all observations belonging to the cluster. | |
void | determineClustersByNumber (const size_t numberClusters, const InitializationStrategy strategy=IS_LARGEST_DISTANCE, const size_t iterations=5, Worker *worker=nullptr) |
Determines the clusters for this object, ensure that this object has been initialized with a valid set of observations. | |
void | determineClustersByDistance (const TSquareDistance maximalSqrDistance, size_t maximalClusters=0, const size_t iterations=5, Worker *worker=nullptr) |
Determines the clusters for this object, ensure that this object has been initialized with a valid set of observations. | |
bool | addCluster (const size_t iterations=5, TSquareDistance sqrDistance=TSquareDistance(0), Worker *worker=nullptr) |
Adds a new clusters for this object. | |
void | removeCluster (const size_t iterations=5, Worker *worker=nullptr) |
Removes one cluster from this object. | |
size_t | findCluster (const Observation &observation) |
Finds a best matching cluster for a given independent observation. | |
void | applyOptimizationIteration () |
Explicitly applies one further optimization iteration for an existing set of clusters. | |
void | applyOptimizationIteration (Worker *worker) |
Explicitly applies one further optimization iteration for an existing set of clusters. | |
void | clear () |
Clears all determined clusters but registered the data information is untouched. | |
bool | isValid () const |
Returns whether this object holds a valid set of observations. | |
operator bool () const | |
Returns whether this object holds a valid set of observations. | |
ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices > & | operator= (ClusteringKMeans &&clustering) |
Move operator. | |
Protected Member Functions | |
void | determineInitialClustersLargestDistance (const size_t numberClusters) |
Determines the initial clusters for this object with the IS_LARGEST_DISTANCE strategy. | |
void | determineInitialClustersRandom (const size_t numberClusters) |
Determines the initial clusters for this object with the IS_RANDOM strategy. | |
void | applyOptimizationIterationSubset (Lock *lock, const unsigned int firstObservation, const unsigned int numberObservations) |
Explicitly applies one further optimization iteration for an existing set of clusters. | |
Static Protected Member Functions | |
static DataIndex | smallestObservation (const Data &data) |
Determines the smallest observation (euclidean distance to origin) from a set of observations. | |
static TSquareDistance | sqrDistance (const Observation &observation) |
Returns the square distance between an observation and the origin. | |
Protected Attributes | |
Data | data_ |
The data that stores the observations of this clustering object, either with index-access or pointer-access. | |
Clusters | clusters_ |
The current clusters of this object. | |
This class implements a k-means clustering algorithm.
Beware: Due to performance issues, this class does not copy the given observation values, this expects that the given observation values exist as long as the KMean object exists.
T | The data type of each element of an observation |
tDimension | The dimension of each observation (the number of elements in each observation), with range [1, infinity) |
TSum | The data type of the intermediate sum values, that is necessary to determine e.g. the mean parameters |
TSquareDistance | The data type of the square distance value, might be different from T |
typedef std::vector<Cluster> Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::Clusters |
Definition of a vector holding cluster objects.
typedef Clustering<tUseIndices>::template Data<T, tDimension> Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::Data |
(Re-)Definition of a data object providing the data which will be clustered.
typedef Data::DataIndex Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::DataIndex |
(Re-)Definition of an index that addresses one specific observation element in the data object that stores all observations.
typedef Data::DataIndices Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::DataIndices |
(Re-)Definition of a vector holding (size_t) indices.
typedef Data::Observation Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::Observation |
(Re-)Definition of an observation object.
enum Ocean::ClusteringKMeans::InitializationStrategy |
Definition of individual initialization strategies.
|
inline |
Creates an empty k-means object.
|
inlinenoexcept |
Move constructor.
clustering | The clustering object to be moved |
|
inlineexplicit |
Creates a new k-means object by a given data object.
data | The data object to be used to determine the clusters. |
|
inlineexplicit |
Creates a new k-means object by a given data object.
data | The data object that will be moved and used to determine the clusters. |
bool Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::addCluster | ( | const size_t | iterations = 5 , |
TSquareDistance | sqrDistance = TSquareDistance(0) , |
||
Worker * | worker = nullptr |
||
) |
Adds a new clusters for this object.
iterations | The number of optimization iterations that are applied after the new cluster has been added, with range [1, infinity) |
sqrDistance | The minimal square distance between the cluster's mean and an observation of this cluster so that this cluster is divided into two clusters |
worker | Optional worker object to distribute the computation |
void Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::applyOptimizationIteration | ( | ) |
Explicitly applies one further optimization iteration for an existing set of clusters.
Do not call this function before initial clusters have been found.
void Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::applyOptimizationIteration | ( | Worker * | worker | ) |
Explicitly applies one further optimization iteration for an existing set of clusters.
Do not call this function before initial clusters have been found.
worker | The worker object to distribute the computation |
|
protected |
Explicitly applies one further optimization iteration for an existing set of clusters.
This functions operates on a subset of all observations.
lock | Optional lock object if this function is executed on multiple threads in parallel |
firstObservation | The first observation that will be handled |
numberObservations | The number of observations that will be handled |
void Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::clear | ( | ) |
Clears all determined clusters but registered the data information is untouched.
|
inline |
Returns the clusters of this k-means clustering object.
void Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::determineClustersByDistance | ( | const TSquareDistance | maximalSqrDistance, |
size_t | maximalClusters = 0 , |
||
const size_t | iterations = 5 , |
||
Worker * | worker = nullptr |
||
) |
Determines the clusters for this object, ensure that this object has been initialized with a valid set of observations.
This function adds new clusters within several iterations until the defined maximalSqrDistance is larger than the distance within all clusters or until the defined maximal number of clusters is reached.
maximalSqrDistance | The maximal square distance in the final clusters between the clusters' mean observation values and the observations in the clusters |
maximalClusters | The maximal number of clusters that will be created (even if maximalSqrDistance is not reached), with range [0, infinity), define 0 to ignore this parameter |
iterations | The number of optimization iterations that are applied after each time a new cluster is added [1, infinity) |
worker | Optional worker object to distribute the computation |
void Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::determineClustersByNumber | ( | const size_t | numberClusters, |
const InitializationStrategy | strategy = IS_LARGEST_DISTANCE , |
||
const size_t | iterations = 5 , |
||
Worker * | worker = nullptr |
||
) |
Determines the clusters for this object, ensure that this object has been initialized with a valid set of observations.
numberClusters | The number of clusters that will be created, with range [1, numberObservations()) |
strategy | The initialization strategy for the first clusters |
iterations | The number of optimization iterations that are applied after the initial clusters have been determined, with range [1, infinity) |
worker | Optional worker object to distribute the computation |
|
protected |
Determines the initial clusters for this object with the IS_LARGEST_DISTANCE strategy.
First the smallest observation object is selected as first cluster,
all following clusters are determined by observations that have the largest distance to the already existing clusters.
numberClusters | The number of initial clusters that will be created. |
|
protected |
Determines the initial clusters for this object with the IS_RANDOM strategy.
All clusters are created randomly.br>
numberClusters | The number of initial clusters that will be created. |
size_t Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::findCluster | ( | const Observation & | observation | ) |
Finds a best matching cluster for a given independent observation.
However, the observation is not added to this cluster, it's simply a lookup for the best matching cluster.
observation | The observation for that the best matching cluster is determined |
|
inline |
Returns whether this object holds a valid set of observations.
TSquareDistance Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::maximalSqrDistance | ( | ) | const |
Calculates the maximal square distance between the mean observation value of each clusters and all observations belonging to the cluster.
|
inlineexplicit |
Returns whether this object holds a valid set of observations.
|
inline |
Move operator.
clustering | The clustering object to be moved |
void Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::removeCluster | ( | const size_t | iterations = 5 , |
Worker * | worker = nullptr |
||
) |
Removes one cluster from this object.
The cluster with smallest maximal distance of all observations to the mean observation value of the clusters is removed.
iterations | The number of optimization iterations that are applied after the cluster has been removed, with range [1, infinity) |
worker | Optional worker object to distribute the computation |
|
inlinestaticprotected |
Determines the smallest observation (euclidean distance to origin) from a set of observations.
data | The observation data in which the smallest observation is determined, must be valid |
void Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::sortClusters | ( | ) |
Sorts the clusters regarding their number of elements.
|
inlinestaticprotected |
Returns the square distance between an observation and the origin.
observation | The observation for that the square distance is determined |
|
protected |
The current clusters of this object.
|
protected |
The data that stores the observations of this clustering object, either with index-access or pointer-access.