|
| ClusteringKMeans () |
| Creates an empty k-means object. More...
|
|
| ClusteringKMeans (ClusteringKMeans &&clustering) noexcept |
| Move constructor. More...
|
|
| ClusteringKMeans (const Data &data) |
| Creates a new k-means object by a given data object. More...
|
|
| ClusteringKMeans (Data &&data) |
| Creates a new k-means object by a given data object. More...
|
|
const Clusters & | clusters () const |
| Returns the clusters of this k-means clustering object. More...
|
|
void | sortClusters () |
| Sorts the clusters regarding their number of elements. More...
|
|
TSquareDistance | maximalSqrDistance () const |
| Calculates the maximal square distance between the mean observation value of each clusters and all observations belonging to the cluster. More...
|
|
void | determineClustersByNumber (const size_t numberClusters, const InitializationStrategy strategy=IS_LARGEST_DISTANCE, const size_t iterations=5, Worker *worker=nullptr) |
| Determines the clusters for this object, ensure that this object has been initialized with a valid set of observations. More...
|
|
void | determineClustersByDistance (const TSquareDistance maximalSqrDistance, size_t maximalClusters=0, const size_t iterations=5, Worker *worker=nullptr) |
| Determines the clusters for this object, ensure that this object has been initialized with a valid set of observations. More...
|
|
bool | addCluster (const size_t iterations=5, TSquareDistance sqrDistance=TSquareDistance(0), Worker *worker=nullptr) |
| Adds a new clusters for this object. More...
|
|
void | removeCluster (const size_t iterations=5, Worker *worker=nullptr) |
| Removes one cluster from this object. More...
|
|
size_t | findCluster (const Observation &observation) |
| Finds a best matching cluster for a given independent observation. More...
|
|
void | applyOptimizationIteration () |
| Explicitly applies one further optimization iteration for an existing set of clusters. More...
|
|
void | applyOptimizationIteration (Worker *worker) |
| Explicitly applies one further optimization iteration for an existing set of clusters. More...
|
|
void | clear () |
| Clears all determined clusters but registered the data information is untouched. More...
|
|
bool | isValid () const |
| Returns whether this object holds a valid set of observations. More...
|
|
| operator bool () const |
| Returns whether this object holds a valid set of observations. More...
|
|
ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices > & | operator= (ClusteringKMeans &&clustering) |
| Move operator. More...
|
|
template<typename T, size_t tDimension, typename TSum = T, typename TSquareDistance = T, bool tUseIndices = true>
class Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >
This class implements a k-means clustering algorithm.
Beware: Due to performance issues, this class does not copy the given observation values, this expects that the given observation values exist as long as the KMean object exists.
- Template Parameters
-
T | The data type of each element of an observation |
tDimension | The dimension of each observation (the number of elements in each observation), with range [1, infinity) |
TSum | The data type of the intermediate sum values, that is necessary to determine e.g. the mean parameters |
TSquareDistance | The data type of the square distance value, might be different from T |
template<typename T , size_t tDimension, typename TSum , typename TSquareDistance , bool tUseIndices>
Explicitly applies one further optimization iteration for an existing set of clusters.
Do not call this function before initial clusters have been found.
- See also
- clusters(), determineCluster().
template<typename T , size_t tDimension, typename TSum , typename TSquareDistance , bool tUseIndices>
Explicitly applies one further optimization iteration for an existing set of clusters.
Do not call this function before initial clusters have been found.
- Parameters
-
worker | The worker object to distribute the computation |
- See also
- clusters(), determineCluster().
template<typename T , size_t tDimension, typename TSum , typename TSquareDistance , bool tUseIndices>
void Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::applyOptimizationIterationSubset |
( |
Lock * |
lock, |
|
|
const unsigned int |
firstObservation, |
|
|
const unsigned int |
numberObservations |
|
) |
| |
|
protected |
Explicitly applies one further optimization iteration for an existing set of clusters.
This functions operates on a subset of all observations.
- Parameters
-
lock | Optional lock object if this function is executed on multiple threads in parallel |
firstObservation | The first observation that will be handled |
numberObservations | The number of observations that will be handled |
- See also
- clusters(), determineCluster().
template<typename T , size_t tDimension, typename TSum , typename TSquareDistance , bool tUseIndices>
void Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::determineClustersByDistance |
( |
const TSquareDistance |
maximalSqrDistance, |
|
|
size_t |
maximalClusters = 0 , |
|
|
const size_t |
iterations = 5 , |
|
|
Worker * |
worker = nullptr |
|
) |
| |
Determines the clusters for this object, ensure that this object has been initialized with a valid set of observations.
This function adds new clusters within several iterations until the defined maximalSqrDistance is larger than the distance within all clusters or until the defined maximal number of clusters is reached.
- Parameters
-
maximalSqrDistance | The maximal square distance in the final clusters between the clusters' mean observation values and the observations in the clusters |
maximalClusters | The maximal number of clusters that will be created (even if maximalSqrDistance is not reached), with range [0, infinity), define 0 to ignore this parameter |
iterations | The number of optimization iterations that are applied after each time a new cluster is added [1, infinity) |
worker | Optional worker object to distribute the computation |
template<typename T , size_t tDimension, typename TSum , typename TSquareDistance , bool tUseIndices>
void Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::determineInitialClustersLargestDistance |
( |
const size_t |
numberClusters | ) |
|
|
protected |
Determines the initial clusters for this object with the IS_LARGEST_DISTANCE strategy.
First the smallest observation object is selected as first cluster,
all following clusters are determined by observations that have the largest distance to the already existing clusters.
- Parameters
-
numberClusters | The number of initial clusters that will be created. |
template<typename T , size_t tDimension, typename TSum , typename TSquareDistance , bool tUseIndices>
void Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::determineInitialClustersRandom |
( |
const size_t |
numberClusters | ) |
|
|
protected |
Determines the initial clusters for this object with the IS_RANDOM strategy.
All clusters are created randomly.br>
- Parameters
-
numberClusters | The number of initial clusters that will be created. |
template<typename T , size_t tDimension, typename TSum , typename TSquareDistance , bool tUseIndices>
Finds a best matching cluster for a given independent observation.
However, the observation is not added to this cluster, it's simply a lookup for the best matching cluster.
- Parameters
-
observation | The observation for that the best matching cluster is determined |
- Returns
- The index of the best matching cluster, -1 if no cluster could be found
- See also
- clusters().
template<typename T , size_t tDimension, typename TSum , typename TSquareDistance , bool tUseIndices>
void Ocean::ClusteringKMeans< T, tDimension, TSum, TSquareDistance, tUseIndices >::removeCluster |
( |
const size_t |
iterations = 5 , |
|
|
Worker * |
worker = nullptr |
|
) |
| |
Removes one cluster from this object.
The cluster with smallest maximal distance of all observations to the mean observation value of the clusters is removed.
- Parameters
-
iterations | The number of optimization iterations that are applied after the cluster has been removed, with range [1, infinity) |
worker | Optional worker object to distribute the computation |