PG-means: learning the number of clusters in data.
MetadataShow full item record
We present a novel algorithm called PG-means in this thesis. This algorithm is able to determine the number of clusters in a classical Gaussian mixture model automatically. PG-means uses efficient statistical hypothesis tests on one-dimensional projections of the data and model to determine if the examples are well represented by the model. In so doing, we apply a statistical test to the entire model at once, not just on a per-cluster basis. We show that this method works well in difficult cases such as overlapping clusters, eccentric clusters and high dimensional clusters. PG-means also works well on non-Gaussian clusters and many true clusters. Further, the new approach provides a much more stable estimate of the number of clusters than current methods.