This is C source code for a simple implementation of the popular k-means clustering algorithm. It is based on the implementation in Matlab, which was in turn based on GAF Seber, Multivariate Observations, 1964, and H Spath, Cluster Dissection and Analysis: Theory, FORTRAN Programs, Examples.
The algorithm is based on a two-pass implementation with an iterative "batch update" process occuring in the first pass and an iterative "point by point" update in the second pass. The "point by point" or "online update" process does not seem to be working, but that may just be a consequence of the particular type of datasets I have been working with. It is currently commented out - I welcome feedback on this, especially if somebody managed to fix it.
This code has currently been tested on a 2D dataset with tens of millions of points being grouped into <10 clusters. Note that the max number of clusters and max number of iterations are hard-coded using #define - you may need to change these for your application.
Test cases coming soon.