Non-probabilistic variant of Expectation Maximization. Uses hard membership parameters. Given data $D = {x_i}, i = 1 \dots N, K$
#* Randomly select K data points and use them as “means” #* Randomly assign each data point to one of K clusters, then compute the mean from each cluster
#* When the sum of squared error or distances from each point to its mean stops decreasing #* When no data point changes membership between iterations