A student asked how to define initial cluster centres in SPSS K-means clustering and it proved surprisingly hard to find a reference to this online. It turns out to be very easy but I’m posting here to save everyone else the trouble of working it out from scratch.
SPSS offers Hierarchical cluster and K-means clustering. K-means clustering is often used to ‘fine tune’ the results of Hierarchical clustering, taking the cluster solution from Hierarchical clustering as its inputs.
The easiest way to set this up is to read the cluster centres in from an external SPSS datafile: the problem is finding out how this data file should be formatted.
The answer is that that SPSS requires one row of data for each cluster, and one column of cluster means for each variable. The first column must be called CLUSTER_ and is simply the cluster number for each row. So for a two-cluster solution with five variables it should look like this
The K-means clustering procedure can then be pointed to this file by ticking the Cluster Centers ‘Read initial’ option and telling SPSS where the ‘External data file’ is saved. Note that the ‘Number of Clusters’ also has to be set to the same number as defined in the data file.
See Jane Clatworthy’s paper here for further details on different clustering methods.