A student asked how to define initial cluster centres in SPSS K-means clustering and it proved surprisingly hard to find a reference to this online. It turns out to be very easy but I’m posting here to save everyone else the trouble of working it out from scratch.

SPSS offers Hierarchical cluster and K-means clustering. K-means clustering is often used to ‘fine tune’ the results of Hierarchical clustering, taking the cluster solution from Hierarchical clustering as its inputs.

The easiest way to set this up is to read the cluster centres in from an external SPSS datafile: the problem is finding out how this data file should be formatted.

The answer is that that SPSS requires one row of data for each cluster, and one column of cluster means for each variable. The first column must be called CLUSTER_ and is simply the cluster number for each row. So for a two-cluster solution with five variables it should look like this

CLUSTER_ |
Var_A |
Var_B |
Var_C |
Var_D |
Var_E |

1 |
2.99 |
3.00 |
2.99 |
2.83 |
2.87 |

2 |
2.15 |
2.72 |
2.13 |
1.87 |
2.52 |

The K-means clustering procedure can then be pointed to this file by ticking the Cluster Centers ‘Read initial’ option and telling SPSS where the ‘External data file’ is saved. Note that the ‘Number of Clusters’ also has to be set to the same number as defined in the data file.

See Jane Clatworthy’s paper here for further details on different clustering methods.

### Like this:

Like Loading...

*Related*

I am so grateful for this explanation – you have no idea! It has been impossible to find this information anywhere else. I still have a query, though, in regard to the initial cluster centers. Are these the mean values for the variables (used for clustering) for each cluster obtained in the hierarchical cluster analysis. I’m using Ward’s method and squared euclidean distance. Thank you in advance if you have time to respond.

Sorry not to have got back to you sooner – I don’t visit the blog that often. The values are, as you say, the mean values for the variables for each cluster in the HCA. You probably worked this out already!

I did, but thank you anyway. Kind regards.

Thank you very much for posting this!