I talk I gave at KCL in 2006, obviously hoping that the visual allusion to both William Blake and Isaac Newton would rub off on my own reputation.
So, thanks to Math for the Masses I can now do equations using LaTeX, a typesetting language I have avoided learning to date, if only because so many people have told me that I simply must learn LaTeX.
To return to the problem at hand, why do we get a negative correlation between preoperative score & health gain?
The reason is that ‘health gain’ is a change score, calculated from the postoperative score minus preoperative score:
Health Gain = Postoperative Score – Preoperative Score
I haven’t worked out how to make LaTeX do ‘friendly’ equations so we will have to simplify this to:
c = b – a
The correlation between preoperative score (a) and health gain (c) is clearly the same as the correlation of a with b-a. We can write the equation for this as follows:
The important thing about this equation the top part. The standard deviations of pre- and postoperative scores are likely to be very similar: we have the same people filling in the same questionnaire on both occasions. Similarly, rab is likely to be positive because it’s the correlation between pre- and postoperative scores, and less than 1.0.
This means that rab × sdb will be almost always be smaller than sda so the expression will be negative, making the correlation between a and c negative.
Note also that the corrleation between pre- and postoperative scores might be quite small if the questionnaire is unreliable so the greater the measurement error, the greater the negative bias.
Let’s imagine we have a Patient-Reported Outcome Measure (PROM) for surgical outcomes and we give it to patients pre- and post- operatively to see if health status improves after surgery. In common with many questionnaire measures, our PROM has reliability of 0.7.
Let’s also suppose that the surgery is equally effective for everyone, increasing health status by 3 points. In this example, that’s an effect size of d=0.36.
We calculate ‘health gain’ as the difference between pre- and post- operative score:
Health Gain = Postoperative Score – Preoperative Score
We know this should be around 3 points for each person, give or take a bit due to measurement error.
What happens when we plot health gain against preoperative score?
The correlation between preoperative score and health gain is negative and significant: r = -0.46, p<0.05: can we conclude that surgery is more effective for patients with the lowest initial health status?
NO: because we know that everyone improved by the same amount, 3 points. The correlation should be zero.
Why this happens is quite easy to explain but will have to wait until I work out how to do equations in WordPress.
UPDATE: just in case you were wondering if this sort of thing happens in real life, this graph is taken from a BMJ article on PROMs (1) in which it is concluded that “better preoperative health tends to be associated with smaller, not larger, health gains “.
1. J. Appleby, “Patient reported outcome measures: how are we feeling today?” BMJ 344, no. jan11 2 (January 11, 2012): d8191-d8191.
Patient-reported outcome measures (PROMs) are essentially questionnaire scales designed to measure the treatment outcomes that matter to patients. These are usually characterised as ‘health status’ or ‘health-related quality of life’, but really there’s no reason to limit the scope of PROMs to these domains: you could argue, for example, that the numerous measures of severity of depression (PHQ-9, BDI etc.) are PROMs in their own right.
The NHS Information Centre states that:
“The health status information collected from patients by way of PROMs questionnaires before and after an intervention provides an indication of the outcomes or quality of care delivered to NHS Patients”
A key feature of PROMs analysis is the reliability of the data: the degree of measurement error entailed in collecting data using a PROM. Concerns about reliability can be dismissed because reliability is an index of unbiased measurement error and, well, that just cancels out with large samples, doesn’t it? Well, not always. And especially not if you use ‘change scores’.
The natural thing to do with a set of before and after scores is to subtract the ‘before’ from the ‘after’ and call that a ‘change score’ or ‘health gain’:
change score = health gain = [post-treatment score] – [pre-treatment score]
There are two related problems when you do this with PROMs data:
- The reliability of the change score is usually much lower than the reliability of the pre- and post- scores
- The correlation between pre-score and change score is usually strongly biased in the negative direction
What this means in practice is:
- Change scores tend to have greater measurement error than the scores they were derived from
- They tend also to have moderate (but spurious) negative correlations with pre-scores
Point (2) is interesting because it can be misinterpreted as an interaction effect: the treatment appears most effective for patients with the lowest pre-treatment scores. But the effect is entirely spurious.
What’s also interesting is that the size of the (spurious) negative correlation between pre-treatment score and change score increases as the reliability of the PROM decreases: the greater the measurement error, the larger the (spurious) correlation, and the more likely it is be significant.
So, to conclude: change scores are generally a bad idea for PROMs data, and the more unreliable the PROMs data, the worse the problem is.
What the researchers say:
“The sample size of 102 participants will be sufficient to detect an effect size of d=0.5 or greater with 80% power and significance of 0.05”
What the researchers mean:
“We have cut & pasted this from our last application as we were short of time. The effect size of d=0.5 was chosen because Cohen said something about it being a ‘medium’ effect size and it doesn’t really relate to the hypothesis of this study at all. Even if it did, 0.5 would probably be a point estimate, so really we should have chosen the lower bounds of this estimate for the sample size calculation. This would have made the sample size very large, however, so we didn’t. Likewise, we chose 80% power and 0.05 significance because everyone else does. Hopefully no-one on the funding committee will notice the similarity of this sample size calculation to every other sample size calculation.”
A student asked how to define initial cluster centres in SPSS K-means clustering and it proved surprisingly hard to find a reference to this online. It turns out to be very easy but I’m posting here to save everyone else the trouble of working it out from scratch.
SPSS offers Hierarchical cluster and K-means clustering. K-means clustering is often used to ‘fine tune’ the results of Hierarchical clustering, taking the cluster solution from Hierarchical clustering as its inputs.
The easiest way to set this up is to read the cluster centres in from an external SPSS datafile: the problem is finding out how this data file should be formatted.
The answer is that that SPSS requires one row of data for each cluster, and one column of cluster means for each variable. The first column must be called CLUSTER_ and is simply the cluster number for each row. So for a two-cluster solution with five variables it should look like this
The K-means clustering procedure can then be pointed to this file by ticking the Cluster Centers ‘Read initial’ option and telling SPSS where the ‘External data file’ is saved. Note that the ‘Number of Clusters’ also has to be set to the same number as defined in the data file.
See Jane Clatworthy’s paper here for further details on different clustering methods.