So, thanks to Math for the Masses I can now do equations using LaTeX, a typesetting language I have avoided learning to date, if only because so many people have told me that I simply must learn LaTeX.
To return to the problem at hand, why do we get a negative correlation between preoperative score & health gain?
The reason is that ‘health gain’ is a change score, calculated from the postoperative score minus preoperative score:
Health Gain = Postoperative Score – Preoperative Score
I haven’t worked out how to make LaTeX do ‘friendly’ equations so we will have to simplify this to:
c = b – a
The correlation between preoperative score (a) and health gain (c) is clearly the same as the correlation of a with b-a. We can write the equation for this as follows:
The important thing about this equation the top part. The standard deviations of pre- and postoperative scores are likely to be very similar: we have the same people filling in the same questionnaire on both occasions. Similarly, rab is likely to be positive because it’s the correlation between pre- and postoperative scores, and less than 1.0.
This means that rab × sdb will be almost always be smaller than sda so the expression will be negative, making the correlation between a and c negative.
Note also that the corrleation between pre- and postoperative scores might be quite small if the questionnaire is unreliable so the greater the measurement error, the greater the negative bias.