Why change scores are generally a bad idea for PROMs data

Patient-reported outcome measures (PROMs) are essentially questionnaire scales designed to measure the treatment outcomes that matter to patients. These are usually characterised as ‘health status’ or ‘health-related quality of life’, but really there’s no reason to limit the scope of PROMs to these domains: you could argue, for example, that the numerous measures of severity of depression (PHQ-9, BDI etc.) are PROMs in their own right.

The NHS Information Centre states that:

“The health status information collected from patients by way of PROMs questionnaires before and after an intervention provides an indication of the outcomes or quality of care delivered to NHS Patients”

A key feature of PROMs analysis is the reliability of the data: the degree of measurement error entailed in collecting data using a PROM. Concerns about reliability can be dismissed because reliability is an index of unbiased measurement error and, well, that just cancels out with large samples, doesn’t it? Well, not always. And especially not if you use ‘change scores’.

The natural thing to do with a set of before and after scores is to subtract the ‘before’ from the ‘after’ and call that a ‘change score’ or ‘health gain’:

change score = health gain = [post-treatment score] – [pre-treatment score]

There are two related problems when you do this with PROMs data:

  1. The reliability of the change score is usually much lower than the reliability of the pre- and post- scores
  2. The correlation between pre-score and change score is usually strongly biased in the negative direction

What this means in practice is:

  1. Change scores tend to have greater measurement error than the scores they were derived from
  2. They tend also to have moderate (but spurious) negative correlations with pre-scores

Point (2) is interesting because it can be misinterpreted as an interaction effect: the treatment appears most effective for patients with the lowest pre-treatment scores. But the effect is entirely spurious.

What’s also interesting is that the size of the (spurious) negative correlation between pre-treatment score and change score increases as the reliability of the PROM decreases: the greater the measurement error, the larger the (spurious) correlation, and the more likely it is be significant.

So, to conclude: change scores are generally a bad idea for PROMs data, and the more unreliable the PROMs data, the worse the problem is.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s