Still Not Significant


What to do if your p-value is just over the arbitrary threshold for ‘significance’ of p=0.05?

You don’t need to play the significance testing game – there are better methods, like quoting the effect size with a confidence interval – but if you do, the rules are simple: the result is either significant or it isn’t.

So if your p-value remains stubbornly higher than 0.05, you should call it ‘non-significant’ and write it up as such. The problem for many authors is that this just isn’t the answer they were looking for: publishing so-called ‘negative results’ is harder than ‘positive results’.

The solution is to apply the time-honoured tactic of circumlocution to disguise the non-significant result as something more interesting. The following list is culled from peer-reviewed journal articles in which (a) the authors set themselves the threshold of 0.05 for significance, (b) failed to achieve that threshold value for p and (c) described it in such a way as to make it seem more interesting.

As well as being statistically flawed (results are either significant or not and can’t be qualified), the wording is linguistically interesting, often describing an aspect of the result that just doesn’t exist. For example, “a trend towards significance” expresses non-significance as some sort of motion towards significance, which it isn’t: there is no ‘trend’, in any direction, and nowhere for the trend to be ‘towards’.

Some further analysis will follow, but for now here is the list in full (UPDATE: now in alpha-order):

(barely) not statistically significant (p=0.052)
a barely detectable statistically significant difference (p=0.073)
a borderline significant trend (p=0.09)
a certain trend toward significance (p=0.08)
a clear tendency to significance (p=0.052)
a clear trend (p<0.09)
a clear, strong trend (p=0.09)
a considerable trend toward significance (p=0.069)
a decreasing trend (p=0.09)
a definite trend (p=0.08)
a distinct trend toward significance (p=0.07)
a favorable trend (p=0.09)
a favourable statistical trend (p=0.09)
a little significant (p<0.1)
a margin at the edge of significance (p=0.0608)
a marginal trend (p=0.09)
a marginal trend toward significance (p=0.052)
a marked trend (p=0.07)
a mild trend (p<0.09)
a moderate trend toward significance (p=0.068)
a near-significant trend (p=0.07)
a negative trend (p=0.09)
a nonsignificant trend (p<0.1)
a nonsignificant trend toward significance (p=0.1)
a notable trend (p<0.1)
a numerical increasing trend (p=0.09)
a numerical trend (p=0.09)
a positive trend (p=0.09)
a possible trend (p=0.09)
a possible trend toward significance (p=0.052)
a pronounced trend (p=0.09)
a reliable trend (p=0.058)
a robust trend toward significance (p=0.0503)
a significant trend (p=0.09)
a slight slide towards significance (p<0.20)
a slight tendency toward significance(p<0.08)
a slight trend (p<0.09)
a slight trend toward significance (p=0.098)
a slightly increasing trend (p=0.09)
a small trend (p=0.09)
a statistical trend (p=0.09)
a statistical trend toward significance (p=0.09)
a strong tendency towards statistical significance (p=0.051)
a strong trend (p=0.077)
a strong trend toward significance (p=0.08)
a substantial trend toward significance (p=0.068)
a suggestive trend (p=0.06)
a trend close to significance (p=0.08)
a trend significance level (p=0.08)
a trend that approached significance (p<0.06)
a very slight trend toward significance (p=0.20)
a weak trend (p=0.09)
a weak trend toward significance (p=0.12)
a worrying trend (p=0.07)
all but significant (p=0.055)
almost achieved significance (p=0-065)
almost approached significance (p=0.065)
almost attained significance (p<0.06)
almost became significant (p=0.06)
almost but not quite significant (p=0.06)
almost clinically significant (p<0.10)
almost insignificant (p>0.065)
almost marginally significant (p>0.05)
almost non-significant (p=0.083)
almost reached statistical significance (p=0.06)
almost significant (p=0.06)
almost significant tendency (p=0.06)
almost statistically significant (p=0.06)
an adverse trend (p=0.10)
an apparent trend (p=0.286)
an associative trend (p=0.09)
an elevated trend (p<0.05)
an encouraging trend (p<0.1)
an established trend (p<0.10)
an evident trend (p=0.13)
an expected trend (p=0.08)
an important trend (p=0.066)
an increasing trend (p<0.09)
an interesting trend (p=0.1)
an inverse trend toward significance (p=0.06)
an observed trend (p=0.06)
an obvious trend (p=0.06)
an overall trend (p=0.2)
an unexpected trend (p=0.09)
an unexplained trend (p=0.09)
an unfavorable trend (p<0.10)
appeared to be marginally significant (p<0.10)
approached acceptable levels of statistical significance (p=0.054)
approached but did not quite achieve significance (p>0.05)
approached but fell short of significance (p=0.07)
approached conventional levels of significance (p<0.10)
approached near significance (p=0.06)
approached our criterion of significance (p>0.08)
approached significant (p=0.11)
approached the borderline of significance (p=0.07)
approached the level of significance (p=0.09)
approached trend levels of significance (p0.05)
approached, but did reach, significance (p=0.065)
approaches but fails to achieve a customary level of statistical significance (p=0.154)
approaches statistical significance (p>0.06)
approaching a level of significance (p=0.089)
approaching an acceptable significance level (p=0.056)
approaching borderline significance (p=0.08)
approaching borderline statistical significance (p=0.07)
approaching but not reaching significance (p=0.53)
approaching clinical significance (p=0.07)
approaching close to significance (p<0.1)
approaching conventional significance levels (p=0.06)
approaching conventional statistical significance (p=0.06)
approaching formal significance (p=0.1052)
approaching independent prognostic significance (p=0.08)
approaching marginal levels of significance p<0.107)
approaching marginal significance (p=0.064)
approaching more closely significance (p=0.06)
approaching our preset significance level (p=0.076)
approaching prognostic significance (p=0.052)
approaching significance (p=0.09)
approaching the traditional significance level (p=0.06)
approaching to statistical significance (p=0.075)
approaching, although not reaching, significance (p=0.08)
approaching, but not reaching, significance (p<0.09)
approximately significant (p=0.053)
approximating significance (p=0.09)
arguably significant (p=0.07)
as good as significant (p=0.0502)
at the brink of significance (p=0.06)
at the cusp of significance (p=0.06)
at the edge of significance (p=0.055)
at the limit of significance (p=0.054)
at the limits of significance (p=0.053)
at the margin of significance (p=0.056)
at the margin of statistical significance (p<0.07)
at the verge of significance (p=0.058)
at the very edge of significance (p=0.053)
barely below the level of significance (p=0.06)
barely escaped statistical significance (p=0.07)
barely escapes being statistically significant at the 5% risk level (0.1>p>0.05)
barely failed to attain statistical significance (p=0.067)
barely fails to attain statistical significance at conventional levels (p<0.10
barely insignificant (p=0.075)
barely missed statistical significance (p=0.051)
barely missed the commonly acceptable significance level (p<0.053)
barely outside the range of significance (p=0.06)
barely significant (p=0.07)
below (but verging on) the statistical significant level (p>0.05)
better trends of improvement (p=0.056)
bordered on a statistically significant value (p=0.06)
bordered on being significant (p>0.07)
bordered on being statistically significant (p=0.0502)
bordered on but was not less than the accepted level of significance (p>0.05)
bordered on significant (p=0.09)
borderline conventional significance (p=0.051)
borderline level of statistical significance (p=0.053)
borderline significant (p=0.09)
borderline significant trends (p=0.099)
close to a marginally significant level (p=0.06)
close to being significant (p=0.06)
close to being statistically significant (p=0.055)
close to borderline significance (p=0.072)
close to the boundary of significance (p=0.06)
close to the level of significance (p=0.07)
close to the limit of significance (p=0.17)
close to the margin of significance (p=0.055)
close to the margin of statistical significance (p=0.075)
closely approaches the brink of significance (p=0.07)
closely approaches the statistical significance (p=0.0669)
closely approximating significance (p>0.05)
closely not significant (p=0.06)
closely significant (p=0.058)
close-to-significant (p=0.09)
did not achieve conventional threshold levels of statistical significance (p=0.08)
did not exceed the conventional level of statistical significance (p<0.08)
did not quite achieve acceptable levels of statistical significance (p=0.054)
did not quite achieve significance (p=0.076)
did not quite achieve the conventional levels of significance (p=0.052)
did not quite achieve the threshold for statistical significance (p=0.08)
did not quite attain conventional levels of significance (p=0.07)
did not quite reach a statistically significant level (p=0.108)
did not quite reach conventional levels of statistical significance (p=0.079)
did not quite reach statistical significance (p=0.063)
did not reach the traditional level of significance (p=0.10)
did not reach the usually accepted level of clinical significance (p=0.07)
difference was apparent (p=0.07)
direction heading towards significance (p=0.10)
does not appear to be sufficiently significant (p>0.05)
does not narrowly reach statistical significance (p=0.06)
does not reach the conventional significance level (p=0.098)
effectively significant (p=0.051)
equivocal significance (p=0.06)
essentially significant (p=0.10)
extremely close to significance (p=0.07)
failed to reach significance on this occasion (p=0.09)
failed to reach statistical significance (p=0.06)
fairly close to significance (p=0.065)
fairly significant (p=0.09)
falls just short of standard levels of statistical significance (p=0.06)
fell (just) short of significance (p=0.08)
fell barely short of significance (p=0.08)
fell just short of significance (p=0.07)
fell just short of statistical significance (p=0.12)
fell just short of the traditional definition of statistical significance (p=0.051)
fell marginally short of significance (p=0.07)
fell narrowly short of significance (p=0.0623)
fell only marginally short of significance (p=0.0879)
fell only short of significance (p=0.06)
fell short of significance (p=0.07)
fell slightly short of significance (p>0.0167)
fell somewhat short of significance (p=0.138)
felt short of significance (p=0.07)
flirting with conventional levels of significance (p>0.1)
heading towards significance (p=0.086)
highly significant (p=0.09)
hint of significance (p>0.05)
hovered around significance (p = 0.061)
hovered at nearly a significant level (p=0.058)
hovering closer to statistical significance (p=0.076)
hovers on the brink of significance (p=0.055)
in the edge of significance (p=0.059)
in the verge of significance (p=0.06)
inconclusively significant (p=0.070)
indeterminate significance (p=0.08)
indicative significance (p=0.08)
is just outside the conventional levels of significance
just about significant (p=0.051)
just above the arbitrary level of significance (p=0.07)
just above the margin of significance (p=0.053)
just at the conventional level of significance (p=0.05001)
just barely below the level of significance (p=0.06)
just barely failed to reach significance (p<0.06)
just barely insignificant (p=0.11)
just barely statistically significant (p=0.054)
just beyond significance (p=0.06)
just borderline significant (p=0.058)
just escaped significance (p=0.07)
just failed significance (p=0.057)
just failed to be significant (p=0.072)
just failed to reach statistical significance (p=0.06)
just failing to reach statistical significance (p=0.06)
just fails to reach conventional levels of statistical significance (p=0.07)
just lacked significance (p=0.053)
just marginally significant (p=0.0562)
just missed being statistically significant (p=0.06)
just missing significance (p=0.07)
just on the verge of significance (p=0.06)
just outside accepted levels of significance (p=0.06)
just outside levels of significance (p<0.08)
just outside the bounds of significance (p=0.06)
just outside the conventional levels of significance (p=0.1076)
just outside the level of significance (p=0.0683)
just outside the limits of significance (p=0.06)
just outside the traditional bounds of significance (p=0.06)
just over the limits of statistical significance (p=0.06)
just short of significance (p=0.07)
just shy of significance (p=0.053)
just skirting the boundary of significance (p=0.052)
just tendentially significant (p=0.056)
just tottering on the brink of significance at the 0.05 level
just very slightly missed the significance level (p=0.086)
leaning towards significance (p=0.15)
leaning towards statistical significance (p=0.06)
likely to be significant (p=0.054)
loosely significant (p=0.10)
marginal significance (p=0.07)
marginally and negatively significant (p=0.08)
marginally insignificant (p=0.08)
marginally nonsignificant (p=0.096)
marginally outside the level of significance
marginally significant (p>=0.1)
marginally significant tendency (p=0.08)
marginally statistically significant (p=0.08)
may not be significant (p=0.06)
medium level of significance (p=0.051)
mildly significant (p=0.07)
missed narrowly statistical significance (p=0.054)
moderately significant (p>0.11)
modestly significant (p=0.09)
narrowly avoided significance (p=0.052)
narrowly eluded statistical significance (p=0.0789)
narrowly escaped significance (p=0.08)
narrowly evaded statistical significance (p>0.05)
narrowly failed significance (p=0.054)
narrowly missed achieving significance (p=0.055)
narrowly missed overall significance (p=0.06)
narrowly missed significance (p=0.051)
narrowly missed standard significance levels (p<0.07)
narrowly missed the significance level (p=0.07)
narrowly missing conventional significance (p=0.054)
near limit significance (p=0.073)
near miss of statistical significance (p>0.1)
near nominal significance (p=0.064)
near significance (p=0.07)
near to statistical significance (p=0.056)
near/possible significance(p=0.0661)
near-borderline significance (p=0.10)
near-certain significance (p=0.07)
nearing significance (p<0.051)
nearly acceptable level of significance (p=0.06)
nearly approaches statistical significance (p=0.079)
nearly borderline significance (p=0.052)
nearly negatively significant (p<0.1)
nearly positively significant (p=0.063)
nearly reached a significant level (p=0.07)
nearly reaching the level of significance (p<0.06)
nearly significant (p=0.06)
nearly significant tendency (p=0.06)
nearly, but not quite significant (p>0.06)
near-marginal significance (p=0.18)
near-significant (p=0.09)
near-to-significance (p=0.093)
near-trend significance (p=0.11)
nominally significant (p=0.08)
non-insignificant result (p=0.500)
non-significant in the statistical sense (p>0.05
not absolutely significant but very probably so (p>0.05)
not as significant (p=0.06)
not clearly significant (p=0.08)
not completely significant (p=0.07)
not completely statistically significant (p=0.0811)
not conventionally significant (p=0.089), but..
not currently significant (p=0.06)
not decisively significant (p=0.106)
not entirely significant (p=0.10)
not especially significant (p>0.05)
not exactly significant (p=0.052)
not extremely significant (p<0.06)
not formally significant (p=0.06)
not fully significant (p=0.085)
not globally significant (p=0.11)
not highly significant (p=0.089)
not insignificant (p=0.056)
not markedly significant (p=0.06)
not moderately significant (P>0.20)
not non-significant (p>0.1)
not numerically significant (p>0.05)
not obviously significant (p>0.3)
not overly significant (p>0.08)
not quite borderline significance (p>=0.089)
not quite reach the level of significance (p=0.07)
not quite significant (p=0.118)
not quite within the conventional bounds of statistical significance (p=0.12)
not reliably significant (p=0.091)
not remarkably significant (p=0.236)
not significant by common standards (p=0.099)
not significant by conventional standards (p=0.10)
not significant by traditional standards (p<0.1)
not significant in the formal statistical sense (p=0.08)
not significant in the narrow sense of the word (p=0.29)
not significant in the normally accepted statistical sense (p=0.064)
not significantly significant but..clinically meaningful (p=0.072)
not statistically quite significant (p<0.06)
not strictly significant (p=0.06)
not strictly speaking significant (p=0.057)
not technically significant (p=0.06)
not that significant (p=0.08)
not to an extent that was fully statistically significant (p=0.06)
not too distant from statistical significance at the 10% level
not too far from significant at the 10% level
not totally significant (p=0.09)
not unequivocally significant (p=0.055)
not very definitely significant (p=0.08)
not very definitely significant from the statistical point of view (p=0.08)
not very far from significance (p<0.092)
not very significant (p=0.1)
not very statistically significant (p=0.10)
not wholly significant (p>0.1)
not yet significant (p=0.09)
not strongly significant (p=0.08)
noticeably significant (p=0.055)
on the border of significance (p=0.063)
on the borderline of significance (p=0.0699)
on the borderlines of significance (p=0.08)
on the boundaries of significance (p=0.056)
on the boundary of significance (p=0.055)
on the brink of significance (p=0.052)
on the cusp of conventional statistical significance (p=0.054)
on the cusp of significance (p=0.058)
on the edge of significance (p>0.08)
on the limit to significant (p=0.06)
on the margin of significance (p=0.051)
on the threshold of significance (p=0.059)
on the verge of significance (p=0.053)
on the very borderline of significance (0.05<p<0.06)
on the very fringes of significance (p=0.099)
on the very limits of significance (0.1>p>0.05)
only a little short of significance (p>0.05)
only just failed to meet statistical significance (p=0.051)
only just insignificant (p>0.10)
only just missed significance at the 5% level
only marginally fails to be significant at the 95% level (p=0.06)
only marginally nearly insignificant (p=0.059)
only marginally significant (p=0.9)
only slightly less than significant (p=0.08)
only slightly missed the conventional threshold of significance (p=0.062)
only slightly missed the level of significance (p=0.058)
only slightly missed the significance level (p=0·0556)
only slightly non-significant (p=0.0738)
only slightly significant (p=0.08)
partial significance (p>0.09)
partially significant (p=0.08)
partly significant (p=0.08)
perceivable statistical significance (p=0.0501)
possible significance (p<0.098)
possibly marginally significant (p=0.116)
possibly significant (0.05<p>0.10)
possibly statistically significant (p=0.10)
potentially significant (p>0.1)
practically significant (p=0.06)
probably not experimentally significant (p=0.2)
probably not significant (p>0.25)
probably not statistically significant (p=0.14)
probably significant (p=0.06)
provisionally significant (p=0.073)
quasi-significant (p=0.09)
questionably significant (p=0.13)
quite close to significance at the 10% level (p=0.104)
quite significant (p=0.07)
rather marginal significance (p>0.10)
reached borderline significance (p=0.0509)
reached near significance (p=0.07)
reasonably significant (p=0.07)
remarkably close to significance (p=0.05009)
resides on the edge of significance (p=0.10)
roughly significant (p>0.1)
scarcely significant (0.05<p>0.1)
significant at the .07 level
significant tendency (p=0.09)
significant to some degree (0<p>1)
significant, or close to significant effects (p=0.08, p=0.05)
significantly better overall (p=0.051)
significantly significant (p=0.065)
similar but not nonsignificant trends (p>0.05)
slight evidence of significance (0.1>p>0.05)
slight non-significance (p=0.06)
slight significance (p=0.128)
slight tendency toward significance (p=0.086)
slightly above the level of significance (p=0.06)
slightly below the level of significance (p=0.068)
slightly exceeded significance level (p=0.06)
slightly failed to reach statistical significance (p=0.061)
slightly insignificant (p=0.07)
slightly less than needed for significance (p=0.08)
slightly marginally significant (p=0.06)
slightly missed being of statistical significance (p=0.08)
slightly missed statistical significance (p=0.059)
slightly missed the conventional level of significance (p=0.061)
slightly missed the level of statistical significance (p<0.10)
slightly missed the margin of significance (p=0.051)
slightly not significant (p=0.06)
slightly outside conventional statistical significance (p=0.051)
slightly outside the margins of significance (p=0.08)
slightly outside the range of significance (p=0.09)
slightly outside the significance level (p=0.077)
slightly outside the statistical significance level (p=0.053)
slightly significant (p=0.09)
somewhat marginally significant (p>0.055)
somewhat short of significance (p=0.07)
somewhat significant (p=0.23)
somewhat statistically significant (p=0.092)
strong trend toward significance (p=0.08)
sufficiently close to significance (p=0.07)
suggestive but not quite significant (p=0.061)
suggestive of a significant trend (p=0.08)
suggestive of statistical significance (p=0.06)
suggestively significant (p=0.064)
tailed to insignificance (p=0.1)
tantalisingly close to significance (p=0.104)
technically not significant (p=0.06)
teetering on the brink of significance (p=0.06)
tend to significant (p>0.1)
tended to approach significance (p=0.09)
tended to be significant (p=0.06)
tended toward significance (p=0.13)
tendency toward significance (p approaching 0.1)
tendency toward statistical significance (p=0.07)
tends to approach significance (p=0.12)
tentatively significant (p=0.107)
too far from significance (p=0.12)
trend bordering on statistical significance (p=0.066)
trend in a significant direction (p=0.09)
trend in the direction of significance (p=0.089)
trend significance level (p=0.06)
trend toward (p>0.07)
trending towards significance (p>0.15)
trending towards significant (p=0.099)
uncertain significance (p>0.07)
vaguely significant (p>0.2)
verged on being significant (p=0.11)
verging on significance (p=0.056)
verging on the statistically significant (p<0.1)
verging-on-significant (p=0.06)
very close to approaching significance (p=0.060)
very close to significant (p=0.11)
very close to the conventional level of significance (p=0.055)
very close to the cut-off for significance (p=0.07)
very close to the established statistical significance level of p=0.05 (p=0.065)
very close to the threshold of significance (p=0.07)
very closely approaches the conventional significance level (p=0.055)
very closely brushed the limit of statistical significance (p=0.051)
very narrowly missed significance (p<0.06)
very nearly significant (p=0.0656)
very slightly non-significant (p=0.10)
very slightly significant (p<0.1)
virtually significant (p=0.059)
weak significance (p>0.10)
weakened..significance (p=0.06)
weakly non-significant (p=0.07)
weakly significant (p=0.11)
weakly statistically significant (p=0.0557)
well-nigh significant (p=0.11)

126 responses to “Still Not Significant

  1. Reblogged this on Mr Epidemiology and commented:
    A handy alphabetized list of various different ways of stating your results when p > .05! I think my favourites are “teetering on the brink of significance (P=0.06)” and “not significant in the narrow sense of the word (P=0.29)”

  2. Pingback: Oh yeah, it’s significant. REALLY significant. | The Mad Scientist Confectioner's Club

  3. Pingback: Tantalisingly close to significance | Quomodocumque

  4. I love the 0.23 as “somewhat significant”. Um, no.

  5. I would love to see the list sorted by p value instead of alphabetically.

  6. Thanks for the chuckle – the list is indeed amusing, but the key point above is that the p-value threshold is arbitrary. This fact is now widely accepted, so a strict dichotomy between “significant” and “non-significant” no longer makes sense. It is a bit of a fudge – and one completely unnecessary if (e.g.) a Bayesian approach is adopted – but we prefer to see “significance” as a continuum; phrases such as “marginally significant” represent uncertainty in the threshold location and therefore do make some sort of sense.

    It is *always* the case that calling p=0.49 “significant” and p=0.51 “non-significant” is just plain silly.

    • Good reply Mark. I concur completely and often tell clients to use the term marginally significant for values close to 0.05 (on either side). It is better that they talk about these things then just sweep them under the rug and ignore them because they are “not significant”.

      • Joanne Yaffe

        Significance is not really the important question. How important is the finding? Report effect sizes!

    • Thanks for the comment. I think there is confusion over the threshold being arbitrary, i.e. 0.05 rather than 0.06, and the arbitrariness of having a threshold at all.
      If we agree that there isn’t really a need for a threshold and just discuss the p-values directly, then ‘significant’ and ‘marginally significant’ both become meaningless.

  7. Pingback: A borderline definite marginally mild notably numerically increasing suggestively verging on significant result | Scientific News

  8. Reblogged this on JHND NOTES: The Journal of Human Nutrition and Dietetics Editor's Blog and commented:
    Prospective authors and students take note. Not significant means not significant, no matter how much you wish it otherwise.

    • Sorry, a result that is not statistically significant can indeed have practical or clinical significance. It’s the flip side of the typical admonishment about statistical significance not necessarily having practical significance. So, it has nothing to do wishes. It’s odd to hold your viewpoint when NHST itself will reveal that a test statistic that results in a p less than .05 is not significantly different from many values of a test statistic that result in p values greater than .05. It seems that the wishing is that the values on each side of the .05 fence are actually different from one another. You can wish all you want, but a great number of them are not, by your own criterion.

      • That should read: ” that a test statistic that results in a p less than .05 is not significantly different from many values of a test statistic that result in p values greater than .05. ”

        I must have deleted the end of the sentence while editing.

      • I do not agree. The problem is that a result not statistically significant is a “no result” or, perhaps better, is the absence of result. Doesn’t make much sense to talk about the practical significance of something whose existence has not been able to show. This situation is not the symmetric back of that other in that we have discussed the practical importance of an effect that has been demonstrated (statistically significant).

  9. Pingback: Somewhere else, part 56 | Freakonometrics

  10. Pingback: Some Links | Meta Rabbit

  11. in APA there should not be a 0 before the decimal point

  12. Pingback: On the present problems of publications, and possibly the coming futue? Some Labyrinthine musings. | Åse Fixes Science

  13. Pingback: I’ve got your missing links right here (01 June 2013) – Phenomena: Not Exactly Rocket Science

  14. I made a word cloud of the list with all the variations of the word “significant” removed and creative spelling standardized:

  15. Pingback: A borderline definite marginally mild notably numerically increasing suggestively verging on significant result |

  16. Pingback: “Presuntuosos” y “remilgados” en estadística | psy'n'thesis

  17. To say a result is either significant or not is glib. First of all many people, especially non scientists, will confuse statistical significance with substantive significance ( economic, psychological whatever). So what if an effect is statistically significant but tiny? Statistical “insignificance” means you can’t reject the null of 0 but you can’t reject other hypotheses too: so why the fetish about one? The standard error tells you how *precisely* determined the result is.
    Say you measure two effects, A with size 1 and confidence interval (-1,3) and B with size 0.5 with CI (0.3, 0.8) . Would you really conclude that B has a bigger effect than A? This would be silly but the practise of only counting “significant” results commonly leads to this.

    • What I conclude is that, most likely, there is a effect “B” of positive sign, and that, however, is much less likely that there is a effect “A” of positive sign. I believe that it is not appropriate to compare the magnitude of an effect that, reasonably, there is with the one of whose existence I am not a sufficient security.

  18. Pingback: borderline significance | Game Dasein

  19. Truly, G-d loves the .06 nearly as much as the .05!

  20. Pingback: The Messy Machine » “Although our results are not significant…” (a rant)

  21. Pingback: P-values: Destroying the Barrier Between Scientific and Creative Writing | Instead of Facebook

  22. Pingback: Comment être sûr qu’un résultat scientifique est vrai ? | Science étonnante

  23. @Aaron Levitt–I agree! Love to know the journals these were published in.

  24. Pingback: Not-So-Critical Analysis | University of Glasgow SLS

  25. Pingback: Rebecca D. Gill » Blog Archive » If it’s not significant,

  26. Pingback: [轉載] When p-value is slightly larger than 0.05…. | 生活的紀念冊

  27. Pingback: A Significantly Improved Significance Test. Not! | Patient 2 Earn

  28. I incorporated your list in a test of significance (implemented in R). Every time the p-value is between 0.12-0.5 it randomly selects one of you “p excuses” :)

  29. Pingback: I’m Using the New Statistics |

  30. Pingback: On the hazards of significance testing. Part 2: the false discovery rate, or how not to make a fool of yourself with P values

  31. Pingback: The Cult of p(0.05) |

  32. Reblogged this on Le blog de Michaël.

  33. I’ve never understood the statisticians’ overly dogmatic objections to the way these p-values are discussed. All a p-value of 0.05 means is there’s a 95% chance that the hypothesis was indeed working. A p-value of 0.051 means there was a 94.9% of the hypothesis was working. I agree that if one prespecifies 0.05 as the threshold then, yes, the p-value of 0.051 is not significant. But is to say it was “almost significant” such a travesty? Statisticians often treat a p-value of 0.051 the same as a p-value of 0.70…which makes little sense to anyone with some connection to logic. Should, for example, FDA have pretty strict inflexibility on p-values? Yes!! But please relax and use common sense when talking about the way some talk about these statistics. And, no, I’m not defending the p=0.29 example!! :)

    • No, the p-value of 0.05 doesn’t mean what you’re saying at all. You’re committing a logical fallacy. It tells you about the probability of your results if the null were actually true and has little to nothing to say about the probability that your alternative is true. Furthermore, statistical tests have philosophies of that underpin the very nature of the test and what you can take out of them for meaning. The 0.05 is arbitrary, yes, but modifying what values are important after you calculate them complete changes their meaning.
      What you do imply is that you want a couple of different kinds of uses of p-values and that’s fine (although it’s not great as an evidence measure it’s been used as one); but people need to be clear a priori how they’re using them. Thus, statements like, “almost significant” usually have very little meaning because they’re post hoc efforts to cram one philosophy of statistics into another. State at the outset that your p-value is a measure of evidence, that you have no pre-conceived test per se, don’t mention testing, and you’d be fine using 0.06 in a qualitative statement about how believable the null is. But trying to do that afterwards is corrupts the value of doing any testing at all.
      You might want to look at Gigerenzer’s “Mindless Statistics”.

      • This stems from the hybrid logic used in psychology in general. Rather than using a Fisherian “Report P-observed, replicate” or a Neymon-Pearsonian “Fix alpha, set sample size sufficient to detect departures of interest”, we have a “Pseudo fix alpha, mumble something about significance, complain a lot about the procedure not detecting a difference despite not doing prospective investigation about how the procedure would do”.

      • THANK YOU. I was starting to rip my hair out reading many of the other responses.

      • Bingo! You nailed it, John! The p-value has NOTHING to do with the alternate hypothesis. It’s easy to demonstrate. Generate a list of 1000 random numbers in XL in column A. Then extend this column across the spreadsheet for 1000 columns (so you have 1000×1000 random numbers. Then plug this dataset into a stats program (I used JMP) and ask the software to detect “significant associations between the outcome variable (column A) and any other predictor variable (all other individual columns). Amazingly (NOT!!!) 5% of the “predictor variables” will have a significant association with the outcome variable (at P<0.05, and some way less than that). Now think back – you created random sets of numbers. There is NO way that the data in Column XXX is associated with Column A. And that's because the p-value simply told us that this result occurred by chance.

  34. Pingback: Somewhere else, part 123 | Freakonometrics

  35. Pingback: Does researching casual marijuana use cause brain abnormalities? | Bits of DNA

  36. Pingback: The Futility of Significance (Statistical, that is) | The Couch Psychologist

  37. Tapio Branvinn

    “(results are either significant or not and can’t be qualified)”

    How’s that? There’s no reason why you would have to make a clear cut decision, for or against, in a scientific paper. It’s perfectly legitimate to report that a p-value of 0.051 for example provides weak (but clearly inconclusive) evidence against the H0.

    • But in all the examples the authors elected to do exactly that: make a clear-cut decision based on a threshold they themselves chose.

    • Sometimes we need clear and predefined cut-off points. For example, the decision on public funding of a medication can be based on an “adequate” proof of the existence of an effect (the significance compared to H0 or, what is the same, the limits of the CI95%) of a magnitude “sufficient” (the difference with the control group).

  38. Pingback: Does Researching Casual Marijuana Use Cause Brain Abnormalities? | The Falling Darkness

  39. Pingback: Verging on a borderline trend | Stats Chat

  40. Pingback: Felix Schönbrodt's website

  41. Pingback: Reanalyzing the Schnall/Johnson “cleanliness” data sets: New insights from Bayesian and robust approaches ← Patient 2 Earn

  42. Pingback: Why Have Female Hurricanes Killed More People Than Male Ones? – Phenomena: Not Exactly Rocket Science

  43. Do yo have a list of references from where you picked up this phrasing?

  44. Pingback: Female-Named Hurricanes Are Deadlier Than Males, But Why? – | Premium News Update

  45. Pingback: Why Have Female Hurricanes Killed More People Than Male Ones? | Gaia Gazette

  46. In psychology, given how frequently we’re doing things like assuming normality when what we’ve got is approximate normality, it is definitely worth reporting nearly-significant results as such; minor violations of our assumptions and similar things are ubiquitous, so .002 away from significance is absolutely not the same as .5 away. For exactly the same reason that .001 and .05 are two different significance levels and we don’t just say “significant”.

  47. Can I make a t-shirt of this list? I promise not to wear it to dissertation defenses.

  48. Pingback: Hurricanes with feminine names are probably NOT more destructive | Matter Of Facts

  49. Pingback: The RedPen/BlackPen Guide To The Sciencing! | The Mad Scientist Confectioner's Club

  50. Curious what anyone would have to say about this article by David Healy:

  51. Reblogged this on Gauss17gon and commented:
    A must for a stats student starting research!

  52. This blog was… how do I say it? Relevant!! Finally I’ve found something which helped me.
    Many thanks!

  53. I almost understand…………………………………………………………………….

  54. more information needed………………………………………

  55. Hi there – was this review ever published formally? I’d love to cite it if it were published in a journal somewhere.


  56. Thank you for this blog – just what I needed!

  57. I feel that this article is a bit misguided (although the list is funny). In the real world there is no single godlike level for alpha beyond which all p values are meaningless. Surely we can decide the level we are willing to accept for alpha based on our knowledge of the experiment and the data or we can ignore any arbitrary cutoff completely and just report the actual p value and explain what it means in a way the reader can understand?

    • Well, that’s sort of the point: all the studies here had 0.05 as their level of significance. So it is arbitrary, but the authors all chose it. And then decided it was a movable feast only when the results weren’t what they were hoping for. The wording wasn’t chosen to help the reader understand; it’s a rhetorical device to mislead the reader.

  58. Pingback: Science in the Abstract: Don’t Judge a Study by its Cover | Absolutely Maybe

  59. I encourage my psychology students to comment on non-significant trends if p<.10 (unless they already have enough significant results to address). If the result is over .05 then it isn't significant at the accepted level but does provide weak evidence for an effect (i.e. weaker than if the p value was significant).

    Following today's xkcd (1478) I'm wondering if there's a better way of putting it. However, I don't like most of the phrases in the above list, so maybe I should just stick with the ones I already use (non-significant trend, weak evidence for an effect).

    • I think that would be a mistake, starting with the notion of ‘non-significant trends’. There’s no ‘trend’ at all, just a ‘near-miss’, which is not the same thing. ‘Trend’ implies some movement towards significance, and you don’t get this from a single p-value. ‘Non-significant trend’ just means the same thing as ‘non-significant’, and ‘weak evidence for an effect’ is misleading. The distribution of p under the null hypothesis is flat, i.e. a p value of 0.06 is just as likely as 0.96; neither is a ‘trend’.

      • Whether or not there is a trend in the data (e.g. one mean is different from the others) is separate from the significance test which calculates the probability that the observed trend might have been due to chance, if the null hypothesis of no differences is correct. A value just above .05 still means that the null hypothesis is likely to be incorrect, but the evidence is weaker than the accepted cut-off.

      • That’s correct, but then what is the cut-off for? Treating the p-value as a sliding scale is justifiable, but the concept of ‘significance’ is then unnecessary. It’s inconsistent to choose a cut-off and then only apply it if the results go the way we want: ‘weak evidence for an effect’ is not the same as ‘by the standards set in this experiment the evidence was insufficient to conclude that there was an effect’.
        I misread your phrasing on ‘trend’ as being applied to the significance, but I think the same argument applies (and also, I wouldn’t describe a difference in means as a trend).

      • I agree that “non-significant trend” doesn’t really fit when talking about the effect of an IV with 3+ levels. However, I stand by my advice to students about appropriately reporting effects that just miss the .05 criteria. If the effect is significant at the .05 level then then should discuss it with confidence. If the evidence for an effect is only significant at .10 then they can make tentative conclusions about it. I don’t like “weakly significant” because, as you say, any given p value is either significant at the given level or it isn’t. On the other hand, changing “there was a significant effect of IV on DV (…, p=.040)” to “there was weak evidence for an effect of IN on DV (…, p=.060)” is a reasonable and simple change to make.

        Two caveats.

        1) I am making no comment here about whether a study with no significant results is publishable in a peer reviewed journal, though perhaps it would make sense for small studies showing weak evidence for effects to be accepted to reduce the negative correlation between study size and reported effect size. My advice is primarily directed at psychology students completing coursework and dissertations.

        2) If a Bonferroni correction has been applied to keep the family-wise error rate at .05 then I would ignore all effects with p values of over .05 and use “weak evidence” to refer to effects where p<.05 but is not under the corrected value.

  60. Pingback: P-Values interpretation | Poisoned Coffee

  61. Pingback: A new word for statistical significance: ‘psignificant’ | Liam Shaw – Blog

  62. Pingback: Statistical significance and clinical importance | Anne Bruton's Blog

  63. Should the approach used be important. For instance, if a within subject design is used, then anything bigger than .05 should be deemed insignificant and not subject to further discussion. This is because of the power of the design. Are there any cites on the inappropriateness of going higher than .05?

  64. Pingback: Saturday assorted links

  65. Pingback: Saturday assorted links | Homines Economici

  66. The significance testing business has been recommended against by the National Academy of Sciences. Two months ago, the first journal, a journal of applied social psychology banned significance tests. Many years ago, a PLOS article showed that the probability a published article has an actual relationship is
    PPV = 1 / [1 + alpha/(1 – beta)*R]
    where alpha is often .05, 1-beta is the power, and R is the proportion of actual relationships among such tests in the field (you can get creative with “field”).
    PPV is the Positive Predictive Value of published articles in the field.
    While alpha is usually set by the statistician at .05, 1 – beta is bounded between 0 and 1, and for better designs 1- beta is near 1.
    You would think better power would improve matters, but we’re looking at published articles, so the best 1 – beta can do is 1.
    As a result, for fields with the proportion of true relationships small, journals publish almost no actual relationships (shall we say, ” barely any acceptable results”). You find this in many fields. For example, in genetics, if 30 of 30,000 genes cause a disease, then R is .001 and PPV is .02.
    Yet, we imagine significance tests pulling out of just such situations relationships we wouldn’t otherwise notice. We’ve deceived ourselves. Although, a followup study in a redefined field with only previously significant relationships has a much better PPV.
    This is similar to the problem of random numbers published in the back of books — random for the individual, but not random if we observe results when many people use the same random number table.

  67. Dears,


    Some of you have “teetered on the brink” of the main point about Null Hypothesis Tests of Significance in the Absence of a Loss Function, which is, as has been known since Edgeworth and Gossett (“Student”) and Neyman and the younger Pearson, that any level is not the same as importance. Fit is not the same as oomph. It just isn’t, unless you have some way of translating probability space into consequence space. You might want to read, slowly, McCloskey and Ziliak, The Cult of Statistical Significance (University Of Michigan Press, 2008).

    Deirdre N McCloskey

  68. Pingback: Long time no blog…! | maria r. andersen

  69. Pingback: PSA: p-Values are Thresholds, Not Approximations |

  70. A lot of these are as ludicrous as “almost not pregnant” or “nearly a virgin”

  71. Pingback: Friday AM Reads | The Big Picture

  72. The problem comes from the need to spin 0.06 into a positive result for your research question.
    The answer comes from publishing negative/non-significant results, but we all know journals don’t do this.
    The full solution comes from publishing your results as a Data Note at e.g. GigaScience or Scientific Data. By releasing the data in a curated, peer reviewed, curated and citable manner, you increase the chance of citations for your non-significant result because your data is still useful for method development, meta-analyses, increasing the numbers [of controls at least] in other studies etc. The publication is about your methods of data collection, not your specific research question – although you will record the non-significance in your publication.

  73. Love it! Am sharing with student network. Personally am looking forward to time out to learn R

  74. Reblogged this on In the Dark and commented:
    I just couldn’t resist reblogging this post because of the wonderful list of meaningless convoluted phrases people when they don’t get a “statistically significant” result. I particularly like

    “a robust trend toward significance”

    It’s scary to think that these were all taken from peer-reviewed scientific journals…

  75. Reblogged this on fluffysciences and commented:
    There has been a lack of posting lately, mostly because a very busy marking schedule has caught up with me.

    So I hope you will enjoy this link to ‘Probable Error’, which has spent a very likely significant amount of time rounding up all the ways scientists use to describe P values which aren’t anywhere near significant at all.

    Given the paper I’m currently reviewing reporting a tendency of P=0.07, I was highly amused!

  76. Pingback: Nerdcore › Von P-Hacking, Clickbait-Bullshit und Schoko-Diäten

  77. Pingback: Fabulous Finds II | Spatialists

  78. Pingback: The language of insignificance | Management Briefs

  79. Eugene Allevato

    I think we should just state the significance of the p-value as 0.049 or 0.051 and let the reader make a decision of how much risk the reader wants to assume.

  80. Reblogged this on Sciception and commented:
    Just when I was discussing significant p-values at work!Someone was insisting that values slightly above 0.05 are still believably significant..and then I found the exact terms used in this very list!

  81. Pingback: Links & misc #4 | Hypermagical Ultraomnipotence

  82. Why don’t authors ever claim that P=0.049 was approaching “non-significance”?

  83. Ah, the p-value… very useful if you are tossing a coin.

  84. Pingback: Bookmarks for October 14th | Chris's Digital Detritus

  85. Funny!!
    Thanks for making my online search for levels of significance surprisingly amusing.

  86. Pingback: Does your data 'hover on the brink of significance?' - an insignificant, but hilarious detour

  87. Reblogged this on Chaos Theory and Pharmacology and commented:

  88. This is precisely why “studies” are overemphasized as the number one “evidence” but can be easily skewed in interpretation. This is precisely why individual lived experience with what is being tested is more valuable. We need to hear more lived experience stories, precisely what it felt like, whether it helped, if anything went wrong, and if they’d recommend it to someone they loved. You can’t replicate any of that in a “study.” Yet these studies are the tools that are shaping medicine. Funny, the ones in power, the decision-makers, rarely ask the guinea pigs, nor do they want to hear these very real stories.

  89. I haven’t read all the comments, but found this whole post embarrassing, though some of the comments seem to understand the point I’m about to make.

    As some have pointed out the threshold of statistical significance is arbitrary. ARBITRARY – capiche?

    So as the p value rises from 0.000001 it becomes less statistically significant – that’s all you can say.

    Beyond that you need to use the most rigorous reasoning you can in the context you are in, consider the risks you are taking both to act on the knowledge you have or not to act. Sometimes – very rarely – the threshold between acting and not acting on the knowledge you have might occur at or around p = 0.05.

    The rest of the time, if you proceeded according to what I’ve set out, which ought to be the most basic commonsense, especially amongst those who have been through years of training and practice in these matters, you’ll come up with a different number.

    • I think you’ve missed the point.

      In all of these papers the authors adopted a threshold of 0.05: yes, it’s an arbitrary threshold but that’s the one they chose. Having declared that they would accept/reject based on a threshold of 0.05, the authors then fail to report a “non-significant” result as such.

      That’s all this post is about – it’s not about how you choose the significance threshold, nor whether a threshold for a binary decision is how it should be done.

      In short, it’s not the argument you think it is.

  90. “Having declared that they would accept/reject based on a threshold of 0.05, the authors then fail to report a “non-significant” result as such.”

    I don’t think that’s quite right is it? Perhaps that should read “Having chosen to become academic researchers and submit journals the referees of which tend to impose arbitrary thresholds on tests, the academics sought to verbally justify (usually) minor excesses of the threshold.

    • That’s simultaneously quite a harsh accusation (academics knowingly publish incorrect interpretations of their analyses) and quite a generous one (their obfuscation is merely drawing attention to the arbitrariness of the threshold).

      • They don’t publish “knowingly incorrect” interpretations, they find ways of putting it that sound good. They lack candour – candour being the first casualty of bureaucracy.

      • But “they find ways of putting it that sound good” is the point of the post – so I don’t understand why you found it embarrassing.

  91. If p < .05 is significant, then is .05 < p < .10 your significant other?

  92. Just built a Shiny app based on the data in this blog post, have fun :)

  93. Reblogged this on Dr Geoff Kushnick and commented:
    Great list of ways to refer to “close to, but not really, significant results.” Given how much P values can jump around using samples from the same population,my suggestion is to give the actual P value and talk about the effect size. No need to describe the P value itself.

  94. Here’s Ziliac’s depressing latest on the state of affairs.

    Significance Controversy in the Past
    This is not the first time in history that statistical significance has been on trial. “Significance” was only a partial argument from odds from the beginning, Francis Ysidro Edgeworth (1885, p. 208), who coined the term, clearly perceived. Galton and Pearson saw in the test more security than they might have. But by 1905 Student himself—that is William Sealy Gosset aka “Student”, the inventor of Student’s t, and eventual Head Brewer of Guinness—warned in a letter to Karl Pearson about economic and other substantive losses that can be caused by following a bright line rule of statistical significance:

    When I first reported on the subject [of “The Application of the ‘Law of Error’ to the Work of the Brewery” (1904)], I thought that perhaps there might be some degree of probability which is conventionally treated as sufficient in such work as ours and I advised that some outside authority [such as you, Karl Pearson] should be consulted as to what certainty is required to aim at in large scale work. However it would appear that in such work as ours the degree of certainty to be aimed at must depend on the pecuniary advantage to be gained by following the result of the experiment, compared with the increased cost of the new method, if any, and the cost of each experiment (quoted in Ziliak 2008, p. 207).

    Student’s rejection of a bright-line accept-reject standard was echoed a few years on by Harvard psychologist Edwin Boring (1919), warning about the difference between substantive and merely statistical significance in psychological research. Yet mindless tests and uses of statistical significance raged on, heedless of warnings from its eminent discoverers.

  95. I’m finishing an essay on this subject (soon hopefully coming out as a self-published book) in an attempt to introduce students to these concepts. I used some examples on this page and it was a delight to find.

    I also came across a paper by Wood et al. called “Trap of trends to statistical significance: likelihood of near significant P value becoming more significant with extra data” Published in BMJ. The paper is very relevant to the discussions surrounding p-value and this page.

    One of their point (directly quoted) is as follows: “Describing near significant P values as “trends towards significance”’ (or similar) is not just inappropriate but actively misleading, as such P values would be quite likely to become less significant if extra data were collected”

    On the other hand they also mention that p-values around 0.05 show modest degrees of evidence, no matter which side of the threshold they fall. Which leads them to say calling (for example) p=0.06 an “interesting hint” may be a good choice.

    I also became aware of technical discussions between Bayesians and Frequentists on the subject. I’m no statistician but I hope to learn more about this subject since it is very interesting, not to mention the impact that it can have on inferential statistics and scientific reporting.

  96. Pingback: First Post – Approaching Significance

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s