Still Not Significant

Posted on April 21, 2013 | 153 Comments

What to do if your p-value is just over the arbitrary threshold for ‘significance’ of p=0.05?

You don’t need to play the significance testing game – there are better methods, like quoting the effect size with a confidence interval – but if you do, the rules are simple: the result is either significant or it isn’t.

So if your p-value remains stubbornly higher than 0.05, you should call it ‘non-significant’ and write it up as such. The problem for many authors is that this just isn’t the answer they were looking for: publishing so-called ‘negative results’ is harder than ‘positive results’.

The solution is to apply the time-honoured tactic of circumlocution to disguise the non-significant result as something more interesting. The following list is culled from peer-reviewed journal articles in which (a) the authors set themselves the threshold of 0.05 for significance, (b) failed to achieve that threshold value for p and (c) described it in such a way as to make it seem more interesting.

As well as being statistically flawed (results are either significant or not and can’t be qualified), the wording is linguistically interesting, often describing an aspect of the result that just doesn’t exist. For example, “a trend towards significance” expresses non-significance as some sort of motion towards significance, which it isn’t: there is no ‘trend’, in any direction, and nowhere for the trend to be ‘towards’.

Some further analysis will follow, but for now here is the list in full (UPDATE: now in alpha-order):

(barely) not statistically significant (p=0.052)
a barely detectable statistically significant difference (p=0.073)
a borderline significant trend (p=0.09)
a certain trend toward significance (p=0.08)
a clear tendency to significance (p=0.052)
a clear trend (p<0.09)
a clear, strong trend (p=0.09)
a considerable trend toward significance (p=0.069)
a decreasing trend (p=0.09)
a definite trend (p=0.08)
a distinct trend toward significance (p=0.07)
a favorable trend (p=0.09)
a favourable statistical trend (p=0.09)
a little significant (p<0.1)
a margin at the edge of significance (p=0.0608)
a marginal trend (p=0.09)
a marginal trend toward significance (p=0.052)
a marked trend (p=0.07)
a mild trend (p<0.09)
a moderate trend toward significance (p=0.068)
a near-significant trend (p=0.07)
a negative trend (p=0.09)
a nonsignificant trend (p<0.1)
a nonsignificant trend toward significance (p=0.1)
a notable trend (p<0.1)
a numerical increasing trend (p=0.09)
a numerical trend (p=0.09)
a positive trend (p=0.09)
a possible trend (p=0.09)
a possible trend toward significance (p=0.052)
a pronounced trend (p=0.09)
a reliable trend (p=0.058)
a robust trend toward significance (p=0.0503)
a significant trend (p=0.09)
a slight slide towards significance (p<0.20)
a slight tendency toward significance(p<0.08)
a slight trend (p<0.09)
a slight trend toward significance (p=0.098)
a slightly increasing trend (p=0.09)
a small trend (p=0.09)
a statistical trend (p=0.09)
a statistical trend toward significance (p=0.09)
a strong tendency towards statistical significance (p=0.051)
a strong trend (p=0.077)
a strong trend toward significance (p=0.08)
a substantial trend toward significance (p=0.068)
a suggestive trend (p=0.06)
a trend close to significance (p=0.08)
a trend significance level (p=0.08)
a trend that approached significance (p<0.06)
a very slight trend toward significance (p=0.20)
a weak trend (p=0.09)
a weak trend toward significance (p=0.12)
a worrying trend (p=0.07)
all but significant (p=0.055)
almost achieved significance (p=0-065)
almost approached significance (p=0.065)
almost attained significance (p<0.06)
almost became significant (p=0.06)
almost but not quite significant (p=0.06)
almost clinically significant (p<0.10)
almost insignificant (p>0.065)
almost marginally significant (p>0.05)
almost non-significant (p=0.083)
almost reached statistical significance (p=0.06)
almost significant (p=0.06)
almost significant tendency (p=0.06)
almost statistically significant (p=0.06)
an adverse trend (p=0.10)
an apparent trend (p=0.286)
an associative trend (p=0.09)
an elevated trend (p<0.05)
an encouraging trend (p<0.1)
an established trend (p<0.10)
an evident trend (p=0.13)
an expected trend (p=0.08)
an important trend (p=0.066)
an increasing trend (p<0.09)
an interesting trend (p=0.1)
an inverse trend toward signiﬁcance (p=0.06)
an observed trend (p=0.06)
an obvious trend (p=0.06)
an overall trend (p=0.2)
an unexpected trend (p=0.09)
an unexplained trend (p=0.09)
an unfavorable trend (p<0.10)
appeared to be marginally significant (p<0.10)
approached acceptable levels of statistical significance (p=0.054)
approached but did not quite achieve significance (p>0.05)
approached but fell short of significance (p=0.07)
approached conventional levels of significance (p<0.10)
approached near significance (p=0.06)
approached our criterion of significance (p>0.08)
approached significant (p=0.11)
approached the borderline of significance (p=0.07)
approached the level of signiﬁcance (p=0.09)
approached trend levels of significance (p0.05)
approached, but did reach, significance (p=0.065)
approaches but fails to achieve a customary level of statistical significance (p=0.154)
approaches statistical significance (p>0.06)
approaching a level of significance (p=0.089)
approaching an acceptable significance level (p=0.056)
approaching borderline significance (p=0.08)
approaching borderline statistical significance (p=0.07)
approaching but not reaching significance (p=0.53)
approaching clinical significance (p=0.07)
approaching close to significance (p<0.1)
approaching conventional significance levels (p=0.06)
approaching conventional statistical significance (p=0.06)
approaching formal significance (p=0.1052)
approaching independent prognostic significance (p=0.08)
approaching marginal levels of significance p<0.107)
approaching marginal significance (p=0.064)
approaching more closely significance (p=0.06)
approaching our preset significance level (p=0.076)
approaching prognostic significance (p=0.052)
approaching significance (p=0.09)
approaching the traditional significance level (p=0.06)
approaching to statistical significance (p=0.075)
approaching, although not reaching, significance (p=0.08)
approaching, but not reaching, significance (p<0.09)
approximately significant (p=0.053)
approximating significance (p=0.09)
arguably significant (p=0.07)
as good as significant (p=0.0502)
at the brink of significance (p=0.06)
at the cusp of significance (p=0.06)
at the edge of significance (p=0.055)
at the limit of significance (p=0.054)
at the limits of significance (p=0.053)
at the margin of significance (p=0.056)
at the margin of statistical significance (p<0.07)
at the verge of significance (p=0.058)
at the very edge of significance (p=0.053)
barely below the level of significance (p=0.06)
barely escaped statistical significance (p=0.07)
barely escapes being statistically significant at the 5% risk level (0.1>p>0.05)
barely failed to attain statistical significance (p=0.067)
barely fails to attain statistical significance at conventional levels (p<0.10
barely insignificant (p=0.075)
barely missed statistical significance (p=0.051)
barely missed the commonly acceptable significance level (p<0.053)
barely outside the range of significance (p=0.06)
barely significant (p=0.07)
below (but verging on) the statistical significant level (p>0.05)
better trends of improvement (p=0.056)
bordered on a statistically significant value (p=0.06)
bordered on being significant (p>0.07)
bordered on being statistically significant (p=0.0502)
bordered on but was not less than the accepted level of significance (p>0.05)
bordered on significant (p=0.09)
borderline conventional significance (p=0.051)
borderline level of statistical significance (p=0.053)
borderline signiﬁcant (p=0.09)
borderline significant trends (p=0.099)
close to a marginally significant level (p=0.06)
close to being significant (p=0.06)
close to being statistically signiﬁcant (p=0.055)
close to borderline signiﬁcance (p=0.072)
close to the boundary of significance (p=0.06)
close to the level of significance (p=0.07)
close to the limit of significance (p=0.17)
close to the margin of significance (p=0.055)
close to the margin of statistical significance (p=0.075)
closely approaches the brink of signiﬁcance (p=0.07)
closely approaches the statistical significance (p=0.0669)
closely approximating significance (p>0.05)
closely not significant (p=0.06)
closely significant (p=0.058)
close-to-signiﬁcant (p=0.09)
did not achieve conventional threshold levels of statistical significance (p=0.08)
did not exceed the conventional level of statistical significance (p<0.08)
did not quite achieve acceptable levels of statistical significance (p=0.054)
did not quite achieve significance (p=0.076)
did not quite achieve the conventional levels of significance (p=0.052)
did not quite achieve the threshold for statistical significance (p=0.08)
did not quite attain conventional levels of significance (p=0.07)
did not quite reach a statistically significant level (p=0.108)
did not quite reach conventional levels of statistical significance (p=0.079)
did not quite reach statistical significance (p=0.063)
did not reach the traditional level of signiﬁcance (p=0.10)
did not reach the usually accepted level of clinical significance (p=0.07)
difference was apparent (p=0.07)
direction heading towards significance (p=0.10)
does not appear to be sufficiently significant (p>0.05)
does not narrowly reach statistical significance (p=0.06)
does not reach the conventional significance level (p=0.098)
effectively significant (p=0.051)
equivocal significance (p=0.06)
essentially significant (p=0.10)
extremely close to signiﬁcance (p=0.07)
failed to reach significance on this occasion (p=0.09)
failed to reach statistical significance (p=0.06)
fairly close to significance (p=0.065)
fairly significant (p=0.09)
falls just short of standard levels of statistical significance (p=0.06)
fell (just) short of significance (p=0.08)
fell barely short of significance (p=0.08)
fell just short of significance (p=0.07)
fell just short of statistical significance (p=0.12)
fell just short of the traditional definition of statistical significance (p=0.051)
fell marginally short of significance (p=0.07)
fell narrowly short of significance (p=0.0623)
fell only marginally short of significance (p=0.0879)
fell only short of significance (p=0.06)
fell short of significance (p=0.07)
fell slightly short of significance (p>0.0167)
fell somewhat short of significance (p=0.138)
felt short of significance (p=0.07)
flirting with conventional levels of significance (p>0.1)
heading towards significance (p=0.086)
highly significant (p=0.09)
hint of significance (p>0.05)
hovered around signiﬁcance (p = 0.061)
hovered at nearly a significant level (p=0.058)
hovering closer to statistical significance (p=0.076)
hovers on the brink of significance (p=0.055)
in the edge of significance (p=0.059)
in the verge of significance (p=0.06)
inconclusively significant (p=0.070)
indeterminate significance (p=0.08)
indicative significance (p=0.08)
is just outside the conventional levels of significance
just about significant (p=0.051)
just above the arbitrary level of signiﬁcance (p=0.07)
just above the margin of significance (p=0.053)
just at the conventional level of significance (p=0.05001)
just barely below the level of significance (p=0.06)
just barely failed to reach significance (p<0.06)
just barely insignificant (p=0.11)
just barely statistically signiﬁcant (p=0.054)
just beyond significance (p=0.06)
just borderline significant (p=0.058)
just escaped significance (p=0.07)
just failed significance (p=0.057)
just failed to be significant (p=0.072)
just failed to reach statistical significance (p=0.06)
just failing to reach statistical significance (p=0.06)
just fails to reach conventional levels of statistical significance (p=0.07)
just lacked significance (p=0.053)
just marginally significant (p=0.0562)
just missed being statistically significant (p=0.06)
just missing significance (p=0.07)
just on the verge of significance (p=0.06)
just outside accepted levels of significance (p=0.06)
just outside levels of significance (p<0.08)
just outside the bounds of significance (p=0.06)
just outside the conventional levels of significance (p=0.1076)
just outside the level of significance (p=0.0683)
just outside the limits of significance (p=0.06)
just outside the traditional bounds of significance (p=0.06)
just over the limits of statistical significance (p=0.06)
just short of significance (p=0.07)
just shy of significance (p=0.053)
just skirting the boundary of significance (p=0.052)
just tendentially signiﬁcant (p=0.056)
just tottering on the brink of significance at the 0.05 level
just very slightly missed the significance level (p=0.086)
leaning towards significance (p=0.15)
leaning towards statistical significance (p=0.06)
likely to be significant (p=0.054)
loosely significant (p=0.10)
marginal significance (p=0.07)
marginally and negatively significant (p=0.08)
marginally insignificant (p=0.08)
marginally nonsignificant (p=0.096)
marginally outside the level of significance
marginally significant (p>=0.1)
marginally significant tendency (p=0.08)
marginally statistically significant (p=0.08)
may not be signiﬁcant (p=0.06)
medium level of significance (p=0.051)
mildly signiﬁcant (p=0.07)
missed narrowly statistical significance (p=0.054)
moderately significant (p>0.11)
modestly significant (p=0.09)
narrowly avoided significance (p=0.052)
narrowly eluded statistical significance (p=0.0789)
narrowly escaped significance (p=0.08)
narrowly evaded statistical significance (p>0.05)
narrowly failed significance (p=0.054)
narrowly missed achieving significance (p=0.055)
narrowly missed overall significance (p=0.06)
narrowly missed significance (p=0.051)
narrowly missed standard significance levels (p<0.07)
narrowly missed the significance level (p=0.07)
narrowly missing conventional significance (p=0.054)
near limit significance (p=0.073)
near miss of statistical significance (p>0.1)
near nominal significance (p=0.064)
near significance (p=0.07)
near to statistical significance (p=0.056)
near/possible significance(p=0.0661)
near-borderline significance (p=0.10)
near-certain signiﬁcance (p=0.07)
nearing significance (p<0.051)
nearly acceptable level of significance (p=0.06)
nearly approaches statistical significance (p=0.079)
nearly borderline significance (p=0.052)
nearly negatively significant (p<0.1)
nearly positively significant (p=0.063)
nearly reached a significant level (p=0.07)
nearly reaching the level of significance (p<0.06)
nearly significant (p=0.06)
nearly significant tendency (p=0.06)
nearly, but not quite significant (p>0.06)
near-marginal significance (p=0.18)
near-significant (p=0.09)
near-to-significance (p=0.093)
near-trend significance (p=0.11)
nominally significant (p=0.08)
non-insignificant result (p=0.500)
non-significant in the statistical sense (p>0.05
not absolutely significant but very probably so (p>0.05)
not as significant (p=0.06)
not clearly significant (p=0.08)
not completely significant (p=0.07)
not completely statistically signiﬁcant (p=0.0811)
not conventionally significant (p=0.089), but..
not currently significant (p=0.06)
not decisively significant (p=0.106)
not entirely significant (p=0.10)
not especially significant (p>0.05)
not exactly significant (p=0.052)
not extremely significant (p<0.06)
not formally significant (p=0.06)
not fully significant (p=0.085)
not globally significant (p=0.11)
not highly significant (p=0.089)
not insignificant (p=0.056)
not markedly significant (p=0.06)
not moderately significant (P>0.20)
not non-significant (p>0.1)
not numerically significant (p>0.05)
not obviously signiﬁcant (p>0.3)
not overly significant (p>0.08)
not quite borderline significance (p>=0.089)
not quite reach the level of significance (p=0.07)
not quite significant (p=0.118)
not quite within the conventional bounds of statistical significance (p=0.12)
not reliably signiﬁcant (p=0.091)
not remarkably signiﬁcant (p=0.236)
not significant by common standards (p=0.099)
not significant by conventional standards (p=0.10)
not significant by traditional standards (p<0.1)
not significant in the formal statistical sense (p=0.08)
not significant in the narrow sense of the word (p=0.29)
not significant in the normally accepted statistical sense (p=0.064)
not significantly significant but..clinically meaningful (p=0.072)
not statistically quite significant (p<0.06)
not strictly significant (p=0.06)
not strictly speaking significant (p=0.057)
not technically significant (p=0.06)
not that significant (p=0.08)
not to an extent that was fully statistically signiﬁcant (p=0.06)
not too distant from statistical significance at the 10% level
not too far from significant at the 10% level
not totally significant (p=0.09)
not unequivocally significant (p=0.055)
not very definitely significant (p=0.08)
not very definitely significant from the statistical point of view (p=0.08)
not very far from significance (p<0.092)
not very significant (p=0.1)
not very statistically significant (p=0.10)
not wholly significant (p>0.1)
not yet significant (p=0.09)
not strongly significant (p=0.08)
noticeably signiﬁcant (p=0.055)
on the border of significance (p=0.063)
on the borderline of significance (p=0.0699)
on the borderlines of significance (p=0.08)
on the boundaries of signiﬁcance (p=0.056)
on the boundary of signiﬁcance (p=0.055)
on the brink of significance (p=0.052)
on the cusp of conventional statistical significance (p=0.054)
on the cusp of significance (p=0.058)
on the edge of significance (p>0.08)
on the limit to significant (p=0.06)
on the margin of significance (p=0.051)
on the threshold of significance (p=0.059)
on the verge of significance (p=0.053)
on the very borderline of significance (0.05<p<0.06)
on the very fringes of signiﬁcance (p=0.099)
on the very limits of significance (0.1>p>0.05)
only a little short of significance (p>0.05)
only just failed to meet statistical significance (p=0.051)
only just insignificant (p>0.10)
only just missed significance at the 5% level
only marginally fails to be significant at the 95% level (p=0.06)
only marginally nearly insignificant (p=0.059)
only marginally significant (p=0.9)
only slightly less than significant (p=0.08)
only slightly missed the conventional threshold of significance (p=0.062)
only slightly missed the level of significance (p=0.058)
only slightly missed the significance level (p=0·0556)
only slightly non-signiﬁcant (p=0.0738)
only slightly significant (p=0.08)
partial significance (p>0.09)
partially significant (p=0.08)
partly significant (p=0.08)
perceivable statistical significance (p=0.0501)
possible significance (p<0.098)
possibly marginally significant (p=0.116)
possibly significant (0.05<p>0.10)
possibly statistically significant (p=0.10)
potentially significant (p>0.1)
practically significant (p=0.06)
probably not experimentally significant (p=0.2)
probably not significant (p>0.25)
probably not statistically significant (p=0.14)
probably significant (p=0.06)
provisionally significant (p=0.073)
quasi-significant (p=0.09)
questionably significant (p=0.13)
quite close to significance at the 10% level (p=0.104)
quite significant (p=0.07)
rather marginal significance (p>0.10)
reached borderline significance (p=0.0509)
reached near significance (p=0.07)
reasonably significant (p=0.07)
remarkably close to significance (p=0.05009)
resides on the edge of significance (p=0.10)
roughly significant (p>0.1)
scarcely significant (0.05<p>0.1)
significant at the .07 level
significant tendency (p=0.09)
significant to some degree (0<p>1)
significant, or close to significant effects (p=0.08, p=0.05)
significantly better overall (p=0.051)
significantly significant (p=0.065)
similar but not nonsigniﬁcant trends (p>0.05)
slight evidence of significance (0.1>p>0.05)
slight non-significance (p=0.06)
slight significance (p=0.128)
slight tendency toward significance (p=0.086)
slightly above the level of signiﬁcance (p=0.06)
slightly below the level of signiﬁcance (p=0.068)
slightly exceeded signiﬁcance level (p=0.06)
slightly failed to reach statistical signiﬁcance (p=0.061)
slightly insignificant (p=0.07)
slightly less than needed for significance (p=0.08)
slightly marginally significant (p=0.06)
slightly missed being of statistical significance (p=0.08)
slightly missed statistical significance (p=0.059)
slightly missed the conventional level of significance (p=0.061)
slightly missed the level of statistical significance (p<0.10)
slightly missed the margin of significance (p=0.051)
slightly not significant (p=0.06)
slightly outside conventional statistical significance (p=0.051)
slightly outside the margins of significance (p=0.08)
slightly outside the range of significance (p=0.09)
slightly outside the significance level (p=0.077)
slightly outside the statistical significance level (p=0.053)
slightly significant (p=0.09)
somewhat marginally significant (p>0.055)
somewhat short of significance (p=0.07)
somewhat significant (p=0.23)
somewhat statistically significant (p=0.092)
strong trend toward significance (p=0.08)
sufficiently close to significance (p=0.07)
suggestive but not quite significant (p=0.061)
suggestive of a significant trend (p=0.08)
suggestive of statistical significance (p=0.06)
suggestively significant (p=0.064)
tailed to insignificance (p=0.1)
tantalisingly close to significance (p=0.104)
technically not significant (p=0.06)
teetering on the brink of significance (p=0.06)
tend to significant (p>0.1)
tended to approach significance (p=0.09)
tended to be significant (p=0.06)
tended toward significance (p=0.13)
tendency toward significance (p approaching 0.1)
tendency toward statistical significance (p=0.07)
tends to approach signiﬁcance (p=0.12)
tentatively signiﬁcant (p=0.107)
too far from signiﬁcance (p=0.12)
trend bordering on statistical significance (p=0.066)
trend in a significant direction (p=0.09)
trend in the direction of significance (p=0.089)
trend significance level (p=0.06)
trend toward (p>0.07)
trending towards significance (p>0.15)
trending towards significant (p=0.099)
uncertain significance (p>0.07)
vaguely significant (p>0.2)
verged on being significant (p=0.11)
verging on significance (p=0.056)
verging on the statistically significant (p<0.1)
verging-on-significant (p=0.06)
very close to approaching significance (p=0.060)
very close to significant (p=0.11)
very close to the conventional level of significance (p=0.055)
very close to the cut-off for significance (p=0.07)
very close to the established statistical significance level of p=0.05 (p=0.065)
very close to the threshold of significance (p=0.07)
very closely approaches the conventional significance level (p=0.055)
very closely brushed the limit of statistical significance (p=0.051)
very narrowly missed significance (p<0.06)
very nearly significant (p=0.0656)
very slightly non-significant (p=0.10)
very slightly significant (p<0.1)
virtually significant (p=0.059)
weak significance (p>0.10)
weakened..significance (p=0.06)
weakly non-significant (p=0.07)
weakly significant (p=0.11)
weakly statistically significant (p=0.0557)
well-nigh signiﬁcant (p=0.11)

This entry was posted in Uncategorized. Bookmark the permalink.

153 responses to “Still Not Significant”

Mr Epidemiology | April 23, 2013 at 2:44 pm | Reply

Reblogged this on Mr Epidemiology and commented:
A handy alphabetized list of various different ways of stating your results when p > .05! I think my favourites are “teetering on the brink of significance (P=0.06)” and “not significant in the narrow sense of the word (P=0.29)”
Pingback: Oh yeah, it’s significant. REALLY significant. | The Mad Scientist Confectioner's Club
Pingback: Tantalisingly close to significance | Quomodocumque
Daniel Ezra Johnson | May 24, 2013 at 12:48 am | Reply

Funny but read this: http://library.mpib-berlin.mpg.de/ft/gg/GG_Null_2004.pdf
katiedid | May 24, 2013 at 11:22 pm | Reply

I love the 0.23 as “somewhat significant”. Um, no.
Chad Jones (@TheCollapsedPsi) | May 25, 2013 at 7:24 am | Reply

I would love to see the list sorted by p value instead of alphabetically.
- wyowanderer | April 13, 2015 at 7:26 pm | Reply
  
  Call it two requests…
Mark Brewer | May 25, 2013 at 10:27 am | Reply

Thanks for the chuckle – the list is indeed amusing, but the key point above is that the p-value threshold is arbitrary. This fact is now widely accepted, so a strict dichotomy between “significant” and “non-significant” no longer makes sense. It is a bit of a fudge – and one completely unnecessary if (e.g.) a Bayesian approach is adopted – but we prefer to see “significance” as a continuum; phrases such as “marginally significant” represent uncertainty in the threshold location and therefore do make some sort of sense.

It is *always* the case that calling p=0.49 “significant” and p=0.51 “non-significant” is just plain silly.
- pdiff | May 25, 2013 at 5:56 pm | Reply
  
  Good reply Mark. I concur completely and often tell clients to use the term marginally significant for values close to 0.05 (on either side). It is better that they talk about these things then just sweep them under the rug and ignore them because they are “not significant”.
  - Joanne Yaffe | May 26, 2015 at 9:00 pm |
    
    Significance is not really the important question. How important is the finding? Report effect sizes!
- mchankins | May 25, 2013 at 9:04 pm | Reply
  
  Thanks for the comment. I think there is confusion over the threshold being arbitrary, i.e. 0.05 rather than 0.06, and the arbitrariness of having a threshold at all.
  If we agree that there isn’t really a need for a threshold and just discuss the p-values directly, then ‘significant’ and ‘marginally significant’ both become meaningless.
  - jlee | October 14, 2016 at 1:01 am |
    
    If “significance” is defined as 0.05, then it seems logical that 0.06 would be “almost significant,” just like 0.06 is almost 0.05. Just like scoring 99% on a test is like almost getting 100%. It’s like those carnival games where you pound the giant hammer on the scale, and if you pound it hard enough, the ball hits the bell and you win the prize. If someone pounds the hammer but the ball just barely runs out of energy before hitting the bell, then I think most people would say that the contestant almost won the prize.
  - David | May 11, 2019 at 5:03 am |
    
    “If someone pounds the hammer but the ball just barely runs out of energy before hitting the bell, then I think most people would say that the contestant almost won the prize.” Yes. The word “almost” has a meaning that it isn’t wiped out when you are writing about statistics or any other subject. Of the many formulations listed, some seem like fudging and others are just factual. The blogger makes no distinction, and states flatly, as self-evident, that results are significant or not and “can’t be qualified.” Gee, why is that?
Pingback: A borderline definite marginally mild notably numerically increasing suggestively verging on significant result | Scientific News
Simon Langley-Evans | May 25, 2013 at 5:41 pm | Reply

Reblogged this on JHND NOTES: The Journal of Human Nutrition and Dietetics Editor's Blog and commented:
Prospective authors and students take note. Not significant means not significant, no matter how much you wish it otherwise.
- Jackson D. | December 28, 2013 at 4:39 am | Reply
  
  Sorry, a result that is not statistically significant can indeed have practical or clinical significance. It’s the flip side of the typical admonishment about statistical significance not necessarily having practical significance. So, it has nothing to do wishes. It’s odd to hold your viewpoint when NHST itself will reveal that a test statistic that results in a p less than .05 is not significantly different from many values of a test statistic that result in p values greater than .05. It seems that the wishing is that the values on each side of the .05 fence are actually different from one another. You can wish all you want, but a great number of them are not, by your own criterion.
  - Jackson D. | December 28, 2013 at 4:42 am |
    
    That should read: ” that a test statistic that results in a p less than .05 is not significantly different from many values of a test statistic that result in p values greater than .05. ”
    
    I must have deleted the end of the sentence while editing.
  - Rafael | September 29, 2014 at 9:14 am |
    
    I do not agree. The problem is that a result not statistically significant is a “no result” or, perhaps better, is the absence of result. Doesn’t make much sense to talk about the practical significance of something whose existence has not been able to show. This situation is not the symmetric back of that other in that we have discussed the practical importance of an effect that has been demonstrated (statistically significant).
Pingback: Somewhere else, part 56 | Freakonometrics
Pingback: Some Links | Meta Rabbit
Robert King | May 27, 2013 at 2:05 pm | Reply

in APA there should not be a 0 before the decimal point
Pingback: On the present problems of publications, and possibly the coming futue? Some Labyrinthine musings. | Åse Fixes Science
Pingback: I’ve got your missing links right here (01 June 2013) – Phenomena: Not Exactly Rocket Science
Wordcloud | June 3, 2013 at 3:02 am | Reply

I made a word cloud of the list with all the variations of the word “significant” removed and creative spelling standardized:
http://www.wordle.net/show/wrdl/6789217/Still_Not_Significant
Pingback: A borderline definite marginally mild notably numerically increasing suggestively verging on significant result | Neurobonkers.com
Pingback: “Presuntuosos” y “remilgados” en estadística | psy'n'thesis
kevin denny | June 12, 2013 at 10:52 am | Reply

To say a result is either significant or not is glib. First of all many people, especially non scientists, will confuse statistical significance with substantive significance ( economic, psychological whatever). So what if an effect is statistically significant but tiny? Statistical “insignificance” means you can’t reject the null of 0 but you can’t reject other hypotheses too: so why the fetish about one? The standard error tells you how *precisely* determined the result is.
Say you measure two effects, A with size 1 and confidence interval (-1,3) and B with size 0.5 with CI (0.3, 0.8) . Would you really conclude that B has a bigger effect than A? This would be silly but the practise of only counting “significant” results commonly leads to this.
- Rafael | September 29, 2014 at 9:25 am | Reply
  
  What I conclude is that, most likely, there is a effect “B” of positive sign, and that, however, is much less likely that there is a effect “A” of positive sign. I believe that it is not appropriate to compare the magnitude of an effect that, reasonably, there is with the one of whose existence I am not a sufficient security.
Pingback: borderline significance | Game Dasein
Aaron Levitt | October 8, 2013 at 10:05 pm | Reply

Truly, G-d loves the .06 nearly as much as the .05!
Pingback: The Messy Machine » “Although our results are not significant…” (a rant)
Pingback: P-values: Destroying the Barrier Between Scientific and Creative Writing | Instead of Facebook
Pingback: Comment être sûr qu’un résultat scientifique est vrai ? | Science étonnante
Kay | November 27, 2013 at 9:21 pm | Reply

@Aaron Levitt–I agree! Love to know the journals these were published in.
Pingback: Not-So-Critical Analysis | University of Glasgow SLS
Pingback: Rebecca D. Gill » Blog Archive » If it’s not significant,
Pingback: [轉載] When p-value is slightly larger than 0.05…. | 生活的紀念冊
Pingback: A Significantly Improved Significance Test. Not! | Patient 2 Earn
rasmusab | February 13, 2014 at 1:07 pm | Reply

I incorporated your list in a test of significance (implemented in R). Every time the p-value is between 0.12-0.5 it randomly selects one of you “p excuses” 🙂

http://sumsar.net/blog/2014/02/a-significantly-improved-test/
Pingback: I’m Using the New Statistics |
Pingback: On the hazards of significance testing. Part 2: the false discovery rate, or how not to make a fool of yourself with P values
Pingback: The Cult of p(0.05) | Science.xcuz.me
mimiryudo | March 31, 2014 at 1:04 pm | Reply

Reblogged this on Le blog de Michaël.
Matt | March 31, 2014 at 1:33 pm | Reply

I’ve never understood the statisticians’ overly dogmatic objections to the way these p-values are discussed. All a p-value of 0.05 means is there’s a 95% chance that the hypothesis was indeed working. A p-value of 0.051 means there was a 94.9% of the hypothesis was working. I agree that if one prespecifies 0.05 as the threshold then, yes, the p-value of 0.051 is not significant. But is to say it was “almost significant” such a travesty? Statisticians often treat a p-value of 0.051 the same as a p-value of 0.70…which makes little sense to anyone with some connection to logic. Should, for example, FDA have pretty strict inflexibility on p-values? Yes!! But please relax and use common sense when talking about the way some talk about these statistics. And, no, I’m not defending the p=0.29 example!! 🙂
- John | June 16, 2014 at 8:28 pm | Reply
  
  No, the p-value of 0.05 doesn’t mean what you’re saying at all. You’re committing a logical fallacy. It tells you about the probability of your results if the null were actually true and has little to nothing to say about the probability that your alternative is true. Furthermore, statistical tests have philosophies of that underpin the very nature of the test and what you can take out of them for meaning. The 0.05 is arbitrary, yes, but modifying what values are important after you calculate them complete changes their meaning.
  What you do imply is that you want a couple of different kinds of uses of p-values and that’s fine (although it’s not great as an evidence measure it’s been used as one); but people need to be clear a priori how they’re using them. Thus, statements like, “almost significant” usually have very little meaning because they’re post hoc efforts to cram one philosophy of statistics into another. State at the outset that your p-value is a measure of evidence, that you have no pre-conceived test per se, don’t mention testing, and you’d be fine using 0.06 in a qualitative statement about how believable the null is. But trying to do that afterwards is corrupts the value of doing any testing at all.
  You might want to look at Gigerenzer’s “Mindless Statistics”.
  - David | September 26, 2014 at 5:11 am |
    
    This stems from the hybrid logic used in psychology in general. Rather than using a Fisherian “Report P-observed, replicate” or a Neymon-Pearsonian “Fix alpha, set sample size sufficient to detect departures of interest”, we have a “Pseudo fix alpha, mumble something about significance, complain a lot about the procedure not detecting a difference despite not doing prospective investigation about how the procedure would do”.
  - Amanda | January 7, 2015 at 9:16 pm |
    
    THANK YOU. I was starting to rip my hair out reading many of the other responses.
  - Mark R | October 7, 2015 at 7:38 pm |
    
    Bingo! You nailed it, John! The p-value has NOTHING to do with the alternate hypothesis. It’s easy to demonstrate. Generate a list of 1000 random numbers in XL in column A. Then extend this column across the spreadsheet for 1000 columns (so you have 1000×1000 random numbers. Then plug this dataset into a stats program (I used JMP) and ask the software to detect “significant associations between the outcome variable (column A) and any other predictor variable (all other individual columns). Amazingly (NOT!!!) 5% of the “predictor variables” will have a significant association with the outcome variable (at P<0.05, and some way less than that). Now think back – you created random sets of numbers. There is NO way that the data in Column XXX is associated with Column A. And that's because the p-value simply told us that this result occurred by chance.
Pingback: Somewhere else, part 123 | Freakonometrics
Pingback: Does researching casual marijuana use cause brain abnormalities? | Bits of DNA
Pingback: The Futility of Significance (Statistical, that is) | The Couch Psychologist
couchpsychologist | April 23, 2014 at 5:10 am | Reply

This and “Marginally Significant” reblogged on http://couchpsychologist.wordpress.com/2014/04/23/the-futility-of-significance-statistical-that-is/.
Tapio Branvinn | April 27, 2014 at 3:40 pm | Reply

“(results are either significant or not and can’t be qualified)”

How’s that? There’s no reason why you would have to make a clear cut decision, for or against, in a scientific paper. It’s perfectly legitimate to report that a p-value of 0.051 for example provides weak (but clearly inconclusive) evidence against the H0.
- mchankins | April 27, 2014 at 5:40 pm | Reply
  
  But in all the examples the authors elected to do exactly that: make a clear-cut decision based on a threshold they themselves chose.
- Rafael | September 29, 2014 at 9:42 am | Reply
  
  Sometimes we need clear and predefined cut-off points. For example, the decision on public funding of a medication can be based on an “adequate” proof of the existence of an effect (the significance compared to H0 or, what is the same, the limits of the CI95%) of a magnitude “sufficient” (the difference with the control group).
Pingback: Does Researching Casual Marijuana Use Cause Brain Abnormalities? | The Falling Darkness
Pingback: Verging on a borderline trend | Stats Chat
Pingback: Felix Schönbrodt's website
Pingback: Reanalyzing the Schnall/Johnson “cleanliness” data sets: New insights from Bayesian and robust approaches ← Patient 2 Earn
Pingback: Why Have Female Hurricanes Killed More People Than Male Ones? – Phenomena: Not Exactly Rocket Science
Akshat Rathi | June 2, 2014 at 10:49 pm | Reply

Do yo have a list of references from where you picked up this phrasing?
- mchankins | June 2, 2014 at 11:12 pm | Reply
  
  Yes – all of the expressions are from journal articles
  - Akshat Rathi | June 2, 2014 at 11:17 pm |
    
    Indeed. Do you have a list of those references matched to the expressions? Would be handy.
  - mchankins | June 4, 2014 at 8:58 pm |
    
    It would be a lengthy and joyless task to assemble them, but I might get around to it one day
Pingback: Female-Named Hurricanes Are Deadlier Than Males, But Why? – NBCNews.com | Premium News Update
Pingback: Why Have Female Hurricanes Killed More People Than Male Ones? | Gaia Gazette
oetpay | June 3, 2014 at 4:10 pm | Reply

In psychology, given how frequently we’re doing things like assuming normality when what we’ve got is approximate normality, it is definitely worth reporting nearly-significant results as such; minor violations of our assumptions and similar things are ubiquitous, so .002 away from significance is absolutely not the same as .5 away. For exactly the same reason that .001 and .05 are two different significance levels and we don’t just say “significant”.
- A fellow psychologist | January 28, 2017 at 2:20 pm | Reply
  
  Actually, precisely that’s why it is even more flawed, because the strictness of the test has already been lowered and with lowering it even further through accepting results above 0.05, the tests become almost meaningless since the probability to interpret a random difference as a real effect rises quite a lot.
  
  The issue here is that the authors chose in the beginning to follow the statistical testing standards, including the 0.05 significance border, in order to gain some degree of confidence in their results. They present the results within this “frame”, but they actually fail to stick to the rules. Practically, they are misleading their reading audience – they rely on a statistical testing practice (which they don’t abide to) to make their claims more convincing before the readers.
  
  While I agree that 0.002 is not the same as 0.5, there is again an issue of how far from 0.05 is too far to be considered a marginal difference. Marginally significant results are a valid issue indeed, but I guess that a replication study to confirm the questionable result, when possible, would be a far better solution than bending the data to fit the desired interpretation. In any case, you can’t label something as “a trend towards significance” and then treat it in further text as statistically significant (and usually not even return to the fact that the results are not significant in the discussion).
  
  By the way, we do have 0.01 and 0.001, but there is no level of significance higher than 0.05.
Andrew M. Byrne (@ByrneJournal) | June 5, 2014 at 8:14 pm | Reply

Can I make a t-shirt of this list? I promise not to wear it to dissertation defenses.
Pingback: Hurricanes with feminine names are probably NOT more destructive | Matter Of Facts
Pingback: The RedPen/BlackPen Guide To The Sciencing! | The Mad Scientist Confectioner's Club
Richard Warner | August 15, 2014 at 1:14 am | Reply

Curious what anyone would have to say about this article by David Healy: http://apt.rcpsych.org/content/12/5/320.full.pdf+html
kevinwang09 | September 19, 2014 at 7:31 am | Reply

Reblogged this on Gauss17gon and commented:
A must for a stats student starting research!
diagnosis kanker usus 12 jari | October 8, 2014 at 5:51 pm | Reply

This blog was… how do I say it? Relevant!! Finally I’ve found something which helped me.
Many thanks!
Alfonso Freda | November 3, 2014 at 7:18 pm | Reply

I almost understand…………………………………………………………………….
Alfonso Freda | November 3, 2014 at 7:22 pm | Reply

more information needed………………………………………..lol
Tom Martin | November 24, 2014 at 2:54 pm | Reply

Hi there – was this review ever published formally? I’d love to cite it if it were published in a journal somewhere.

Thanks!
- mchankins | November 25, 2014 at 8:31 am | Reply
  
  You’re right, I really should get around to writing this up…it can be cited as a blog entry though.
Nikki | December 6, 2014 at 7:09 pm | Reply

Thank you for this blog – just what I needed!
Tim Fischer | December 12, 2014 at 3:21 am | Reply

I feel that this article is a bit misguided (although the list is funny). In the real world there is no single godlike level for alpha beyond which all p values are meaningless. Surely we can decide the level we are willing to accept for alpha based on our knowledge of the experiment and the data or we can ignore any arbitrary cutoff completely and just report the actual p value and explain what it means in a way the reader can understand?
- mchankins | December 12, 2014 at 7:56 pm | Reply
  
  Well, that’s sort of the point: all the studies here had 0.05 as their level of significance. So it is arbitrary, but the authors all chose it. And then decided it was a movable feast only when the results weren’t what they were hoping for. The wording wasn’t chosen to help the reader understand; it’s a rhetorical device to mislead the reader.
Pingback: Science in the Abstract: Don’t Judge a Study by its Cover | Absolutely Maybe
wakecarter | January 26, 2015 at 1:14 pm | Reply

I encourage my psychology students to comment on non-significant trends if p<.10 (unless they already have enough significant results to address). If the result is over .05 then it isn't significant at the accepted level but does provide weak evidence for an effect (i.e. weaker than if the p value was significant).

Following today's xkcd (1478) I'm wondering if there's a better way of putting it. However, I don't like most of the phrases in the above list, so maybe I should just stick with the ones I already use (non-significant trend, weak evidence for an effect).
- mchankins | January 26, 2015 at 7:43 pm | Reply
  
  I think that would be a mistake, starting with the notion of ‘non-significant trends’. There’s no ‘trend’ at all, just a ‘near-miss’, which is not the same thing. ‘Trend’ implies some movement towards significance, and you don’t get this from a single p-value. ‘Non-significant trend’ just means the same thing as ‘non-significant’, and ‘weak evidence for an effect’ is misleading. The distribution of p under the null hypothesis is flat, i.e. a p value of 0.06 is just as likely as 0.96; neither is a ‘trend’.
  - wakecarter | January 27, 2015 at 6:45 pm |
    
    Whether or not there is a trend in the data (e.g. one mean is different from the others) is separate from the significance test which calculates the probability that the observed trend might have been due to chance, if the null hypothesis of no differences is correct. A value just above .05 still means that the null hypothesis is likely to be incorrect, but the evidence is weaker than the accepted cut-off.
  - mchankins | January 27, 2015 at 7:33 pm |
    
    That’s correct, but then what is the cut-off for? Treating the p-value as a sliding scale is justifiable, but the concept of ‘significance’ is then unnecessary. It’s inconsistent to choose a cut-off and then only apply it if the results go the way we want: ‘weak evidence for an effect’ is not the same as ‘by the standards set in this experiment the evidence was insufficient to conclude that there was an effect’.
    I misread your phrasing on ‘trend’ as being applied to the significance, but I think the same argument applies (and also, I wouldn’t describe a difference in means as a trend).
  - wakecarter | January 28, 2015 at 2:45 pm |
    
    I agree that “non-significant trend” doesn’t really fit when talking about the effect of an IV with 3+ levels. However, I stand by my advice to students about appropriately reporting effects that just miss the .05 criteria. If the effect is significant at the .05 level then then should discuss it with confidence. If the evidence for an effect is only significant at .10 then they can make tentative conclusions about it. I don’t like “weakly significant” because, as you say, any given p value is either significant at the given level or it isn’t. On the other hand, changing “there was a significant effect of IV on DV (…, p=.040)” to “there was weak evidence for an effect of IN on DV (…, p=.060)” is a reasonable and simple change to make.
    
    Two caveats.
    
    1) I am making no comment here about whether a study with no significant results is publishable in a peer reviewed journal, though perhaps it would make sense for small studies showing weak evidence for effects to be accepted to reduce the negative correlation between study size and reported effect size. My advice is primarily directed at psychology students completing coursework and dissertations.
    
    2) If a Bonferroni correction has been applied to keep the family-wise error rate at .05 then I would ignore all effects with p values of over .05 and use “weak evidence” to refer to effects where p<.05 but is not under the corrected value.
  - notabiostatsperson | April 27, 2016 at 4:20 pm |
    
    Would be acceptable to use the phrase “there was an association between x and y, though not statistically significant” if p-value is between 0.5-0.1? Is it true that “p-values tend to become smaller as sample size increases, unless H0 is true”? Is it possible to assume that a similar study with a larger sample might yield a statistically significant result for the same association?
Pingback: P-Values interpretation | Poisoned Coffee
Pingback: A new word for statistical significance: ‘psignificant’ | Liam Shaw – Blog
Pingback: Statistical significance and clinical importance | Anne Bruton's Blog
Jill Hartmann | March 27, 2015 at 6:10 pm | Reply

Should the approach used be important. For instance, if a within subject design is used, then anything bigger than .05 should be deemed insignificant and not subject to further discussion. This is because of the power of the design. Are there any cites on the inappropriateness of going higher than .05?
Pingback: Saturday assorted links
Pingback: Saturday assorted links | Homines Economici
Jameson Burt | April 12, 2015 at 5:41 am | Reply

The significance testing business has been recommended against by the National Academy of Sciences. Two months ago, the first journal, a journal of applied social psychology banned significance tests. Many years ago, a PLOS article showed that the probability a published article has an actual relationship is
PPV = 1 / [1 + alpha/(1 – beta)*R]
where alpha is often .05, 1-beta is the power, and R is the proportion of actual relationships among such tests in the field (you can get creative with “field”).
PPV is the Positive Predictive Value of published articles in the field.
While alpha is usually set by the statistician at .05, 1 – beta is bounded between 0 and 1, and for better designs 1- beta is near 1.
You would think better power would improve matters, but we’re looking at published articles, so the best 1 – beta can do is 1.
As a result, for fields with the proportion of true relationships small, journals publish almost no actual relationships (shall we say, ” barely any acceptable results”). You find this in many fields. For example, in genetics, if 30 of 30,000 genes cause a disease, then R is .001 and PPV is .02.
Yet, we imagine significance tests pulling out of just such situations relationships we wouldn’t otherwise notice. We’ve deceived ourselves. Although, a followup study in a redefined field with only previously significant relationships has a much better PPV.
This is similar to the problem of random numbers published in the back of books — random for the individual, but not random if we observe results when many people use the same random number table.
Deirdre McCloskey | April 12, 2015 at 1:47 pm | Reply

Dears,

Funny!

Some of you have “teetered on the brink” of the main point about Null Hypothesis Tests of Significance in the Absence of a Loss Function, which is, as has been known since Edgeworth and Gossett (“Student”) and Neyman and the younger Pearson, that any level is not the same as importance. Fit is not the same as oomph. It just isn’t, unless you have some way of translating probability space into consequence space. You might want to read, slowly, McCloskey and Ziliak, The Cult of Statistical Significance (University Of Michigan Press, 2008).

Deirdre N McCloskey
Pingback: Long time no blog…! | maria r. andersen
Pingback: PSA: p-Values are Thresholds, Not Approximations |
fergdoug | April 21, 2015 at 5:47 pm | Reply

A lot of these are as ludicrous as “almost not pregnant” or “nearly a virgin”
Pingback: Friday AM Reads | The Big Picture
RLD | May 27, 2015 at 2:18 am | Reply

The problem comes from the need to spin 0.06 into a positive result for your research question.
The answer comes from publishing negative/non-significant results, but we all know journals don’t do this.
The full solution comes from publishing your results as a Data Note at e.g. GigaScience or Scientific Data. By releasing the data in a curated, peer reviewed, curated and citable manner, you increase the chance of citations for your non-significant result because your data is still useful for method development, meta-analyses, increasing the numbers [of controls at least] in other studies etc. The publication is about your methods of data collection, not your specific research question – although you will record the non-significance in your publication.
Psych Stats Tutor | May 27, 2015 at 8:38 am | Reply

Love it! Am sharing with student network. Personally am looking forward to time out to learn R
telescoper | May 27, 2015 at 8:40 am | Reply

Reblogged this on In the Dark and commented:
I just couldn’t resist reblogging this post because of the wonderful list of meaningless convoluted phrases people when they don’t get a “statistically significant” result. I particularly like

“a robust trend toward significance”

It’s scary to think that these were all taken from peer-reviewed scientific journals…
jilly | May 27, 2015 at 11:23 am | Reply

Reblogged this on fluffysciences and commented:
There has been a lack of posting lately, mostly because a very busy marking schedule has caught up with me.

So I hope you will enjoy this link to ‘Probable Error’, which has spent a very likely significant amount of time rounding up all the ways scientists use to describe P values which aren’t anywhere near significant at all.

Given the paper I’m currently reviewing reporting a tendency of P=0.07, I was highly amused!
Veronica | May 27, 2015 at 3:27 pm | Reply

See also:
http://xkcd.com/1478/
Pingback: Nerdcore › Von P-Hacking, Clickbait-Bullshit und Schoko-Diäten
Pingback: Fabulous Finds II | Spatialists
Pingback: The language of insignificance | Management Briefs
Eugene Allevato | July 19, 2015 at 2:59 pm | Reply

I think we should just state the significance of the p-value as 0.049 or 0.051 and let the reader make a decision of how much risk the reader wants to assume.
reflectiveponderer | July 23, 2015 at 2:54 pm | Reply

Reblogged this on Sciception and commented:
Just when I was discussing significant p-values at work!Someone was insisting that values slightly above 0.05 are still believably significant..and then I found the exact terms used in this very list!
Pingback: Links & misc #4 | Hypermagical Ultraomnipotence
Mark R | October 7, 2015 at 7:32 pm | Reply

Why don’t authors ever claim that P=0.049 was approaching “non-significance”?
Huinca | October 12, 2015 at 7:38 pm | Reply

Ah, the p-value… very useful if you are tossing a coin.
Pingback: Bookmarks for October 14th | Chris's Digital Detritus
Oje | October 27, 2015 at 11:43 am | Reply

Funny!!
Thanks for making my online search for levels of significance surprisingly amusing.
Pingback: Does your data 'hover on the brink of significance?' - an insignificant, but hilarious detour
Jorge Ramírez | November 18, 2015 at 8:05 pm | Reply

Reblogged this on Chaos Theory and Pharmacology and commented:
HT:

Still Not Significant https://t.co/miXfGf8IGa via @mc_hankins #alltrials

— pharmagossip (@pharmagossip) November 18, 2015
juliemadblogger | November 18, 2015 at 8:19 pm | Reply

This is precisely why “studies” are overemphasized as the number one “evidence” but can be easily skewed in interpretation. This is precisely why individual lived experience with what is being tested is more valuable. We need to hear more lived experience stories, precisely what it felt like, whether it helped, if anything went wrong, and if they’d recommend it to someone they loved. You can’t replicate any of that in a “study.” Yet these studies are the tools that are shaping medicine. Funny, the ones in power, the decision-makers, rarely ask the guinea pigs, nor do they want to hear these very real stories.
ngruen | December 6, 2015 at 1:07 pm | Reply

I haven’t read all the comments, but found this whole post embarrassing, though some of the comments seem to understand the point I’m about to make.

As some have pointed out the threshold of statistical significance is arbitrary. ARBITRARY – capiche?

So as the p value rises from 0.000001 it becomes less statistically significant – that’s all you can say.

Beyond that you need to use the most rigorous reasoning you can in the context you are in, consider the risks you are taking both to act on the knowledge you have or not to act. Sometimes – very rarely – the threshold between acting and not acting on the knowledge you have might occur at or around p = 0.05.

The rest of the time, if you proceeded according to what I’ve set out, which ought to be the most basic commonsense, especially amongst those who have been through years of training and practice in these matters, you’ll come up with a different number.
- mchankins | December 6, 2015 at 3:24 pm | Reply
  
  I think you’ve missed the point.
  
  In all of these papers the authors adopted a threshold of 0.05: yes, it’s an arbitrary threshold but that’s the one they chose. Having declared that they would accept/reject based on a threshold of 0.05, the authors then fail to report a “non-significant” result as such.
  
  That’s all this post is about – it’s not about how you choose the significance threshold, nor whether a threshold for a binary decision is how it should be done.
  
  In short, it’s not the argument you think it is.
ngruen | December 7, 2015 at 3:16 pm | Reply

“Having declared that they would accept/reject based on a threshold of 0.05, the authors then fail to report a “non-significant” result as such.”

I don’t think that’s quite right is it? Perhaps that should read “Having chosen to become academic researchers and submit journals the referees of which tend to impose arbitrary thresholds on tests, the academics sought to verbally justify (usually) minor excesses of the threshold.
- mchankins | December 7, 2015 at 4:28 pm | Reply
  
  That’s simultaneously quite a harsh accusation (academics knowingly publish incorrect interpretations of their analyses) and quite a generous one (their obfuscation is merely drawing attention to the arbitrariness of the threshold).
  - ngruen | December 8, 2015 at 2:25 am |
    
    They don’t publish “knowingly incorrect” interpretations, they find ways of putting it that sound good. They lack candour – candour being the first casualty of bureaucracy.
  - mchankins | December 8, 2015 at 12:04 pm |
    
    But “they find ways of putting it that sound good” is the point of the post – so I don’t understand why you found it embarrassing.
Snowdawgjack | December 15, 2015 at 10:29 pm | Reply

If p < .05 is significant, then is .05 < p < .10 your significant other?
Nan Xiao | December 23, 2015 at 11:20 pm | Reply

Just built a Shiny app based on the data in this blog post, have fun 🙂

Signify is a web application for making your (>0.05) p-values sound significant. https://t.co/1t122H4VlQ #shiny #rstats

— Nan Xiao 肖楠 (@nanxstats) December 22, 2015
geokush | January 2, 2016 at 11:35 pm | Reply

Reblogged this on Dr Geoff Kushnick and commented:
Great list of ways to refer to “close to, but not really, significant results.” Given how much P values can jump around using samples from the same population,my suggestion is to give the actual P value and talk about the effect size. No need to describe the P value itself.
ngruen | January 6, 2016 at 3:31 am | Reply

Here’s Ziliac’s depressing latest on the state of affairs.

Significance Controversy in the Past
This is not the first time in history that statistical significance has been on trial. “Significance” was only a partial argument from odds from the beginning, Francis Ysidro Edgeworth (1885, p. 208), who coined the term, clearly perceived. Galton and Pearson saw in the test more security than they might have. But by 1905 Student himself—that is William Sealy Gosset aka “Student”, the inventor of Student’s t, and eventual Head Brewer of Guinness—warned in a letter to Karl Pearson about economic and other substantive losses that can be caused by following a bright line rule of statistical significance:

When I first reported on the subject [of “The Application of the ‘Law of Error’ to the Work of the Brewery” (1904)], I thought that perhaps there might be some degree of probability which is conventionally treated as sufficient in such work as ours and I advised that some outside authority [such as you, Karl Pearson] should be consulted as to what certainty is required to aim at in large scale work. However it would appear that in such work as ours the degree of certainty to be aimed at must depend on the pecuniary advantage to be gained by following the result of the experiment, compared with the increased cost of the new method, if any, and the cost of each experiment (quoted in Ziliak 2008, p. 207).

Student’s rejection of a bright-line accept-reject standard was echoed a few years on by Harvard psychologist Edwin Boring (1919), warning about the difference between substantive and merely statistical significance in psychological research. Yet mindless tests and uses of statistical significance raged on, heedless of warnings from its eminent discoverers.
thetruthfulheretic | January 7, 2016 at 2:24 am | Reply

I’m finishing an essay on this subject (soon hopefully coming out as a self-published book) in an attempt to introduce students to these concepts. I used some examples on this page and it was a delight to find.

I also came across a paper by Wood et al. called “Trap of trends to statistical significance: likelihood of near significant P value becoming more significant with extra data” Published in BMJ. The paper is very relevant to the discussions surrounding p-value and this page.

One of their point (directly quoted) is as follows: “Describing near significant P values as “trends towards significance”’ (or similar) is not just inappropriate but actively misleading, as such P values would be quite likely to become less significant if extra data were collected”

On the other hand they also mention that p-values around 0.05 show modest degrees of evidence, no matter which side of the threshold they fall. Which leads them to say calling (for example) p=0.06 an “interesting hint” may be a good choice.

I also became aware of technical discussions between Bayesians and Frequentists on the subject. I’m no statistician but I hope to learn more about this subject since it is very interesting, not to mention the impact that it can have on inferential statistics and scientific reporting.
Pingback: Flirting with conventional levels of significance, p = .06 | Motor behaviour
Pingback: First Post – Approaching Significance
Pingback: Still not significant – NRIN
gshenaut | March 7, 2016 at 12:22 am | Reply

At the risk of embarrassing myself, I personally do two things: I report exact p values to three decimal places, and I do use terms like “[marginally] significant”. My purpose in the former is to avoid using levels of significance, which can obscure the pattern of results; my purpose in the latter is to provide useful information to the reader about the pattern of results. Note that a strict “significant/not significant” dichotomy would lump results such as p = .234 with those such as p = .051, which in my view would usually be misleading. Terminology such as significant, highly significant, and marginally significant, when used in the absence of alpha levels or other specific levels of significance, return to their original colloquial, non-technical meaning (meaningful, remarkable, etc.).
Pingback: 5 Tips For Avoiding P-Value Potholes | Absolutely Maybe
Pingback: 5 Tips For Avoiding P-Value Potholes | PLOS Blogs Network
Theo Hewitt | June 5, 2016 at 6:05 am | Reply

I believe this site has got some very great info for everyone :D. “This is an age in which one cannot find common sense without a search warrant.” by George Will.
Matt | September 12, 2016 at 5:50 pm | Reply

Just saw one in a paper I’m reviewing “Although statistically was in borderline (p=0.05)”
Momo Liu | March 1, 2017 at 10:19 pm | Reply

Thank you for putting this together!!! I’m using it right now!
staphy | March 13, 2017 at 4:33 pm | Reply

I don’t want to defend any of these 500 phrases, but I have an impression why researchers are so apt to do so. It is about one thing not really mentioned here: the 2nd-order error. If you got a p-value of 0.051 it is intuitively clear to most scientists that they are just about to get prone to this kind of error. So they try to find a phrase that mirrors their intuition, namely that a difference, though not significant, is still quite probable. Since it was a real near-miss, those guys wantes to make clear with (inapt) words that they did not find “no difference”, but simply could not show a difference (though having in mind that stating a difference would still be correct in 18 of 20 cases).
Le’s say, a certain chemical could not be shown to cause cancer by a p-value of 0.051. I think nobody of the people around here would swallow that stuff anyway because intuition tells them it would not be quite a good idea. I think a near-miss p-value has to be discussed in this respect, though maybe with other and more adequate words.
When being as strict as most people here, always keep in mind the 2nd-oder error! Negative consequences of a medical study might be more severe when not naming and discussing a near-miss and just sticking to “significant” or “not significant”. The world is not black and white.
Pingback: Analyzing Accupedo step count data in R: Part 2 – Adding weather data – Mubashir Qasim
Pingback: Does your data ‘hover on the brink of significance?’ – an insignificant, but hilarious detour – Do Ya Know
Pingback: The persistent p-value - NRIN
Pingback: Statistics in SQL: The Kruskal–Wallis Test | OnCall DBA
A good Biologist | July 30, 2017 at 12:55 pm | Reply

I do not agree with this article, statistics should be the last resort for showing differences between data set. If it looks different, we do not need statistics at all to show differences! The p-value is to make statisticians happy. It also depends greatly on the area of study. In physiological studies, statistics is not so important when comparing with the actually changes physiologically. For example, if you see color differences visually, you simply can’t say two things are the same because P>0.05.
- mchankins | September 29, 2018 at 10:27 pm | Reply
  
  You should probably look into ‘sampling error’.
Pingback: The uncanny mountain: p-values between .01 and .10 are still a problem – The 100% CI
Jason McGovern (@maccgizzle) | October 27, 2018 at 3:35 am | Reply

I find this enlightening while also perplexing. Sure, if we’re trying to launch a manned rocket into space, we need to be absolutely sure it will succeed. But in business, we have to make decisions in a gray area while competition is disrupting our landscape, and we don’t usually have the luxury of waiting for a perfect p < 0.05 result. In business, it's about risk and reward. If the risk is low, and the reward is high, I might take that gamble based upon p < 0.20. Now if the risk is high, as in I might go bankrupt if I'm wrong, sure I'm going for p < 0.05 (if not lower). I believe that the key is where you set your threshold. I don't feel that it should just always default to p <= 0.05. But yet it seems that's what everyone just does without any thought towards the risk/reward ratio.
Pingback: Conceptual Statistical Overview – Kirosdsi
Pingback: Psychologists Love To Report “Marginally Significant” Results, According To A New Analysis – Research Digest
Pingback: Psychologists Love To Report “Marginally Significant” Results, According To A New Analysis - Enrichment Realm
David | May 11, 2019 at 4:42 am | Reply

If .05 is an “arbitrary” threshold to begin with, why is it incorrect to say something like .06 is “almost” significant? How does that wording obscure the fact that the result is nonsignificant by the definition of .05? If I have 99 kittens, is it “misrepresenting” something to say that I have almost 100 kittens? “Almost” has a specific meaning too.
Pingback: Getting coefficient significance stars in LaTeX tables - Poetry, compositions, scientific stuff by Kostyrka
Twinkieboy | July 23, 2019 at 7:40 pm | Reply

To those who insist on using these terms (and thereby clearly not understanding what a P value represents), we have to only ask: “Have you ever described a result, with a P value of 0.049 as “approaching insignificance” or “almost insignificant”?”. Surely, if one values a P=0.06 or P=0.055, then one should equally be skeptical about a P=0.048.