What to do if your p-value is just over the arbitrary threshold for ‘significance’ of p=0.05?

You don’t need to play the significance testing game – there are better methods, like quoting the effect size with a confidence interval – but if you do, the rules are simple: the result is either significant or it isn’t.

So if your p-value remains stubbornly higher than 0.05, you should call it ‘non-significant’ and write it up as such. The problem for many authors is that this just isn’t the answer they were looking for: publishing so-called ‘negative results’ is harder than ‘positive results’.

The solution is to apply the time-honoured tactic of circumlocution to disguise the non-significant result as something more interesting. The following list is culled from peer-reviewed journal articles in which (a) the authors set themselves the threshold of 0.05 for significance, (b) failed to achieve that threshold value for p and (c) described it in such a way as to make it seem more interesting.

As well as being statistically flawed (results are either significant or not and can’t be qualified), the wording is linguistically interesting, often describing an aspect of the result that just doesn’t exist. For example, “a trend towards significance” expresses non-significance as some sort of motion towards significance, which it isn’t: there is no ‘trend’, in any direction, and nowhere for the trend to be ‘towards’.

Some further analysis will follow, but for now here is the list in full (UPDATE: now in alpha-order):

(barely) not statistically significant (p=0.052)

a barely detectable statistically significant difference (p=0.073)

a borderline significant trend (p=0.09)

a certain trend toward significance (p=0.08)

a clear tendency to significance (p=0.052)

a clear trend (p<0.09)

a clear, strong trend (p=0.09)

a considerable trend toward significance (p=0.069)

a decreasing trend (p=0.09)

a definite trend (p=0.08)

a distinct trend toward significance (p=0.07)

a favorable trend (p=0.09)

a favourable statistical trend (p=0.09)

a little significant (p<0.1)

a margin at the edge of significance (p=0.0608)

a marginal trend (p=0.09)

a marginal trend toward significance (p=0.052)

a marked trend (p=0.07)

a mild trend (p<0.09)

a moderate trend toward significance (p=0.068)

a near-significant trend (p=0.07)

a negative trend (p=0.09)

a nonsignificant trend (p<0.1)

a nonsignificant trend toward significance (p=0.1)

a notable trend (p<0.1)

a numerical increasing trend (p=0.09)

a numerical trend (p=0.09)

a positive trend (p=0.09)

a possible trend (p=0.09)

a possible trend toward significance (p=0.052)

a pronounced trend (p=0.09)

a reliable trend (p=0.058)

a robust trend toward significance (p=0.0503)

a significant trend (p=0.09)

a slight slide towards significance (p<0.20)

a slight tendency toward significance(p<0.08)

a slight trend (p<0.09)

a slight trend toward significance (p=0.098)

a slightly increasing trend (p=0.09)

a small trend (p=0.09)

a statistical trend (p=0.09)

a statistical trend toward significance (p=0.09)

a strong tendency towards statistical significance (p=0.051)

a strong trend (p=0.077)

a strong trend toward significance (p=0.08)

a substantial trend toward significance (p=0.068)

a suggestive trend (p=0.06)

a trend close to significance (p=0.08)

a trend significance level (p=0.08)

a trend that approached significance (p<0.06)

a very slight trend toward significance (p=0.20)

a weak trend (p=0.09)

a weak trend toward significance (p=0.12)

a worrying trend (p=0.07)

all but significant (p=0.055)

almost achieved significance (p=0-065)

almost approached significance (p=0.065)

almost attained significance (p<0.06)

almost became significant (p=0.06)

almost but not quite significant (p=0.06)

almost clinically significant (p<0.10)

almost insignificant (p>0.065)

almost marginally significant (p>0.05)

almost non-significant (p=0.083)

almost reached statistical significance (p=0.06)

almost significant (p=0.06)

almost significant tendency (p=0.06)

almost statistically significant (p=0.06)

an adverse trend (p=0.10)

an apparent trend (p=0.286)

an associative trend (p=0.09)

an elevated trend (p<0.05)

an encouraging trend (p<0.1)

an established trend (p<0.10)

an evident trend (p=0.13)

an expected trend (p=0.08)

an important trend (p=0.066)

an increasing trend (p<0.09)

an interesting trend (p=0.1)

an inverse trend toward signiﬁcance (p=0.06)

an observed trend (p=0.06)

an obvious trend (p=0.06)

an overall trend (p=0.2)

an unexpected trend (p=0.09)

an unexplained trend (p=0.09)

an unfavorable trend (p<0.10)

appeared to be marginally significant (p<0.10)

approached acceptable levels of statistical significance (p=0.054)

approached but did not quite achieve significance (p>0.05)

approached but fell short of significance (p=0.07)

approached conventional levels of significance (p<0.10)

approached near significance (p=0.06)

approached our criterion of significance (p>0.08)

approached significant (p=0.11)

approached the borderline of significance (p=0.07)

approached the level of signiﬁcance (p=0.09)

approached trend levels of significance (p0.05)

approached, but did reach, significance (p=0.065)

approaches but fails to achieve a customary level of statistical significance (p=0.154)

approaches statistical significance (p>0.06)

approaching a level of significance (p=0.089)

approaching an acceptable significance level (p=0.056)

approaching borderline significance (p=0.08)

approaching borderline statistical significance (p=0.07)

approaching but not reaching significance (p=0.53)

approaching clinical significance (p=0.07)

approaching close to significance (p<0.1)

approaching conventional significance levels (p=0.06)

approaching conventional statistical significance (p=0.06)

approaching formal significance (p=0.1052)

approaching independent prognostic significance (p=0.08)

approaching marginal levels of significance p<0.107)

approaching marginal significance (p=0.064)

approaching more closely significance (p=0.06)

approaching our preset significance level (p=0.076)

approaching prognostic significance (p=0.052)

approaching significance (p=0.09)

approaching the traditional significance level (p=0.06)

approaching to statistical significance (p=0.075)

approaching, although not reaching, significance (p=0.08)

approaching, but not reaching, significance (p<0.09)

approximately significant (p=0.053)

approximating significance (p=0.09)

arguably significant (p=0.07)

as good as significant (p=0.0502)

at the brink of significance (p=0.06)

at the cusp of significance (p=0.06)

at the edge of significance (p=0.055)

at the limit of significance (p=0.054)

at the limits of significance (p=0.053)

at the margin of significance (p=0.056)

at the margin of statistical significance (p<0.07)

at the verge of significance (p=0.058)

at the very edge of significance (p=0.053)

barely below the level of significance (p=0.06)

barely escaped statistical significance (p=0.07)

barely escapes being statistically significant at the 5% risk level (0.1>p>0.05)

barely failed to attain statistical significance (p=0.067)

barely fails to attain statistical significance at conventional levels (p<0.10

barely insignificant (p=0.075)

barely missed statistical significance (p=0.051)

barely missed the commonly acceptable significance level (p<0.053)

barely outside the range of significance (p=0.06)

barely significant (p=0.07)

below (but verging on) the statistical significant level (p>0.05)

better trends of improvement (p=0.056)

bordered on a statistically significant value (p=0.06)

bordered on being significant (p>0.07)

bordered on being statistically significant (p=0.0502)

bordered on but was not less than the accepted level of significance (p>0.05)

bordered on significant (p=0.09)

borderline conventional significance (p=0.051)

borderline level of statistical significance (p=0.053)

borderline signiﬁcant (p=0.09)

borderline significant trends (p=0.099)

close to a marginally significant level (p=0.06)

close to being significant (p=0.06)

close to being statistically signiﬁcant (p=0.055)

close to borderline signiﬁcance (p=0.072)

close to the boundary of significance (p=0.06)

close to the level of significance (p=0.07)

close to the limit of significance (p=0.17)

close to the margin of significance (p=0.055)

close to the margin of statistical significance (p=0.075)

closely approaches the brink of signiﬁcance (p=0.07)

closely approaches the statistical significance (p=0.0669)

closely approximating significance (p>0.05)

closely not significant (p=0.06)

closely significant (p=0.058)

close-to-signiﬁcant (p=0.09)

did not achieve conventional threshold levels of statistical significance (p=0.08)

did not exceed the conventional level of statistical significance (p<0.08)

did not quite achieve acceptable levels of statistical significance (p=0.054)

did not quite achieve significance (p=0.076)

did not quite achieve the conventional levels of significance (p=0.052)

did not quite achieve the threshold for statistical significance (p=0.08)

did not quite attain conventional levels of significance (p=0.07)

did not quite reach a statistically significant level (p=0.108)

did not quite reach conventional levels of statistical significance (p=0.079)

did not quite reach statistical significance (p=0.063)

did not reach the traditional level of signiﬁcance (p=0.10)

did not reach the usually accepted level of clinical significance (p=0.07)

difference was apparent (p=0.07)

direction heading towards significance (p=0.10)

does not appear to be sufficiently significant (p>0.05)

does not narrowly reach statistical significance (p=0.06)

does not reach the conventional significance level (p=0.098)

effectively significant (p=0.051)

equivocal significance (p=0.06)

essentially significant (p=0.10)

extremely close to signiﬁcance (p=0.07)

failed to reach significance on this occasion (p=0.09)

failed to reach statistical significance (p=0.06)

fairly close to significance (p=0.065)

fairly significant (p=0.09)

falls just short of standard levels of statistical significance (p=0.06)

fell (just) short of significance (p=0.08)

fell barely short of significance (p=0.08)

fell just short of significance (p=0.07)

fell just short of statistical significance (p=0.12)

fell just short of the traditional definition of statistical significance (p=0.051)

fell marginally short of significance (p=0.07)

fell narrowly short of significance (p=0.0623)

fell only marginally short of significance (p=0.0879)

fell only short of significance (p=0.06)

fell short of significance (p=0.07)

fell slightly short of significance (p>0.0167)

fell somewhat short of significance (p=0.138)

felt short of significance (p=0.07)

flirting with conventional levels of significance (p>0.1)

heading towards significance (p=0.086)

highly significant (p=0.09)

hint of significance (p>0.05)

hovered around signiﬁcance (p = 0.061)

hovered at nearly a significant level (p=0.058)

hovering closer to statistical significance (p=0.076)

hovers on the brink of significance (p=0.055)

in the edge of significance (p=0.059)

in the verge of significance (p=0.06)

inconclusively significant (p=0.070)

indeterminate significance (p=0.08)

indicative significance (p=0.08)

is just outside the conventional levels of significance

just about significant (p=0.051)

just above the arbitrary level of signiﬁcance (p=0.07)

just above the margin of significance (p=0.053)

just at the conventional level of significance (p=0.05001)

just barely below the level of significance (p=0.06)

just barely failed to reach significance (p<0.06)

just barely insignificant (p=0.11)

just barely statistically signiﬁcant (p=0.054)

just beyond significance (p=0.06)

just borderline significant (p=0.058)

just escaped significance (p=0.07)

just failed significance (p=0.057)

just failed to be significant (p=0.072)

just failed to reach statistical significance (p=0.06)

just failing to reach statistical significance (p=0.06)

just fails to reach conventional levels of statistical significance (p=0.07)

just lacked significance (p=0.053)

just marginally significant (p=0.0562)

just missed being statistically significant (p=0.06)

just missing significance (p=0.07)

just on the verge of significance (p=0.06)

just outside accepted levels of significance (p=0.06)

just outside levels of significance (p<0.08)

just outside the bounds of significance (p=0.06)

just outside the conventional levels of significance (p=0.1076)

just outside the level of significance (p=0.0683)

just outside the limits of significance (p=0.06)

just outside the traditional bounds of significance (p=0.06)

just over the limits of statistical significance (p=0.06)

just short of significance (p=0.07)

just shy of significance (p=0.053)

just skirting the boundary of significance (p=0.052)

just tendentially signiﬁcant (p=0.056)

just tottering on the brink of significance at the 0.05 level

just very slightly missed the significance level (p=0.086)

leaning towards significance (p=0.15)

leaning towards statistical significance (p=0.06)

likely to be significant (p=0.054)

loosely significant (p=0.10)

marginal significance (p=0.07)

marginally and negatively significant (p=0.08)

marginally insignificant (p=0.08)

marginally nonsignificant (p=0.096)

marginally outside the level of significance

marginally significant (p>=0.1)

marginally significant tendency (p=0.08)

marginally statistically significant (p=0.08)

may not be signiﬁcant (p=0.06)

medium level of significance (p=0.051)

mildly signiﬁcant (p=0.07)

missed narrowly statistical significance (p=0.054)

moderately significant (p>0.11)

modestly significant (p=0.09)

narrowly avoided significance (p=0.052)

narrowly eluded statistical significance (p=0.0789)

narrowly escaped significance (p=0.08)

narrowly evaded statistical significance (p>0.05)

narrowly failed significance (p=0.054)

narrowly missed achieving significance (p=0.055)

narrowly missed overall significance (p=0.06)

narrowly missed significance (p=0.051)

narrowly missed standard significance levels (p<0.07)

narrowly missed the significance level (p=0.07)

narrowly missing conventional significance (p=0.054)

near limit significance (p=0.073)

near miss of statistical significance (p>0.1)

near nominal significance (p=0.064)

near significance (p=0.07)

near to statistical significance (p=0.056)

near/possible significance(p=0.0661)

near-borderline significance (p=0.10)

near-certain signiﬁcance (p=0.07)

nearing significance (p<0.051)

nearly acceptable level of significance (p=0.06)

nearly approaches statistical significance (p=0.079)

nearly borderline significance (p=0.052)

nearly negatively significant (p<0.1)

nearly positively significant (p=0.063)

nearly reached a significant level (p=0.07)

nearly reaching the level of significance (p<0.06)

nearly significant (p=0.06)

nearly significant tendency (p=0.06)

nearly, but not quite significant (p>0.06)

near-marginal significance (p=0.18)

near-significant (p=0.09)

near-to-significance (p=0.093)

near-trend significance (p=0.11)

nominally significant (p=0.08)

non-insignificant result (p=0.500)

non-significant in the statistical sense (p>0.05

not absolutely significant but very probably so (p>0.05)

not as significant (p=0.06)

not clearly significant (p=0.08)

not completely significant (p=0.07)

not completely statistically signiﬁcant (p=0.0811)

not conventionally significant (p=0.089), but..

not currently significant (p=0.06)

not decisively significant (p=0.106)

not entirely significant (p=0.10)

not especially significant (p>0.05)

not exactly significant (p=0.052)

not extremely significant (p<0.06)

not formally significant (p=0.06)

not fully significant (p=0.085)

not globally significant (p=0.11)

not highly significant (p=0.089)

not insignificant (p=0.056)

not markedly significant (p=0.06)

not moderately significant (P>0.20)

not non-significant (p>0.1)

not numerically significant (p>0.05)

not obviously signiﬁcant (p>0.3)

not overly significant (p>0.08)

not quite borderline significance (p>=0.089)

not quite reach the level of significance (p=0.07)

not quite significant (p=0.118)

not quite within the conventional bounds of statistical significance (p=0.12)

not reliably signiﬁcant (p=0.091)

not remarkably signiﬁcant (p=0.236)

not significant by common standards (p=0.099)

not significant by conventional standards (p=0.10)

not significant by traditional standards (p<0.1)

not significant in the formal statistical sense (p=0.08)

not significant in the narrow sense of the word (p=0.29)

not significant in the normally accepted statistical sense (p=0.064)

not significantly significant but..clinically meaningful (p=0.072)

not statistically quite significant (p<0.06)

not strictly significant (p=0.06)

not strictly speaking significant (p=0.057)

not technically significant (p=0.06)

not that significant (p=0.08)

not to an extent that was fully statistically signiﬁcant (p=0.06)

not too distant from statistical significance at the 10% level

not too far from significant at the 10% level

not totally significant (p=0.09)

not unequivocally significant (p=0.055)

not very definitely significant (p=0.08)

not very definitely significant from the statistical point of view (p=0.08)

not very far from significance (p<0.092)

not very significant (p=0.1)

not very statistically significant (p=0.10)

not wholly significant (p>0.1)

not yet significant (p=0.09)

not strongly significant (p=0.08)

noticeably signiﬁcant (p=0.055)

on the border of significance (p=0.063)

on the borderline of significance (p=0.0699)

on the borderlines of significance (p=0.08)

on the boundaries of signiﬁcance (p=0.056)

on the boundary of signiﬁcance (p=0.055)

on the brink of significance (p=0.052)

on the cusp of conventional statistical significance (p=0.054)

on the cusp of significance (p=0.058)

on the edge of significance (p>0.08)

on the limit to significant (p=0.06)

on the margin of significance (p=0.051)

on the threshold of significance (p=0.059)

on the verge of significance (p=0.053)

on the very borderline of significance (0.05<p<0.06)

on the very fringes of signiﬁcance (p=0.099)

on the very limits of significance (0.1>p>0.05)

only a little short of significance (p>0.05)

only just failed to meet statistical significance (p=0.051)

only just insignificant (p>0.10)

only just missed significance at the 5% level

only marginally fails to be significant at the 95% level (p=0.06)

only marginally nearly insignificant (p=0.059)

only marginally significant (p=0.9)

only slightly less than significant (p=0.08)

only slightly missed the conventional threshold of significance (p=0.062)

only slightly missed the level of significance (p=0.058)

only slightly missed the significance level (p=0·0556)

only slightly non-signiﬁcant (p=0.0738)

only slightly significant (p=0.08)

partial significance (p>0.09)

partially significant (p=0.08)

partly significant (p=0.08)

perceivable statistical significance (p=0.0501)

possible significance (p<0.098)

possibly marginally significant (p=0.116)

possibly significant (0.05<p>0.10)

possibly statistically significant (p=0.10)

potentially significant (p>0.1)

practically significant (p=0.06)

probably not experimentally significant (p=0.2)

probably not significant (p>0.25)

probably not statistically significant (p=0.14)

probably significant (p=0.06)

provisionally significant (p=0.073)

quasi-significant (p=0.09)

questionably significant (p=0.13)

quite close to significance at the 10% level (p=0.104)

quite significant (p=0.07)

rather marginal significance (p>0.10)

reached borderline significance (p=0.0509)

reached near significance (p=0.07)

reasonably significant (p=0.07)

remarkably close to significance (p=0.05009)

resides on the edge of significance (p=0.10)

roughly significant (p>0.1)

scarcely significant (0.05<p>0.1)

significant at the .07 level

significant tendency (p=0.09)

significant to some degree (0<p>1)

significant, or close to significant effects (p=0.08, p=0.05)

significantly better overall (p=0.051)

significantly significant (p=0.065)

similar but not nonsigniﬁcant trends (p>0.05)

slight evidence of significance (0.1>p>0.05)

slight non-significance (p=0.06)

slight significance (p=0.128)

slight tendency toward significance (p=0.086)

slightly above the level of signiﬁcance (p=0.06)

slightly below the level of signiﬁcance (p=0.068)

slightly exceeded signiﬁcance level (p=0.06)

slightly failed to reach statistical signiﬁcance (p=0.061)

slightly insignificant (p=0.07)

slightly less than needed for significance (p=0.08)

slightly marginally significant (p=0.06)

slightly missed being of statistical significance (p=0.08)

slightly missed statistical significance (p=0.059)

slightly missed the conventional level of significance (p=0.061)

slightly missed the level of statistical significance (p<0.10)

slightly missed the margin of significance (p=0.051)

slightly not significant (p=0.06)

slightly outside conventional statistical significance (p=0.051)

slightly outside the margins of significance (p=0.08)

slightly outside the range of significance (p=0.09)

slightly outside the significance level (p=0.077)

slightly outside the statistical significance level (p=0.053)

slightly significant (p=0.09)

somewhat marginally significant (p>0.055)

somewhat short of significance (p=0.07)

somewhat significant (p=0.23)

somewhat statistically significant (p=0.092)

strong trend toward significance (p=0.08)

sufficiently close to significance (p=0.07)

suggestive but not quite significant (p=0.061)

suggestive of a significant trend (p=0.08)

suggestive of statistical significance (p=0.06)

suggestively significant (p=0.064)

tailed to insignificance (p=0.1)

tantalisingly close to significance (p=0.104)

technically not significant (p=0.06)

teetering on the brink of significance (p=0.06)

tend to significant (p>0.1)

tended to approach significance (p=0.09)

tended to be significant (p=0.06)

tended toward significance (p=0.13)

tendency toward significance (p approaching 0.1)

tendency toward statistical significance (p=0.07)

tends to approach signiﬁcance (p=0.12)

tentatively signiﬁcant (p=0.107)

too far from signiﬁcance (p=0.12)

trend bordering on statistical significance (p=0.066)

trend in a significant direction (p=0.09)

trend in the direction of significance (p=0.089)

trend significance level (p=0.06)

trend toward (p>0.07)

trending towards significance (p>0.15)

trending towards significant (p=0.099)

uncertain significance (p>0.07)

vaguely significant (p>0.2)

verged on being significant (p=0.11)

verging on significance (p=0.056)

verging on the statistically significant (p<0.1)

verging-on-significant (p=0.06)

very close to approaching significance (p=0.060)

very close to significant (p=0.11)

very close to the conventional level of significance (p=0.055)

very close to the cut-off for significance (p=0.07)

very close to the established statistical significance level of p=0.05 (p=0.065)

very close to the threshold of significance (p=0.07)

very closely approaches the conventional significance level (p=0.055)

very closely brushed the limit of statistical significance (p=0.051)

very narrowly missed significance (p<0.06)

very nearly significant (p=0.0656)

very slightly non-significant (p=0.10)

very slightly significant (p<0.1)

virtually significant (p=0.059)

weak significance (p>0.10)

weakened..significance (p=0.06)

weakly non-significant (p=0.07)

weakly significant (p=0.11)

weakly statistically significant (p=0.0557)

well-nigh signiﬁcant (p=0.11)

Reblogged this on Mr Epidemiology and commented:

A handy alphabetized list of various different ways of stating your results when p > .05! I think my favourites are “teetering on the brink of significance (P=0.06)” and “not significant in the narrow sense of the word (P=0.29)”

Pingback: Oh yeah, it’s significant. REALLY significant. | The Mad Scientist Confectioner's Club

Pingback: Tantalisingly close to significance | Quomodocumque

Funny but read this: http://library.mpib-berlin.mpg.de/ft/gg/GG_Null_2004.pdf

I love the 0.23 as “somewhat significant”. Um, no.

I would love to see the list sorted by p value instead of alphabetically.

Call it two requests…

Thanks for the chuckle – the list is indeed amusing, but the key point above is that the p-value threshold is arbitrary. This fact is now widely accepted, so a strict dichotomy between “significant” and “non-significant” no longer makes sense. It is a bit of a fudge – and one completely unnecessary if (e.g.) a Bayesian approach is adopted – but we prefer to see “significance” as a continuum; phrases such as “marginally significant” represent uncertainty in the threshold location and therefore do make some sort of sense.

It is *always* the case that calling p=0.49 “significant” and p=0.51 “non-significant” is just plain silly.

Good reply Mark. I concur completely and often tell clients to use the term marginally significant for values close to 0.05 (on either side). It is better that they talk about these things then just sweep them under the rug and ignore them because they are “not significant”.

Significance is not really the important question. How important is the finding? Report effect sizes!

Thanks for the comment. I think there is confusion over the threshold being arbitrary, i.e. 0.05 rather than 0.06, and the arbitrariness of having a threshold at all.

If we agree that there isn’t really a need for a threshold and just discuss the p-values directly, then ‘significant’ and ‘marginally significant’ both become meaningless.

Pingback: A borderline definite marginally mild notably numerically increasing suggestively verging on significant result | Scientific News

Reblogged this on JHND NOTES: The Journal of Human Nutrition and Dietetics Editor's Blog and commented:

Prospective authors and students take note. Not significant means not significant, no matter how much you wish it otherwise.

Sorry, a result that is not statistically significant can indeed have practical or clinical significance. It’s the flip side of the typical admonishment about statistical significance not necessarily having practical significance. So, it has nothing to do wishes. It’s odd to hold your viewpoint when NHST itself will reveal that a test statistic that results in a p less than .05 is not significantly different from many values of a test statistic that result in p values greater than .05. It seems that the wishing is that the values on each side of the .05 fence are actually different from one another. You can wish all you want, but a great number of them are not, by your own criterion.

That should read: ” that a test statistic that results in a p less than .05 is not significantly different from many values of a test statistic that result in p values greater than .05. ”

I must have deleted the end of the sentence while editing.

I do not agree. The problem is that a result not statistically significant is a “no result” or, perhaps better, is the absence of result. Doesn’t make much sense to talk about the practical significance of something whose existence has not been able to show. This situation is not the symmetric back of that other in that we have discussed the practical importance of an effect that has been demonstrated (statistically significant).

Pingback: Somewhere else, part 56 | Freakonometrics

Pingback: Some Links | Meta Rabbit

in APA there should not be a 0 before the decimal point

Pingback: On the present problems of publications, and possibly the coming futue? Some Labyrinthine musings. | Åse Fixes Science

Pingback: I’ve got your missing links right here (01 June 2013) – Phenomena: Not Exactly Rocket Science

I made a word cloud of the list with all the variations of the word “significant” removed and creative spelling standardized:

http://www.wordle.net/show/wrdl/6789217/Still_Not_Significant

Pingback: A borderline definite marginally mild notably numerically increasing suggestively verging on significant result | Neurobonkers.com

Pingback: “Presuntuosos” y “remilgados” en estadística | psy'n'thesis

To say a result is either significant or not is glib. First of all many people, especially non scientists, will confuse statistical significance with substantive significance ( economic, psychological whatever). So what if an effect is statistically significant but tiny? Statistical “insignificance” means you can’t reject the null of 0 but you can’t reject other hypotheses too: so why the fetish about one? The standard error tells you how *precisely* determined the result is.

Say you measure two effects, A with size 1 and confidence interval (-1,3) and B with size 0.5 with CI (0.3, 0.8) . Would you really conclude that B has a bigger effect than A? This would be silly but the practise of only counting “significant” results commonly leads to this.

What I conclude is that, most likely, there is a effect “B” of positive sign, and that, however, is much less likely that there is a effect “A” of positive sign. I believe that it is not appropriate to compare the magnitude of an effect that, reasonably, there is with the one of whose existence I am not a sufficient security.

Pingback: borderline significance | Game Dasein

Truly, G-d loves the .06 nearly as much as the .05!

Pingback: The Messy Machine » “Although our results are not significant…” (a rant)

Pingback: P-values: Destroying the Barrier Between Scientific and Creative Writing | Instead of Facebook

Pingback: Comment être sûr qu’un résultat scientifique est vrai ? | Science étonnante

@Aaron Levitt–I agree! Love to know the journals these were published in.

Pingback: Not-So-Critical Analysis | University of Glasgow SLS

Pingback: Rebecca D. Gill » Blog Archive » If it’s not significant,

Pingback: [轉載] When p-value is slightly larger than 0.05…. | 生活的紀念冊

Pingback: A Significantly Improved Significance Test. Not! | Patient 2 Earn

I incorporated your list in a test of significance (implemented in R). Every time the p-value is between 0.12-0.5 it randomly selects one of you “p excuses” 🙂

http://sumsar.net/blog/2014/02/a-significantly-improved-test/

Pingback: I’m Using the New Statistics |

Pingback: On the hazards of significance testing. Part 2: the false discovery rate, or how not to make a fool of yourself with P values

Pingback: The Cult of p(0.05) | Science.xcuz.me

Reblogged this on Le blog de Michaël.

I’ve never understood the statisticians’ overly dogmatic objections to the way these p-values are discussed. All a p-value of 0.05 means is there’s a 95% chance that the hypothesis was indeed working. A p-value of 0.051 means there was a 94.9% of the hypothesis was working. I agree that if one prespecifies 0.05 as the threshold then, yes, the p-value of 0.051 is not significant. But is to say it was “almost significant” such a travesty? Statisticians often treat a p-value of 0.051 the same as a p-value of 0.70…which makes little sense to anyone with some connection to logic. Should, for example, FDA have pretty strict inflexibility on p-values? Yes!! But please relax and use common sense when talking about the way some talk about these statistics. And, no, I’m not defending the p=0.29 example!! 🙂

No, the p-value of 0.05 doesn’t mean what you’re saying at all. You’re committing a logical fallacy. It tells you about the probability of your results if the null were actually true and has little to nothing to say about the probability that your alternative is true. Furthermore, statistical tests have philosophies of that underpin the very nature of the test and what you can take out of them for meaning. The 0.05 is arbitrary, yes, but modifying what values are important after you calculate them complete changes their meaning.

What you do imply is that you want a couple of different kinds of uses of p-values and that’s fine (although it’s not great as an evidence measure it’s been used as one); but people need to be clear a priori how they’re using them. Thus, statements like, “almost significant” usually have very little meaning because they’re post hoc efforts to cram one philosophy of statistics into another. State at the outset that your p-value is a measure of evidence, that you have no pre-conceived test per se, don’t mention testing, and you’d be fine using 0.06 in a qualitative statement about how believable the null is. But trying to do that afterwards is corrupts the value of doing any testing at all.

You might want to look at Gigerenzer’s “Mindless Statistics”.

This stems from the hybrid logic used in psychology in general. Rather than using a Fisherian “Report P-observed, replicate” or a Neymon-Pearsonian “Fix alpha, set sample size sufficient to detect departures of interest”, we have a “Pseudo fix alpha, mumble something about significance, complain a lot about the procedure not detecting a difference despite not doing prospective investigation about how the procedure would do”.

THANK YOU. I was starting to rip my hair out reading many of the other responses.

Bingo! You nailed it, John! The p-value has NOTHING to do with the alternate hypothesis. It’s easy to demonstrate. Generate a list of 1000 random numbers in XL in column A. Then extend this column across the spreadsheet for 1000 columns (so you have 1000×1000 random numbers. Then plug this dataset into a stats program (I used JMP) and ask the software to detect “significant associations between the outcome variable (column A) and any other predictor variable (all other individual columns). Amazingly (NOT!!!) 5% of the “predictor variables” will have a significant association with the outcome variable (at P<0.05, and some way less than that). Now think back – you created random sets of numbers. There is NO way that the data in Column XXX is associated with Column A. And that's because the p-value simply told us that this result occurred by chance.

Pingback: Somewhere else, part 123 | Freakonometrics

Pingback: Does researching casual marijuana use cause brain abnormalities? | Bits of DNA

Pingback: The Futility of Significance (Statistical, that is) | The Couch Psychologist

This and “Marginally Significant” reblogged on http://couchpsychologist.wordpress.com/2014/04/23/the-futility-of-significance-statistical-that-is/.

“(results are either significant or not and can’t be qualified)”

How’s that? There’s no reason why you would have to make a clear cut decision, for or against, in a scientific paper. It’s perfectly legitimate to report that a p-value of 0.051 for example provides weak (but clearly inconclusive) evidence against the H0.

But in all the examples the authors elected to do exactly that: make a clear-cut decision based on a threshold they themselves chose.

Sometimes we need clear and predefined cut-off points. For example, the decision on public funding of a medication can be based on an “adequate” proof of the existence of an effect (the significance compared to H0 or, what is the same, the limits of the CI95%) of a magnitude “sufficient” (the difference with the control group).

Pingback: Does Researching Casual Marijuana Use Cause Brain Abnormalities? | The Falling Darkness

Pingback: Verging on a borderline trend | Stats Chat

Pingback: Felix Schönbrodt's website

Pingback: Reanalyzing the Schnall/Johnson “cleanliness” data sets: New insights from Bayesian and robust approaches ← Patient 2 Earn

Pingback: Why Have Female Hurricanes Killed More People Than Male Ones? – Phenomena: Not Exactly Rocket Science

Do yo have a list of references from where you picked up this phrasing?

Yes – all of the expressions are from journal articles

Indeed. Do you have a list of those references matched to the expressions? Would be handy.

It would be a lengthy and joyless task to assemble them, but I might get around to it one day

Pingback: Female-Named Hurricanes Are Deadlier Than Males, But Why? – NBCNews.com | Premium News Update

Pingback: Why Have Female Hurricanes Killed More People Than Male Ones? | Gaia Gazette

In psychology, given how frequently we’re doing things like assuming normality when what we’ve got is approximate normality, it is definitely worth reporting nearly-significant results as such; minor violations of our assumptions and similar things are ubiquitous, so .002 away from significance is absolutely not the same as .5 away. For exactly the same reason that .001 and .05 are two different significance levels and we don’t just say “significant”.

Actually, precisely that’s why it is even more flawed, because the strictness of the test has already been lowered and with lowering it even further through accepting results above 0.05, the tests become almost meaningless since the probability to interpret a random difference as a real effect rises quite a lot.

The issue here is that the authors chose in the beginning to follow the statistical testing standards, including the 0.05 significance border, in order to gain some degree of confidence in their results. They present the results within this “frame”, but they actually fail to stick to the rules. Practically, they are misleading their reading audience – they rely on a statistical testing practice (which they don’t abide to) to make their claims more convincing before the readers.

While I agree that 0.002 is not the same as 0.5, there is again an issue of how far from 0.05 is too far to be considered a marginal difference. Marginally significant results are a valid issue indeed, but I guess that a replication study to confirm the questionable result, when possible, would be a far better solution than bending the data to fit the desired interpretation. In any case, you can’t label something as “a trend towards significance” and then treat it in further text as statistically significant (and usually not even return to the fact that the results are not significant in the discussion).

By the way, we do have 0.01 and 0.001, but there is no level of significance higher than 0.05.

Can I make a t-shirt of this list? I promise not to wear it to dissertation defenses.

Pingback: Hurricanes with feminine names are probably NOT more destructive | Matter Of Facts

Pingback: The RedPen/BlackPen Guide To The Sciencing! | The Mad Scientist Confectioner's Club

Curious what anyone would have to say about this article by David Healy: http://apt.rcpsych.org/content/12/5/320.full.pdf+html

Reblogged this on Gauss17gon and commented:

A must for a stats student starting research!

This blog was… how do I say it? Relevant!! Finally I’ve found something which helped me.

Many thanks!

I almost understand…………………………………………………………………….

more information needed………………………………………..lol

Hi there – was this review ever published formally? I’d love to cite it if it were published in a journal somewhere.

Thanks!

You’re right, I really should get around to writing this up…it can be cited as a blog entry though.

Thank you for this blog – just what I needed!

I feel that this article is a bit misguided (although the list is funny). In the real world there is no single godlike level for alpha beyond which all p values are meaningless. Surely we can decide the level we are willing to accept for alpha based on our knowledge of the experiment and the data or we can ignore any arbitrary cutoff completely and just report the actual p value and explain what it means in a way the reader can understand?

Well, that’s sort of the point: all the studies here had 0.05 as their level of significance. So it is arbitrary, but the authors all chose it. And then decided it was a movable feast only when the results weren’t what they were hoping for. The wording wasn’t chosen to help the reader understand; it’s a rhetorical device to mislead the reader.

Pingback: Science in the Abstract: Don’t Judge a Study by its Cover | Absolutely Maybe

I encourage my psychology students to comment on non-significant trends if p<.10 (unless they already have enough significant results to address). If the result is over .05 then it isn't significant at the accepted level but does provide weak evidence for an effect (i.e. weaker than if the p value was significant).

Following today's xkcd (1478) I'm wondering if there's a better way of putting it. However, I don't like most of the phrases in the above list, so maybe I should just stick with the ones I already use (non-significant trend, weak evidence for an effect).

I think that would be a mistake, starting with the notion of ‘non-significant trends’. There’s no ‘trend’ at all, just a ‘near-miss’, which is not the same thing. ‘Trend’ implies some movement towards significance, and you don’t get this from a single p-value. ‘Non-significant trend’ just means the same thing as ‘non-significant’, and ‘weak evidence for an effect’ is misleading. The distribution of p under the null hypothesis is flat, i.e. a p value of 0.06 is just as likely as 0.96; neither is a ‘trend’.

Whether or not there is a trend in the data (e.g. one mean is different from the others) is separate from the significance test which calculates the probability that the observed trend might have been due to chance, if the null hypothesis of no differences is correct. A value just above .05 still means that the null hypothesis is likely to be incorrect, but the evidence is weaker than the accepted cut-off.

That’s correct, but then what is the cut-off for? Treating the p-value as a sliding scale is justifiable, but the concept of ‘significance’ is then unnecessary. It’s inconsistent to choose a cut-off and then only apply it if the results go the way we want: ‘weak evidence for an effect’ is not the same as ‘by the standards set in this experiment the evidence was insufficient to conclude that there was an effect’.

I misread your phrasing on ‘trend’ as being applied to the significance, but I think the same argument applies (and also, I wouldn’t describe a difference in means as a trend).

I agree that “non-significant trend” doesn’t really fit when talking about the effect of an IV with 3+ levels. However, I stand by my advice to students about appropriately reporting effects that just miss the .05 criteria. If the effect is significant at the .05 level then then should discuss it with confidence. If the evidence for an effect is only significant at .10 then they can make tentative conclusions about it. I don’t like “weakly significant” because, as you say, any given p value is either significant at the given level or it isn’t. On the other hand, changing “there was a significant effect of IV on DV (…, p=.040)” to “there was weak evidence for an effect of IN on DV (…, p=.060)” is a reasonable and simple change to make.

Two caveats.

1) I am making no comment here about whether a study with no significant results is publishable in a peer reviewed journal, though perhaps it would make sense for small studies showing weak evidence for effects to be accepted to reduce the negative correlation between study size and reported effect size. My advice is primarily directed at psychology students completing coursework and dissertations.

2) If a Bonferroni correction has been applied to keep the family-wise error rate at .05 then I would ignore all effects with p values of over .05 and use “weak evidence” to refer to effects where p<.05 but is not under the corrected value.

Would be acceptable to use the phrase “there was an association between x and y, though not statistically significant” if p-value is between 0.5-0.1? Is it true that “p-values tend to become smaller as sample size increases, unless H0 is true”? Is it possible to assume that a similar study with a larger sample might yield a statistically significant result for the same association?

Pingback: P-Values interpretation | Poisoned Coffee

Pingback: A new word for statistical significance: ‘psignificant’ | Liam Shaw – Blog

Pingback: Statistical significance and clinical importance | Anne Bruton's Blog

Should the approach used be important. For instance, if a within subject design is used, then anything bigger than .05 should be deemed insignificant and not subject to further discussion. This is because of the power of the design. Are there any cites on the inappropriateness of going higher than .05?

Pingback: Saturday assorted links

Pingback: Saturday assorted links | Homines Economici

The significance testing business has been recommended against by the National Academy of Sciences. Two months ago, the first journal, a journal of applied social psychology banned significance tests. Many years ago, a PLOS article showed that the probability a published article has an actual relationship is

PPV = 1 / [1 + alpha/(1 – beta)*R]

where alpha is often .05, 1-beta is the power, and R is the proportion of actual relationships among such tests in the field (you can get creative with “field”).

PPV is the Positive Predictive Value of published articles in the field.

While alpha is usually set by the statistician at .05, 1 – beta is bounded between 0 and 1, and for better designs 1- beta is near 1.

You would think better power would improve matters, but we’re looking at published articles, so the best 1 – beta can do is 1.

As a result, for fields with the proportion of true relationships small, journals publish almost no actual relationships (shall we say, ” barely any acceptable results”). You find this in many fields. For example, in genetics, if 30 of 30,000 genes cause a disease, then R is .001 and PPV is .02.

Yet, we imagine significance tests pulling out of just such situations relationships we wouldn’t otherwise notice. We’ve deceived ourselves. Although, a followup study in a redefined field with only previously significant relationships has a much better PPV.

This is similar to the problem of random numbers published in the back of books — random for the individual, but not random if we observe results when many people use the same random number table.

Dears,

Funny!

Some of you have “teetered on the brink” of the main point about Null Hypothesis Tests of Significance in the Absence of a Loss Function, which is, as has been known since Edgeworth and Gossett (“Student”) and Neyman and the younger Pearson, that any level is not the same as importance. Fit is not the same as oomph. It just isn’t, unless you have some way of translating probability space into consequence space. You might want to read, slowly, McCloskey and Ziliak, The Cult of Statistical Significance (University Of Michigan Press, 2008).

Deirdre N McCloskey

Pingback: Long time no blog…! | maria r. andersen

Pingback: PSA: p-Values are Thresholds, Not Approximations |

A lot of these are as ludicrous as “almost not pregnant” or “nearly a virgin”

Pingback: Friday AM Reads | The Big Picture

The problem comes from the need to spin 0.06 into a positive result for your research question.

The answer comes from publishing negative/non-significant results, but we all know journals don’t do this.

The full solution comes from publishing your results as a Data Note at e.g. GigaScience or Scientific Data. By releasing the data in a curated, peer reviewed, curated and citable manner, you increase the chance of citations for your non-significant result because your data is still useful for method development, meta-analyses, increasing the numbers [of controls at least] in other studies etc. The publication is about your methods of data collection, not your specific research question – although you will record the non-significance in your publication.

Love it! Am sharing with student network. Personally am looking forward to time out to learn R

Reblogged this on In the Dark and commented:

I just couldn’t resist reblogging this post because of the wonderful list of meaningless convoluted phrases people when they don’t get a “statistically significant” result. I particularly like

“a robust trend toward significance”

It’s scary to think that these were all taken from peer-reviewed scientific journals…

Reblogged this on fluffysciences and commented:

There has been a lack of posting lately, mostly because a very busy marking schedule has caught up with me.

So I hope you will enjoy this link to ‘Probable Error’, which has spent a very likely significant amount of time rounding up all the ways scientists use to describe P values which aren’t anywhere near significant at all.

Given the paper I’m currently reviewing reporting a tendency of P=0.07, I was highly amused!

See also:

http://xkcd.com/1478/

Pingback: Nerdcore › Von P-Hacking, Clickbait-Bullshit und Schoko-Diäten

Pingback: Fabulous Finds II | Spatialists

Pingback: The language of insignificance | Management Briefs

I think we should just state the significance of the p-value as 0.049 or 0.051 and let the reader make a decision of how much risk the reader wants to assume.

Reblogged this on Sciception and commented:

Just when I was discussing significant p-values at work!Someone was insisting that values slightly above 0.05 are still believably significant..and then I found the exact terms used in this very list!

Pingback: Links & misc #4 | Hypermagical Ultraomnipotence

Why don’t authors ever claim that P=0.049 was approaching “non-significance”?

Ah, the p-value… very useful if you are tossing a coin.

Pingback: Bookmarks for October 14th | Chris's Digital Detritus

Funny!!

Thanks for making my online search for levels of significance surprisingly amusing.

Pingback: Does your data 'hover on the brink of significance?' - an insignificant, but hilarious detour

Reblogged this on Chaos Theory and Pharmacology and commented:

HT:

This is precisely why “studies” are overemphasized as the number one “evidence” but can be easily skewed in interpretation. This is precisely why individual lived experience with what is being tested is more valuable. We need to hear more lived experience stories, precisely what it felt like, whether it helped, if anything went wrong, and if they’d recommend it to someone they loved. You can’t replicate any of that in a “study.” Yet these studies are the tools that are shaping medicine. Funny, the ones in power, the decision-makers, rarely ask the guinea pigs, nor do they want to hear these very real stories.

I haven’t read all the comments, but found this whole post embarrassing, though some of the comments seem to understand the point I’m about to make.

As some have pointed out the threshold of statistical significance is arbitrary. ARBITRARY – capiche?

So as the p value rises from 0.000001 it becomes less statistically significant – that’s all you can say.

Beyond that you need to use the most rigorous reasoning you can in the context you are in, consider the risks you are taking both to act on the knowledge you have or not to act. Sometimes – very rarely – the threshold between acting and not acting on the knowledge you have might occur at or around p = 0.05.

The rest of the time, if you proceeded according to what I’ve set out, which ought to be the most basic commonsense, especially amongst those who have been through years of training and practice in these matters, you’ll come up with a different number.

I think you’ve missed the point.

In all of these papers the authors adopted a threshold of 0.05: yes, it’s an arbitrary threshold but that’s the one they chose. Having declared that they would accept/reject based on a threshold of 0.05, the authors then fail to report a “non-significant” result as such.

That’s all this post is about – it’s not about how you choose the significance threshold, nor whether a threshold for a binary decision is how it should be done.

In short, it’s not the argument you think it is.

“Having declared that they would accept/reject based on a threshold of 0.05, the authors then fail to report a “non-significant” result as such.”

I don’t think that’s quite right is it? Perhaps that should read “Having chosen to become academic researchers and submit journals the referees of which tend to impose arbitrary thresholds on tests, the academics sought to verbally justify (usually) minor excesses of the threshold.

That’s simultaneously quite a harsh accusation (academics knowingly publish incorrect interpretations of their analyses) and quite a generous one (their obfuscation is merely drawing attention to the arbitrariness of the threshold).

They don’t publish “knowingly incorrect” interpretations, they find ways of putting it that sound good. They lack candour – candour being the first casualty of bureaucracy.

But “they find ways of putting it that sound good” is the point of the post – so I don’t understand why you found it embarrassing.

If p < .05 is significant, then is .05 < p < .10 your significant other?

Just built a Shiny app based on the data in this blog post, have fun 🙂

Reblogged this on Dr Geoff Kushnick and commented:

Great list of ways to refer to “close to, but not really, significant results.” Given how much P values can jump around using samples from the same population,my suggestion is to give the actual P value and talk about the effect size. No need to describe the P value itself.

Here’s Ziliac’s depressing latest on the state of affairs.

Significance Controversy in the Past

This is not the first time in history that statistical significance has been on trial. “Significance” was only a partial argument from odds from the beginning, Francis Ysidro Edgeworth (1885, p. 208), who coined the term, clearly perceived. Galton and Pearson saw in the test more security than they might have. But by 1905 Student himself—that is William Sealy Gosset aka “Student”, the inventor of Student’s t, and eventual Head Brewer of Guinness—warned in a letter to Karl Pearson about economic and other substantive losses that can be caused by following a bright line rule of statistical significance:

Student’s rejection of a bright-line accept-reject standard was echoed a few years on by Harvard psychologist Edwin Boring (1919), warning about the difference between substantive and merely statistical significance in psychological research. Yet mindless tests and uses of statistical significance raged on, heedless of warnings from its eminent discoverers.

I’m finishing an essay on this subject (soon hopefully coming out as a self-published book) in an attempt to introduce students to these concepts. I used some examples on this page and it was a delight to find.

I also came across a paper by Wood et al. called “Trap of trends to statistical significance: likelihood of near significant P value becoming more significant with extra data” Published in BMJ. The paper is very relevant to the discussions surrounding p-value and this page.

One of their point (directly quoted) is as follows: “Describing near significant P values as “trends towards significance”’ (or similar) is not just inappropriate but actively misleading, as such P values would be quite likely to become less significant if extra data were collected”

On the other hand they also mention that p-values around 0.05 show modest degrees of evidence, no matter which side of the threshold they fall. Which leads them to say calling (for example) p=0.06 an “interesting hint” may be a good choice.

I also became aware of technical discussions between Bayesians and Frequentists on the subject. I’m no statistician but I hope to learn more about this subject since it is very interesting, not to mention the impact that it can have on inferential statistics and scientific reporting.

Pingback: Flirting with conventional levels of significance, p = .06 | Motor behaviour

Pingback: First Post – Approaching Significance

Pingback: Still not significant – NRIN

At the risk of embarrassing myself, I personally do two things: I report exact p values to three decimal places, and I do use terms like “[marginally] significant”. My purpose in the former is to avoid using levels of significance, which can obscure the pattern of results; my purpose in the latter is to provide useful information to the reader about the pattern of results. Note that a strict “significant/not significant” dichotomy would lump results such as p = .234 with those such as p = .051, which in my view would usually be misleading. Terminology such as significant, highly significant, and marginally significant, when used in the absence of alpha levels or other specific levels of significance, return to their original colloquial, non-technical meaning (meaningful, remarkable, etc.).

Pingback: 5 Tips For Avoiding P-Value Potholes | Absolutely Maybe

Pingback: 5 Tips For Avoiding P-Value Potholes | PLOS Blogs Network

I believe this site has got some very great info for everyone :D. “This is an age in which one cannot find common sense without a search warrant.” by George Will.

Thank you for putting this together!!! I’m using it right now!

I don’t want to defend any of these 500 phrases, but I have an impression why researchers are so apt to do so. It is about one thing not really mentioned here: the 2nd-order error. If you got a p-value of 0.051 it is intuitively clear to most scientists that they are just about to get prone to this kind of error. So they try to find a phrase that mirrors their intuition, namely that a difference, though not significant, is still quite probable. Since it was a real near-miss, those guys wantes to make clear with (inapt) words that they did not find “no difference”, but simply could not show a difference (though having in mind that stating a difference would still be correct in 18 of 20 cases).

Le’s say, a certain chemical could not be shown to cause cancer by a p-value of 0.051. I think nobody of the people around here would swallow that stuff anyway because intuition tells them it would not be quite a good idea. I think a near-miss p-value has to be discussed in this respect, though maybe with other and more adequate words.

When being as strict as most people here, always keep in mind the 2nd-oder error! Negative consequences of a medical study might be more severe when not naming and discussing a near-miss and just sticking to “significant” or “not significant”. The world is not black and white.

Pingback: Analyzing Accupedo step count data in R: Part 2 – Adding weather data – Mubashir Qasim

Pingback: Does your data ‘hover on the brink of significance?’ – an insignificant, but hilarious detour – Do Ya Know