I Tweeted this graph last week. I had been messing around with density plots in R and it seemed a neat illustration of the phrase ‘marginally significant’ being used to mean ‘nearly, but not actually, significant’: the frequency of the phrase is rare below 0.05, peaks at p=0.06, then declines sharply before another peak at p=0.1.

The ensuing discussion highlighted a couple of good points: (a) where did the data come from? and (b) it should have been a histogram.

Where did the data come from? A very cursory search of Google Scholar for the phrase “marginally significant (p=x), where x is 0.01, 0.02..0.15 in steps of 0.01, which is probably good enough for a quick Tweet, but not enough for sustained discussion.

Should it have been a histogram? Yes, if only because the peaks misrepresent the data: there are no intermediate values between, say, 0.05 and 0.06.

So I re-did the Google Scholar search. This time I looked for statements of the form “marginally significant (p=x)” where x is every synonym of 0.001,0.002..0.200 in steps of 0.001. So, for example, p=0.01 might be in the format 0.01, 0.010, .01, .010.

Here are the data:

And here is the histogram:

It’s still not perfect, since the search misses examples if the p-value isn’t cited directly after the phrase. But until automated searches on Google Scholar are possible, it’s probably the best I can do for now.

