Originally published in Techdirt.
The open access movement believes that academic publications should be freely available to all, not least because most of the research is paid for by the public purse. Open access supporters see the high cost of many academic journals, whose subscriptions often run into thousands of dollars per year, as unsustainable for cash-strapped libraries, and unaffordable for researchers in emerging economies. The high profit margins of leading academic publishers – typically 30-40%1 – seem even more outrageous when you take into account the fact that publishers get almost everything done for free. They don’t pay the authors of the papers they publish, and rely on the unpaid efforts of public-spirited academics to carry out crucial editorial functions like choosing and reviewing submissions.
Academic publishers justify their high prices and fat profit margins by claiming that they “add value” as papers progress through the publication process. Although many have wondered whether that is really true – does a bit of sub-editing and design really justify the ever-rising subscription costs? — hard evidence has been lacking that could be used to challenge the publishers’ narrative. A paper from researchers at the University of California and Los Alamos Laboratory is particularly relevant here. It appeared first on arXiv.org in 2016 (pdf), but has only just been “officially” published2 (paywall). It does something really obvious but also extremely valuable: it takes around 12,000 academic papers as they were originally released in their preprint form, and compares them in detail with the final version that appears in the professional journals, sometimes years later, as the paper’s own history demonstrates. The results are unequivocal:
We apply five different similarity measures to individual extracted sections from the articles’ full text contents and analyze their results. We have shown that, within the boundaries of our corpus, there are no significant differences in aggregate between pre-prints and their corresponding final published versions. In addition, the vast majority of pre-prints (90%-95%) are published by the open access pre-print service first and later by a commercial publisher.
That is, for the papers considered, which were taken from the arXiv.org preprint repository, and compared with the final versions that appeared, mostly in journals published by Elsevier, there were rarely any important additions. That applies to titles, abstracts and the main body of the articles. The five metrics applied looked at letter-by-letter changes between the two versions, as well as more subtle semantic differences. All five agreed that the publishers made almost no changes to the initial preprint, which nearly always appeared before the published version, minimizing the possibility that the preprint merely reflected the edited version.
The authors of the paper point out a number of ways in which their research could be improved and extended. For example, the reference section of papers before and after editing was not compared, so it is possible that academic publishers add more value in this section; the researchers plan to investigate this aspect. Similarly, since the arXiv.org papers are heavily slanted towards physics, mathematics, statistics, and computer science, further work will look at articles from other fields, such as economics and biology.
Such caveats aside, this is an important result that has not received the attention it deserves. It provides hard evidence of something that many have long felt: that academic publishers add almost nothing during the process of disseminating research in their high-profile products. The implications are that libraries should not be paying for expensive subscriptions to academic journals, but simply providing access to the equivalent preprints, which offer almost identical texts free of charge, and that researchers should concentrate on preprints, and forget about journals. Of course, that means that academic institutions must do the same when it comes to evaluating the publications of scholars applying for posts.
If it was felt that more user-friendly formats were needed than the somewhat austere preprints, it would be enough for funding organizations to pay third-party design companies to take the preprint texts as-is, and simply reformat them in a more attractive way. Given the relatively straightforward skills required, the costs of doing so would be far less than paying high page charges, which is the main model used to fund so-called “gold” open access journals, as opposed to the “green” open access based on preprints freely available from repositories.
In theory, gold open access offers “better” quality texts than green open access, which supposedly justifies the higher cost of the former. What the research shows is that when it comes to academic publishing, as in many other spheres, all that glitters is not gold: humble preprints turn out to be almost identical to the articles later published in big-name journals, but available sooner, and much more cheaply.
- Larivière, V., Haustein, S., & Mongeon, P. (2015). The oligopoly of academic publishers in the digital era. PloS one, 10(6), e0127502. ↩
- Grappone, T., Farb, S. E., Broadwell, P., & Klein, M. (2016, June). Comparing published scientific journal articles to their pre-print versions. In Digital Libraries (JCDL), 2016 IEEE/ACM Joint Conference on (pp. 153-162). IEEE. ↩