November 27, 2015 | Rachael Lammey, Product Manager, CrossRef

Getting to grips with Crossref Similarity Check

Introducing Crossref

The Crossref Similarity Check service is available to publishers who are members of Crossref, a not-for-profit organization for scholarly publishing that works on tools to improve research communication. For example, when Taylor & Francis assign Digital Object Identifiers (DOIs) to journal articles, they do so by depositing the authoritative article information with Crossref. By taking direction from the scholarly community, Crossref services extend beyond DOI assignment. This is why Crossref Similarity Check, designed to help publishers, journal editors and staff actively engage in efforts to prevent plagiarism, was launched in 2008.

The Crossref Similarity Check service consists of two parts: an increasingly comprehensive database of full-text content to screen documents against, and an interface where manuscripts can be uploaded and checked against this database, and a report produced.

The Crossref database

To be able to effectively screen scholarly content for potential issues, it needs to be checked against other academic content which can sit with many different publishers and providers. To try to enable this, when a Crossref member publisher starts to participate in the Crossref service, they allow the full text of their journals to be indexed in the Crossref database so that they and other publishers can check against them. The service also checks the manuscripts uploaded to it against a growing repository of online and offline content, including databases from Gale and EBSCO, and sites such as PubMed and arXiv. A manuscript will also be checked against web content - over eight billion web pages have been indexed, with an archive of web content going back around eight years. You can find more detailed information here.

The publishers that currently participate in Crossref are listed on the Crossref website, and this figure is growing quickly over time. In October 2012, 359 publishers were Crossref members, but by October 2015 this number has risen to nearly 700. This makes the Crossref database more useful as publisher content is added to it as new members sign up. Currently there are over 132,000 titles from participating publishers (books, journals and conference proceedings) in the database, accounting for over 43 million individual items like articles, chapters etc., that have DOIs assigned to them. Crossref works with publishers to get this process set up and to maintain it, to make sure the indexing process is as thorough as possible.

Screening manuscripts

Crossref partners with a company called Turnitin to provide the iThenticate system to Crossref. The iThenticate system is what many people refer to as Crossref Similarity Check - the interface where manuscripts are uploaded that generates the similarity reports, showing where text in the paper uploaded matches other text in the database.

There is a lot to say about the similarity reports, and training and help on interpreting these is available via publishers, Crossref and Turnitin. However, there is one key point and one note of advice that are worth flagging up. The key point is that the ‘similarity score’ i.e., the percentage match that iThenticate displays is always worth looking at in more detail. A 20% match doesn’t mean a paper is 20% plagiarism - what that means is that 20% of the paper you have uploaded is matching to other sources in the Crossref database. However, what that 20% is made up of and why it matches is more important. Does it contain a 15% match to a single source from another journal that isn’t referenced, or is it lots of smaller matches to content that is properly referenced or uses set terminology/phrasing? It is always worth looking at the reports in detail to try to ascertain the intent of the author.

The note of advice would be in terms of using the iThenticate tool. With such a large database of content it can be difficult to focus in on the parts of a paper that you think may be problematic. There are settings that can be employed to remove the bibliography/references from the matches that the report finds, matches to specific sources that aren’t of interest can be excluded (or specific URLs), and small matches can be left out, all of which can remove the ‘background noise’ that can otherwise make the reports very information-heavy.

Using the service

Most publishers who use Crossref Similarity Check integrate iThenticate with their manuscript tracking systems so that the reports can be run as part of the peer-review lifecycle of the paper. Publishers and individual journals also take a range of approaches as to when in the review process they run the checks - some run everything upon submission, some run everything upon acceptance, or some just check the papers that arouse concerns (and a whole range of approaches in between). Again, there is no right or wrong answer, it depends on the best fit for the journal.

Integrations like those with manuscript tracking systems, as well as growth in the number of participating publishers and journals, mean that over 270,000 manuscripts are being uploaded to iThenticate every month for screening. This shows that both publishers and individual journals are taking responsibility for the content that they publish and adding value through the peer-review process. At Crossref, we want to support members using the service, through training, documentation and continued improvements to the iThenticate service, so please stay tuned.

