The recent increase in Open Access (OA) policies has brought forth important questions concerning the effect these policies have on the practice of publishing Open Access. In particular, is there evidence to support that mandating OA increases the proportion of OA outputs (in other words, do authors comply with relevant policies)? Furthermore, does mandating OA reduce the time from acceptance to the public availability of research outputs, and can compliance with OA mandates be effectively tracked? This work studies compliance with the UK REF 2021 Open Access policy. We use data from CrossRef and from CORE to create a dataset containing 1.6 million publications. We show that after the introduction of the UK OA policy, the proportion of OA research outputs in the UK has increased significantly, and the time lag between the acceptance of a publication and its Open Access availability has decreased, although there are significant differences in compliance between different repositories. We have developed a tool that can be used to assess publications' compliance with the policy based on a list of DOIs.
Do Authors Deposit on Time? Tracking Open Access Policy Compliance
1. Do Authors Deposit on Time?
Tracking Open Access Policy
Compliance
Drahomira Herrmannova
Nancy Pontika
Petr Knoth
June 4, 2019 – JCDL 2019, Urbana-Champaign, IL
Big Scientific Data and Text Analytics Group
Knowledge Media Institute, The Open University
3. Introduction
• Why we want Open Access (OA)
• Taxpayers should be able to read publicly funded research
• Help researchers at poorer institutions without access to subscriptions
• Institutions suffer from rising journal subscription prices
• Funders introduce policies to encourage OA
• Notable examples:
• U.S. Public Access Plan
• U.S. NIH Public Access Policy
• UK REF 2021 Open Access Policy
• EC H2020 Open Access Policy
2/22
4. Growing number of OA policies
Source: http://roarmap.eprints.org/
Currently close to 1
thousand funder and
institutional OA policies
3/22
5. OA policies
• Provide criteria for making papers OA
• Requirements, such as:
• Where should papers be made available (publication or deposit)
• When should papers be deposited
• What version should be deposited (e.g. pre-print vs. post-print)
• Allowed embargo periods
• Etc.
4/22
7. Research questions
• Piwowar et al. (2018): At least 28% of all research papers are
OA
• Lariviere and Sugimoto (2018): More than two thirds of papers
from selected funders (with an OA policy) were OA
• Gargouri et al. (2012): OA growth often due to retroactive self-
archiving (often years after publication)
6/22
10. Deposit time lag
• What is deposit time lag?
• The difference between date of publication and date of deposit in a
repository expressed in days
• We study deposit time lag across
• Country
• Time
• Repository
• Discipline
7/22
15. Deposit time lag calculation
• Deposit time lag = deposit
date – publication date
• The difference was expressed
in days
• Positive values: article
deposited after publication
• Negative values: article
deposited prior to publication
• Best: as low value as possible
9/22
16. Dataset
• 2013-2018 publications
• Metadata from Crossref and CORE
Publications 808,984
Repositories 728
Countries 70
Final dataset size Year of publication distribution
10/22
18. Results: Deposit time lag per country/year
• How has deposit time lag changed over time?
• Average deposit time lag per year of publication
?
12/22
20. Results: Deposit time lag per country/year
• Two options:
1. Use all data
2011 2012 2013 2014 2015 2016 2017 2018
Yearly deposits – toy example
2013 publications 2017 publications
13/22
21. Results: Deposit time lag per country/year
• Two options:
1. Use all data
2011 2012 2013 2014 2015 2016 2017 2018
Yearly deposits – toy example
2013 publications 2017 publications
Data for 2013 publications
13/22
22. Results: Deposit time lag per country/year
• Two options:
1. Use all data
2011 2012 2013 2014 2015 2016 2017 2018
Yearly deposits – toy example
2013 publications 2017 publications
Data for 2017 publications …?
13/22
23. Results: Deposit time lag per country/year
• Two options:
1. Use all data
• Underestimates deposit time lag
for all, but especially for newer
publications
2011 2012 2013 2014 2015 2016 2017 2018
Yearly deposits – toy example
2013 publications 2017 publications
13/22
24. Results: Deposit time lag per country/year
• Two options:
1. Use all data
• Underestimates deposit time lag
for all, but especially for newer
publications
2. Put a maximum limit on
deposit time lag for the
analysis (for comparability)
• E.g. deposit at most a year later
2011 2012 2013 2014 2015 2016 2017 2018
Yearly deposits – toy example
2013 publications 2017 publications
Data for 2013 publications
13/22
25. Results: Deposit time lag per country/year
• Two options:
1. Use all data
• Underestimates deposit time lag
for all, but especially for newer
publications
2. Put a maximum limit on
deposit time lag for the
analysis (for comparability)
• E.g. deposit at most a year later
2011 2012 2013 2014 2015 2016 2017 2018
Yearly deposits – toy example
2013 publications 2017 publications
Data for 2017 publications
13/22
26. Results: Deposit time lag per country/year
• Two options:
1. Use all data
• Underestimates deposit time lag
for all, but especially for newer
publications
2. Put a maximum limit on
deposit time lag for the
analysis (for comparability)
• E.g. deposit at most a year later
• Underestimates deposit time lag
for all, but especially for older
publications
2011 2012 2013 2014 2015 2016 2017 2018
Yearly deposits – toy example
2013 publications 2017 publications
13/22
27. Results: Deposit time lag per year/country
Option 1: All data Option 2: Max deposit time lag limit (1 yr)
14/22
29. REF 2021 OA Policy
• In 2014, the UK introduced an OA Policy for its next research
assessment exercise (REF)
• Requirements
• Deposit final manuscript in an OA repository
• Deposit on publication/acceptance or within 3 months from it
• Papers published since April 2016
• Sanction
• The OA requirement is linked to performance review
• Did the introduction of this mandatory policy affect deposit time lag
in the UK compared to other countries?
16/22
30. Single vs any repository deposit time lag
1. Single repository deposit time lag
• Deposit time lag with respect to the publications’ deposit date in a given
repository
2. Any repository deposit time lag
• Deposit time lag with respect to the publications’ deposit date in any
repository
Repository 1 Repository 2
05/2017 09/2017
Single repository deposit
time lag for Repository 1 =
05/2017 – publication date
Any repository deposit time
lag for Repository 1 =
min(05/2017, 09/2017) –
publication date
17/22
31. Results: UK REF compliance per year
Any repository deposit time lag
18/22
32. Results: Deposit time lag per repository
Full lines: Single repository deposit time lag
Dashed lines: Any repository deposit time lag 19/22
33. Results: Deposit time lag per year/country
Option 1: All data Option 2: Max deposit time lag limit
2014: UK introduces REF 2021 OA policy 20/22
34. Discussion
• Study assumption: if metadata deposited, then the full text is also
deposited
• Validation of full text deposits complicated due to the way the OAI-PMH
works
21/22
35. Discussion
• Study assumption: if metadata deposited, then the full text is also
deposited
• Validation of full text deposits complicated due to the way the OAI-PMH
works
• Our study excludes publications that were never deposited
• To quantify missing deposits we would have to correctly match all CORE
publications to their Crossref metadata
• Focus on deposit time lag rather than the proportion of missing deposits
21/22
36. Discussion
• Study assumption: if metadata deposited, then the full text is also
deposited
• Validation of full text deposits complicated due to the way the OAI-PMH
works
• Our study excludes publications that were never deposited
• To quantify missing deposits we would have to correctly match all CORE
publications to their Crossref metadata
• Focus on deposit time lag rather than the proportion of missing deposits
• Matching between Crossref and CORE was done using metadata
(titles, authors, publication years)
• Strict approach, results in high accuracy (~95.27%) but lower recall
21/22
37. Conclusions
• Time between publication and deposit has decreased significantly in
the 2013-2017 period globally
• By 472 days per country on average across all countries in our dataset
22/22
38. Conclusions
• Time between publication and deposit has decreased significantly in
the 2013-2017 period globally
• By 472 days per country on average across all countries in our dataset
• After introduction of the UK REF 2021 OA Policy this decrease in the
UK has accelerated
• As of early 2018, UK publications are deposited immediately upon
publication or even slightly before
22/22
39. Conclusions
• Time between publication and deposit has decreased significantly in
the 2013-2017 period globally
• By 472 days per country on average across all countries in our dataset
• After introduction of the UK REF 2021 OA Policy this decrease in the
UK has accelerated
• As of early 2018, UK publications are deposited immediately upon
publication or even slightly before
• Key messages:
• Our observations support the argument for the inclusion of time limited
deposit requirement in OA policies
• Institutional practices an important role in supporting OA policy adoption
22/22
Many reasons why OA is important and why it is wanted, such as…
Many funders starting to introduce policies to encourage OA
Many people in the room probably are or have been subject to some OA policies
I listed some you might have heard of
Many reasons why OA is important and why it is wanted, such as…
Many funders starting to introduce policies to encourage OA
Many people in the room probably are or have been subject to some OA policies
I listed some you might have heard of
Source: Registry of Open Access Repository Mandates and Policies (ROARMAP)
What matters in this figure is that each bar represents how many policies in the database in one quarter
Rightmost bar – first quarter of 2019
Currently close to 1 thousand funder and institutional OA policies
Provide criteria
Requirements typically include
I’ve shown there are many policies and the numbers keep growing
Many people tried to answer this question, for example this study from 2018 has shown that…
Researchers require access to recent literature
OA is not enough, we need fast OA
Is there evidence to show that if there is a deadline authors deposit sooner?
To answer this question we use something we call deposit time lag
We define it as …
We study deposit time lag (that’s what I’ll show in this presentation) across…
* Explanation of how the data we need gets created and how we collect it
Negative value – for example if the author deposited a pre-print before review, or the post-print after review but still before publication
Best case scenario is if the value is as low as possible, we want zero or negative value so that everybody has early access
There is a drop in 2018 simply because of when we collected the data – we have data until May 2018
Overall deposit time lag for top 5 countries with most publications
Each bar = one week
Publications at zero were deposited within the same week as they were published – we don’t care when they were published here, just when deposited
Significant differences between countries, some have many early deposits, some have many late deposits
We wanted to know how has deposit time lag changed over time, if it’s improving or not
We wanted to calculate average deposit time lag per year of publication
To do this we have two options, both have some limitations that I’ll explain because they affect how we can interpret the results
First options is to use all data
To explain the limitations I have this mock example – not real data
Explain what bars represent
To calculate average deposit time lag per year of publication
The problem is that because of where we are right now for 2017 publications we don’t know about these late deposits yet
We know about some from 2018, but we don’t know how many will be deposited in the future
This approach still underestimates, because now we’re not including late deposits at all
Especially underestimates for older papers because we know from the data late deposits are becoming less common over time
Because we are not aware of any better method that would alleviate the limitations of both approaches at the same time, we use these approaches in conjunction
Every point in both figures represents average deposit time lag for papers published in a given year
This point tells us papers published in 2013 were deposited on average around 700 days later
Numbers on the right lower because if you remember we have removed very late deposits
We can see that deposit time lag is decreasing for all countries, especially since 2015-16
Different countries have seen different speed of decrease
Looks optimistic, similar pattern in all countries
Are there differences in subjects rather than countries?
Compare publications from two years, explain figure
We can see that in 2013 average deposit time lag was quite large for all subjects except physics and math
All subjects have seen significant decrease
Computer science third best in 2017, might be due to depositing pre-prints becoming common practice in recent years, looks like it’s having an effect
Key message – differences between subjects are substantial but not dramatic
Are the differences elsewhere and can they be driven by policies?
Chose REF 2021 OA policy as a case study because it because it has includes a timeframe for deposit
Introduce the policy
It requires …
Explain example
Any repository deposit time lag – looking at everywhere the paper was deposited, taking the first deposit date and using that to analyse the repository we are looking at
We do it this way because once a publication is in an OA repository, it’s already OA, so doing this we want to see if aggregating data from all repositories helps to get access faster
Compliance with the REF OA policy across all UK publications
Compliance is increasing
Results per repository, each point on each line is one repository, value represents proportion of publications in that repository that are compliant with the REF OA policy
Explain the rest of figure
Key message – significant differences between institutions, institutional policies matter (more than subject)
“This is not a game of medicine vs physics or mathematics, this is a game of institutional policies and practices.”
Some institutions make sure it happens and some don’t, It’s in the hands of institutions
Deposit time lag per country, point out where UK is
REF OA policy introduced in 2014, it looks like in 2014 there was some change in behavior and deposit time lag starts decreasing and it has been continuously decreasing since then