Categories
Open research

APIv3: Announcing a new API to access CORE data

Since the start (10 years ago!) CORE’s mission has been to aggregate and facilitate access to open access scientific research at an unprecedented scale to both humans and machines. To achieve this aim, we are always refining and improving our methods for access and use of the CORE data. 

A key consideration in making improvements is that CORE users hail from many different backgrounds and are applying the CORE tools in a variety of use-cases. At the last count we had over 40 broad industry types (including academic research, education, publishing, software and technology companies) applying the CORE tools to their work across the world. Applications of CORE tools and data are growing and constantly changing. 

To meet the needs of our diverse user-base, CORE offers four levels of access to enable a wide variety of use cases:

  • CORE Dataset: A periodically generated snapshot of the CORE data. 
  • CORE FastSync: A continuously updated data feed.
  • CORE Search: A search interface allowing users to search and browse through the content.  
  • CORE API: An API for machine-readable access. 

We are now delighted to bring our users further developments to the CORE services and announce the new CORE APIv3. The full details of the new API can be found in the documentation and you can register for a key in this URL

Below is  a quick overview of the new features in APIv3:

A new way to look at CORE’s data

CORE data mirrors the content that the various data providers expose and how these different groups interpret the multiple standards, working on making the data homogeneous and searchable. It’s also common practice for researchers from multiple affiliations to submit their papers to their own institutional and disciplinary repositories. This means that it is common for CORE to mirror two (or more) records representing the same paper. 

In the new API we decided to take a step forward in the way that CORE exposes data and as such, we are introducing Work. Work is an entity that represents all the knowledge that CORE holds about a particular piece of research. Work will include, and link to, one or more whole outputs provided by the different data providers.

Work not only offers the possibility of deduplicating research objects, it also allows us to use the best possible knowledge about a single piece of research. In short, a Work is the total representation of a research work in CORE. Work contains links to the “Outputs” from the data providers and enrichments collected using additional sources already included in the CORE pipeline. 

We want Work to represent a research article and be a gateway to all other possible references and enrichments for the same item available online.

We further redesigned the way we offer CORE data. Thus, in our new API, you will be able to access the following entities:

  • Journals – all the journals available in CORE – the primary data sources for this data are DOAJ and Crossref.
  • Data providers – all the sources from which CORE aggregates data. Most of the data providers have an OAI-PMH endpoint and are either repositories (institutional or disciplinary) or journals. The list further includes customized sources, such as arXiv, Crossref, PubMed, and special collections for Open Access resources from publishers. The list is constantly growing as we discover more and more sources.
  • Outputs – corresponds to the records exposed by the data providers. The documents are then processed and integrated with full text (if available), while additional information from the full text enriches the metadata records exposed by the data providers.
  • Work – a deduplicated entity that represents one or more outputs. It is the most complete version of a research work (based on CORE knowledge). 

All entities are searchable and accessible from all the services of the new CORE API. 

A fully searchable dataset

It’s easy to get lost when navigating through 200 million records with differing data quality. During the APIv2 years, we listened and helped thousands of users shape their queries in the best way. We have now put the feedback we received in the new query language for searching the API. The following is an example of how a potential query could look:

(title: coronavirus OR covid) AND year >= 2020 AND _exists_:fullText

The example will search for the CORE papers: 

  • including “coronavirus” in the title 
  • or “covid” anywhere in the text
  • published before 2020
  • including a full text extracted

The query above will search for “covid” anywhere in the article, while searching for “coronavirus” only within the title. All the details about creating search queries are available in the documentation section

Easier access to larger datasets

At CORE we like to do things at scale and are privileged to be able to implement large scale solutions with such a large variety over millions of articles. CORE’s existing services support independent access to the dataset; the CORE Dataset gives users access to the complete CORE dataset from a certain point in time. Additionally, with CORE FastSync users can keep up to date with the latest records from CORE.

For smaller datasets, the best access route is to use our paginated query and download the CORE Dataset by moving the limit and offset parameters in the search query. However, in instances where the full CORE Dataset is too big and the paginated queries are too small a middle ground is required. 

We are happy to introduce a new method supported by our APIs, which allows users to run larger queries. These types of queries have a bigger impact on CORE’s systems, so we are keeping an eye on any impact and will limit the number of queries of this type that can be performed in a short timeframe. 

For the technical specifications and the instructions on how to perform these queries, please head over to the related page

A more accessible user experience

We reviewed our management of API keys, and have created four ‘profiles’ for users who receive different levels of service and support depending on their requirements. This also helps us to be more in touch with our users, understand their use-cases and how we can better facilitate their usage of CORE. The four new profiles are:

  • Individual – free use for the general public. Any query can be performed, although to keep our servers snappy, we limit the number of requests that can be performed in a short amount of time (especially if they are particularly large). 
  • Researcher/Community – for users who have an exciting research project that will help the community. As a result, users can perform more queries than with an individual license and you will have access to our newly created CORE Researcher Community for extra support and collaboration opportunities (when you use CORE tools in your research, remember to cite us!). 
  • Institutional – for users who are working on behalf of an academic institution and are developing new tools and implementations using CORE data. We give you the full support of our research and development teams and provide a tailored rate limit based on your needs.
  •  Enterprise – for organisations looking to purchase (or have already purchased) a license to develop a product, service or software using CORE data. We give you a fully supported product with unlimited queries and a direct line for technical support and our developers will help you shape your use of the CORE data in the best way. Talk to our partnership team now to learn more. 

APIv3: inclusive and open for all

There is a lot of background knowledge involved when using an API, so one would need to know a bit of programming and understand some basic concepts of interaction. As such, alongside refreshing the technical specifications, we also decided to take a step forward and create a series of tools and services to help you navigate through our API and get you started as quickly as possible.

The main tools that we are introducing to support this are:

  • A new Researcher Community – we have created a new Researcher Community for CORE users where people can collaborate, discuss and share their ideas and research. Our community members will have access to tips, sharing knowledge with other researchers and first-hand access to discovering more exciting ways of applying  the CORE data. If you’re interested in joining the research community, we want to hear from you. 
  • Updated documentation – we have upgraded our API documentation to make it more accessible and easy to understand.
  • An API gallery – this will showcase all the exciting uses for the API and share some reproducible code to get you started. We’re looking forward to your ideas and making our list grow. For example, one can see how to collect identifiers for research outputs in a handy CSV, or discover how to perform some bibliometric analysis of the coronavirus pandemic publications.
  • A feedback form – so we can hear your opinions, bug reports, ideas and suggestions on how to move the CORE API and CORE forward.

The new CORE API is ready for you to use – happy searching! 

The CORE Team

Share and Enjoy !

Shares

Leave a Reply

Your email address will not be published. Required fields are marked *