Semantic Scholar
API
Open Research Corpus
Supp.ai Dataset
CORD-19 Dataset
Contact
Download Instructions
The papers are provided as JSON objects, one per line. Archives are partitioned in batches and shared as a collection of gzipped files.
Note: Entities have been deprecated and the most recent release they are available for is the 2019-01-31 release.
A license agreement is provided. By downloading this data you acknowledge that you have read and agreed to all the terms in this license.
To download the partitioned archives, you have two choices:
The preferred method for download is to use the AWS CLI to download directly from S3:
aws s3 cp --no-sign-request --recursive s3://ai2-s2-research-public/open-corpus/2021-09-01/ destinationPath
Alternatively, you can download the manifest via http, and use it to download all archive files via http as well. For example, using wget:
wget https://s3-us-west-2.amazonaws.com/ai2-s2-research-public/open-corpus/2021-09-01/manifest.txt wget -B https://s3-us-west-2.amazonaws.com/ai2-s2-research-public/open-corpus/2021-09-01/ -i manifest.txt
Most Recent Release
2021-09-01
Sample
S3 URL
Manifest

Previous Releases
2021-08-01SampleS3 URLManifest
2021-07-08SampleS3 URLManifest
2021-06-01SampleS3 URLManifest
2021-04-01SampleS3 URLManifest
2021-03-01SampleS3 URLManifest
2021-02-01SampleS3 URLManifest
2021-01-01SampleS3 URLManifest
2020-12-01SampleS3 URLManifest
2020-11-06SampleS3 URLManifest
2020-05-27SampleS3 URLManifest
2020-04-10SampleS3 URLManifest
2020-03-01SampleS3 URLManifest
2020-02-01SampleS3 URLManifest
2020-01-13SampleS3 URLManifest
2020-01-01SampleS3 URLManifest
Provided by Semantic Scholar • Built by AI2
Terms & Conditions | Privacy Policy