Semantic Scholar
Open Research Corpus Dataset
CORD-19 Dataset
Semantic Scholar Open Research Corpus
Semantic Scholar's records for research papers published in all fields provided as an easy-to-use JSON archive.
If you are interested in one-off, request-based data, please see our RESTful API.
Download full, sampled, and archived versions of the corpus.
Example Paper Record
{ "id": "4cd223df721b722b1c40689caa52932a41fcc223", "title": "Knowledge-rich, computer-assisted composition of Chinese couplets", "paperAbstract": "Recent research effort in poem composition has focused on the use of automatic language generation...", "entities": [ ], "fieldsOfStudy": [ "Computer Science" ], "s2Url": "", "pdfUrls": [ "" ], "s2PdfUrl": "", "authors": [ { "name": "John Lee", "ids": [ "3362353" ] }, "..." ], "inCitations": [ "c789e333fdbb963883a0b5c96c648bf36b8cd242" ], "outCitations": [ "abe213ed63c426a089bdf4329597137751dbb3a0", "..." ], "year": 2016, "venue": "DSH", "journalName": "DSH", "journalVolume": "31", "journalPages": "152-163", "sources": [ "DBLP" ], "doi": "10.1093/llc/fqu052", "doiUrl": "", "pmid": "", "magId": "2050850752" }
Attribute Definitions
id  string
S2 generated research paper ID.
title  string
Research paper title.
paperAbstract  string
Extracted abstract of the paper.
entities  list
Extracted entities (deprecated on 2019-09-17)
s2Url  string
URL to S2 research paper details page.
pdfUrls  list
URLs related to this PDF scraped from the web.
s2PdfUrl  string
Usable PDF Url (deprecated on 2020-05-27)
authors  list
List of authors with an S2 generated author ID and name.
inCitations  list
List of S2 paper IDs which cited this paper.
outCitations  list
List of S2 paper IDs which this paper cited.
fieldsOfStudy  list
Zero or more fields of study this paper addresses.
year  int
Year this paper was published as integer.
venue  string
Extracted publication venue for this paper.
journalName  string
Name of the journal that published this paper.
journalVolume  string
The volume of the journal where this paper was published.
journalPages  string
The pages of the journal where this paper was published.
sources  list
Identifies papers sourced from DBLP or Medline​.
doi  string
Digital Object Identifier registered at
doiUrl  string
DOI link for registered objects.
pmid  string
Unique identifier used by PubMed.
magId  string
Unique identifier used by Microsoft Academic Graph.

Semantic Scholar Open Research Corpus is licensed under ODC-BY​.
When using the Semantic Scholar Open Research Corpus (“S2 ORC”) in a product or service, or including data in a redistribution, please cite the following paper:
Waleed Ammar et al. 2018. Construction of the Literature Graph in Semantic Scholar. NAACL
This site is provided by The Allen Institute for Artificial Intelligence (“AI2”) as a service to the research community. The site is covered by AI2 Terms of Use and Privacy Policy. AI2 does not claim ownership of any materials on this site unless specifically identified. AI2 does not exercise editorial control over the contents of this site. AI2 respects the intellectual property rights of others. If you believe your copyright or trademark is being infringed by something on this site, please follow the "DMCA Notice" process set out in the Terms of Use.
BibTex format:
{"@inproceedings{ammar:18,"} {"title={Construction of the Literature Graph in Semantic Scholar},"} {"author={Waleed Ammar and Dirk Groeneveld and Chandra Bhagavatula and Iz Beltagy and Miles Crawford and Doug Downey"} {" and Jason Dunkelberger and Ahmed Elgohary and Sergey Feldman and Vu Ha and Rodney Kinney"} {" and Sebastian Kohlmeier and Kyle Lo and Tyler Murray and Hsu-Han Ooi and Matthew Peters and Joanna Power"} {" and Sam Skjonsberg and Lucy Lu Wang and Chris Wilhelm and Zheng Yuan and Madeleine van Zuylen and Oren Etzioni},"} {"booktitle={NAACL},"} {"year={2018},"} {"url={}"}
Provided by Semantic Scholar • Built by AI2
Terms & Conditions | Privacy Policy