Data for Research

Welcome to JSTOR's Data for Research (DfR) service.

The DfR service is provided by JSTOR for use by the research community. It provides a set of web-based tools for selecting and interacting with content from the JSTOR archive. The service also provides the ability to obtain data sets via bulk downloads or using a REST API.

Features provided by the site include:

  • Full-text and fielded searching of the entire JSTOR archive using a powerful faceted search interface. Using this interface one can quickly and easily define content of interest through an iterative process of searching and results filtering.
  • Online viewing of document-level data including word frequencies, citations, key terms, and ngrams.
  • Request and download datasets containing word frequencies, citations, key terms, or ngrams associated with the content selected.
  • API for content selection and retrieval.

New in this release

Data for Research beta #3 was released on May 27th, 2009. The new features provided in this release include:

  • The addition of references for search and export. The JSTOR archive contains nearly 35 million parsed citations from approximately 1.25 million articles. The DfR Explore tool now supports searching and filtering on reference text as well as tools for viewing reference patterns over time. The "References Profile" tab on the Explore screen provides graphs depicting the average number and age of references per article by year for all documents in a search result set. The age of a reference refers to the difference between the publication data of the citing work and the publication date of the referenced works. For instance, if an article was published in 1990 and references a work from 1980, the age of the reference is 10 years. Full-text searching can now be performed on the approximately 35 million references in the archive. Searching reference text is accomplished by selecting the "in References" option in the 'Field' selector in the search box. As an example, searching on the term "new york times" using the "in References" option returns hits for over 58,000 articles that reference the New York Times.
  • The addition of new facets for searching and filtering. The search interface has been enhanced by the addition of facets for 'Publisher', 'Reviewed Work', 'Reviewed Author', and 'Has References'. The 'Reviewed Work' and 'Reviewed Author' facets provide greater visibility into the nearly 1.6 million review articles in the JSTOR archive. Using this facet one finds that the archive has over 250 reviews of works authored by William Shakespeare. The combination of the 'Reviewed Author' and 'Reviewed Work' facets shows that JSTOR has 19 reviews of Shakespeare's 'Hamlet'. The 'Has References' facet provides a convenient way of filtering content based on whether they have parsed references or not.
  • More options for search results sorting. In addition to relevance, publication date, and dataset order, search results may now be sorted using references count, references average age, and the number of times that an article has been cited by other articles in the JSTOR archive.
  • Downloadable chart data. The DfR tool provides a number of charts that visually summarize the contents of results sets. These include charts that depict the distibution of documents by discipline and year of publication, as well as the new charts provided in this version for parsed references. The data used to build these charts can now be downloaded as Excel-compatible CSV files.
  • Application Programming Interface (API) support. In addition to the bulk download option supported in earlier versions, the DfR site now provides the means to search and download data programatically using a REST-based API. The API is based on the SRU (Search and Retrieve via URL) and CQL (Context Query Language) standards. More information on the API can be found here.
  • Disabling of email notifications. By default, email notifications are generated when a dataset request has been completed. While this can be useful, some users have expressed an interest in being able to disable this feature. The site now provides an option in a users account preferences to turn this feature off.
  • Performance improvements. In addition to the visual changes to the site, the underlying infrastructure has been largely reimplemented using a new framework resulting in a noticable improvement in response times for most operations.

About JSTOR

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship.

As of May 2009, JSTOR contains nearly 5 million journal articles covering a broad range of disciplines.

Data for Research has been developed by JSTOR's Advanced Technology Research (ATR) group. The Advanced Technology Research Group is dedicated to discovering and using relevant technologies in support of JSTOR and the broader scholarly community.

To view other projects, built by ATR and the greater scholarly community, please visit us at http://showcase.jstor.org.