Applications Programming Interface

An applications programming interface (API) is provided for programatically searching the DfR database and retrieving data.

Searching

Searching the DfR database is accomplished using the Search and Retrieval by URL (SRU) protocol. SRU is a RESTful protocol designed to query remote indexes. A RESTful service basically means that requests are submitted as simple URL strings and results are returned in XML. The SRU protocol consists of three primary functions or "operations". The first, explain, provides a mechanism to auto-discover the type of content and capabilities of an SRU server. The second, scan, provides a mechanism for browsing an index, much like perusing the back-of-a-book index. The third, searchRetrieve, provides the means for sending a query to the server and getting back a response. SRU is a streamlined implementation of the venerable Z39.50 protocol redesigned for the Web. At present the DfR site does not provide support the scan operation. Additional information about the SRU protocol can be found on the Library of Congress web site .

SRU utilizes CQL (Contextual Query Language), a standard syntax for representing queries. A formal definition of CQL can also be found on the Library of Congress web site.

Searchable fields

  • dc.creator - Article author(s)
  • dc.date - Searches the articles date of publication, format="YYYY-MM-YYT00:00:00Z"
  • dc.description - Searches article abstract, if available
  • dc.identifier - Article UID
  • dc.language - Searches language field using a ISO 639-2 3 digit language code
  • dc.publisher - Refer to Publisher facet on DfR Explore page for publisher names
  • dc.subject - Searches auto-extracted keywords associated with article
  • dc.title - Article title
  • jstor.articletype - JSTOR article type, recognized values are "research-article", "book-review", "misc", "news", and "editorial"
  • jstor.discipline - Refer to Discipline facet on DfR Explore page for discipline names
  • jstor.journaltitle - Refer to Journal facet facet on DfR Explore page for journal titles
  • jstor.text - Searches everything

Sample queries

The DfR SRU service is found at http://dfr.jstor.org/sru. Pointing a browser at this address will produce a web form for defining CQL queries. This form is generated by applying an xsl stylesheet the results of the explain operation. Submitting this form envokes the SRU searchRetrieve operation using the specified parameters. Query results are returned in the searchRetrieve response which is formatted for web presentation using another xsl stylesheet.

Interacting with the SRU service via a browser is a great way to experiment with SRU and CQL. However, the real value of an API is realized in its integration with a larger application. The SRU protocol is web based and is therefore programming language neutral. It can be accessed using java, python, ruby, perl, virtually any language capable of communicating over HTTP. A python client is provided as an example.

Data Retrieval

The data returned by an SRU searchRetrieve query contains basic bibliographic data including a unique document identifier, the document title, author names, publisher, and date of publication. The DfR service supports a rich set of data representations (such as word counts and ngrams) that would be difficult to directly incorporate into an SRU reponse. To retrieve these data formats a data retrieval operation is provided by the DfR API. The request path is http://dfr.jstor.org/resource/<Resource ID> and takes a single optional parameter (view), the name of the resource view requested. The resource id is obtained from the SRU query response.

For example, the following request would retrieve bigrams for the article with the id '10.2307/20000002':

    http://dfr.jstor.org/resource/10.2307/20000002?view=bigrams

Views

Views currently supported by DfR include:

  • wordcount
  • bigrams
  • trigrams
  • quadgrams
  • keyterms
  • references