An applications programming interface (API) is provided for programatically searching the DfR database and retrieving data.
Searching the DfR database is accomplished using the Search and Retrieval by URL (SRU) protocol. SRU is a RESTful protocol designed to query remote indexes. A RESTful service basically means that requests are submitted as simple URL strings and results are returned in XML. The SRU protocol consists of three primary functions or "operations". The first, explain, provides a mechanism to auto-discover the type of content and capabilities of an SRU server. The second, scan, provides a mechanism for browsing an index, much like perusing the back-of-a-book index. The third, searchRetrieve, provides the means for sending a query to the server and getting back a response. SRU is a streamlined implementation of the venerable Z39.50 protocol redesigned for the Web. At present the DfR site does not provide support the scan operation. Additional information about the SRU protocol can be found on the Library of Congress web site .
SRU utilizes CQL (Contextual Query Language), a standard syntax for representing queries. A formal definition of CQL can also be found on the Library of Congress web site.
The DfR SRU service is found at http://dfr.jstor.org/sru. Pointing a browser at this address will produce a web form for defining CQL queries. This form is generated by applying an xsl stylesheet the results of the explain operation. Submitting this form envokes the SRU searchRetrieve operation using the specified parameters. Query results are returned in the searchRetrieve response which is formatted for web presentation using another xsl stylesheet.
Interacting with the SRU service via a browser is a great way to experiment with SRU and CQL. However, the real value of an API is realized in its integration with a larger application. The SRU protocol is web based and is therefore programming language neutral. It can be accessed using java, python, ruby, perl, virtually any language capable of communicating over HTTP. A python client is provided as an example.
The data returned by an SRU searchRetrieve query contains basic bibliographic data including a unique document identifier, the document title, author names, publisher, and date of publication. The DfR service supports a rich set of data representations (such as word counts and ngrams) that would be difficult to directly incorporate into an SRU reponse. To retrieve these data formats a data retrieval operation is provided by the DfR API. The request path is http://dfr.jstor.org/resource/<Resource ID> and takes a single optional parameter (view), the name of the resource view requested. The resource id is obtained from the SRU query response.
For example, the following request would retrieve bigrams for the article with the id '10.2307/20000002':
http://dfr.jstor.org/resource/10.2307/20000002?view=bigrams
Views currently supported by DfR include: