05 Mar 2016 - 30 Nov 2020
All Blogs
On Seams, Seamlessness, and Methodology
February 25, 2016
Roger C. Schonfeld
Digital scholarship and data management
Discovery and access
Research practices
Data mining
text mining
Earlier this month, I encountered a thought-provoking talk by Tim Sherratt making the very strong argument that seamlessness should not be our only goal in designing digital library systems. The talk is a year old but it is well worth reading today. I thank Donna Lanclos for tweeting about it recently.
I have argued strongly that we need to reduce the barriers to the use of e-resources for the academic community (I’ve shared some of my thinking in both an issue brief and a presentation to a major group of publishers and platform providers). Library, publisher, and vendor systems introduce far too many stumbling blocks for no good reason, impeding research and driving users away from scholarly sources. Discovery and access has improved tremendously over the past two decades but it is also fundamentally broken today.
At the same time, there are places where efforts to create a simple user experience will lead to a misleading outcome or a methodological flaw. Sherratt’s piece offers a lovely illustration of this, presenting an illustration of the dramatic apparent spike in newspapers published in Victoria, Australia during World War I, as recorded in Trove. As he points out, this is not real but simply an artifact of libraries that “have chosen to invest in the digitisation of newspapers from the World War I period.”
Or, to take another example, when conducting computational analysis against large digital corpora, when is it possible to describe the dataset—both its contents and the methods of its creation – sufficiently to assess any skew in the findings? The “culturonomics” effort, to take just one example, described its underlying Google Books source, but is it reasonable to draw conclusions about the English language globally based on 4% of the books published in that language drawn from just 40 university libraries? Is there skew or is the corpus representative? How would we even know? Google’s ngram viewer provides a sophisticated set of search tools but not information on the “seams” of the dataset, such as how it was assembled and what bias may have been introduced as a result.
In reflecting on where seamlessness and simplicity should be foregrounded and where an acknowledgment of complexity and methodology should be encouraged, I would propose a set of principles for discussion.
Seamlessness is desirable in discovery and access for reading and research purposes. Of course, discovery services that are made available through a library should offer a transparent indication of their sources and contents, so that librarians can understand these tools and advise their user communities—a principle that NISO’s Open Discovery Initiative adopted as recommended practice 3.3.1. While researchers may not typically need access to this information themselves, it can be made available should the issue arise. And even in the absence of such full disclosure, a discovery corpus such as Google Books can provide a major advance in discovery for historians and other researchers. Artificial stumbling blocks should be systematically removed to meet researchers’ desire to discover and access content as readily as possible.
Increasingly we build and use corpora not only for discovery and access, but also for text mining, data mining, and other forms of computational analysis. This includes not only Google’s ngram viewer, but also the HathiTrust Research Center, JSTOR’s Data for Research, and Elsevier’s API, among others. When a digital corpus is made available for use as a primary source, it should provide as much information as possible to help researchers assess its suitability for the particular research project to be undertaken. This builds upon the need to describe what is missing from a collection before it is contributed to an archive (see 3.1.5) and is appropriate given that archival collections are almost by definition primary sources.
Seams like these take time and cost money for archivists and digital collection managers, but they are essential to the work of scholars. Seamlessness in discovery and access also takes time and adds costs for libraries, publishers, and vendors, but it saves the time and improves the experience of scholars. Resources are not unlimited and tradeoffs are frequently made. Still, both seams and seamlessness can be worthy and valuable.
Email Updates
Sign-up and we’ll send you updates about our news and publications.
Leave a Reply
Your email address will not be published. Required fields are marked *
Name *
Email *
Roger C. Schonfeld
Director, Libraries and Scholarly Communication Program

Libraries & Scholarly Communication
Issue Brief
Meeting Researchers Where They Start
Streamlining Access to Scholarly Resources
March 26, 2015
Roger C. Schonfeld
Ithaka S+R
Related Publication
Information Literacy and Research Practices
Nancy Fried Foster
November 13, 2014
Related Publication
Supporting the Changing Research Practices of Art Historians
Roger C. Schonfeld, Matthew P. Long
April 30, 2014
Related Publication
Supporting the Changing Research Practices of Historians
Roger C. Schonfeld, Jennifer Rutner
December 7, 2012
Related Publication
Does Discovery Still Happen in the Library?
Roger C. Schonfeld
September 24, 2014
Related Publication
Supporting the Changing Research Practices of Chemists
Matthew P. Long, Roger C. Schonfeld
February 25, 2013
Related Blog Posts
The Consistency of Data
Liam Sweeney
November 2, 2015
Defining Institutional Boundaries
Roger C. Schonfeld
May 4, 2015
The Role of a Society Journal in a Changing Environment
Roger C. Schonfeld
March 16, 2015
The Vital Need to Link Discovery and Access
Roger C. Schonfeld
April 17, 2015
New Report: Supporting the Changing Research Practices of Art Historians
Matthew P. Long, Roger C. Schonfeld
April 29, 2014
Work With Us
Email Updates
Sign-up and we’ll send you updates about our news and publications.
Photo Credits | Privacy policy
©2004-2016 ITHAKA
Ithaka S+R is a not-for-profit service that helps the academic community navigate economic and technological change. We deliver strategic guidance, research, and publications through two program areas: Educational Transformation and Libraries & Scholarly Communication.
2 Rector Street, 18th Floor New York, NY 10006
212.500.2355 info@sr.ithaka.org
Ithaka S+R is part of ITHAKA.
Email UpdatesCommunityContactAboutIthaka S+R HomePublicationsWork With UsPeopleEventsBlogAboutContactCommunityEmail Updates