The Internet and Unrefereed Scholarly Publishing
For: Annual Review of Information Science and Technology (ARIST), –(volume 38)
Blaise Cronin and Debora Shaw (Eds.)
Draft V 7.1R ( 12,700 words)
February 24, 2003
In the early 1990s, much of the enthusiasm for the use of electronic media to enhance scholarly communication focused on electronic journals, especially electronic-only (pure) e-journals1
(see for example, Peek and Newby's (1996) anthology). Much of the systematic research about the use of electronic media to enhance scholarly communication also focused on electronic journals. However, by the late 1990s, numerous scientific publishers transformed their paper journals (p-journals) into paper and electronic journals (p-e journals) and sold them via subscription models that did not provide the significant costs savings, speed of access and breadth of audience that pure e-journal advocates had expected (Okerson, 1996).
In 2001, some senior life scientists led a campaign to have publishers make online access to their journals freely available after six months (Russo, 2001). The campaign leaders, using the name "Public Library of Science" asked scientists to boycott journals that did not comply with their demands for open access to online articles after six months. While the proposal was discussed in scientific magazines and conferences, it did not seem to influence any journal publishers to comply (Young, 2002). Most productive scientists, who work for major universities and research institutes that have adequate to excellent scientific journal collections would have little incentive to boycott top journals, such as Science, the Journal of Biological Chemistry, Proceedings of the National Academy of Sciences, or the New England Journal of Medicine.
Some of the major improvements in the speed and openness of scholarly communication via the Internet are most likely to come from outside of the peer-reviewed journal system. In this chapter, the term “unrefereed manuscript” refers to a manuscript that has not yet been accepted for publication through peer review2
. The “unrefereed manuscript” may not have yet been submitted to a peer reviewed venue, may be under review at a peer reviewed venue, or may have been rejected from one peer reviewed venue and has not yet been accepted for another peer-reviewed venue.
Some enthusiasts for using electronic media such as Internet forums to enhance scholarly communication, emphasized the value of scholars' exchanging research manuscripts prior to their being accepted for publication in peer reviewed venues, such as journals or conferences (Harnad, 1999; Halpern, 2000). By the late 1990s, the discussions among e-publishing enthusiasts had shifted from a primary focus on e-journals to also include repositories of research e-scripts.
The literatures of scholarly electronic communication rest on some key terms that various authors use with subtle but important differences in their meanings. These terms include:
- Publication:can range from one-day posting on a Web site, to appearing in print in a large circulation prestigious peer-reviewed scientific journal)
- Preprint: can range from any article that a scholar circulates for comment, to an article that has been submitted to a journal, accepted for publication, and that has not yet been formally published
- E-print: an electronic version of a manuscript, used as an equivalent to an electronic preprint.
Unfortunately, these differing conceptions of publication and preprints in all of the literatures seem to sow considerable confusions and ambiguity about the questions raised, issues addressed, claims made, and answers provided. This chapter will examine two major ways of organizing e-script collections, and some of the research about e-script publishing practices.
Conceptions of Scholarly Publishing and Scholarly Communication
Scholarly publishing and scholarly communication are often used interchangeably, as their meanings are similar, but they have some distinguishing differences. Formal publication is often based on the assumption that an article will be read, but it is possible that the article will not attract attention and that the communication process will cease. Formal journal policies that prohibit submissions of articles that have been previously published, assume that an author’s entire intended audience has read them. In practice, many scholarly articles are read by only a small fraction of their potential audiences and publishing may be primarily a one-way process.
Scholarly publishing is one formal part of scholarly communication, and serves as a basis for scholarly evaluation. Scholars and academic programs are often reviewed, in part, based on the quality and quantity of their research published in journals; the quality of journals are often assessed by the "impact factors," measured by citation analyses.
Scholarly communication can be described informally as a two-way process consisting of its communicators and its content. Communication involves ‘receivers’ and ‘senders’. Communicators can take on roles such as authors and readers or speakers and listeners. Content may vary from pure scholarly content (research, teaching) to supporting activities like conference organizing, journal editing, etc., although the content must be related to academic activities. Authors, readers, editors, publishers, academic associations, and librarians are all participants in the process.
When scholarly communication is discussed, the scholarly community is often mistakenly treated as a homogenous unit, without consideration of the differences in the practices among different fields. These disciplinary differences are, however, readily visible in the traditional model of scholarly communication and are reported and emphasized in some of the research that are reviewed here.
Kling and McKim (1999) developed an analytical publishing framework which is based on the idea that publication is a multidimensional continuum. They observe that when a scholarly document is effectively published within a scholarly community, it seems to satisfy three criteria: publicity, trustworthiness, and accessibility. They described their three criteria as follows:
Publicity – the document has to be announced to scholars so that they may learn about its existence. Publicity can be represented by a continuum of activities like subscriptions, reports lists, abstract databases, and citation.
Trustworthiness – the document has been subject to a social process that assures readers that the content of the document satisfies the norms of quality accepted by the community. Trustworthiness is typically marked by peer review process, social status of the journal, and publishing house quality, but less formally may also be based on the author's reputation and institutional affiliation.
Accessibility – readers must able to access the document in a stable manner over time. Libraries, publishers and clearinghouses typically assure accessibility, by distributing and storing the documents.
This framework analyzes the publishing process from a social perspective, and emphasizes its communicative role. Kling and McKim developed their framework to help answer questions about whether any article that is posted on an Internet site should be considered to have been published. They analyzed different types of postings, such as articles that are posted on the author’s personal Web site and articles that are posted in the technical report series of well known academic departments, and show how they differ in their publicity, trustworthiness and accessibility --near the time of posting as well as five years after their original posting. They also examine a number of paper publishing practices. They show that from a behavioral perspective, publishing is a continuum rather than binary (yes/no) and that the relationship between electronic publishing and paper publishing is relatively complex. The Kling and McKim publishing framework will be used throughout this chapter. Kling and McKim's conceptualization of publishing as a continuum influenced a recent proposal to define the ends of the continuum. In 1999-2000 an International Working Group (Report: Excerpts from 'Defining and certifying electronic publication in science.' 2000; Frankel, Elliott, Blume, Bourgois, Hugenholz, Lindquist et al., 2000) was invited by the International Association of STM Publishers3 to clarify some of the confusions about nomenclature that is confounding the discussions of electronic publishing. This International Working Group proposed a distinction between the "first publication" of a work, and a (possibly subsequent) “definitive publication." They write:
The crucial fixed point, in our view, remains the final published version of an article after peer review (or any future equivalent). We have called this the Definitive Publication and believe that it should be clearly identified as such. In the electronic environment, certain other characteristics are also required in addition to peer review:
- It must be publicly available.
- The relevant community must be made aware of its existence.
- A system for long-term access and retrieval must be in place (e.g. Handle).
- It must not be changed (technical protection and/or certification are desirable).
- It must not be removed (unless legally unavoidable).
- It must be unambiguously identified (e.g. by a SICI or DOI).
- It must have a bibliographic record (metadata) containing certain minimal information.
- Archiving and long-term preservation must be provided for.
This is the version to which citations, secondary services and so forth should ideally point. However, we recognize (sic) that earlier versions of an author's work may be made available, and that in some disciplines these are already being cited by other authors. Such early versions might be all that is available to an author for citation at the time of submission of the author's work. However, versions which are not durably recorded in some form, or which do not have a mechanism for continuing location and access, or which are altered over time (without due provision for version control, as outlined below), should not be regarded as 'publications' in the sense that publication has been defined here, even if cited by an author.
The International Working group refers to these possibly multiple early versions as a singular "first publication." This is not completely satisfactory, since a "first publication" should refer to a unique document, rather than a ghost trail of unidentified revisions. If an author refers to a definitive publication as 'a publication," what label(s) should be used to characterize a first publication? I will examine this nomenclature in the next section.Research Manuscripts and Preprints
Even in the paper-only world publishing was a continuum. The famous Garvey-Griffith (Garvey, 1979) publishing model, based on careful empirical studies of research communications in the field of psychology, treats the appearance of an article in printed conference proceedings or in a journal as the only forms of communication that warrant the label "publication." Although they were not explicit, they use the term "publication" to refer to the International Working Group's conception of a definitive publication. In many fields, scholars circulate "first publications" -- informally to colleagues, or more formally as publications in a series of working papers, technical reports, occasional papers, or research memoranda.
While many scholars believe that the trajectory of publication described by Garvey and Griffith fits many fields, there are important variations in sequence and nomenclature across disciplines. For example, MIT's Artificial Intelligence (AI) Lab started its series of research articles, called "AI Memos" in the late 1950s. Some of these AI Memos became conference papers and/or journal articles and/or book chapters. However, some AI Memos remained research manuscripts without subsequent publication in other forums. In the 1960s, the first research-oriented computer science departments often organized paper technical reports series of articles that might subsequently appear in printed conference proceedings and/or in journals. Some of the manuscripts in this series, such as dissertations, were not expected to be published elesewhere in the form that they appear in the series.
When the Stanford Linear Accelerator Laboratory was established in 1962, its first director, W. Panofsky, requested that the library staff collect unpublished research reports in high energy physics (Kreitz, Addis, Galic, & Johnson, 1997; Till, 2001). In the 1970s in the field of economics, several academic departments developed working paper series. This practice became common in other fields such as demography and mathematics. These collections were heterogeneous in their contents. Many of these articles would be subsequently published in printed conference proceedings, journals or as book chapters. Those articles which were variously labelled in different disciplines: (research) manuscripts, technical reports, working papers, were also at some stage arguably preprints if their subsequent publication did not entail substantial revisions. However, some would not be published in any other form, and consequently should not be called preprints at all. What would they be preprints of if they were not subsequently published? Further, if a research memo or technical report was significantly revised duing editorial review, the original version should not be called a preprint.
In 1969, The American Physical Society Division of Particles and Fields and the U.S. Atomic Energy Commission sponsored a community-wide distribution of a weekly list of new research manuscripts received by the Stanford Linear Accelerator (SLAC). This listing was named Preprints in Particles and Fields (PPF)
. PPF listed authors, titles, abstracts and author contact information to enable subscribers to request the full text of an article of interest to them. Hundreds of physicists paid an annual subscription fee to receive PPF weekly by airmail (Till, 2001; Addis, 2002).4
Not all of the manuscripts that are listed in PPF are published afterward. This leaves open the question exactly what are these subsequently unpublished research manuscripts to be considered as preprints of?
These differences in the nomenclature for research articles i.e., preprints by high-energy physicists and manuscripts, technical reports (or working papers) by others continues today. Unfortunately, some of this terminological diversity clouds the discussions of alternative ways to organize Internet forums to support scholarly communication. It is amplified by the terms used by some advocates of more open exchanges of research articles via Internet forums, such as Stevan Harnad (1998), who often refers to "unrefereed preprints."
Harnad's discussion "unrefereed preprints" is generally misleading. If he changed his nomenclature to "unrefereed research reports," "unrefereed technical reports," "unrefereed research manuscripts," or similar terms, his enthusiastic arguments for enabling scholars share these documents to would be much more lucid.
In the Garvey-Griffith publishing model, preprints are distributed when an article has been submitted to a journal, and also has been accepted for publication. The preprint precedes a formally published printed version. Before an article is accepted for publication in a specific venue, it is not a preprint. It may be referred to as a manuscript, a research memorandum (or research memo), a working paper, a technical report, or an occassional paper. I believe that this linguistic usage should be retained, even though the term preprint is often casually used to refer to articles in any of these categories.
Consider the unusual case in which a scholar writes an article, submits it to a journal, and has it both accepted for publication and finally published with no changes (including copyediting and updating references). A copy of the article in the scholar's file starts out as a research memorandum (or working paper or technical report) on the day that she submits it to the journal for publication. When it is accepted for publication, with no changes, its status changed to that of a preprint (i.e., a preprint of a forthcoming definitive publication). That is, it spawned a copy of itself that would appear as a definitive publication in the journal. When the journal issue that included the article was published, it became a reprint of that definitive publication.
It is more common for the authors of articles submitted to journals to be asked to make some changes requested by peer reviewers and editors, or to initiate some changes on their own. In the social sciences, where many of the most prestigious journals accept less than 20% of the articles that are submitted for review, many authors will submit their rejected articles to other journals. This practice is common in the natural sciences as well. Of course, some articles are never accepted for publication. These articles do not merit the label preprint in any stage before there is a clear relationship to the article that will be accepted for definitive publication in a conference proceedings, journal or book. As an article travels through a peer review process, value is added to it by a combination of the editorial work that can lead to major or minor changes, as well as by the "peer-reviewed" status that is bestowed upon it by the conference or journal.
The Oxford English Dictionary
(Oxford English Dictionary, 2nd Edition [Electronic version]1996) defines a preprint as "something printed in advance; a portion of a work printed and issued before the publication of the whole." High-energy physicists gave their research manuscripts a status boost by referring to them as preprints before they were submitted for and accepted for publication. For example, according to its official description, "Recently, fewer than 40% of submitted papers have been finally accepted for publication in Physical Review Letters (PRL).
" It should not surprise us if many of the research manuscripts that are listed on PPF and that were originally submitted to PRL
were not accepted for publication in PRL.
Perhaps many of these manuscripts rejected by PRL
would be accepted elsewhere, but few of the manuscripts listed on PPF are guaranteed to be preprints of any specific publication when they are first listed.
Unfortunately, physicists have casually used the term preprint to refer to research manuscripts whose publication status is similar to articles that are called research manuscripts, working papers and technical reports in other fields. For example, the “PREPRINT Network” at Oak Ridge National Laboratories defines the documents that it helps readers to obtain in these terms:
preprints, or 'e-prints,' are manuscripts that have not yet been published, but may have been reviewed and accepted; submitted for publication; or intended for publication and being circulated for comment.
The PREPRINT Network is a valuable service in the physical sciences; but its definition of preprint is so elastic that it can refer to any manuscript, even one that is only posted on an author's personal Web site, and not subsequently published anywhere else.
In this article, I will try to use terminology to describe research documents that can work across many disciplines:
Manuscript – Manuscript is the primary candidate for labeling documents that authors circulate prior to their acceptance for publication. The term manuscript is still widely used by journal editors to refer to documents that are to be submitted and/or are under review.5 I will use the term manuscript to refer to documents that have not yet been accepted for publication in a specific venue as well as to documents that have been published in an institutionally sponsored venue, such as a working paper series or an online server for research documents, such as arXiv.org. Electronic versions may be called e-scripts.Preprint - I believe that the term preprint should be used in a strict sense to refer to articles that have been accepted for a specific venue. Preprint refers to a relationship between two documents, rather than as a feature of a document in isolation. The first of these two documents is what the International Working Group refereed to as a “first publication,” and the second document is what it characterized as a “definitive publication.” I will use the terms preprint and e-print conservatively -- to refer to manuscripts in the form in which they are likely to appear in a conference proceedings, journal or book (whether in printed form, electronic form, or both). E-print, which some scientists use to refer to electronic manuscripts, plays off of its resonance with preprints, and I believe that e-prints should refer to electronic versions of preprints.
Article- The common term “article” can implicitly refer to a publication venue. The Oxford English Dictionary defines an article as “a literary composition forming materially part of a journal, magazine, encyclopædia, or other collection, but treating a specific topic distinctly and independently.” I will use the term article in a broader way to refer to any document that fits the OED’s definition, or that is in a form that could fit the OED’s definition if it were published.
The International Working Group carefully avoided calling preprints, as used by high-energy physicists, a "definitive publication." In short, many of today’s "preprint networks" and "preprint servers" should be called "e-script networks" and "e-script servers." These services may include some preprints and even definitive publications in their corpuses. However, their defining characteristic is to make research manuscripts available rapidly, and usually inexpensively, to readers, not to publish actual “preprints”.E-script Nomenclature
In the Research Manuscripts and Preprints section I discussed nomenclatures for articles that have not been accepted for publication in a specific venue. Scholars employ a variety of labels to refer to these documents: manuscripts, drafts, working papers, research reports, technical reports, and research manuscripts. I criticized the use of the term preprint to characterize these kinds of documents, and proposed to call them all research manuscripts (or e-scripts).
My usage is contrary to the growing convention of referring to all of these documents as preprints (or e-prints). However, the elastic extension of the term preprint to refer to any memo that an author releases for discussion or review for publication blurs categories that most scholars treat as fundamentally different: documents that have been accepted for publication in a specific venue and those that have not (yet) been accepted, or that have been reviewed and rejected (and will possibly never be published in a different venue). The coy use of the term e-print to refer to electronic manuscripts borrows its semantics from preprint, and suffers from the same limitations.Paper Precursors of E-script Repositories
Many people believe that the systematic exchange and publication of unrefereed research manuscripts began with the Internet and even more specifically with Paul Ginsparg's development of an e-script server at Los Alamos National Labs in 1991. Tomaiuolo and Packer (2000:53), for example, claim that "Paul Ginsparg ... developed the first preprint archive in August 1991."
Scholars in a number of fields had developed semi-formal processes for exchanging paper
manuscripts. I have not found a complete history of these practices. MIT's Research Laboratory of Electronics (RLE), its oldest and largest interdisciplinary research laboratory, has been issuing paper-based technical reports since it was founded in 1946.6
Some academic departments had established paper manuscript publication series by the 1960s. These were more common in fields such as artificial intelligence, computer science, economics, demography, linguistics, and high-energy physics. The variety of fields in which such series were common increased somewhat through the 1970s and 1980s, to include some new fields such as information systems. However, these practices of paper manuscript publishing did not sweep across academia. The majority of academic departments and schools in research universities do not sponsor their own research manuscript series.
In the mid to late 1990s, many of the academic units that had published manuscript series in paper began to publish their new manuscripts as e-scripts on their Web sites. In some cases, they also "backfilled" their publication list with some e-scripts that predated their shift to the period when they began routinely publishing e-scripts.
I call the institutionally organized publishing strategy a "Guild Publishing Model." (Kling, Spector & McKim, 2002). It is a generalization of the practice of publishing manuscript series that are referred to as working papers, technical reports, research reports, and occasional papers that are sponsored by academic departments or research institutes. The term guild was chosen from among a number of synonyms for groups or associations that had very restricted membership based upon common topical interests. Like any term that is chosen for metaphorical purposes, it's imperfect and is not meant to indicate that an academic department or research institute shares all of the features of traditional guilds. The key feature of guilds is that potential members are screened through some kind of careful "career review".. However, individual publications for the guild's series are not strictly reviewed. Rather, the author’s entry into the guild is carefully reviewed, and their manuscripts may be lightly (or not at all) reviewed before posting in the series.
The Guild Publishing Model's signature character is that publications in its series may be authored only by "guild members" -- those who are formally affiliated with the academic department, school or research institute that published the series. Usually, the manuscripts are published in the series before they are accepted for formal publication in conferences, journals, or books. The conventions about retaining articles in these series after they appear in print vary by field. It appears common in particle physics and computer science to retain articles in the series, even when they are published in conference proceedings or in a journal. In other fields, such as demography, only the abstracts and citations to the more formally reviewed work are retained online.
Two examples, from economics and physics may be helpful to make the Guild Publishing Model more vivid. Economics research is read by economists and non-economists. The field of economics has a history of sharing research manuscripts; economists realize that it benefits them to have a wide readership, including high-level policy-makers. The likelihood of continued support (and increased research funding) is increased with public knowledge of their contributions. The Berkeley Roundtable on the International Economy (BRIE) is a research institute that comprises faculty at the University of California at Berkeley and selected members from a few other elite universities. BRIE publishes its work in a number of forums including its own series of working and research papers. BRIE publications are free to download. The following passage, from the BRIE working paper series, notes who authors BRIE manuscripts, "All of the papers posted are written by BRIE members -- or are from BRIE conferences" (BRIE, 2000). 
I like showing how high energy physicists publish with a guild model, since they are usually associated with field-wide repositories, such as arXiv.org. In truth, they multipublish in both kinds of repositories. Fermilab is a major experimental particle physics facility that supports over three dozen active collaborations. One major collaboration is DZero, which has its own Web site within the Fermilab Experiments and Projects site. The DZero Web site offers options for selecting published (appeared in print), accepted (accepted for publication), or submitted manuscripts. All of these manuscripts appear to be available online. According to Harry Weerts (1997), who was the top level science manager for the DZero collaboration, the general criterion for determining authorship on any publication is whether that collaborator is a "serious" participant in DZero. Weerts (1997) goes onto describe the criteria for a serious member of DZero:
To become eligible for authorship on a physics publication, a scientist is expected to contribute "significantly" to DZero for one year prior to the submission of that publication. To maintain good standing after the initial year (that is, to remain an active author), all scientists on the experiment are expected to continue to contribute the major fraction of their research time to DZero.
DZero has tightly controlled membership and restricts authorship to guild members, thus high quality research manuscripts are very likely. In addition to publishing e–scripts on its collaboration web site at Fermilab, D0 often publishes a copy of the same e-script at arXiv.org! Many e-scripts from Fermilab are subsequently published in key high-energy physics journals such as Physical Review Letters and Physical Review D. Publication in a key journal is not a guarantee of quality; rather it is one of many quality indicators. Similarly, if an article is published in a lower-ranked journal, it is only one of many indicators of the article's quality..
Another Fermilab collaboration that illustrates strict membership guidelines is BteV. The process of becoming a BTeV collaborator includes discussions with the membership committee, recommendation by the executive committee, and acceptance (by a two-thirds vote) of the full BTeV collaboration. The public portion of the BteV Document Database includes research manuscripts dating from June 1996 through December 2001, as well as a list of some conference proceedings, published manuscripts, abstracts, publication information, talks, figures, photographs, and reference information
Kling, Spector and McKim (2002) discuss the history of Guild Publishing, and suggest that it became common in some fields, such as physics and computer science in the late 1950s (and in paper media). I will discuss the Guild Publishing Model in greater detail later in this section.
In contrast with Guild Publishing, the e-script publishing model that has received the greatest attention, is organized around "disciplinary repositories" or archives with which authors need not have any formal affiliations (Crawford, Hurd and Weller, 1996). In this publishing model, the authors post their articles to an electronic space -- today a Web site -- that is organized for a specific discipline, such as astrophysics, particle physics, mathematics, economics, or linguistics. Advocates of disciplinary repositories usually stress that any author may post articles and they contrast their model with more strictly reviewed peer reviewed journals. In practice, the organizers of these repositories do reserve the right to filter out or to remove e-scripts that they deem to be inappropriate (Krichel and Warner, 2001). They are usually vague about their criteria, stress a willingness to err on the side of expanding communication, and sometimes mention filtering out advertising or e-scripts that are not scholarly in their view. Most significantly, they emphasize that posting e-scripts and reading them should be free of charge to authors and readers. Advocates of the disciplinary repository publishing model usually do not fully address the question of who will pay the operating costs for larger disciplinary repositories, and whether the benign dictatorship of volunteer editors is an adequate model for governing them.
ArXiv.org -- which was organized by physicist Paul Ginsparg around 1991 has been the most visible (or at least most written about) exemplar of a disciplinary e-script repository. It started as an e-script resource for high-energy physics, but has been expanded to include all of physics, as well as mathematics and computer science. I will discuss arXiv.org in more detail below. Here, a key point is that the concept of disciplinary repositories of unrefereed research manuscripts predates the Internet and seems to be anchored in important science library practices of the 1960s.
According to Kreitz, Addis, Galic, and Johnson (1996), the Stanford Linear Accelerator’s (SLAC) librarians began systematically collecting unrefereed research manuscripts from the time that this experimental high energy physics laboratory opened in 1962. Their efforts were mandated by the founding director, W.K.H. Panofsky. However, SLAC's librarians did not usually have to seek research manuscripts; authors (or their research institutes) would send copies of the manuscripts to the SLAC library. The librarians had to organize and index their growing manuscript collection. In 1962, Mme. Luisella Goldschmidt-Clermont, CERN's manuscript librarian was invited to spend a month helping the SLAC librarians "to establish very strong manual systems for obtaining, cataloging, announcing, and discarding (when published)" manuscripts (Addis, 2000).
I believe that the libraries of other major experimental high energy physics facilities, such as DESY (Deutsche Elektronen-SYnchrotron ) and Fermilab, also collected substantial numbers of paper research manuscripts in the 1960s. However, they also published their own series of research manuscripts. By 1970, Fermilab was publishing three series of research manuscripts: Preprints, Technical Memos, and Physics Notes. In our nomenclature, these series of research manuscripts were organized with a Guild Publishing Model. But they helped to set the stage for field-wide repositories in particle physics because they helped facilitate the circulation of research manuscripts that had not yet been peer-reviewed.
Kreitz, Addis, Galic, and Johnson (1996) note that "As some visionary librarians began to acquire, organize, and provide access to the preprint literature, physicists also came to recognize the value of an organized and centralized system of bibliographic control." Their article describes how librarians at the major experimental particle physics laboratories developed a bibliographic infrastructure over a period of years. Earlier I mentioned the publication of Preprints in Particles and Fields that was published in 1969 by SLAC with support from the American Physical Society's Division of Particles and Fields and the U.S. Atomic Energy Commission. These paper-based developments played a critical role in setting the stage for the development and acceptance of Paul Ginsparg's e-script server, arXiv.org by particle physicists.
The Growth of Unrefereed E-Script Publishing
There is no question that e-script publishing exploded in the 1990s. But it exploded selectively -- much more in some fields, such as computer science, mathematics and physics than in fields such as chemistry and psychology. It also exploded in some kinds of venues, such as technical reports series than in venues such as field-wide repositories, such as arXiv.org. Kling and McKim (1999), conceptualize some of differences between disciplines that lead to more or less support for publishing unrefereed e-scripts. Kling, Fortuna, and King (2000), examine some of the political difficulties that bio-scientists encountered in their efforts to extend the field-wide repository model to the publishing of biomedical research.
It would be a gargantuan task to actually try to estimate the number of e-scripts that were published, read, and cited in various fields during each year, commencing, in say, 1990. Part of the difficulty is identifying the majority of active venues, since some venues have quietly ceased operation . There are numerous on-line indexes of e-script servers and disciplinary e-script collections that could help; but the enumerative activity would be a major undertaking that no one seems to have taken on.
Some venues are vast, and their collections are so heterogeneous that it would be hard to know what to count. As an extreme example, consider CERN's description of its "Articles & Preprints collection" notes that it:
aims to cover as far as possible the published and pre-published literature in particle physics and its related technologies. The collection contains something like 400,000 documents, out of which about 50% can be accessed electronically. The documents originate from articles published in journals, preprints, technical reports, conference presentations, scientific committee documents and theses .... The collection starts ... from the mid of the 19th century. The full coverage starts from 1980 onwards. (http://weblib.cern.ch/Home/Library_Catalogue/Articles_and_Preprints/
It would take considerable work to identify which fraction of CERN's 200,000 electronic documents were originally published as e-scripts. While CERN's e-script collection is larger than most; it illustrates the practical challenges of answering some of the relevant research questions.
The following subsections, will briefly examine two major socio-technical architectures for e-script publications: one that is based on e-script series where control is localized to a sponsoring organization (Guild Publishing Model) and a major alternative where repositories are organized for whole research fields or disciplines (Disciplinary Repository Publishing Model).
The following subsections, will briefly examine two major socio-technical architectures for e-script publications: one that is based on e-script series where control is localized to a sponsoring organization (Guild Publishing Model) and a major alternative where repositories are organized for whole research fields or disciplines (Disciplinary Repository Publishing Model).Guild Publishing Models
Kling, Spector and McKim (2002) note that guild publishing series are most common in fields such as artificial intelligence, computer science, mathematics, economics, demography, linguistics, and physics. They note that there are scattered examples in other fields, including political science and policy studies.
The availability of e-scripts seems to have become the norm in many of these fields, and they trace each of these to the prior development of paper-based guild publishing series in these fields. They suggest, however, that the Guild Publishing Model may be adopted in some fields in the next few decades that did not rely upon it prior to the widespread use of the Internet. They argue that Guild Publishing is more likely than disciplinary repositories to be adopted in some fields because it is subject to local adoption and experimentation -- a specific research institute or department can create a working paper or research memorandum series without requiring that other institutes or departments in the same field do so. Prior-publishing restrictions regarding e-scripts varies from field to field; this can dampen or speed adoption of the Guild Model (and posting in field-wide repositories). Further, publication in these series seems to be at an author's discretion, thus enabling experimentation by those organizations and scholars who are affiliated with them who are willing to be "early adopters."
However, the practice of shifting paper research manuscript series into e-scripts is not universal. For example, even at the end of 2001, reports from MIT's Research Laboratory in Electronics seem to be available only in paper form from MIT .
More seriously, I do not see a rapid movement in the adoption of Guild Publishing (or archival repositories) in certain fields, such as biology, medicine, and some of the natural sciences. There does seem to be an incremental expansion of Guild Publishing. Disciplinary Repository Publishing Model
ArXiv.org is the most famous of the centralized disciplinary e-script repositories. According to Merman (1992), the first database, hep-th (for High Energy Physics -- Theory), was started in August of 1991 and was intended for usage by a small subcommunity of less than 200 physicists, then working on a so-called "matrix model" approach to studying string theory and two dimensional gravity. ArXiv.org grew rapidly in scope and size in the 1990s, as it added more physics specializations and expanded to some other disciplines as well.
In at least two cases it superseded two other, more specialized e-script repositories. In the mid-1990s, the American Mathematical Society sponsored an e-script repository for mathematics. By February 1999, the society suspended its operation and endorsed arXiv.org as a repository for mathematical e-scripts. Similarly, the American Physical Society developed an e-script server for research e-scripts in physics in 1996 and closed its server for new submissions as of May 2000. In this same period, the ACM partnered with arXiv.org to develop a topical section for computer science research e-scripts. By February 2003 about 240,000 e-scripts had been posted on arXiv.org.
ArXiv.org has demonstrably played a strong role as a scientific communication service in some areas of physics, mathematics, and astronomy. Unfortunately, it is too common for analysts to overstate its contribution (see, for example, Harnad 1999). The top level index of arXiv.org divides physics into 13 sub areas: four of these relate to high-energy physics and two of them cover nuclear physics.
Youngen (1998) examined the number of citations to articles with arXiv.org identifiers in various physics and astronomy journals, and concluded that the use of e-scripts is the greatest in high-energy (especially particle) physics and astrophysics. More recently, Luce (2001) examined the distribution of e-scripts across the major categories of arXiv.org's structure and found that by December 2000, 37% of the submissions were in the four high-energy physics databases. In contrast, only 4% of the e-scripts were in nuclear physics.7
The general physics category, which is divided into 19 sub areas such as chemical physics, optics and space physics, accounted for only 2% of arXiv.org's e-scripts in Luce's study. This distribution does not reflect the distribution of research publications across the subfields of physics.
Brown (2001a) tried to evaluate the importance of arXiv.org in physics by carefully examining citations to each of the 12 top-level physics sections of arXiv.org between 1991-1999 using data from the Information Science Institute's Sci-Search. Generally, the citations to the total number of e-scripts in arXiv.org grew each year through 1997 (Brown 2001a, Fig. 3), and two of the high energy physics databases were by far the most cited by journals. Brown also examined the citation policies of journals, and noted that most of the 37 high impact journals that she studied via Sci-Search allowed authors to use arXiv identifiers in their bibliographies. Overall, over one-third of the articles in arXiv.org were cited in other articles in these journals. Brown's study is a very meticulous examination of variations in the number of e-scripts posted in these 12 subfields on arXiv.org and citations to them. She concludes that "it is, therefore, evident that arXiv.org e-prints have evolved into an important facet of the scholarly communication of physics and astronomy."My perusal of some of these journals yields a somewhat more complex picture of citation practices. For example, the article "Holographic Probes of Anti-de Sitter Spacetimes" by Vijay Balasubramanian, Per Kraus, Albion Lawrence, and Sandip Trivedi was arXiv.org identifier hep-th/9808017 as well as a Fermilab report (#FERMILAB-PUB-98-240-T http://fnalpubs.fnal.gov/archive/1998/pub/Pub-98-240-T.html
). It was published in Physical Review
D59 (1999) 104021 It is cited in "Spacetime and the Holographic Renormalization Group" Vijay Balasubramanian and Per Kraus Physical Review Letters (November 1, 1999) 83(18):3605-3608 as "Vijay Balasubramanian, Per Kraus, Albion Lawrence, Sandip Trivedi. Phys.Rev. D59 104021 (1999)" with no
reference to its availability as an e-script at Fermilab's e-script archive or at arXiv.org. This "Holographic Probes" article would have been counted by Brown as an arXiv.org e-script that has been cited by Physical Review Letters.
This self-citation could be an anomaly, since the Physical Review Letters
article does include arXiv.org identifiers for their articles in its bibliography. But it also raises intriguing questions about when an e-script that has been posted on arXiv.org should be counted as "being cited" and whether the use of a bibliographic database to estimate e-script citations will overestimate their frequency.8 Brown (2001b) extended her analysis with an additional bibliographic source: the high-energy physics bibliography that is curated at the Stanford Linear Accelerator - HEP-SPIRES (see Kling and McKim (2000). The SLAC librarians work hard to maintain complete bibliographic records, and the "Holographic Probes" article is indexed as: HOLOGRAPHIC PROBES OF ANTI-DE SITTER SPACE-TIMES. By Vijay Balasubramanian (Harvard U.), Per Kraus (Caltech), Albion E. Lawrence (Harvard U.), Sandip P. Trivedi (Fermilab). HUTP-98-A057, CALT-68-2189, FERMILAB-PUB-98-240-T, Aug 1998. 28pp. Published in Phys.Rev.D59:104021,1999 e-Print Archive: hep-th/9808017. Brown (2001b) reports much higher rates of citation to e-scripts from arXiv.org than are found via SciSearch for each of the 12 top-level arXiv.org topics. These increased rates of citation average 20 times more on the average across all 12 topics (and are 15.4 times more common for hep-th, the original archive). However, when an e-script is available in four locations, including arXiv.org, it is inaccurate to trea arXiv.org as its only source.
Brown is trying to support her main thesis, that "e-prints have become an integral and valid component of the literature of physics." Again, I suggest some caution in interpreting her data: for example experimental nuclear physics is not well represented among arXiv.org's e-scripts. Even condensed matter research, which produces about 50% as many e-scripts as high energy physics for arXiv.org, has citation rates that are less than half of those in high energy physics. The data from HEP-SPIRES are more difficult to interpret, since SLAC librarians include only those articles that they deem to be relevant to high-energy physics. Further, they maintain complete bibliographic records. Using HEP-SPIRES data, Brown must count all of the citations to the "holographic probes" article as accruing to an arXiv.org e-script. The socio-technical strength of HEP-SPIRES for research physicist (i.e., on-line access to complete bibliographic records) limits its usefulness for information scientists who are researching scientific communication practices.
To informally examine the uptake of arXiv.org in one important area of physics, condensed matter, I searched for the e-scripts in arXiv.org of MIT's Prof. Wolfgang Ketterle, 2001 Nobel laureate in physics. As of March 2002, only four articles of his are published via arXiv.org for a one year period between the summer of 1999 and the summer of 2000. This contrasts with 26 articles that Ketterle authored or co-authored between 1999-2001 and four for early 2002 that are listed in the Science Citation Index. Someone who is interested in Ketterle's research will be much better informed if they search the Science Citation Index than if they search arXiv.org. Many of Ketterle's articles are published in Physical Review Letters
, a p-e journal that is available by subscription and site license from the American Physical Society.
One conclusion that I draw from these little bibliographic exercises is that librarians or academic administrators who are tempted to have some savings in serials budgets by canceling their subscriptions to physics journals because they read that arXiv.org has replaced or will soon replace the journal literature would be making a major mistake. The distribution reported by Luce will almost certainly change over time. But, as the example of Wolfgang Ketterle suggests, it is much too soon to claim that arXiv.org has become the medium of e-script research communication in all areas of physics.9
In fact James Langer (2000), president of the American Physical Society (APS) in 2000, made a special effort both to acknowledge the importance of arXiv.org for communication in some of physics' subfields while making the strong claim that it was not a universalizable model for all of physics. He reported a monotonic rise in the number of manuscripts that were submitted to APS journals between 1980 and 1998 (from 5,000 to almost 25,000 manuscripts per year over the 20 year period). Langer noted that the APS could not rely upon arXiv.org (or similar e-script repositories) to replace their journals. Rather they had to support multiple publication systems: both e-script repositories and peer reviewed p-e journals. I shall turn to this key point below.
Halpern (2000) published an intriguing review of the choices made by computer scientists in organizing an e-script repository in 1998 through a partnership that included arXiv.org's sponsor (then, the Los Alamos National Laboratory) and the ACM. Computer science departments in research universities had often developed paper-based technical report series that fit a Guild Publishing Model as early as the 1960s. In the late 1980s and early 1990s, some departments began publishing new technical reports as e-scripts on their Internet sites. In the 1990s, Department of Defense's Advanced Research Projects Agency (DARPA), a major funder of computer science research in the U.S. initiated a project -- NCSTRL (Networked Computer Science Technical Report Library) - to link these computer science e-script series together with a common search engine. NCSTRL was also a partner in this effort to develop CoRR - the online Computing Research Repository. Halpern describes the alternatives for many technical architectural choices as well as legal practices, such as copyright. Since computer science departments had experience with their own e-script series, some of these -- such as copyright -- could be readily resolved based on prior disciplinary practices, negotiations, and experiences. CoRR was organized so that it could be accessed either through the top level of arXiv.org, or as just 'another node" in NCSTRL's linked collection of e-script series.
Halpern reported that some socio-technical choices stimulated "bitter complaints" from computer scientists. In particular, some prospective authors were reluctant to provide their e-scripts in the markup language Tex, because they feared potential plagiarism. Computer scientists were used to publishing their e-scripts in Postscript format -- a format that is much more difficult to plagiarize from than Tex. CoRR's organizers wanted source documents so that they could convert the e-script collection to whatever new formats would prove popular in the future. Halpern notes that the physicists who published on arXiv.org had been willing to provide their source text for nearly a decade. He attributes these differences to "cultural differences" between physicists and computer scientists. Kling and McKim (2000) characterize high-energy physics as a "high visibility field" in which active researchers are keenly aware of the work of others. This would be particularly true in experimental research conducted at the major national and international facilities, such as Fermilab and CERN. Computer Science is a much "lower visibility" field; there may be less mutual awareness amongst researchers about the topics that they study. For example, Kock(1999) published an article in Communications of the ACM i
n which he detailed how he accidentally became aware of an author who plagiarized one of his research articles. Since all ACM members, including the North American Computer Scientists that Halpern was trying to mobilize in 1999-2000, receive Communications of the ACM
, some of them may have become specially sensitized to the possibilities of undetected academic plagiarism.
Halpern also noted that some authors -- research computer scientists -- found the CoRR/LANL interface rather awkward, and could take as long as 45 minutes to post an e-script for the first time! This brief observation is worth noting, since all too many observers of e-publishing characterize the pragmatics as nearly effortless.
Halpern also made arrangements with editors of the ACM's mathematical computer science journals -- Journal of the ACM
and the ACM Transactions on Computational Logic
-- to encourage prospective authors to submit their papers electronically by posting their e-scripts on CoRR and send summary information about it and its CoRR URL to the journal's editors. Overall, Halpern notes that CoRR has not been rapidly embraced computer scientists. He suggests that the requirement for Tex source formats, fears of plagiarism and LANL's unfriendly interface for submissions were the most readily identifiable impediments during CoRR's first year of operation.
Carr, Hitchcock, Hall and Harnad (2000) examined the growth of CoRR's e-script collection. They note that it was created from a combination of new postings and also included e-scripts from arXiv.org's
computation and language (cmp-lg) archive that began in 1994, and papers from the electronic Journal of AI Research (JAIR), and which are archived according to date of publication (not date of archiving). Hence CoRR appears to have a history of posting prior to its launch.
They found that CoRR attracted over 120 postings when it was launched formally in September 1998, but that the number of e-scripts posted immediately dropped to 20-40 per month and totaled about 1000 by December 2000. These numbers are much smaller than the 27,000 e-scripts that they believe are available through NCSTRL's other nodes. More seriously, Carr and his colleagues carefully compare the growth rates of CoRR and hep-th, the original arXiv.org e-script section. Their data shows that e-script postings to hep-th grew steadily in its first two years and stabilized at 200-300 new submissions each month.
Carr et. al. speculate how Halpern could more effectively "build a community of authors." They are not convinced by his emphasis upon interfaces and source text requirements as the major impediments to CoRR's initial adoption. They claim that Halpern should build stronger relationships with journal publishers, and are not confident that the ACM will offer strong continuing support for CoRR. Their "most powerful argument for CoRR, for its authors and users, and ultimately for publishers, especially those that recognize this early, is the ability of free-to-post, free-to view archives to transform access to the scholarly journal literature." Unfortunately, Carr and his colleagues don't address two important features of computer science publishing. On one hand NCSTRL, had been a viable e-script publishing service through the year 2000. Halpern does not have to convince computer scientists to publish e-scripts; he has to convince them to publish via CoRR instead of (or, more interesting, in addition to) their departments' research e-script series, their research project Web sites, and/or their own Web pages! On the other hand, the ACM has developed a substantial "Digital Library" of articles from its journals, that it sells as a service to its members and also site licenses to organizations, such as universities. At some point, the vision for CoRR advanced by Carr and his colleagues would directly clash with the ACM's interest in offering its Digital Library as a member service. I cannot resolve the intriguing conflicts that NCSTRL and the ACM Digital Library provide for CoRR. I do, however, note them and wish that Carr and his colleagues had noted (or even engaged them) as well.
I have focused upon arXiv.org because it is the most active, largest and best known disciplinary repository. There are at least two generally less well known and relatively smaller e-script repositories in the field of linguistics (one for semantics and another for "optimality theory"), for economics (EconWPA) and for cognitive sciences (Cogprints). This list is not exhaustive, and doubtless some scholars will develop new disciplinary repositories for some other disciplines. However, there have been substantial controversial proposals for a disciplinary e-script repository for some fields, such as chemistry and biomedical research -- a topic that I will examine later in this chapter.
A Hybrid Publishing Model
The strength of the Guild Publishing Model is that it enables local adoption without requiring field-wide consensus about the value of communication via unrefereed e-scripts. It also has several limitations. For example, if a major publisher places a prior publishing restriction on e-scripts, local authors may be helpless to turn that tide. In chemistry for example, 31 editors of journals that are sponsored by the American Chemical Society (ACS), announced in the year 2001 that they would not accept manuscripts for review that were posted on Web sites.10
However, a chemical-physicist who publishes in physics journals may be undeterred by the ACS's ban.
Many readers will often evaluate the quality of a specific Guild e-script series by the research reputations of its members and its sponsoring institution. Thus, the e-script series sponsored by Stanford University's departments may be more frequently searched than the e-script series of California State University departments in the same field. If an e-script series become common in a discipline, then the work of searching them for specific topics becomes proportionally more time consuming. For example, by the early 1990s, well over 100 research computer science departments sponsored their own technical report series.
One strategy to simplify the problems of search (and also increase the potential visibility of all researchers who study a specific topic) is to develop a uniform front end with a search engine that links all of the research e-script series together in a specific field. Developing software for interoperability is discussed on the September98 list.11
There are at least three major efforts of this kind: NCSTRL (Davis, 1995), the U.S. Department of Energy's "Preprint Network" and Working Papers in Economics (WoPEc) ( Krichel and Warner, 2001). Each of these has had different histories.
NCSTRL began as a project of DARPA. It grew rapidly in the period 1997-1998 to include well over 100 e-script series that could be searched through a uniform interface and by December 2000 included about 27,000 e-scripts (Carr, et. al. 2001). However, DARPA's support of NCSTRL ended around 2000, and by the year 2001 it seemed to be malfunctioning (Krichel and Warner, 2001). There have been recent efforts by a small consortium of computer science departments to resuscitate it and to expand its scope to include a library of electronic dissertations and theses. By 2002, NCSTRL seemed to be very much a work in progress with an expanded agenda, searches for a few e-script series operational, and its actual state of development not reported on its site (see www.ncstrl.org).
WoPEc was started in the early 1990s by Thomas Krichel as part of a small economics research bibliographic project and has mushroomed into a much larger set of services with links to about 30,000 working papers in the e-script series of over 100 departments and research institutes. It received most of its research funding from a British digital library program in 1998. However, today it appears to be a self-funded volunteer project (Krichel and Warner, 2001).
The US Department of Energy announced its Preprint Network in January 2000 (Warnick, 2001). It is organized by science librarians at the Oak Ridge National Laboratories who have linked the e-script series of over 7,600 individuals, departments and institutes to cover diverse science and technology topics (Warnick, 2001). As of March 2002, its organizers estimate that they provide access to over 400,000 e-scripts. It is worth noting that arXiv.org is a node within the Preprint Network that can be browsed from the set of linked e-script series. But one cannot search arXiv.org from the Preprint Network's integrated search engine because of arXiv's approach to blocking robot searchers! Since the Preprint Network is funded as part of the U.S. Department of Energy's Office of Science and Technical Information (OSTI), its future may seem secure. However, there have been periodic attacks upon OSTI's e-script services by U.S. Congressmen who argue that it competes with private sector scientific publishing ventures. Thus it faces a continual political risk.
The vision of federating e-script collections, whether they are organized as locally controlled e-script series or disciplinary repositories is a kind of "natural vision" for many advocates of research digital libraries. The vision goes back to at least the late 18th century when Etienne-Louis Boullee proposed a Royal Library for King Louis XVI ("Duxieme projet pour la Bibliotheque du Rois") that would contain all of the world's knowledge (Chartier, 1994). The technological superstructure for a contemporary version of this project is developing under the banner of "Open Archives" -- a set of documentary and metadata protocols that will make it easier for services such as NCSTRL, WoPEc and the Preprint Network to effectively search large distributed e-script collections (Van de Sompel and Lagoze, 2000).Controversies about Communication via Unrefereed Manuscripts
These arrangements for publishing unrefereed research reports in paper or sharing them via libraries seem to have been relatively uncontroversial in some fields, such as economics, demography, electrical engineering, computer science and high energy physics. Perhaps, more accurately, I have been unable to find published records of whatever controversies may have been vetted about these publishing practices.
However, some efforts to organize systematic circulation or publishing of unrefereed manuscripts were controversial and some were terminated. Till (2001) reviews the history of a project by the National Institutes of Health top operate a paper-based service for exchanging research manuscripts that started in 1961. The NIH's "Information Exchange Groups" seemed to be moderately successful. But it was also the subject of some controversy -- especially criticism by journal editors about a potential erosion of peer review -- and was closed in 1967. This NIH venture of the 1960s predates an NIH sponsored proposal of the late 1990s to sponsor an electronic exchange for e-scripts, called E-Biomed. E-Biomed was also the subject of substantial controversy, and I will discuss it later in this chapter.
The debates about the NIH service seemed to have been long forgotten in the early 1990s, when advocates of distributing unrefereed e-scripts were excited about the Internet as an inexpensive scholarly communication medium.
In the 1990s, Steven Harnad became the most visible advocate of emphasizing scholarly communication via e-scripts. In the mid-1990s, he advocated the following "subversive proposals:"
If every esoteric author in the world this very day established a globally accessible local ftp archive for every piece of esoteric writing from this day forward, the long-heralded transition from paper publication to purely electronic publication (of esoteric research) would follow suit almost immediately (Okerson and O'Donnell, 1995) http://www.arl.org/scomm/subversive/sub01.htmlIf right now every esoteric scholar/scientist were to make available on the Net, in a public ftp or http archive, the preprint of every paper he wrote from this day forward, the rest would take care of itself, and in short order. (Harnad, S. (1995b) If from this day forward, each and every one of you were to make available on the Net, in publicly accessible archives on the World Wide Web, the texts of all your current papers (and whichever past ones are still sitting on your word processors' disks) then the transition to the PostGutenberg Galaxy would happen virtually overnight (1995b).
Harnad generally advocated e-script publishing to be organized with centralized repositories based on the architecture of arXiv.org. But Harnad's terse "subversive proposal" doesn't limit e-script publishing to centralized field-wide e-script archives.
Harnad posted his "subversive proposal" to the discussion list VPIEJ-L in June, 1994, and stimulated a wide-ranging discussion that was edited by Ann Okerson and James O'Donnell (1995). Harnad also debated Stephen Fuller in 1995 in the journal, The Information Society.
Legal scholar Bernard Hibbitts was clearly influenced by Harnad's writings when he published "Last Writes" in which he proposed that legal scholars should ignore law reviews and self-publish their articles on their own Web sites or in a field-wide archive that would be modeled on arXiv.org (then know as xxx.lanl.gov) (Hibbitts, 1996a, 1997a). Hibbitts' 1996 article was published in First Monday, a pure e-journal, was the subject of a debate there between he and Archie Zariski (1997). Hibbitts' proposal also was the focus of a special issue of the Akron Law Review. Halpern's (2000) description of CoRR -- the Computing Research Repository- appeared in a journal that also included three? articles that were skeptical of some aspects of its operation and value. When proposals such as Hibbitts,' Harnad's or Halpern's are raised in specific disciplines, such as law, computer science, or biology, they are the subject of debates which typically emphasize some common issues, especially concerns about quality control (absent peer-review), the complexities or long term archiving, and their likely long term costs.
Walker's (1998) article in The American Scientist advocated the position that pure e-journals sponsored by scientific societies can be published very inexpensively and can help to solve the research library's serials crisis. Walker's article was the stimulus for The American Scientist to sponsor a lively on-line forum about scholarly electronic publishing, moderated by Stevan Harnad, that continues to be active into early 2003. However, under Harnad's moderation, the discussion emphasizes issues of centralized e-script repositories, such as costs, quality control, copyright, interoperability, and long term archiving.
The dominant analyses of the role of e-scripts in scholarly communication focus on the information processing costs and speeds of different media. These “information processing” analyses are field-independent: the costs and speed of publishing in paper or electronic media in two fields, such as chemistry and physics should be similar (except for differences in production expenses for artwork or color). Information processing analyses lead one to predict that differences in communication practices across fields should diminish over time. Hars (1999a) develops an information processing analysis of corpuses of scientific preprints. He comments:
my argument is generic and should apply to any scientific discipline. Thus I expect (the fields of) Information Systems and Chemistry to embrace online publishing in a similar way as physics, etc. It may just take them longer. (Certainly, this is a naive view and there may be factors rooted in power, tradition etc. which may hinder this development. But I don't think that there is a systematic difference between chemistry/Information Systems and other disciplines which prevents the former from adopting similar structures for online knowledge infrastructures.) (Hars, 1999b).
Similarly, Paul Ginsparg, developer of arXiv.org, wrote:
Regardless of how different research areas move into the future (perhaps by some parallel and ultimately convergent evolutionary paths), I strongly suspect that on the one- to two-decade time scale, serious research biologists will also have moved to some form of global unified archive system, without the current partitioning and access restrictions familiar from the paper medium, for the simple reason that it is the best way to communicate knowledge, and hence to create new knowledge (Ginsparg, 1999).
According to analysts such as Hars and Ginsparg, it is "Just a Matter of Time" for scholars in all disciplines to readily circulate their e-scripts, and preferably via the disciplinary repository model.
Kling and McKim (2000) examined differences in communication forums in the fields of high-energy physics, molecular biology , and information systems to determine the differences in traditional communication forums and how those differences affect the utilization of electronic media.Kling and McKim argue that "it is not just a matter of time" for all fields and disciplines to adopt the arXiv.org repository model of research results distribution. Kling and McKim did not examine the Guild Publishing Model, but their analyses could be applied to e-script series organized via local guilds (academic departments, research institutes) as well.
Their argument rests on four key ideas:There is a dual dialectic of trust between authors and their readers.
Authors may wish to reach readers, but do not wish to be plagiarized or scooped (i.e., allow others to do the follow on studies that they plan before they can turn to them). On the other hand, readers want to read interesting and trustworthy scholarship.There is an institutional embedding of trust supporting processes in different fields.
As an extreme example, they note that experimental particle physics research is organized through about 100 collaborations of 30-1700 physicists. Their articles are based on data that is collected at a few facilities worldwide. Plagiarism is implausible, these physicists are keenly aware of which collaborations are studying which phenomena at which research facility. Each experiment has been vetted through many funding reviews by research agencies and reviewers at the experimental facility. The e-scripts that are posted on arXiv.org and elsewhere (such as each collaboration's Web site)Relatively few fields are organized with such high mutual visibility of others' research by participating researchers.
For example, Kock(1999) who learned that his research was plagiarized by accident, studies the use of information systems in organizations. Empirical information systems research can be conducted in thousands of organizations, and the field is one where scholars are much less cognizant of each others' ongoing research. In addition, the field has dozens of journals. Thus, the person who plagiarized Kock's article could act under the belief that his fraud had little chance of being detected by the editors and reviewers of some of the less prominent journals.The scholarly communication system of a field is embedded in its scholarly work system.
The contrast between the visibility of research in progress in experimental highenergy physics versus the field of information systems is based on the way that empirical research work is organized. As an extreme example, the Large Hadron Collider (LHC) at CERN is currently under construction. However, it is relatively easy to identify four major experimental collaborations that will gather data from it (see http://public.web.cern.ch/Public/SCIENCE/lhccolexp.html ) and to identify lists of participating physicists. The ATLAS collaboration's Web site is illustrative ( http://atlasinfo.cern.ch/ATLAS/internal/Welcome.html ). CERN's collection of electronic documents includes reports of the physics research that the ATLAS collaborators expect to conduct (see for example, Tapprogge, 2000). Another variation is a page for "The Neutrino Oscillation Industry" with links to dozens of experiments about current and planned nutrino experiments (see http://www.hep.anl.gov/ndk/hypertext/nuindustry.html
In contrast, in the field of information systems, there is no easy way to learn who is examining knowledge management practices of, say, international engineering design firms. There many such firms, and the research can be relatively invisible to those who are not very familiar with the participants. Thus, the risk of having such research plagiarized or scooped by early e-scipt publication is much higher than is the case of neutrino oscillation studies. Further, an e-script that is based on empirical data pertinent to neutrino oscillation would come from a study that has been reviewed for its likely scientific competence and significance at many early stages of conception, planning, and execution by a variety of physicists. In contrast, a knowledge management study may be carefully reviewed only by its participants, until there is some form of publication. Thus readers of a corpus of unrefereed knowledge management e-scripts are likely to see a higher percentage of studies that fall below their thresholds of scholarly quality than would be the experimental high energy physicists.
Kling and McKim suggest that electronic forums must suit the practices of the field, otherwise they will not be socially accepted, and will stagnate or die. They predicted that "The divide between fields where researchers share unrefereed articles quite freely (‘open flow fields’) and those where peer review creates a kind of chastity belt (‘restricted flow fields’) is likely to change slowly, if at all.” They identified biology, psychology, and chemistry as three important restricted flow fields.The Slow Transition to E-Scripts
In May 1999, NIH Director Harold Varmus proposed an electronic repository for biomedical research literature server called “E-biomed.” E-biomed reflected the visions of scholarly electronic publishing advocates: it would be fully searchable, be free to readers, and contain full-text versions of both preprint and post-publication biomedical research articles. If Varmus' proposal was accepted, it would have undermined Kling and McKim's (2000) analysis.12
Varmus created a web-site at N.I.H. to post comments about his proposal.
Within 4 months , the E-biomed proposal was radically transformed: the preprint was section eliminated, delays were instituted between article publication and posting to the archive, and the name was changed to “PubMed Central.” First, PubMed Central would not contain a preprint server that would have enabled researchers to post e-scripts without going through traditional peer-review and editorial processes. Second, unlike the original E-biomed proposal, scientific societies and commercial publishers would have central roles in the control and dissemination of content in PubMed Central. PubMed Central would still be free to readers, but publishers would control the content posted in the archive, and the time the articles would be posted. The changes between the E-biomed proposal and the PubMed Central version ran counter to the “inevitable”13
outcomes predicted by many electronic publishing enthusiasts.
Kling, Fortuna and King (2001) examined the remarkable transformation of the E-biomed proposal to PubMed Central by analyzing comments about the proposal that were posted to Varmus' E-biomed online forum nd discussions that took place in other face-to-face forums where E-biomed deliberations took place. They counted supporter of E-biosci to outnumber its critics by two to one! However, the number of supporters was 125 and there were 60 critical postings. They found that the transformation of the E-biomed proposal into PubMed Central was the result of highly visible and highly influential position statements made by scientific societies and the editorial boards of prestigious scientific societies against the proposal. Most of the scientific societies that responded publicly to the E-biomed proposal belong to the Federation of American Societies for Experimental Biology (FASEB), a coalition of 21 biomedical societies representing more than 66,000 scientists. The officers of these societies could claim to speak for tens of thousands of scientists. These officers could each speak for thousands, while only 125 individuals wrote in support of E-biosci!
After the new PubMed central architecture was announced in August 1999, the editorial board of the prestigious Proceedings of the National Academy of Sciences
announced its willingness to participate in it. PNAS
participation was a major coup for the proposal, but it was contingent upon several conditions. PNAS
required that only peer-review materials appear on the server: “Participation in PubMed Central is contingent upon its not including reports that have been screened but not formally peer-reviewed.”14
This requirement imposed a major redefinition of PubMed Central. Some observers had expressed hope that one part of the site would host non-peer-reviewed papers, and that over time PubMed Central might evolve into a service resembling arXiv.org. These hopes were dashed with the PNAS’ demand that all materials on the site appearing on PubMed Central must be peer-reviewed.
The literature about scholarly electronic publishing usually emphasizes a binary conflict between (trade) publishers and scholars/scientists. They concluded that: 1) scientific societies and the individual scientists they represent do not always have identical interests in regard to scientific e-publishing; 2) stakeholder politics and personal interests reign supreme in e-publishing debates, even in a supposedly status-free online forum; and 3) that multiple communication forums must be considered in examinations of e-publishing deliberations.
PubMed Central has been in operation since February 2000. As of November 2002, the archive includes full-texts, PDF files, abstracts, full texts, and sometimes “Supplemental Data” and/or “Video Material” for 40 journals and BioMed Central (BMC), the last containing 57 separate titles. The BMC journal collection, which deposits immediately to PubMed Central, is expected to grow rapidly. This repository is probably valuable to many bioscientists, especially those who work in colleges or laboratories with small libraries and limited inter-library loan. PubMed Central may make an important scientific literature available in an unprecedented way in developing countries. But it is far from the unrefereed corpus of arXiv.org that was originally proposed as E-Biosci.
In an independent project, the British Medical Journal announced the launch of an e-script repository, Clinmed Netprints (http://clinmed.netprints.org/home.dtl
) In December 1999. Perhaps the British Medical Journal could succeed where Varmus failed.
Clinmed Netprints established specific ground rules for e-scripts to be posted. Before posting, articles would be screened to ensure that they contain original research into clinical medicine or health and that they don't breach patient confidentiality or libel anyone. All articles fulfilling these minimal conditions would be posted, usually within 24 hours of receipt. They stressed that
"the appearance of an article on this server is therefore not intended to convey approval of its assumptions, methods, or conclusions. Each preprint will be prefaced by the following disclaimer:"Warning: This article has not yet been accepted for publication by a peer reviewed journal. It is presented here mainly for the benefit of fellow researchers. Casual readers should not act on its findings, and journalists should be wary of reporting them." http://clinmed.netprints.org/misc/netprints.shtml
(March 21, 2002).
Between December of 1999 and mid-March 2002, a total of 53 e-scripts were posted, primarily by authors outside of the U.S.. In the first 3 months of year 2002, a total of four e-scripts were posted. This is a remarkable level of underperformance! It is possible that publishing unrefereed e-scripts does not appeal to many medical scientists.
Chemweb, an e-script repository that is modeled on arXiv.org that is hosted by Elsevier Publishing collected 425 e-scripts between July 2000 and March 2002. Its most active subarchive, physical chemistry, averaged 10 e-scripts posted per month in the year 2001. In the first three weeks of March 2002, 13 e-scripts were posted in physical chemistry and five were posted for the nine other areas of chemistry identified (including "miscellaneous"). While Chemweb is not dying, like Clinmed Netprints, it's not very lively. Conclusions
In the early 1990's a number of librarians and scholars fell in love with the possibilities of reorganizing scholarly publishing regimes with the internet, so as to enable scholars to communicate more rapidly, with wider readership, and with reduced costs (when compared with paper-based publishing). Some of these scenarios were organized through peer-reviewed electronic journals (Kling & Callahan, 2002), while others dreamed of less filtered "desktop to desktop" communication. This chapter has examined some of the complexities of realizing that second kind of scenario of scholarly communication via unrefereed e-scripts.
During the last decade there have been a number of projects that advance this second scenario -- from individuals posting their manuscripts on their web sites through guild publishing sites in about a dozen disciplines through disciplinary repositories in a handful of disciplines. We do not have a good theory to explain the ways in which different disciplines have selected these different architectures for communicating via unrefereed e-scripts. We also lack a good theory of why these practices are confined to a minority of academic disciplines. Kling and McKim's (2001) effort which examines different features of a discipline (ie., relative visibility of projects, patentability of research products) as they pertain to trust between authors and readers stands out as the only systematic effort to conceptualize such disciplinary differences.
It is clear that techno-economic analyses -- which would predict comparable advantages and costs for all disciplines -- have not been good predictors of shifts in disciplinary communication practices. Kling and McKim's analysis is a form of "institutional embedding" -- that scholarly publication is part of a much more complex set of scholarly working arrangements that include credit-assignment and financing arrangements. Since these working arrangements change slowly in a discipline, only those disciplines whose working arrangements are congruent with communicating via unrefereed e-scripts will adopt and extend the practice. For example, while the proposed E-biosci was not supported by the editorial board of the PNAS
until unreviewed articles were banned, they have not publicly criticized arXiv.org or the PrePrint Network. In fact, the PrePrint Network quietly began its operations while E-biosci was being transformed into a respository that would never host a "preprint."
Even here, there are some anomalies. For example, the Research Laboratory for Electronics at MIT which was a pioneer in publishing an unrefereed set of technical reports (starting in 1946) still distributes them only in paper. In the field of information science -- which is disproportionately the scholarly center for studies of scholarly communication -- researchers are sympathetic to with communicating via unrefereed e-scripts; but few information science programs support a local working paper site!
While it is hard to predict the precise details, we have sufficient history of debates, experiments and projects for communicating via unrefereed e-scripts that we should expect incremental change rather than the liquification of paper and the withering of peer review by the electronic power of he internet. If the very recent past is prologue to the future, we shall expect a relatively slow growth in the number of fields that adopt disciplinary repositories. The local control of the Guild Publishing Model leads me to expect to see many more e-script repositories that fit that model develop in the next decade.ACKNOWLEDGEMENTS
Funding was provided in part by NSF Grant #SBR-9872961 and with support from SLIS at Indiana University. This article benefited from helpful discussions about electronic scholarly communication with a number of colleagues, including Blaise Cronin and KyoungHee Jung. Lisa Spector and KyoungHee Jung provided important editorial assistance.1. Pure e-journals
are originally distributed only in digital form (Kling and Callahan, 2002). . Examples include the Electronic Journal of Communication, the Journal of Digital Information, the Internet Journal of Archaeology, and the Journal of Electronic Publishing.2. The basic theme of peer review is rather simple: journal editors solicit the reviews of topical experts to advise them about whether an article that is being considered for publication in their journal should be published. Journals differ in many important practices in their conduct of a peer review: there are variations in the number of reviews solicited, the specific questions asked of reviewers, whether the editors attempt to hide an authors’ identity from the reviewers, etc. (See Weller, 2001).3. The term STM refers to science, technology and medicine.4.
PPF continued hardcopy publication until the Fall 1993. PPF is available on SLAC's website, at http://www.slac.stanford.edu/library/documents/newppf.html 5. This term is strangely anachronistic, since between the 16th and mid-20th centuries, it referred to documents that were handwritten and "not printed" ((1996)). In the 20th century, the "manuscript" (handscript) was replaced by the term typescript to refer to typed documents. In today's parlance, the term for electronic documents might by electro-scripts or e-scripts, although I have not found that usage in the context of scholarly communication. 6. Within three years RLE had issued 64 technical reports, and by the end of 2001, the RLE had issued over 650 reports. Remarkably, they are available only in paper formats!7. Some of these differences may be influenced by the editorial practices of specific journals. For example, two particle physics journals, the American Physical Society's Physical Review D and Elsevier's Physics Letters B encourage prospective authors to post their articles on arXiv.org and to send the arXiv identifiers to the journal rather than send in the complete text of e-scripts.8. This example also raises questions about why the authors (Kraus and Balasubramanian) didn't simplify their readers' work in locating an e-script version.9. It would be interesting to match the publication records of a set of distinguished physicists (say Nobel laureates and winners of the National Medal of Science) with the list of their publications that are published on arXiv.org to better understand its adoption by very elite physicists. 10. In contrast, Elsevier created ChemWeb to invite open publishing and indicated that e-scripts on ChemWeb might be reviewed for its journals.11. September 98 is a public forum about scientific electronic publishing that was hosted by Sigma Xi, the publisher of the magazine, American Scientist. Thomas J. Walker initiated this forum in the September-October 1998 issue of American Scientist. However, due to the popularity of the debate, the forum has continued up to the present, and is moderated by Steven Harnad
( http://amsci-forum. Amsci.org/archives/september98-forum.html
).12. Kling and McKim drafted their paper in 1998, before Varmus proposal. It was under review and in press during the debates about E-BioSci.13. The electronic publishing movement was energized by a number of enthusiasts who state that free online access to all peer reviewed research is inevitable. In 2002 Bradley (2002) wrote a news story for The Scientist. He quotes Stevan Harnad as saying "The optimal and inevitable outcome for research and researchers in view of the new possibilities offered by the online age is open access to all peer-reviewed research." 14. As quoted from the NAS Site, “Of Current Interest” section, see Web site for full statement: http://www4.nationalacademies.org/nas/nashome.nsf/c1c341e2c7507cdb852568080067195a/935da99ebcb56e018525681f0078e050?OpenDocument