Access Keys:
Skip to content (Access Key - 0)
Toggle Sidebar

brief guidelines on authority control decision-making

  1. What is authority control?
  2. Why is authority control important?
  3. What are controlled vocabularies?
  4. How do I know when I need authority control?
  5. Architectural issues for authority file building & linking
  6. Sources for name headings
  7. Sources for geographic headings
  8. Sources for subject terms
  9. Sources for media types, genre/format, and other resource attributes
  10. How the NCSU Libraries Metadata & Cataloging department can help you
What is authority control?

Authority control is concerned with building and maintaining controlled vocabularies of terms, such as names, subjects, media types, and titles, to be used as headings in bibliographic records.

Why is authority control important?

Authority control is invaluable to search and discovery because it standardizes multiple forms of a given search term, increasing the likelihood that the search will return all relevant items.

On the end-user side, authority control can be used to lead to authorized forms of names or to expose the entire thesaurus of authorized terms so that a selection can be made from that list. This helps the user to better understand the scope of a collection and how it has been described, and also gives them some confidence that the metadata has been created in a consistent and well thought-out manner. In addition, authority control enables direct linking between terms of relevance to the searcher in one resource and other resources with the same subject, author, or format.

Authority control is beneficial for both metadata creators and for users of that metadata. For metadata providers it saves time, since data entry can be set up so that typing a few words of an entry brings up an already established term or phrase. This can be used against simple lists of names or subject terms, or it can be used with long phrases that contain policy statements, such as rights or copyright declarations. Entering an unauthorized term or phrase will result in no matches against an authority file, letting the metadata creator know either that the term is new to the list or it has not been entered in the authorized form. If cross-references are built into the authority file, entering an unauthorized term or name could lead you to the authorized form and change your entry accordingly, depending on how your system is designed.

What are controlled vocabularies?

Controlled vocabularies are lists of terms used to provide consistency in form and definition within a given field for retrieval purposes. These can be either short lists developed internally to guide metadata creators in categorizing materials, such as lists of date ranges, material formats, or even metadata creators, or they can be large external lists, such as structured thesauri, subject and name authority files, or standardized technical term lists. These external lists are often maintained by agencies that are expert in their fields, such as the National Library of Medicine, the Getty Institute, the United States Geological Survey, IEEE, W3C, and so on, and they are offered as public services through user-friendly Web interfaces.

How do I know when I need authority control?

Authority control can be expensive and time-consuming work. Conversely, it can also save a lot of time and money, particularly when you can reuse vocabularies already developed and maintained externally. As a result, it should be primarily reserved for fields which effect retrieval or for which you want to be able to generate tailored administrative reports and other outputs. A good example of this would be where you receive data from multiple agencies or individuals and you want to track the source of your data for tax, acknowledgement or other purposes. Another example would be where you have only five possible digital rights or copyright statements and you don’t want to have to reenter this data each time you create a new record. Of course, the most obvious example is where you want to make use of an externally maintained vocabulary – such as Library of Congress or Medical Subject Headings, the Art and Architecture Thesaurus, or GeoNames – so that you can make use of the work of others in your field of interest and present terms that are familiar to your end users. In some cases, these external sources also provide useful Web services that may enable your IT staff to automatically extract changes to the vocabulary made by the thesaurus maintenance agency, or to provide additional metadata, such as maps based on coordinate information stored within the parent thesaurus but not locally.

Architectural issues for authority file building & linking

The size and location of the authority file and the structure of your data store will largely determine what your options are for building, maintaining, and linking controlled vocabularies to your metadata creation project.

For small vocabularies, lists might be built into field definitions within a database, programmed as drop-down values on a data entry form, or listed next to the appropriate box on a Web input form. This method should only be used where you don’t need to store additional administrative information about the terms used, e.g., source or creator of data, source vocabulary, scope and description. The ideal would be just the values themselves, or a code plus the term you want displayed. Here are examples where this solution works best:

  • Date ranges (by decade, century, artistic or cultural period, etc.)
  • Format, data type, data source
  • Metadata creators or editors
  • Rights statements
Initials Metadata creator
ab Black, Adrian
sc Cole, Stephen
tm Matthews, Teresa
as Smith, Albert
Code Date range
3 1931-1940
4 1941-1950
5 1951-1960
6 1961-1970

When using larger vocabularies where the vocabulary is dynamic or needs to support cross-references from variant forms of name, you will need easy access to the vocabulary for maintenance purposes. Here, it is better to have the list separated from the main data store itself, either as a separate table within an underlying database structure, or as pointers from your metadata to an external source of terms using a URI, system control number, or other unique identifier. A relational database structure is ideal for this, in that changes to the separate vocabulary can effect global changes to linked records without having to individually edit them all. This is how most integrated library systems handle name, subject and series entries, where a change from one term to another in the authority file can cause thousands, or even millions, of headings to change in related bibliographic records with little or no human intervention. Here are examples of where this solution works best:

  • Personal, corporate, and geographic names
  • Subject terms
  • Standardized data/mime types

When using an external vocabulary, such as Library of Congress Subject Headings (LCSH), GeoNames, or the Getty Institute’s Union List of Artist Names (ULAN), you should record not only the form of name heading that you are using from that vocabulary, but also whatever system number applies within the external file. This will enable you to build Web services so that the terms can be maintained automatically as the source vocabulary changes. If you are unable to find a term within the selected vocabulary that matches your needs, mark the term as “local” so that these can be checked against the source vocabulary periodically, in case the term is incorporated there later.

For larger vocabularies, here are some elements you will typically want to capture:

  • Authorized term or heading
  • Internal (local) ID number
  • Source vocabulary (this itself can be linked to its own authority file, where you might store serviceable information such as base URL)
  • Source ID number
  • Metadata creator
  • Date captured/last revised
  • Type of term
  • Unauthorized equivalent terms (cross-references)
  • Broader and/or narrower terms
  • Scope/description of resource named in heading (historic/descriptive notes)

So, an example of this type of controlled vocabulary could look like this:

Local ID Subject Type Source Source ID Creator DateRev Scope
57894 4-H clubs topical lcsh sh 85000002 cp 20020718 Use for the 4-H movement in general
08756 African American college athletes topical lcsh sh2004010437 ed 20050620
23111 Albumen prints genre lctgm tgm000227 sh 20090330
00399 Big Savannah (N.C.) geographic local   cp 20100823

Links to this table from controlled lists of “type”, “source”, and “creator” could also be made, to ensure consistency in form of these terms.

Source ID Source Code Source Base URI
0093 oclc WorldCat Identities http://www.worldcat.org/identities/
0009 lcnaf Library of Congress Name Authority File http://id.loc.gov/authorities/
0033 lcsh Library of Congress Subject Headings http://id.loc.gov/authorities/
0002 lctgm Thesaurus for Graphic Materials http://www.loc.gov/pictures/item/
0001 local Local use

Consultation in building these types of data structures is available from Metadata and Data Quality within Metadata and Cataloging, as well as from the Digital Libraries Initiatives and Information Technology Departments.

Sources for name headings

Name headings include personal, corporate, series, title, and meeting headings. Personal names include the names of individuals and families. Corporate names may be names of government agencies, universities or departments within universities, business names, churches, associations, musical groups, and any other aggregates with a corporate identity. In Library of Congress metadata practices, named boats and buildings are considered corporate names. Meetings include academic conferences, as well as specific sporting events, World Fairs, and similar activities.

Sources for name headings Abbreviation URI
Library of Congress Name Authorities lcnaf http://authorities.loc.gov/, http://id.loc.gov/authorities/
Union List of Artist Names ulan http://www.getty.edu/research/conducting_research/vocabularies/ulan/
WorldCat Identities   http://www.worldcat.org/identities/
Sources for geographic headings

Geographic headings include names of planets, continents, countries, states, counties, cities, islands, bodies of water, and other topographic features. Depending on how you intend to use these headings, they can be used directly, as in “Raleigh, N.C.”, or hierarchically, as in “United States—North Carolina—Wake County—Raleigh.” How you structure these headings will depend on how you want to be able to search and/or retrieve them. If you wish to take the hierarchical approach, it can be useful to build that into your authority file so that headings could be searched in either way, but metadata creators only have to enter them directly. This will speed up data entry, without losing the power to narrow the search starting at the highest level and proceeding to the local level.

Sources for geographic headings Abbreviation URI
GeoNames geonames http://www.geonames.org/
Library of Congress Name Authorities lcnaf http://authorities.loc.gov/ http://id.loc.gov/authorities/
USGS Geographic Names Information System gnis http://geonames.usgs.gov/
Thesaurus for Geographic Names tgn http://www.getty.edu/research/conducting_research/vocabularies/tgn/
Sources for subject terms

Subject terms or headings can be at various levels of specificity, depending on the size of your data store and on your retrieval system’s search capabilities. The larger the data store, the more granular the subject system should be, to limit the number of postings on any given term. Obviously, if your entire data store is on the topic of “North Carolina” you don’t need to supply that as a subject heading on each record if the file will only reside within a local retrieval system. However, if the entire collection is going to be added to a more general data store, such as the Library’s catalog or OCLC WorldCat, then you will need that subject to distinguish this collection from others.

Bear in mind that many subject lists that are available to the public have complex rules for formulation of headings. Some allow subdivisions that can be either precoordinated (the entire string is specified in the vocabulary) or post-coordinated (you can synthesize the heading by combining a topical heading with a geographic, genre, form or date subdivision). Allowing for and controlling this sort of term combination is one of the more difficult architectural issues for most retrieval systems.

In assigning subject terms, you first need to establish a framework for which to do this. The cataloging community has created complex guidelines for subject analysis of books and other print materials, and has attempted to extend this to new media over the last few years, with varying degrees of success. Subject analysis for books is based on the “whole book” concept, which says that subject headings should be at the level of specificity of the material in hand. If the book is on “winter sports” and there is a subject heading at that level, then there is no need to also use the subject headings “Hockey”, “Skis and skiing”, “Skating”, “Bobsledding”, “Tobogganing”, and “Snowmobiling”. These are narrower terms on the authority record for “Winter sports”. Additional subject headings may be assigned for portions of the book, but there are requirements for the percentage of the book this must be before the subjects can be assigned. Of course, local needs override all of these rules, and this may be the case for your project as well. Metadata and Cataloging can help you work through these issues and achieve a balance between overassignment of subject terms and underassignment.

Sources for subject terms Abbrev Discipline URI
Library of Congress Subject Headings lcsh general http://authorities.loc.gov/, http://id.loc.gov/authorities/
Art & Architecture Thesaurus aat art, design, architecture http://www.getty.edu/research/conducting_research/vocabularies/aat/
CAB Thesaurus cab agriculture, life sciences http://www.cabi.org/cabthesaurus/
INSPEC Thesaurus inspec computer science,telecommunications http://www.theiet.org/publishing/inspec/products/range/thesaurus.cfm, http://www.theiet.org/publishing/inspec/products/range/thesxml.cfm
Medical Subject Headings mesh health, life sciences, medicine http://www.nlm.nih.gov/mesh/
Thesaurus for Graphic Materials I tgm1 graphics http://lcweb.loc.gov/rr/print/tgm1/
Sources for additional controlled subject vocabularies URI
Online Thesauri & Authority Files http://www.asindexing.org/site/thesonet.shtml
Taxonomy Warehouse http://www.taxonomywarehouse.com/
Taxonomy ShareSpace http://www.taxobank.org/
Taxonomies & Controlled Vocabularies SIG, ALA http://www.taxonomies-sig.org/links.htm
Sources for media types, genre/format, and other resource attributes

Depending on the size of your project and the nature of its contents, you may or may not want to set up lists to control the attributes of your data. If your entire project is describing streaming media, you probably don’t need to identify the format of individual resources. However, if you are attempting to control more than one file type, media type, genre and so on, there are vocabularies available for that.

Sources for name headings Abbreviation URI
LC Genre and Form Thesaurus lcgft http://authorities.loc.gov/, http://id.loc.gov/authorities/
MARC Genre Term List marcgt http://www.loc.gov/standards/valuelist/marcgt.html
Mime media types mime http://www.iana.org/assignments/media-types/, http://www.htmlquick.com/reference/mime-types.html
Moving Image Genre List miggen http://www.loc.gov/rr/mopic/miggen.html
RDA carrier types rdacarrier http://www.loc.gov/standards/valuelist/rdacarrier.html
RDA content types rdacontent http://www.loc.gov/standards/valuelist/rdacontent.html
RDA media types rdamedia http://www.loc.gov/standards/valuelist/rdamedia.html
Source codes for vocabularies, rules & schemes   http://www.loc.gov/standards/sourcelist/
Thesaurus for Graphic Materials II: Genre & Physical Characteristic Terms tgm2 http://www.loc.gov/rr/print/tgm2/
Added by jcchapm2 , last edited by jcchapm2 on Mar 17, 2011 15:30
Labels:
None
Enter labels to add to this page:
Please wait 
Tip: Looking for a label? Just start typing.
Adaptavist Theme Builder (4.2.1) Powered by Atlassian Confluence, the Enterprise Wiki. (Version: http://www.atlassian.com/software/confluence Build:#3.5.2 2153)