This article was written in mid-2007 as a proposal for a now defunct reference project. I tried to provide an overview of digital newspaper sources from a national and international perspective. Because my background is in American literature, the international perspective is not as strong. I provide links to digital projects and titles of monumental periodical research guides.
Newspapers record immediate responses to events of contemporary significance and provide detailed information about historical figures, literary works, and social trends. But the cost of making off of this information accessible in the present—preserving individual issues, binding larger sets, indexing, microfilming, digitizing, purchasing access rights to copyrighted material–has meant that much newspaper content remains inaccessible. Digital newspaper projects face many constraining factors: On what basis should content be included or excluded? Which newspapers should be made available? Should content be made available in subscription databases or in public access platforms? Should content be accessed as transcription or page images? Digital newspaper resources, whether made freely available by research institutions or published by for-profit vendors, always require those who prepare such resources to conform to legal restrictions and to judge cultural significance in order to justify investments in digitization.
The long term trend of gathering current newspaper content into databases can helpfully illuminate the types of decisions that face designers of such systems. The LexisNexis Academic database, for example, gathers content from prominent regional newspapers from the United States and combines it with content from magazines and journals. The definition of the content which is excluded from Lexis-Nexis is instructive. Inserts, classified advertisements and unique matter in alternate daily editions is not considered content at all, and is omitted. Editorial columns and feature stories may be excluded due to copyright restrictions. Another highly ambitious subscription database, Newsbank’s America’s Newspapers, provides access to three years of content from over 2000 prominent local and regional newspapers, but even its “tens of millions” of items are a small slice of the daily newspaper production in the United States. And even those items which are publicly available, access will be influenced by the news search engines from Google and, to a lesser extent, Yahoo.
Library vendors offer many subscription-based services for historical content of U.S. Newspapers. Subscription-based tools include Readex’s America’s Historical Newspapers, Thomson-Gale’s 19th Century U.S. Newspapers, and ProQuest Historical Newspapers. Content in historical databases is not omitted on a category basis, but low quality reproductions from microfilm and low accuracy rates for character recognition limits content retrieval. Subscription databases for historical content offer sophisticated tools to sort results and identify matches. A commercial vendor, NewspaperArchive.com, has assembled large collections of unique content and sell subscriptions to individuals, though these vendors have less sophisticated search tools. Another commercial vendor, Paper of Record, was recently acquired by Google (2009), and Google may well become a dominant player in this field of historical newspapers as well.
The preparation of freely accessible national collections and the spread of commercial search sites for newspaper archives deserve special attention. National collections will provide great benefits to scholars, The United States National Digital Newspaper Project (NDNP) and British Library Online Newspaper Archive are currently available, as is Tiden–A Nordic Digital Newspaper Library and Australian Periodical Publications 1840-1845.
The contemporary emphasis on digital access builds on previous preservation efforts. Digital projects share with their antecedents an emphasis on newspapers with wide distribution, which are more carefully cataloged and preserved. Low-circulation newspapers, which may be saved in haphazard sets, are less likely to be cataloged or duplicated for preservation purposes. Printed and microfilm versions of national union catalogs and prominent library catalogs remain important resources, but their digital descendants are important online resources. Online catalogs from the Library of Congress, British Library, and the International Coalition on Newspapers provide important finding aids. Nonetheless, printed indexes remain indispensable. Specialized print bibliographies remain essential also for the study of Latin America, the Caribbean, and Africa.
User expectations for convenient access will mean that those items not digitized are less likely to be studied. Thus, project definitions are likely to have an important influence on scholarship. For example, the scope of the NDNP project is limited to “significant newspapers” from “1836 and 1922.” The opening year of coverage defines a convenient technological barrier. Type fonts before 1836 increase the cost of automatic optical character recognition. Such a technological barrier can be overcome, as the Tiden project demonstrates. The year 1922 is not technical. It is the legal boundary established in the Sonny Bono Copyright Extension Act. Works published in 1923 or after have copyright protection.
Given current digital newspaper resources, scholars interested in marginal or regional voices, small linguistic enclaves, in aggregations of large data sets, and in paper documents as material objects face the greatest challenges. But all scholars need to be aware that digital tools may obscure facts about original documents. Portions may be omitted due to legal restrictions or editorial decisions about importance, accuracy rates are defined against ideal source materials, and twentieth-century generic categories may not easily apply to nineteenth-century or earlier newspapers papers. The transcription that is the basis of the search may not be accessible to users, and the only evidence for the incorrect transcription is the absence of search results. A pressing task is to improve catalogs for access to current digital resources. Also pressing is a study that has yet to begin, that of newspaper databases themselves as tools of representation.
Please note that many of these sources are available only to members of subscribing institutions.
British Newspapers 1800-1900
Chronicling America: Historic American Newspapers
19th Century U.S. Newspapers (Thomson-Gale)
Australian Periodical Publications 1840-1845
America’s Historical Newspapers (Readex)
Historical Newspapers (ProQuest)
Tiden — A Nordic Digital Newspaper Library
Nineteenth-Century Serials Edition (NCSE)
ICON: International Coalition on Newspapers
Google News Archive
Lexis Nexis Academic
Yahoo! Search, News.
British Library Catalogue of the Newspaper Library (1975)
British Union-Catalogue of Periodicals (1955-58, 1962)
Catalogue Collectif des Périodiques du Début du XVIIe Siècle à 1939 (1967-81)
Gesamtverzeichnis Ausländischer Zeitschriften und Serien (1959-68)
Catalogue Général des Périodiques du début du XVIIe Siècle à 1959 (1988)
Serials in Australian Libraries (1963)
Union List of Canadian Newspapers (1987)