Fill His Head First with a Thousand Questions

July 28, 2008

Bhutan and I

Filed under: Uncategorized — wraabe @ 2:03 pm

The Walter Havighurst Special Collection in King Library at Miami University (Ohio) holds a copy of Bhutan: A Visual Odyssey Across the Himalayan Kingdom. It was lauded in the New York Times as the “largest commercial book ever published.” The Special Collections staff is kind enough to allow visitors to have their picture taken with the book. I’m to the right, with my legs deliberately obscuring the “Please Don’t Touch” sign.

Wesley Raabe with Bhutan

Wesley Raabe with Bhutan

A librarian assures me that this book is a “big deal” in Oxford, Ohio. For a book lover within 50 miles or so of campus, I think it’s worth the trip. Be sure to drop by the library during Special Collections hours (check the Miami web site). The Library keeps a weekly page-turn schedule.

And thanks to Amy Earhart, I can now show you where I am on a Google map. [Note: This link will expire because it is a training course.]

July 26, 2008

Collation in Scholarly Editing: An Introduction

Filed under: Uncategorized — wraabe @ 7:03 pm

The term scholarly editing can be used as a catch-all term for a reprint of a culturally significant work with an introduction by a scholar. Such editions usually include, in addition to a reprinted text, an introduction that surveys scholarly opinion and a brief bibliography for further study. Ambitious editors may also provide explanatory annotation, glossaries of unfamiliar terms, historical notes, and an author time-line. While I have no quarrel with such work–and would observe that its value is often underestimated–I refer to scholarly editing as work in which the editor illuminates the variable forms in which the text has appeared and provides commentary that explains the significance of textual alterations.

When scholarly editing is used in the sense merely of a work by a scholar, the editor’s concern for the text chosen–if such editors are concerned and realize that texts are often printed in multiple forms–is often limited to choosing the right text to reproduce according to some standard (best manuscript, first printed, thoroughly revised by the author, best typography, most influential, etc.). The type of scholarly editing that is my subject, which has also been called critical editing, has as one of its primary emphases the arrangement and ordering of multiple texts to establish the relationship among them, to analyze and classify the variants, and (in many cases) to offer an authoritative text for study to other scholars. A characteristic activity of scholarly editors–from here forward I mean the term in the limited sense–is to compare and classify the forms in which the versions of the text have appeared: we collate.

Although in the study of manuscript culture one of the characteristic activities is to align parallel parts of a work–and this is a common definition of collation–I speak of collation in the more narrow sense of identifying differences between texts (i.e., after passages are already “aligned”). There are three methods to collate texts: 1) read two texts side by side and note the differences, 2) compare printed page images (by allowing your eyes to merge two page images, often with a device especially for that purpose); 3) transcribe and compare transcriptions with aid of a computer. The first method is probably as old as humane study in a Western tradition–though whether what the Alexandrian librarians did counts as “textual editing” is a debate that cannot be pursued here. Tanselle says yes, basically, but Greetham is hesitant to trace the tradition from Alexandria to today as a single discipline. The second method is a comparatively recent development (if you’re not too young to think of the 20th Century as “recent”). The process of comparing page images predates the arrival of computers in the humanities, but it is only suitable for comparing two documents that were printed from the same setting of type. If you doubt that two documents printed from the same setting of type would differ, you are wrong. All type, like flesh, is heir to ills, both accidental and deliberate. Pieces of type slip in their forme, corrections are made during printing runs (stop-press corrections), errors are noticed and corrected for a second printing, and authors choose to alter wording after a work has gone to print (sometimes in response to readers’ responses). Hand-set type, metal stereotype plates, and linotype slugs can be altered, and they are altered.

One reads two texts side by side and takes notes (method 1) only when you have no other choice. Scholars who do such work create monuments of diligence. I have reviewed collations prepared by scholars who did not have access to computers (E. Bruce Kirkham’s collation notes on Uncle Tom’s Cabin in Jewett edition and newspaper version are at Ball State). The dedication required for such a task boggles the mind: let us not underestimate how difficult (physically and mentally taxing) textual comparison was for a previous generation of scholars. A similarly impressive study is a Kansas doctoral dissertation that compared over 20 British edition of Uncle Tom’s Cabin. The scholar used microfilm. As far as I can tell, Harry Opperman (the scholar) never published anything after his 1972 dissertation. But he created a monument to diligence. If current scholars choose not to use electronic collation tools or page-image comparison tools, the side-by-side comparison is as difficult today as it was for scholars in the mid-20th century. One should not be uncharitable to work done with tremendous dedication and the finest tools available–humility ought to stop us–but the method of side-by-side reading and note-taking can be retired. It should (must) be retired because the accuracy of these older methods has been superseded by computer-based comparison of digital transcriptions. Methods 2 (compare page images) and 3 (compare digital transcriptions) remain important for all types of textual work that involve printed matter.

The second method, merging of side-by-side page images, requires two copies from the same setting of type. This process can be undertaken with the aid of an analog device–or without the aid of such devices. One focuses each eye on a version of a page image (or partial page images, to be more exact). If the two images differ (i.e., the setting of type differs, type has been damaged, or if the paper has a water stain), the viewer of two otherwise identical images experiences a 3-dimensional flickering effect at the site of difference, as the mind (one brain, two eyes) attempts to resolve the conflicting portions of the images. One does not just “experience” this as a matter of course–an acquired skill must be acquired with practice. The amount of practice depends on the device and the user.

Many devices are available, and I can only briefly summarize the ones with which I’m familiar here. First, there are the analog devices: Haley’s Comet, Lindstrand Comparator, Hinman Collator, McLeod Collator, and transparencies. I have used the Haley’s Comet with some difficulty. The intial setup–an hour or two of work with the mirrors and stands–can be accomplished with the aid of Carter Hailey’s instructions. The Comet is portable (box weighs about 25 pounds), and it could be carried with luggage. The Lindstrand Comparator is the easiest to use. It is a square wooden box with an opera glass-style eye-piece. Each eye-piece focuses on the same page in two different books. Most users with 5 minutes can achieve some success. Within an hour almost anyone can become accustomed to a Lindstrand. The Lindstrand is not portable in the Comet sense (though I suppose the box would fit in a Volkswagen Beetle). The Hinman is not merely non-portable–it is a hulking monstrosity (itself about the size of the old Beetle, propped upright). The two page images are super-imposed by a mechanical switch that alternates light and dark on each image, which are viewed through binocular eye-pieces. While Hinman may work for others, my poor vision without corrective lenses made the binocular eye-pieces unusable. I have not used the McLeod Collator, but I have heard (secondhand) that the Comet is derived from its design. Comparison of images (analog or digital) using transparencies is another possibility, but I have not yet found the method to be practical–the California Twain editors claim success with a version of this method. The mechanical collators (Hinman and Lindstrand particularly) can be hindered by large books. If a book is larger than Shakespeare’s First Folio, it won’t fit in the device. I was unable to benefit from such devices when I started working with newspapers.

Although these devices provide useful mechanical assistance for merging page images, all you actually need are two eyes, the mind, and a bit of training on how to collate with no device. You probably already know how to collate with two eyes. If while a child you indulged in the optical illusion of the floating finger, you have collated your fingers. The full method of machine-free collating is described as the “Human Hinman” in Joseph A. Dane’s Myth of Print Culture, pgs. 94-95. This method’s chief advantages are cheapness and portability. You need only two impressions from the same setting of type (or one actual impression and a photocopy of another impression). In response to an inquiry, Randall McLeod offered detailed instructions on this method of collation, which has been posted here.

The third method (computer-based comparison of transcriptions) has seen a growth of new tools, but side-by-side page image comparison cannot be discarded. Bethany Nowiskie in a description of a collation tool called Juxta has spoken of the challenges posed by “flashing lights” of the Hinman. Juxta is not a substitute for side-by-side page image comparison. Just as a medievalist who is concerned with manuscript exemplars would have no use for a technique for comparing type, a technique useful for comparing different transcriptions cannot supersede—it can only supplement—a technique for comparing typesettings. Barring the availability of a nearly perfect transcription, the process of correcting transcription errors in two different transcriptions from the same setting of type would be far more time-consuming (and error-prone) than the task of comparing page images from two copies of the same setting of type.

Below is a list of electronic collation tools known to me, and I offer brief observations on purpose, cost, appropriate computing environment, function, and ease of use for those with which I’ve worked to any extent. If I have not used the tool, I offer brief observations based on the documentation about the tool’s intended purpose.

  • Microsoft Word–A widely available application, its Compare Documents feature could serve as a way to compare two transcriptions. But its native format is not suitable for scholarly editing. Its automatic correction features could introduce unexpected variations. Careful and sophisticated users can learn to control these variations, but an update to a new version of Word or a conversion to another program would require careful checking. The Compare Documents feature allows only rudimentary discrimination between significant and insignificant features, but it could be useful to compare an original and an altered version of the same file. The major inadequacy of Word is proprietary encoding, which has the potential to introduce unanticipated alterations, especially when versions of the software are updated. In my experience, the Compare Documents feature is too idiosyncratic to develop trust, especially as its terminology and interface standards can lead to confusion about which document is the original and which the comparison text.
  • UNIX DIFF–The DIFF utility for UNIX-compatible machines (including new Macintosh OS’s) is an excellent way to identify file variations. Don’t be afraid of the command line: you can easily compare two versions of a file. That useful function is not equivalent to identifying textual variants (because the tools that control the display of variants are rudimentary) but DIFF like Word’s Compare Documents feature can serve to confirm file alterations.
  • PC-CASE–This is the tool that I used for my dissertation. With minimal encoding (though you must observe the encoding guidelines), you can compare multiple electronic transcriptions. It is a quick and fast application, but it works only from a command-line interface (MS-DOS). One can encode manuscripts (additions, deletions, replacements), prepare lists of manuscript alterations, conflate two versions to list alterations, sort alterations according to whatever criteria suits, and generate sorted lists of variants. Peter Shillingsburg has said the software can be made freely available. If you’d like a copy, send me an email (see the Kent State email search screen, or request a copy in a comment to this post). My copy now includes a PDF manual. If you are not put off by the command line, the requirement for 8-3 file names, the rigid requirements for consistency when encoding textual features, and not infrequent head-scratching as you try to decipher what encoding error is causing the collation to fail, expect still that you will need a few weeks of practice to develop confidence. Your text files may not be saved using the UNICODE standard (which may be set by default in your plain text editor). You are limited to the older ANSI standard. It is a 16-bit application. While it can run in a 32-bit OS (2000 and XP), the it will not run in the 64-bit versions of XP. I have not tested on Vista (and would appreciate it if someone with access could run a test for me).
  • MacCASE is a version of PC-CASE for the Macintosh computing systems. I understand that it only works in pre-UNIX version of Mac (or an emulation mode). But I need to ask Paul Eggert whether that is true. If you’re interested in MacCase, download it from the Australian Defense Force Academy site.
  • Collate–Susan Hockey in Electronic Texts in the Humanities described Peter Robinson’s COLLATE 2.0 (for Macintosh) as one of the most advanced collation systems available. I am not familiar with it, but my ignorance should disperse as I soon join a project that uses it. A version that supports XML is under development. Peter Robinson has an Anastasia and Collate blog on which development of CollateX is chronicled. This software must be purchased. It has been over two years since I attempted to learn how to use COLLATE, but a new user would expect to spend a weekend (or more) getting started and to devote weeks to achieve reasonable competence.
  • DV-COLL–the Donne Variorum textual collation program. A program designed for scholarly editions of poetry. I have not used it. But like PC-CASE it requires an application-dependent style of encoding. The encoding is embedded in the line, but it is simple to read (and presumably like CASE and COLLATE easy to adjust or alter for other systems with scripting languages like PERL or Ruby).
  • Juxta, from NINES, a tool for comparing texts with minimal encoding. This is a great tool for quick comparisons of texts that have been scanned and made available in OCR plain text. And the documentation is adequate. Its warning messages are sometimes cryptic. You can read a longer discussion of my experience with JUXTA in a previous post. Juxta is a great tool for preliminary analysis of texts prepared by library archives or scanning and OCR projects. If you’ve ever wondered about alterations between a serial and book version of a 19th-century story (and both have been scanned and converted to OCR, whether the OCR is subsequently thrown out in raw form or encoded in XML), JUXTA is a good choice for taking a first look. The developers of Juxta have suggested that scholarly projects produce raw text versions for Juxta comparison. Scholars should consider this option carefully as the interface permits an intuitive method for displaying textual alterations.
  • Versioning Machine–A tool for comparing two versions of documents that have been encoded using TEI. It is relatively easy to use (with basic skills in XML, i.e., if you can create a valid TEI text, you can compare two versions of a poem with relative ease). The V-Machine site provides extensive web-based documentation. Note on the documentation page that the XSLT stylesheet used by the Versioning Machine is in XSLT 1.0, so you must use Saxon 6.0.
  • Edition Production & Presentation Technology–An ambitious project for scholarly editing–freely available–which includes collation facilities among its tool. I have set up an account but do not have sufficient familiarity to speak knowingly about its abilities to produce collations. Prepare to spend a day to orient yourself to the application. I anticipate that achieving competence will take weeks.
  • TUSTEP–Wilhelm Ott’s application is an ambitious and (as I understand from Susan Hockey’s book) complete tool for scholarly editing, whether print or digital. Its beauty is a system by which changes in the source files are automatically propagated throughout collations and later stages of analysis and production. To use TUSTEP is an expensive undertaking (in time and money) for individual scholars working independently from institutions with site licenses. A Windows version and a Linux version are available. For anyone with funding for license and training, basic fluency in German, and longterm funding for a project, a month (or a few months) for exploring this route should be seriously considered. For scholarly editors working on large-scale editions in Germany, it seems like the first option.

While the description of sight-based collation methods and tools should be fairly reliable, the list of electronic tools (in its current draft from) is not intended to be comprehensive, and I welcome additions or corrections. This quick overview was developed in response to a request. I use “quick” in the negative sense–this post was prepared quickly. It is not “quick” in the sense of short and sweet, for which I apologize. Had I more time, the post would have been shorter (and documented better). I hope to expand it over time. May this little post be useful to some in its current state.

UPDATE: 5 April 2009, revised in response to Bethany Nowviskie’s comment.

July 24, 2008

XML and Adding Information

Filed under: Uncategorized — wraabe @ 8:41 pm

In the WWP seminar (see previous post), the presentation includes a slide on information gain [IE does not support this page, so use FireFox], one to which I’d like to add a small caveat.

I’d like to note here that the metaphor of “information” (by the line drawing) is applied also to the source. While source documents include features that can be modeled as information, the supposed equivalence of source and information–though a useful abstraction–participates in a rhetoric in which information is more valuable than the material objects that are its source. The process of information gain–we must remind ourselves–includes information loss. Some aspects of the original material object are abstracted, and some aspects of its information content are lost–aspects which editors and directors of conscientious projects will seek to specify.

My concern is not with the areas of information loss of which we’re conscious, it’s the areas of which we are semi-conscious at best. Through the process of transcribing multiple versions of Uncle Tom’s Cabin, I became consciously aware that the transmission of type space in reprint documents could have meaning. While I have no quarrel with another project’s decision to not represent differences in type space in prose documents–for me the sense of its importance was a comparatively recent discovery, and I doubt even were I concerned that I would choose to encode every aspect of the spatial features of prose–I would submit that the informational content on this matter (implicit in the original) would be discarded in most transcriptions. This is an old saw, of course, so all together now: every act of representation includes loss.

Nothing demands that the information in original not transcribed and not encoded remain irretrievably lost–so long as the original and other materials remain accessible, although past history (i.e., microfilm) has shown that originals are discarded when acts of representation are presumed to preserve information content–but let us resist the shorthand version, in which the process of conversion is always a gain. The informational content of a transcription is of a different order than the informational content of an original document. I hope this is a claim that is not necessary, but my instinct is that it’s a trap that one must continually resist falling.

Kudos to Syd Bauman’ and Julia Flanders’ TEI Course

Filed under: Uncategorized — wraabe @ 3:44 am

The Women Writers’ Project course on TEI encoding is well worth the attention of any scholar who is already involved in digital humanities or who is considering taking a leap into a project that includes encoding a text. While it’s not for everyone–I know some serious coding junkies–Syd and Julia succeed at casting a wide net.

If you have background (you’ve gone beyond handholding baby steps and wrangled a few texts on your own, you’ve drafted portions of an XSLT style sheet, and you’ve been involved at a digital project), the introductory sessions will be a bit slow. But the more advanced sessions would probably be helpful to even seasoned hands on electronic projects. Unless you’re interested in text encoding as an intellectual exercise of its own–and thus you devour guidelines with relish–work on a project can give you a limited sense of the coding possibilities available. It is easy to repeat what you already know how to do rather than venturing out.

This course encourages ventures. The emphasis on oXygen has also been useful. Although I’ve known oXygen to be around, I’ve always been annoyed by its interface. I’m not going to spend time learning an annoying interface when something needs to be done, so the sustained use of it for the course has been helpful. I had learned to encode XML using NoteTab clip libraries and tool directory (part of Rare Book School). While I won’t discard NoteTab (because I like much of its simplicity and quick loading), I’m rapidly becoming agnostic about the tool used for encoding. oXygen is a robust application, one that I expect would appeal to undergraduates familiar with commercial software.

I on the other hand was more pleased by the use of CSS to display encoding. I had not thought of that because I had always forced myself to write XSLT style sheets for the simple as well as complex tasks. So the course has given perspective on the alternation of tools. One caveat. External entity references ruin the simplicity of reviewing CSS display of XML in the Firefox browser. It has been a few years since my last serious efforts at writing ambitious XSLT style sheets, but I think a refresher is coming up in tomorrow’s sessions.

The class is highly recommended. If you have not done so already–and you have the opportunity–sign up.

July 23, 2008

Apostrophes, Single Quotes and TEI entities

Filed under: Uncategorized — wraabe @ 8:45 pm

Syd Bauman has again rescued me from faulty encoding, an encoding issue that dates from the preparation of my dissertation edition. I used a workaround to resolve an issue of display, but it turns out that an alternate fix (a better one) is available.

The problem revolved around what I consider to be a necessary distinction between closing single quotes and apostrophes. The logical distinction between closing quotes and apostrophes (one closes quoted speech, one usually indicates missing letters) relies on a reader’s basic conceptual distinction between identical type characters.

One crux of the issue can be found in the UNICODE standards. The curly right single quotation mark (Unicode Character 2019) portion of standard states that “this is the preferred character to use for apostrophe.” Likewise, the straight apostrophe section states that Unicode Character 2019 is “preferred for apostrophe.” The standard is clear–and consistent. We will pass over our aggravation that apos as apos will remain straight and recognize that it’s an historical problem not solved by someone of my interest.

In my dissertation edition, I created two entity references in my DTD to handle these. The apos entity resolved to 2019. The rsquo entity also resolved to 2019. But the display never worked. While the rsquo always displayed as a curly quote, the apos always displayed as a straight quote.

My workaround included replacing apos entities with character code sequences within my document. That workaround fixed the problem of display, but a more simple issue is at work.

The apos entity is reserved in XML. So the parser always resolved to the default display (the straight quote) rather than my DTD definition for the apos (the curly quote). The solution is to NOT use the apos entity for curly apostrophes. Instead, I need to choose an entity that is not reserved. So I chose “apost”. Now the parser will resolve apost with 2019, the curly apostrophe that is identical to the closing single quote.

Schema and DTD Entity References: You Can Have it All

Filed under: Uncategorized — wraabe @ 4:02 pm

Syd Bauman’s enthusiasm for the RELAX.NG schema is infectious. So I decided to use the schema instead of the P5 DTD I prepared in Roma application. He has almost convinced me that I can discard my DTD and use only a schema.

But I’m really partial to my entity references (which are not supported in schemas). So he told me that I can have it all, and I now have a way to include DTD entity references with my schema. The code within TEI document is as follows:

A small external DTD (UTC_Eds_Ents.dtd) need only to provide the list of entities. Or the entity reference portion of the DTD can be embedded within the document.

I’m very happy at the moment, but I’m slightly worried that Syd is about to convince me to discard entity references completely, in favor of encoded quotations with render attributes. Naah. That’ll never happen, but I’ll explain why in another post.

Theme: Rubric. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.