Collation in Scholarly Editing: An Introduction

The term scholarly editing can be used as a catch-all term for a reprint of a culturally significant work with an introduction by a scholar. Such editions usually include, in addition to a reprinted text, an introduction that surveys scholarly opinion and a brief bibliography for further study. Ambitious editors may also provide explanatory annotation, glossaries of unfamiliar terms, historical notes, and an author time-line. While I have no quarrel with such work–and would observe that its value is often underestimated–I refer to scholarly editing as work in which the editor illuminates the variable forms in which the text has appeared and provides commentary that explains the significance of textual alterations.

When scholarly editing is used in the sense merely of a work by a scholar, the editor’s concern for the text chosen–if such editors are concerned and realize that texts are often printed in multiple forms–is often limited to choosing the right text to reproduce according to some standard (best manuscript, first printed, thoroughly revised by the author, best typography, most influential, etc.). The type of scholarly editing that is my subject, which has also been called critical editing, has as one of its primary emphases the arrangement and ordering of multiple texts to establish the relationship among them, to analyze and classify the variants, and (in many cases) to offer an authoritative text for study to other scholars. A characteristic activity of scholarly editors–from here forward I mean the term in the limited sense–is to compare and classify the forms in which the versions of the text have appeared: we collate.

Although in the study of manuscript culture one of the characteristic activities is to align parallel parts of a work–and this is a common definition of collation–I speak of collation in the more narrow sense of identifying differences between texts (i.e., after passages are already “aligned”). There are three methods to collate texts: 1) read two texts side by side and note the differences, 2) compare printed page images (by allowing your eyes to merge two page images, often with a device especially for that purpose); 3) transcribe and compare transcriptions with aid of a computer. The first method is probably as old as humane study in a Western tradition–though whether what the Alexandrian librarians did counts as “textual editing” is a debate that cannot be pursued here. Tanselle says yes, basically, but Greetham is hesitant to trace the tradition from Alexandria to today as a single discipline. The second method is a comparatively recent development (if you’re not too young to think of the 20th Century as “recent”). The process of comparing page images predates the arrival of computers in the humanities, but it is only suitable for comparing two documents that were printed from the same setting of type. If you doubt that two documents printed from the same setting of type would differ, you are wrong. All type, like flesh, is heir to ills, both accidental and deliberate. Pieces of type slip in their forme, corrections are made during printing runs (stop-press corrections), errors are noticed and corrected for a second printing, and authors choose to alter wording after a work has gone to print (sometimes in response to readers’ responses). Hand-set type, metal stereotype plates, and linotype slugs can be altered, and they are altered.

One reads two texts side by side and takes notes (method 1) only when you have no other choice. Scholars who do such work create monuments of diligence. I have reviewed collations prepared by scholars who did not have access to computers (E. Bruce Kirkham’s collation notes on Uncle Tom’s Cabin in Jewett edition and newspaper version are at Ball State). The dedication required for such a task boggles the mind: let us not underestimate how difficult (physically and mentally taxing) textual comparison was for a previous generation of scholars. A similarly impressive study is a Kansas doctoral dissertation that compared over 20 British edition of Uncle Tom’s Cabin. The scholar used microfilm. As far as I can tell, Harry Opperman (the scholar) never published anything after his 1972 dissertation. But he created a monument to diligence. If current scholars choose not to use electronic collation tools or page-image comparison tools, the side-by-side comparison is as difficult today as it was for scholars in the mid-20th century. One should not be uncharitable to work done with tremendous dedication and the finest tools available–humility ought to stop us–but the method of side-by-side reading and note-taking can be retired. It should (must) be retired because the accuracy of these older methods has been superseded by computer-based comparison of digital transcriptions. Methods 2 (compare page images) and 3 (compare digital transcriptions) remain important for all types of textual work that involve printed matter.

The second method, merging of side-by-side page images, requires two copies from the same setting of type. This process can be undertaken with the aid of an analog device–or without the aid of such devices. One focuses each eye on a version of a page image (or partial page images, to be more exact). If the two images differ (i.e., the setting of type differs, type has been damaged, or if the paper has a water stain), the viewer of two otherwise identical images experiences a 3-dimensional flickering effect at the site of difference, as the mind (one brain, two eyes) attempts to resolve the conflicting portions of the images. One does not just “experience” this as a matter of course–an acquired skill must be acquired with practice. The amount of practice depends on the device and the user.

Many devices are available, and I can only briefly summarize the ones with which I’m familiar here. First, there are the analog devices: Haley’s Comet, Lindstrand Comparator, Hinman Collator, McLeod Collator, and transparencies. I have used the Haley’s Comet with some difficulty. The intial setup–an hour or two of work with the mirrors and stands–can be accomplished with the aid of Carter Hailey’s instructions. The Comet is portable (box weighs about 25 pounds), and it could be carried with luggage. The Lindstrand Comparator is the easiest to use. It is a square wooden box with an opera glass-style eye-piece. Each eye-piece focuses on the same page in two different books. Most users with 5 minutes can achieve some success. Within an hour almost anyone can become accustomed to a Lindstrand. The Lindstrand is not portable in the Comet sense (though I suppose the box would fit in a Volkswagen Beetle). The Hinman is not merely non-portable–it is a hulking monstrosity (itself about the size of the old Beetle, propped upright). The two page images are super-imposed by a mechanical switch that alternates light and dark on each image, which are viewed through binocular eye-pieces. While Hinman may work for others, my poor vision without corrective lenses made the binocular eye-pieces unusable. I have not used the McLeod Collator, but I have heard (secondhand) that the Comet is derived from its design. Comparison of images (analog or digital) using transparencies is another possibility, but I have not yet found the method to be practical–the California Twain editors claim success with a version of this method. The mechanical collators (Hinman and Lindstrand particularly) can be hindered by large books. If a book is larger than Shakespeare’s First Folio, it won’t fit in the device. I was unable to benefit from such devices when I started working with newspapers.

Although these devices provide useful mechanical assistance for merging page images, all you actually need are two eyes, the mind, and a bit of training on how to collate with no device. You probably already know how to collate with two eyes. If while a child you indulged in the optical illusion of the floating finger, you have collated your fingers. The full method of machine-free collating is described as the “Human Hinman” in Joseph A. Dane’s Myth of Print Culture, pgs. 94-95. This method’s chief advantages are cheapness and portability. You need only two impressions from the same setting of type (or one actual impression and a photocopy of another impression). In response to an inquiry, Randall McLeod offered detailed instructions on this method of collation, which has been posted here.

The third method (computer-based comparison of transcriptions) has seen a growth of new tools, but side-by-side page image comparison cannot be discarded. Bethany Nowiskie in a description of a collation tool called Juxta has spoken of the challenges posed by “flashing lights” of the Hinman. Juxta is not a substitute for side-by-side page image comparison. Just as a medievalist who is concerned with manuscript exemplars would have no use for a technique for comparing type, a technique useful for comparing different transcriptions cannot supersede—it can only supplement—a technique for comparing typesettings. Barring the availability of a nearly perfect transcription, the process of correcting transcription errors in two different transcriptions from the same setting of type would be far more time-consuming (and error-prone) than the task of comparing page images from two copies of the same setting of type.

Below is a list of electronic collation tools known to me, and I offer brief observations on purpose, cost, appropriate computing environment, function, and ease of use for those with which I’ve worked to any extent. If I have not used the tool, I offer brief observations based on the documentation about the tool’s intended purpose.

  • Microsoft Word–A widely available application, its Compare Documents feature could serve as a way to compare two transcriptions. But its native format is not suitable for scholarly editing. Its automatic correction features could introduce unexpected variations. Careful and sophisticated users can learn to control these variations, but an update to a new version of Word or a conversion to another program would require careful checking. The Compare Documents feature allows only rudimentary discrimination between significant and insignificant features, but it could be useful to compare an original and an altered version of the same file. The major inadequacy of Word is proprietary encoding, which has the potential to introduce unanticipated alterations, especially when versions of the software are updated. The Compare Documents feature is too idiosyncratic to develop trust, especially as its terminology and interface standards can lead to confusion about which document is the original and which the comparison text.
  • UNIX DIFF–The DIFF utility for UNIX-compatible machines (including new Macintosh OS’s) is an excellent way to identify file variations. Don’t be afraid of the command line: you can easily compare two versions of a file. That useful function is not equivalent to identifying textual variants (because the tools that control the display of variants are rudimentary) but DIFF like Word’s Compare Documents feature can serve to confirm file alterations.
  • PC-CASE–This is the tool that I used for my dissertation. With minimal encoding (though you must observe the encoding guidelines), you can compare multiple electronic transcriptions. It is a quick and fast application, but it works only from a command-line interface (MS-DOS). One can encode manuscripts (additions, deletions, replacements), prepare lists of manuscript alterations, conflate two versions to list alterations, sort alterations according to whatever criteria suits, and generate sorted lists of variants. Peter Shillingsburg has said the software can be made freely available. If you’d like a copy, send me an email (see the Kent State email search screen, or request a copy in a comment to this post). My copy now includes a PDF manual. If you are not put off by the command line, the requirement for 8-3 file names, the rigid requirements for consistency when encoding textual features, and not infrequent head-scratching as you try to decipher what encoding error is causing the collation to fail, expect still that you will need a few weeks of practice to develop confidence. Your text files may not be saved using the UNICODE standard (which may be set by default in your plain text editor). You are limited to the older ANSI standard. It is a 16-bit application. While it can run in a 32-bit OS (2000 and XP), the it will not run in the 64-bit versions of XP. It also works on Vista 32-bit. And it will work on Windows 7 64-bit if you install it in the Win XP emulator, the virtual machine.
  • MacCASE is a version of PC-CASE for the Macintosh computing systems. I understand that it only works in pre-UNIX version of Mac (or an emulation mode). I believe MacCASE is now defunct. Formerly, it was available from the Australian Defense Force Academy [now dead link].
  • COLLATE–Susan Hockey in Electronic Texts in the Humanities described Peter Robinson’s COLLATE 2.0 (for Macintosh) as one of the most advanced collation systems available. A version that supports output to JSON, XML, and GraphVIZ is now available, and it has been released at A new user would expect to spend a weekend (or more) getting started and to devote several hours for several days to achieve reasonable competence. I have a post on installing CollateX and its dependencies to run on Macintosh OSX. See
  • DV-COLL–the Donne Variorum textual collation program. A program designed for scholarly editions of poetry. I have not used it. But like PC-CASE it requires an application-dependent style of encoding. The encoding is embedded in the line, but it is simple to read (and presumably like CASE and COLLATE easy to adjust or alter for other systems with scripting languages like PERL or Ruby).
  • Juxta, from NINES, a tool for comparing texts with minimal encoding. This is a great tool for quick comparisons of texts that have been scanned and made available in OCR plain text. And the documentation is adequate. Its warning messages are sometimes cryptic. You can read a longer discussion of my experience with JUXTA in a previous post. Juxta is a great tool for preliminary analysis of texts prepared by library archives or scanning and OCR projects. If you’ve ever wondered about alterations between a serial and book version of a 19th-century story (and both have been scanned and converted to OCR, whether the OCR is subsequently thrown out in raw form or encoded in XML), JUXTA is a good choice for taking a first look. The developers of Juxta have suggested that scholarly projects produce raw text versions for Juxta comparison. Scholars should consider this option carefully as the interface permits an intuitive method for displaying textual alterations.  October 2014 Update: The new and ongoing Juxta is a web-based client, which is more robust and interesting than the download (which is aging because of its dependence on old Java). Go to and set up an account.
  • Versioning Machine–A tool for comparing two versions of documents that have been encoded using TEI. It is relatively easy to use (with basic skills in XML, i.e., if you can create a valid TEI text, you can compare two versions of a poem with relative ease). The V-Machine site provides extensive web-based documentation. Note on the documentation page that the XSLT stylesheet used by the Versioning Machine is in XSLT 1.0, so you must use Saxon 6.0 (but it’s built into the project without your needing to bother with installation and setup).
  • Edition Production & Presentation Technology–An ambitious project for scholarly editing–freely available–which includes collation facilities among its tool. I have set up an account but do not have sufficient familiarity to speak knowingly about its abilities to produce collations. Prepare to spend a day to orient yourself to the application. I anticipate that achieving competence will take weeks. Warning: EPPT was at [dead link], and then at [dead link, referenced at CNI], but it may no longer be an active project (Sep. 2013).
  • TUSTEP–Wilhelm Ott’s application is an ambitious and (as I understand from Susan Hockey’s book) complete tool for scholarly editing, whether print or digital. Its beauty is a system by which changes in the source files are automatically propagated throughout collations and later stages of analysis and production. To use TUSTEP up until about 2012 was expensive undertaking (in time and money) for individual scholars working independently from institutions with site licenses. For anyone ready to tackle training, basic fluency in German, and longterm funding for a project, a month (or a few months) for exploring this route should be seriously considered. For scholarly editors working on large-scale editions with institutional support, it seems like the first option. You can now [February 2015] download it at

While the description of sight-based collation methods and tools should be fairly reliable, the list of electronic tools (in its current draft from) is not intended to be comprehensive, and I welcome additions or corrections. This quick overview was developed in response to a request. I use “quick” in the negative sense–this post was prepared quickly. It is not “quick” in the sense of short and sweet, for which I apologize. Had I more time, the post would have been shorter (and documented better). I hope to expand it over time. May this little post be useful to some in its current state.

UPDATE: 2 February 2015, Noted dead links for EPPT and MacCASE. Updated links for CollateX, TUSTEP, and Juxta Commons, the three projects that have remained in active development.

UPDATE: 11 September 2013, added that runs in Windows 7 (64 bit) under XP emulation environment, the virtual machine.

UPDATE: 5 April 2009, revised in response to Bethany Nowviskie’s comment.

About these ads
This entry was posted in Uncategorized. Bookmark the permalink.

12 Responses to Collation in Scholarly Editing: An Introduction

  1. You mention that PC-Case (which I had not heard about) can handle manuscript encodings. I wonder though what level of complexity this tool can handle. Is it suited for modern manuscripts that have intricate layers of revision such as, e.g., James Joyce’s?

  2. wraabe says:

    Dr. Van Mierlo,

    I have limited experience encoding manuscripts with PC-CASE (I have encoded a fair copy draft MS of an Alcott short story). And I would be inclined, for a text with multiple layers of manuscript revision, to choose TUSTEP or COLLATE (and maybe MacCASE). TUSTEP was used for Gabler’s Ulysses. Though I will need to check, I think PC-CASE is limited to a conflation of 10 versions. MacCASE may not have that limit. For study of revision, if you do not intend to prepare an edition, I would investigate JUXTA.


  3. Pingback: Harriet Beecher Stowe Revising Uncle Tom’s Cabin: Topsy in the Jewett Paperback « Fill His Head First with a Thousand Questions

  4. Hi, Wesley — this is an old post of yours, but I’ve just discovered it and wanted to write to say that I think you misread me.

    I’ll address the larger point last, but first want to say that I’m certainly not “dismissive” of the Hinman Collator — like you, I simply find it unusable for physical reasons (I feel like I’m going to have an epileptic fit!). I’m a Lindstrand girl, myself, and have spent many a day happily “hunkered down.” I’ve also made a couple of shameless knock-offs of the Haley’s Comet, one of which I drive around with in the trunk of my car.

    Which brings me to this: by taking your long-ago assessment of Juxta as “textual collation for dummies” as a compliment — and by stating that our goal was to open up the practice of collation to literary scholars who might be less likely, as a first foray, to locate and use a Hinman, Lindstrand, or other device — I was not suggesting that digital tools that operate only on transcriptions are a substitute for optical collation of page images.

    My point was that Juxta could be a gateway drug to a level of bibliographic engagement that has been on the wane in English departments. A piece of software that lets you easily pop in even plain text files and generate both manipulable visualizations and a scholarly apparatus that looks familiar from the book world is a pretty cool thing. Juxta, like all the other methods, analog and digital, has its pros and cons.

    I haven’t been actively involved with the project for more than a year, but I understand that work on it has continued and that there are a lot of new features you might be interested in. I do know that, from the start, we built Juxta to include the ability to display page images (manuscript or print) alongside transcriptions for exactly the reasons you cite.


    • wraabe says:


      Thanks for the note. Point taken. I’ll revise both posts. Due to my lack of mechanical skills, I have no Comet knock-off. I’ve taken to Randall McLeod’s device-less method of collating with crossed eyes. And I’ll take another closer look at Juxta.


  5. Pingback: scholarly editing blog « EdRLS

  6. Sophia says:

    I would want to collate about 20-25 verses of an old high german work, contained in 3 manuscripts…wich software for Windows can I use?? Please, Help me!

    • wraabe says:

      I have the equivalent of two semesters of introductory German, so I do not know the requirements for encoding Old High German. But I believe that any one of the following Windows applications could do it. You would need to test each one.

      JUXTA at (free, simple encoding, just plain text)

      Versioning Machine at (free, TEI encoding)

      TUSTEP at I’m quite confident that you must purchase a license. But if you can read German well, you will soon know more than I do.

      Wesley Raabe

      • Godfried Croenen says:

        I don’t think the Versioning Machine would be that useful for the situation you describe. It is a very useful tool, but it works with a text in which all the differences have already been encoded. So unless somebody else has done it already, you need to compare all the versions, and encode the differences in a single XML TEI file. The Versioning Machine will read this file, then reconstruct the different versions and will allow you to read them side by side and highlight the differences. What it will not do for you is to read multiple texts and create the single XML file.

  7. Sofia says:


  8. Sofia says:


  9. Pingback: Juxta Desktop, Juxta Commons | Center for Scholarly Communication & Digital Curation

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s