An OCR cliche: Into his/her anus

In order to improve the quality of my transcription of the 1879 edition of Uncle Tom’s Cabin, I collate the text against the Google Books version. Although the OCR is faulty, it is helpful for catching errors in the original. For example, on page 93 the book has “pertistent,” but while transcribing I unconsciously corrected it to “persistent.” Thanks to OCR, which is machine-dumb accurate, I was able to correct my error in transcription.

But one of the machine-dumb OCR errors is arresting. Stowe, who would use a cliche, does so when she writes of Eliza’s gathering up little Harry in her arms and does so again when she writes of Eva’s reunion with Mammy. Because the letters “rm” resemble “nu,” the OCR algorithm frequently mistakes them. It happens in two different editions of Uncle Tom’s Cabin. In the digital version of the 1879 edition, Eliza squeezes up little Henry “in her anus” (pg. 7). And in the 1962 Harvard edition, Eva on seeing Mammy throws herself “into her anus” (pg. 176).

On a quoted search of Google Books (6 March 2009), about five percent of the time, some one with a “child in her arms” (2360 results) ends up after OCR with a “child in her anus” (2360 to 115). The ratio for a child staying out of his anus and in his arms is a bit better (1434 to 36). Some may be legit–fairy tales and the like–but the vast majority are OCR errors. Below some selected gems:

Wives and Daughters: An Every-day Story‎ – Page 519
by Elizabeth Cleghorn Gaskell – 1890 – 637 pages
When the old servant opened the door, a lady with a child in her anus stood there. She gasped out her ready-prepared English sentence. …

The Atlantic Monthly‎ – Page 166
“… with the child in her anus, she followed her husband down-stairs, across the back-yard, hitting lier feet against stones and logs in the darkness, …”

The works of Daniel Defoe: with a memoir of his life and writings‎ – Page 46
by Daniel Defoe, William Hazlitt – 1840
But when the man who had the child in his anus, had been told by signs that this was the mother, he beckoned to have her come to him,

The Wesleyan-Methodist Magazine‎ – Page 433
“carried this child in his anus to Derry”

Waverley Novels: Library Edition‎ – Page 350
Walter Scott – 1853
“tne seeming parson took the child in his anus, and performed the ceremony of baptism,”

Dr. William Smith’s Dictionary of the Bible: Comprising Its Antiquities …‎ – Page 1824
by William George Smith, Horatio Balch Hackett, Ezra Abbot – Bible – 1888
Then she carried the child in her anus to her people; but they said that it was a strange thing she had done.

Romance writers, be forewarned, the child who leaps into her arms or the heroine who leaps into his has almost a five percent chance of ending up elsewhere, when GoogleBooks OCRs your text.

This entry was posted in Uncategorized. Bookmark the permalink.

4 Responses to An OCR cliche: Into his/her anus

  1. hlharrison says:

    That, my friend, is hilarious. Definitely worth filing away for discussions of OCR in digital projects. Seems that the scale of Google Books presents particular problems here as the wide-variety of texts presents doing any OCR training. Wonder if Jim Mussell ran into similar problems with the NCSE project?

  2. Let the flame of love bum ever higher.

  3. There is more in Google Books results:
    Brothers in anus: 30
    Take up anus: 469
    Threw her anus 285

  4. Pingback: Buck naked to butt naked, arms to anus, 19th century iPhones and other Google Ngram oddities

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s