Blog Events

Visiting Scholar Recap

Tweets and pics from events by our first “Visiting Scholar in Digital Humanities.”


Blog Doing DH

Producing and Consuming Digital History

At the American Historical Association’s annual meeting in early January, digital humanities and digital history projects were some prominent buzzwords around the Marriott and Hilton Hotels in the form of panels, presentations and small conversations. It was a “nice to meet you” conversation that stuck with me the most, though. It went like this:


Serenity: Hi, I work on a digital humanities project at the University of Rochester.

Stranger: Oh? What project do you work on?
Serenity: I’m digitizing the papers of William Henry Seward’s family communication. We’re transcribing, annotating and editing thousands of letters from the 1830s to 1870s with the goal of displaying them on a website.

Stranger (skeptical): Are digitizing projects considered digital humanities?

Serenity (with slight indignation): Why, yes! It’s making accessible documents to a wider audience via a digital platform.

Stranger: When I use the website am I doing the digital humanities?

Serenity: ….ummm….


I eventually acceded in my teeniest, tiniest voice that “yes” the stranger was doing digital humanities when using the website. But in the back of my mind I wanted to protest that “NO!” there was a difference between consuming easily all of the legwork it took to actually get that content on the website: the wrangling of data, images, tagging in tei, three versions of editing, annotation, and transcription – all very hard work. Those of us on the project were hard at work producing the work of digital humanities. The end-user was merely consuming our digital humanities work.


To make sense of my responses to these questions of when one was “doing” digital humanities, I returned to Patrik Svensson’s article “Beyond the Big Tent” in Debates in the Digital Humanities (2012, digital edition). Basing his critique off of the 2011 Stanford digital humanities conference theme “Big Tent Digital Humanities,” Svensson concludes that asking who’s in and who’s out, or what DH project are or aren’t included in the Big Tent of the Digital Humanities, may not be the best way to visualize what we are doing and what’s at stake in DH. Svensson argues instead for a concept of a “trading zone” or “meeting place” to symbolize the types of work digital humanists do. I’ll discuss the usefulness of these terms later on, but first I want to unpack my “slightly indignant” response in the exchange above.


My colleague’s first question of “Are digitizing projects digital humanities?” stems from my sense that many hard-core DH-ers see the digitization of archive materials as simply “baby” projects that fail to adequately reflect the level of tech that DH can and perhaps should do. This is part and parcel of the Big Tent philosophy where one method of exclusion of who’s inside the flaps of the tent and who’s out, peeking through the cracks of the tent, is based on the tools one uses. As Svensson points out:

While tools themselves can be epistemologically predisposed, it could be argued that placing tools and tool-related methodology at the base of digital humanities work implies a particular view of the field and, within big-tent digital humanities, possibly an exclusive stance. One central question is whether the tent can naturally be taken to include critical work construing the digital as an object of inquiry rather than as a tool.


With this in mind, I’m inclined to lean towards Svensson’s portrayal of DH as either a “trading zone” or “meeting place,” but then these have implications for the question of consumer and producer, the origin of my second hesitation during my conversation at the AHA.


Trading zone implies that groups of people are present to do business and make exchanges. These exchanges are built on producer and consumer models where one person brings their items to trade for another items: ten pomegranates for one knitted hat. In this exchange there are given producers and consumers. And all parties likely had to “produce” something, be it labor for money or the actual product itself and in this case, all members of the trading zone are producers and consumers in one way or the other.


Meeting places, on the other hand, send a more democratic or egalitarian message. Meeting places are where people come to be heard and to be listened to. I’m not sure that meeting places eliminates the producer/consumer question, though, as I’ve surely attended meetings intending to contribute very little, only to listen and learn. In this situation, I’m entirely consuming the meeting’s content, perhaps for my own future production of work, but perhaps not. I’ve also led meetings where I do much of the talking while participants scribble down notes to take with them, consuming what I say to put into use elsewhere, or else, to fall into a recycling bin a few months later as they clean out their desks.


So while Svensson’s retooling of the Big Tent into meeting place or trading zone alleviates the question of who’s in and who’s out, the metaphors may be too limited for helping me make sense of the consumer/producer of digital humanities.


To better understand this idea of consumer and producer in an academic context, I look at one of the West’s most cherished forms of academic communication: the book. When scholars produce books, they spend a great deal of time researching and thinking about their topic, not to mention agonizing late nights writing and communicating their thoughts clearly. Just because I read a book on North American sparrows, obviously, does not make me a scholar on sparrows or any other kind of bird. However, reading many books, doing a good amount of field research, and attending classes and conferences on ornithology does. Someone who reads a book on the reign of Queen Elizabeth I would hardly be qualified as “doing” monarchical history solely by reading the book. In academia, one only validly “does” monarchical history by reading many texts on kings and queens, and producing research that contributes to the current conversations about the topic.

All this to say, that in my mind, the question of who “does” DH depends on the types of questions we employ to make sense of a digital resource. Digital Humanities projects are like other source material: how we use them depends on the questions we ask and the furthering of knowledge that results. Scholars doing digital history may have a variety of questions ranging from how the digital informs our reading of the manuscripts to where issues of gender, medical, family, political and social history find resonance by searching the database with keywords. Using a digital resource to answer these questions is more a question of “doing” intellectual work in general, rather than a question of “doing” the digital. And this, for me, is where John Unsworth’s concept of “Scholarly Primitives” is useful. Good scholars are both producing and consuming knowledge through “Discovering, Annotating, Comparing, Referring, Sampling, Illustrating and Representing”


I do think the question of production vs consumption in DH is one that requires further unpacking and really parallels, if not entwines, questions of labor and reward for work. Especially in the example above, I balked at including the consumer as “doing” DH when I thought of the hard, hard work organizing, transcribing, annotating, editing and tei-ing the Seward letters. It struck me rather unfairly that that the consumer’s labor of “doing” DH was not the same as my labor “doing” it. The question of labor is not limited to producer/consumer, though. For many DH projects I know of, an extensive amount of library support is required for any of these projects to get off the ground. Programmers, librarians, support staff, (not to mention shamelessly exploited undergrads and grads!) all do a good amount of work to help academic humanists produce fancy projects to begin with. Where is their recognition and pay-off for such labor? Is it fairly equitable with the principle investigators and academics who will go on and use the project to build their CVs, argue for tenure, raises, or other recognitions? Is an hourly, somewhat livable, wage recognition enough for the support staff that assists our technically heavy projects? Where do “digital” acknowledgment sections get us?


Perhaps the next blog post?


Serenity Sutherland is a PhD student in the Department of History at University of Rochester.


Blog Meta DH

DH Makes Explicit

“I’m still looking for that nugget, that thing I can take away from DH and say ‘here’s the contribution; this is how it relates to someone like me.'”

After a digitally inclined guest lecture on campus last week, a fellow grad student pressed me with this basic question on what Digital Humanities brings to the scholarly table. I understood her mild bewilderment. For those who are not technologically inclined, DH in practice can seem like a heap of techno babble and gimmickry haphazardly tossed over, at best, quasi-scholalry inquiry. Likewise, for those are not humanistically inclined, DH in practice can seem like a misguided use of technological equipment and computational methods.

It’s difficult to even label these reactions as misconceptions. The [intellectual, monetary] hype surrounding DH unfortunately tends to connote lofty ideals, revolutionary ontologies, the latest tools, as well as unnerving intimations of “progress” and the future of the humanities. Furthermore, Matthew Kirschenbaum, on several occasions, has reminded readers about the formation of the term “digital humanities” and its specific relationship with “marketing and uptake” (1) and its current use as a “tactical” term to “get things done” in the academy, be it obtain funding or promote a career (2).

In short, there’s a need for both reconciliation and promotion of the term’s more meaningful usages, particular as a label that describes new practices in the humanities for curious and skeptical onlookers alike. If DH is to be inclusive, its practitioners should take care to articulate clearly their goals and methods to colleagues in all humanities disciplines, not only those who are digitally literate. If DH is to be an advocate for the humanities to the public–as Alan Liu thinks they can be (3)–clear articulation becomes more important still.

DH Makes Explicit

In a 2013 interview, Johanna Drucker recollected that the “mantra of 1990s Digital Humanities” required “making everything explicit we, as humanists, have long left implicit” (4). Her comments refer to the logic of programming–“coding” as a structure for inquiry–but this sentiment also offers an attractive and powerful model for an inclusive DH, and for its full partnership in the humanities in general.

Quickly, let’s apply that framework to a variety of examples:

Screenshot of XML markup.
While hierarchical markup languages like XML make texts machine-readable, their use first requires that textual scholars consistently analyze and describe their texts’ discrete physical characteristics.
  • Textual Studies: TEI, markup, editing. Like its analog counterpart, digital editing requires its practitioners to throw into relief bibliographic data embedded in physical texts. Markup languages like XML require the attribution of values to textual data. In a simplistic view, explaining a text to a computer requires us to explain it first to ourselves. (We’re having quite a time with William Blake’s Four Zoas manuscript over in the Blake Archive.)
  • Literary History: or, the Moretti movement. When I first read Moretti’s now-mandatory Graphs, Maps, Trees (5), I found the middle “Maps” section to be the least provocative. Perhaps that initial reading holds up, but only because the methods described are also the most familiar. When asking “Do maps add anything to our knowledge of literature?”, Moretti illustrates the “centric composition” of Mary Mitford’s Our Village. The map is not an answer to anything, but rather evidence of a narrative feature that requires explanation. In other words, the map makes explicit what our brains are already doing in constructing the narrative.
  • Collaboration: DH in practice, in theory. At a recent inter-departmental panel on “Evaluating Digital Projects as Scholarship,” I was stunned to see a senior faculty member cling so tightly to the image of the isolated scholar, the sole author. Yet the incident is also evidence of the disarming effectiveness of DH to explicate, even exaggerate, the collaborative nature of scholarly inquiry, of “work,” of language. While monograph production still remains the normative argumentative structure in the humanities, DH has the ability to critique these modes of production through a kind of processional remediation. In other words, while DH often remediates a variety of texts, it also remediates a variety of roles in the production of those texts and the production of knowledge. Publishers and editors give way to IT directors and programmers; grad seminars give way to graduate research assistantships. This collaborative stance also makes DH an exceptionally natural partner for critical theorists in a variety of backgrounds, whether poststructuralists, McGannian editors, or feminists. The “digital” is so inherently problematic for hegemonic, centralized [hermeneutic] authority that its position as a polemic is only limited by its increasing permeance as common practice. New endeavors like “open peer review,” crowdsourcing, and collaborative authorship represent only the tip of the iceberg.
  • The University: departments and disciplines. The idea of “interdisciplinary study” has long been used as shorthand for “diverse research interests,” but how diverse has it usually been? Maybe an English prof who crosses the quad to visit the history department. DH has proven to be an effective identifier of false boundaries within the university structure, particularly the “big one” between sciences and the humanities. Increasingly common vocabularies and technologies have made it possible for humanists to approach the sciences, and vice versa, with informed critical perspectives. It’s happening at the undergraduate level, too, with examples like Stanford’s new CS+English dual major or U of R’s newly revised Digital Media Studies major.

Where Have All the Computers Gone?

For the most part, computer technology is de-emphasized in this outward-facing characterization of DH, and yet it’s because of this de-emphasis that I believe this strategy to be the most advantageous for communicating with peers–and the public–outside of our communities of digital scholars. Coding and building are important for practitioners of DH, but the mere use of technology can’t be why DH is important for the humanities. Instead, when we use DH to “make explicit,” we appeal to a common method of all critical inquiry: to identify and articulate underlying ideological operations, whether they exist in cultural structures, like gender, or cultural artifacts, like literature.

DH’s unique contribution, then, comes with the specific manifestation of this “classic” line of inquiry through new technologies that help us ask, ideally, better questions. And, as we can see with even a cursory list of examples, it’s not simply the “products” of DH that make explicit, but the practice as well.


Eric Loy is a PhD student in the Dept. of English at the University of Rochester.


1. Kirschenbaum, Matthew. “What is Digital Humanities and What’s It Doing in English Departments?” Debates in the Digital Humanities. Ed. Matthew K. Gold. Minneapolis: GC CUNY, 2013. Web. <>

2. Kirschenbaum, Matthew. “Digital Humanities As/Is a Tactical Term.” Debates in the Digital Humanities. Ed. Matthew K. Gold. Minneapolis: GC CUNY, 2013. Web. <>

3. Liu, Alan. “Where is Cultural Criticism in the Digital Humanities?” Debates in the Digital Humanities. Ed. Matthew K. Gold. Minneapolis: GC CUNY, 2013. Web. <>

4. Berdan, Jennifer. “The Emerging Field of Digital Humanities: An Interview with Johanna Drucker.” InterActions: UCLA Journal of Education and Information Studies 9.2 (2013). Web. <>

5. Moretti, Franco. Graphs, Maps, Trees: Abstract Models for Literary History. New York: Verso, 2007. Print.

Blog Teaching

Loving Dido

ENG 396 ‘Loving Dido’ is an Honors seminar structured around Virgil’s story of Dido in the Aeneid, and its long literary and cultural afterlife. Students are provided with readings from over two millennia of commentaries, poetry, novels – and viewings of 12 different operatic productions – and discuss the material every week in the context of the original text. There is also a digital component to the course. Professor Tom Hahn believes that a non-traditional course like ‘Loving Dido’, which considers vast arrays of unsorted data on a single subject, is ideally placed to make use of digital tools like content management systems, and the opportunities therein for establishing links and networks between individual items. The omeka-based website run by the class contains nearly 2000 images and video footage inspired by the Dido story. Every week, students post commentaries on selected images, and argue for relationships between the disparate items which moves beyond historical or archival principles. As well as being useful pedagogical tools, such websites promise remarkable possibilities for exploring how humanities material transforms its audience, and is itself transformed, through digital resources.

Blog Research

Blake Archive Forever

As Mellon Fellows, Eitan, Serenity, Chris, and I have been semi-strategically embedded into a few faculty research projects that feature strong digital characteristics. We’re there to assist and learn as much as possible.

Since I had already attached myself to the William Blake Archive when I first arrived at UR last year, it was decided that I would continue with Blake and take on more challenging projects.

In one respect, working with the Blake Archive is a considerably different endeavor than working with the other affiliated projects because, well, it’s been around forever (in DH-years). As a landmark editorial project first conceptualized in the early ‘90s–the first digital edition to receive MLA’s “CSE Approved Edition” seal in 2005–the Blake Archive has been subsequently scrutinized as a case study in countless theoretical and pragmatic contexts.

Blog Research

Digital Mapping and the “Sense of Place”

Map overlay

The fetid musk of South Side slaughterhouses, the eclectic sprawl of Dublin, the muck of the Everglades: these sensual ambiences enwrap readers of The Jungle, Ulysses, and Their Eyes Were Watching God. Between those pages, space and atmosphere seems to “thicken, take on flesh,” as Mikhail Bakhtin wrote. These novels are exemplars, of course, but in general we don’t hesitate to label great fiction “immersive”; prose, at its best, can produce a powerful corporeal experience as well as a cognitive one. Why are we are reluctant to believe that historiography could do the same?

Historical research, we presume, benefits from coolness, neutrality, and critical distance. But the appeal to a sense of place, not just describing but making palpable distant or bygone scenery in all its spatial and social complexity, is not the responsibility of novelists alone. Reenactors, cultural preservationists, and open-air museum curators have demonstrated for more than a century that interactive history has not only entertainment value but also real heuristic potential, and it’s refreshing to work among academic historians eager to enrich historical narrative on—and beyond—the printed page.

For digital humanists working on histories of space and place, representing practice appears to be the current frontier of the technologically possible. Practice is French Marxist geographer and cultural critic’s Henri Lefebvre’s term, one of three composing the iconic “spatial triad” he unveils in The Production of Space. By practice, he refers not to the perceivable patterns and physical structures that demarcate our lived environments—Chicago’s elegant gridiron, for example, or the boggling angles and inclines of a suburban parking garage—but rather to the everyday activities that inform and shape our experience of space. Out of the mute fabric of open terrain, we sew complexly textured quilts of public and private meaning sensible only to us; memory and affect attach themselves to familiar sites and await their resuscitation each time we draw near.

Tangible patterns and structures are, of course, rather easily reproducible in virtual space. Many digital humanities projects succeed in generating multilayer, customizable, information-dense, yet highly legible maps that show, for example, patterns of German-Jewish emigration or mafia territory during Prohibition. These interactive diagrams are inarguably useful, and can provide necessary context and a sense of scale to otherwise dry historical narratives. But experience and memory remain notoriously hard to incorporate into digital interfaces. The challenge today is to push digital mapping technologies (also known as Geographic Information Systems, or GIS) beyond the ontic limitations of the map, before the map “pushes us back,” as Lefebvre predicted “towards a purely descriptive understanding” of history.

As an Andrew W. Mellon Fellow in Digital Humanities, I have the good fortune to work with Dr. Michael Jarvis, a historian at the University of Rochester specializing in the Atlantic maritime world, in particular the cultural and geopolitical role played by Bermuda during the eighteenth and nineteenth century. Virtual St. George’s, his ambitious, year-old digital history project, makes use of multiple mediums and platforms—architectural rendering, digital cartography, drone photography, 3-D scanning—in an effort to electronically, interactively, immersively reconstruct space-as-experienced and life-as-lived across multiple eras in St. George’s, the colonial capital of the mid-Atlantic island. Jarvis summarizes the project’s objective best:

The project’s various historicized 3D townscapes will help visitors visualize how St. George’s evolved through adaptations to environmental change, world events, fluctuating global markets, local demographic shifts and architectural influences. Engagement can vary from particular exploration of individual building interiors using probate inventories (like a virtual house museum in the style of Colonial Williamsburg, Sturbridge Village, Greenfield Village) to an open-ended urban exploration of the town’s docks, warehouses, and streets filled with animated St. Georgian avatars. We plan ultimately to incorporate game-play missions (such as delivering letters to a royal governor, haggling with a ship captain or merchant, aiding an enslaved sailor to escape) to engage users of different ages in order to give direction and purpose to their spatial explorations, teach social science skills, and represent historical realities.


I see Virtual St. George’s as more than an opportunity to experiment with historical storytelling methods, and to spark conversation about the potentials—and practical limits—of the virtual sensorium. It will also model for the interoperability of multiple DH platforms that are now primarily used in isolation, and demonstrate the value of the virtual to preservationist efforts. As the Virtual St. George’s graduate assistant, I’ll be blogging here in the future about the project’s progress, as well as about the intersections and interstices of digital history, video games, virtual reality technology (e.g., Oculus Rift), drone photography, critical theory, and phenomenology.

Eitan Freedenberg is an Andrew W. Mellon Fellow in Digital Humanities and a PhD student in the Graduate Program in Visual and Cultural Studies at the University of Rochester.

Blog Research

Prime Time: Diving into TV Guide


In its first semester, Televisual Time encountered some of problems that face many DH projects, specifically around securing a data set; after all, the time-sensitivity of TV Guide epitomizes the ephemerality of the weekly magazine. Case in point: we procured the first few decades on microfilm, but they were reproduced at such a small scale—up to 4 pages vertically per 35mm reel—that they were difficult for us to read, let alone a computer. The next stop, both oddly and predictably, was eBay, where we procured a selection of issues from each decade at random. Our next step was to scan these issues and submit them to OCR, a task that has proved to have its own complications on account of typeface, symbol usage, and the presence of advertising, to name only some issues. 

1998 grid

To the extent that preparing these files for digital analysis remains very much a work in progress, this semester as well as last, our work so far has taken a different page from Williams’ work, which is that of distribution. In his 1973 analysis, Williams worked with a selection of categories: News, Documentaries, Education, Arts and Music, Children’s Programs, Drama, Movies, General Entertainment, Sport, Religion, Publicity, and Commercials. He doesn’t say where his “conventional” categories come from, but for us, TV Guide’s evolving categorization of programming offered a fairly straight-forward mode of reading. In view of our interest in time, we calculated the general distribution of programming, according to contemporary TV Guide categories, in one issue from each decade, and constructed (admittedly rudimentary) charts to display our results.

genre 1953 genre 1966 genre 1977 genre 1985 genre 1998 genre 2001

I say “fairly straight-forward” because even this is not entirely so. Throughout its decades-long run, TV Guide’s categorization of shows is inconsistent and incomplete: not every show gets a category. Upon cursory examination, it seems possible that categories were more often applied to less popular shows—or perhaps local ones—and left off for well-known shows, putting a premium on “culturally relevant” information rather than comprehensive detail. In the 2000s, the magazine has stopped providing genres at all except for movies, the genres for which are distinctly fewer in number.

comedy time

That being said, there are some interesting trends to note that we hope to explore through further analysis. As one example, the chart in the 1980s shows a dramatic uptick in comedies, a development we hypothesize owes to the expansion of cable and the concomitant increase of re-runs. Comedies, as serial shows, are perhaps the most easily syndicated, since they can continue to attract new audiences every week. But whether this is the cause or isn’t—a question still worth exploring—we can also ask how this distribution allows us to think about television’s structuring of time in this period. Thus, another way of thinking about the distribution of comedies is to calculate them in minutes: how long, with the aid of recording technology, one could spend watching all of the comedies on air in a given week. For 1998, this number—13,560 minutes—far exceeds the total number of minutes in a week, being 10,080. It’s an odd comparison, but striking nonetheless.


As Televisual Time develops, we hope that distant reading will bring new insight to these kinds of qualitative questions and, in turn, to a different way of looking at television’s changing presence over time.

Tracy Stuber is a PhD student in Visual and Cultural Studies at the University of Rochester. She is a 2015-2017 Andrew W. Mellon Fellow in the Digital Humanities.

Blog Research

Research Report: The Family We Thought We Knew

How Digital Annotations Challenge Historians’ Assumptions

Blog Research

Learning As We Go Along

DH projects can be fickle beasts. Of course, any sort of research can be unpredictable. Progress often comes in fits and starts and the path forward is rarely clear. Unforeseen obstacles are part of game. But the application of digital methods to humanistic questions adds a twist. Digital tools have a life all their own and sometimes things don’t go as planned.

I encountered this personally when working with Dr. Joel Burges on his Televisual Time project. The aim of the project is to employ digital, distant reading techniques to TV Guide Magazine in order to discover how time is structured in and through televisual experience. Like many digital projects, the first step was to build a data set.

Two possibilities for building the data set presented themselves. The first was to transcribe the content by hand, an inconvenient and time-consuming prospect. Thus, since hand transcription wasn’t a live option, a second course was chosen. This involved scanning paper copies of TV Guide and employing Optical Character Recognition (OCR) technology to generate searchable text from the images.
Tracy Stuber, who preceeded me as research assistant for the TV Guide project, started the process. She scanned a selection of TV Guide issues and made some first attempts at running OCR software on the images.

But the task of generating a searchable text version of the scans proved surprisingly difficult. Tracy made the first attempt to run OCR software on the scans using the OCR built into Adobe Reader. But the results were mixed. Take for example, page 5 from the May 1-7 1953 edition.

Notice the layout of the page. The stylized title of the article, the multiple columns of text, the images laid out with captions. These visual features, while very familiar to the modern, human eye, proved to be only somewhat readable by a machine. Here is what Adobe’s OCR technology delivered when run on this page:

old favorites
take heart
ed mack’s triumphant return to
television with his Original Amateur
Hour, largely in answer to the demands
of loyal viewers, has raised
hopes of scores of other former favorites
now absent from video screens.
When Mack’s new show was scheduled
for the NBC network, it touched
off speculation that many other shows,
formerly very popular, .were on the
way back.
In Mack’s case, thousands of letters,

There is a lot that worked well here. We have some strange artifacts like the hard break after “T” and the extra period added between “popular” and “were” in the third-to-last line, but the text is largely in good order. But notice that the image captions were completely omitted. The program didn’t even register these at all. A puzzling outcome, possibly due to the lay out of the captions. But this was a harbinger of things to come. For although the OCR seemed to work fairly well on articles like this one, the core of TV Guide Magazine is the schedule. And here things got worse. Take for example, this page from the same issue.

Here’s a close-up of the top of the page.

Note the elaborate visual structure. The top of the page includes captioned images to feature certain programs. The schedule itself is laid out in such a way as to make things easy to understand, but hard for a machine to process. For instance, we have a column indicating the time, a column indicating the channel, and a column listing the program with a description. But the times aren’t repeated for each show. And not every channel airs a program at every time. This visual structure is completely lost on the OCR technology. Here is an excerpt of the Adobe OCR of this page.

4:30 P.M. (5) 5:30 P.M. (7) 6 P.M. (4) 6:15 P.M. (5) 7 P.M. (5)
4 "Goin' To Town"
MOVIE—Lum & Abner run the general
store at Pine Ridge & become
victims of a practical joke perpetrated
by a visiting oil promoter.
7 News With Ulmer Turner
5 Hawkins Falls—Serial Tale
Spec Bassett finds that the honeymoon
is over & so is his marriage.
7 Lucky 7 Ranch—Western Film
“Toll of the Desert”

Obviously, we’ve got some problems. The OCR read the times as a separate column but still separated them with hard returns. This completely divorces the times from the listings. It then included the names from the featured programs at the top, this time reading them all as one line while the times associated are, once again, separated from their targets. Things go a bit better when we consider the text of the programs. Here, at least, we get the full sentences, with proper capitalization, and the associated channel listing. But this information doesn’t do much help without connecting listings with their times.

At this point, I attempted a switch in technologies. Google produced an application called Tesseract which is now open source. I was hopeful that Tesseract might be able to improve upon things. Alas, it was not to be. Here’s Tesseract’s output on the same passage. Forgive the length, I wanted to portray the output as it was when it was generated.










4:30 P.M. (5)



("8 00 NM‘NUI

UthlUIAVOVUlhul .Q Who N


5:30 P.M. (7)

"Goin' To Town"

MOVIE—Lum & Abner run the gen-
eral store at Pine Ridge & become
victims of a practical joke perpe-
trated by a visiting oil promoter.

News With Ulmer Turner

Hawkins Falls—Serial Tale

Spec Bassett finds that the honey-
moon is over & so is his marriage.

lucky 7 Ranch—Western Film
“Toll of the Desert"

Here, we discover even more problems! And things only get worse when we run more recent issues through the OCR. Take this page from a 2001 edition of the magazine.

And here, in its entirety, is the text obtained from this page via one of the OCR runs:

48 I TV GUIDE Oakland Rebuild (8044/03)

That’s it. Just two paltry lines. And literally none of the information from the grid was even registered as characters, let alone correctly. Other pages with a similar structure produced semi-readable results, but Tesseract didn’t see what the human eye sees: the grid structure. So the text read straight across the line without consideration for the lines dividing the cells.

All of this led me to the inevitable conclusion. Our data set was not going to be created via the OCR technologies in their present state. And, given how much time hand entering the data would take, the prospect of obtaining a usable data set for a distant reading approach seemed a pipe dream. So Dr. Burges declared the experiment over and moved on to a different research method. And I walked away having learned about the limitations of OCR technologies, the complexity of visually displayed data in print formats, and the way that DH projects have a life of their own.