[D66] Literature is not Data: Against Digital Humanities
Antid Oto
protocosmos66 at gmail.com
Sun Oct 28 18:41:14 CET 2012
http://lareviewofbooks.org/article.php?type&id=1040&fulltext=1&media
Literature is not Data: Against Digital Humanities by Stephen Marche
October 28th, 2012 RESET - +
BIG DATA IS COMING for your books. It’s already come for everything
else. All human endeavor has by now generated its own monadic mass of
data, and through these vast accumulations of ciphers the robots now
endlessly scour for significance much the way cockroaches scour for
nutrition in the enormous bat dung piles hiding in Bornean caves. The
recent Automate This, a smart book with a stupid title, offers a
fascinatingly general look at the new algorithmic culture: 60 percent of
trades on the stock market today take place with virtually no human
oversight. Artificial intelligence has already changed health care and
pop music, baseball, electoral politics and several aspects of the law.
And now, as an afterthought to an afterthought, the algorithms have
arrived at literature, like an army which, having conquered Italy, turns
its attention to San Marino.
The story of how literature became data in the first place is a story of
several, related intellectual failures.
In 2002, on a Friday, Larry Page began to end the book as we know it.
Using the 20 percent of his time that Google then allotted to its
engineers for personal projects, Page and Vice-President Marissa Mayer
developed a machine for turning books into data. The original was a
crude plywood affair with simple clamps, a metronome, a scanner, and a
blade for cutting the books into sheets. The process took 40 minutes.
The first refinement Page developed was a means of digitizing books
without cutting off their spines — a gesture of tender-hearted
sentimentality towards print. The great disbinding was to be
metaphorical rather than literal. A team of Page-supervised engineers
developed an infrared camera that took into account the curvature of
pages around the spine. They resurrected a long dormant piece of Optical
Character Recognition software from Hewlett-Packard and released it to
the open-source community for improvements. They then crowd-sourced
textual correction at a minimal cost through a brilliant program called
reCAPTCHA, which employs an anti-bot service to get users to read and
type in words the Optical Character Recognition software can’t
recognize. (A miracle of cleverness: everyone who has entered a security
identification has also, without knowing it, aided the perfection of the
world’s texts.) Soon after, the world’s five largest libraries signed on
as partners. And, more or less just like that, literature became data.
..continued..
More information about the D66
mailing list