Monday, August 18, 2008

ReCAPTCHA Helps Decipher OCR Text

From the "Nice to Know" department:

You may have heard of "reCAPTCHA," which is a variant of CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart), i.e. requests to enter the characters of a distorted word before permitting access to a Website or submission of an order, thus confirming that the data was entered by a person and not a bot or machine, and thereby stopping scammers and spammers exploiting the Websites to send out illegal e-mails or harvest addresses.

It is estimated that CAPTCHA schemes are used about 100 million times every day.

reCAPTCHA is often used to decipher words that were not correctly read by OCR programs. In fact, the BBC reports that reCAPTCHA, created by Luis von Ahn at Carnegie Mellon University in Pittsburgh, farms out work to about 40,000 sites and now collects about four million responses every day. In the last year it has helped resolve more than 440 million words and has just helped to complete the conversion of the entire archive of the New York Times from 1908 into digital form.

So may not be participating in SETI, the global computer network in the Search for ExtraTerrestrial Intelligence, or any other globally distributed computer efforts, but you almost surely have helped at some point complete the work of digitizing the New York Times!

