PROJECTS :: OCR Correction of Digital California Newspapers, 1850-1930
Contact: Campaign president John Lumea
The California Digital Newspaper Collection is an extraordinary searchable resource that places page scans of the original editions of more than 50 California newspapers — mostly from the late 19th and early 20th centuries — next to complete HTML texts of these pages.
The Collection includes the Daily Alta California (1849-1891); the Pacific Appeal (1862-1880), where Emperor Norton published most of his proclamations; and the Sacramento Daily Union (1851-1899), making it one of the best and most comprehensive troves of Emperor-related newspaper items available.
There is a caveat. The HTML texts of the original newspaper pages are generated using Optical Character Recognition (OCR) technology — which is very good but far from perfect. Starting with scans of the original pages, the quality of the HTML text rendered using OCR can vary widely — from mildly incorrect to inscrutably messy.
This matters, since online searches of the Collection are pegged to the OCR texts; if the OCR texts are wrong, one may have a very difficult time finding an item in the Collection.
Thankfully, the Collection includes a crowd-sourcing function that allows anyone to correct the OCR texts, simply by signing up for a free account with an email address and a password.
The Emperor's Bridge Campaign seeks to aid Emperor Norton research by correcting all of the Collection's newspaper items that feature references to him.
To learn more, and to participate in this project, please contact Campaign president John Lumea.
To return to our full list of Projects, click here.