Google is to include scanned documents in its search results for the first time.
"In the past, scanned documents were rarely included in search results as we could not be sure of their content. Today, that changes. We are now able to perform Optical Character Recognition (OCR) on any scanned documents that we find stored in Adobe's PDF format."
By submitting your email address, you agree to receive emails regarding relevant topic offers from TechTarget and its partners. You can withdraw your consent at any time. Contact TechTarget at 275 Grove Street, Newton, MA.
This Optical Character Recognition technology lets Google convert a picture of a document into the words contained in it.
Whilst Google has indexed documents saved as PDFs for some time, scanned documents are a lot more difficult for a computer to read.
Scanning is the reverse of printing. Printing turns digital words into text on paper, whilst scanning makes a digital picture of the physical paper (and text) so you can store and view it on a computer.
The scanned picture of the text, however, is not quite the same as the original digital words, said Google. "Often you can see tell-tale signs: the ring of a coffee cup, ink smudges, or even fold creases in the pages.
"To people reading these documents, the distinction between words and pictures of words makes little difference, but for a computer the picture is almost unintelligible."