Google to include scanned documents in search results for first time


Google to include scanned documents in search results for first time

Antony Savvas

Google is to include scanned documents in its search results for the first time.

"In the past, scanned documents were rarely included in search results as we could not be sure of their content. Today, that changes. We are now able to perform Optical Character Recognition (OCR) on any scanned documents that we find stored in Adobe's PDF format."

This Optical Character Recognition technology lets Google convert a picture of a document into the words contained in it.

Whilst Google has indexed documents saved as PDFs for some time, scanned documents are a lot more difficult for a computer to read.

Scanning is the reverse of printing. Printing turns digital words into text on paper, whilst scanning makes a digital picture of the physical paper (and text) so you can store and view it on a computer.

The scanned picture of the text, however, is not quite the same as the original digital words, said Google. "Often you can see tell-tale signs: the ring of a coffee cup, ink smudges, or even fold creases in the pages.

"To people reading these documents, the distinction between words and pictures of words makes little difference, but for a computer the picture is almost unintelligible."

Email Alerts

Register now to receive IT-related news, guides and more, delivered to your inbox.
By submitting your personal information, you agree to receive emails regarding relevant products and special offers from TechTarget and its partners. You also agree that your personal information may be transferred and processed in the United States, and that you have read and agree to the Terms of Use and the Privacy Policy.

COMMENTS powered by Disqus  //  Commenting policy