Google to include scanned documents in search results for first time

News

Google to include scanned documents in search results for first time

Antony Savvas

Google is to include scanned documents in its search results for the first time.

"In the past, scanned documents were rarely included in search results as we could not be sure of their content. Today, that changes. We are now able to perform Optical Character Recognition (OCR) on any scanned documents that we find stored in Adobe's PDF format."

This Optical Character Recognition technology lets Google convert a picture of a document into the words contained in it.

Whilst Google has indexed documents saved as PDFs for some time, scanned documents are a lot more difficult for a computer to read.

Scanning is the reverse of printing. Printing turns digital words into text on paper, whilst scanning makes a digital picture of the physical paper (and text) so you can store and view it on a computer.

The scanned picture of the text, however, is not quite the same as the original digital words, said Google. "Often you can see tell-tale signs: the ring of a coffee cup, ink smudges, or even fold creases in the pages.

"To people reading these documents, the distinction between words and pictures of words makes little difference, but for a computer the picture is almost unintelligible."


Email Alerts

Register now to receive ComputerWeekly.com IT-related news, guides and more, delivered to your inbox.
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
 

COMMENTS powered by Disqus  //  Commenting policy