Google to include scanned documents in search results for first time

Google is to include scanned documents in its search results for the...

Google is to include scanned documents in its search results for the first time.

"In the past, scanned documents were rarely included in search results as we could not be sure of their content. Today, that changes. We are now able to perform Optical Character Recognition (OCR) on any scanned documents that we find stored in Adobe's PDF format."

This Optical Character Recognition technology lets Google convert a picture of a document into the words contained in it.

Whilst Google has indexed documents saved as PDFs for some time, scanned documents are a lot more difficult for a computer to read.

Scanning is the reverse of printing. Printing turns digital words into text on paper, whilst scanning makes a digital picture of the physical paper (and text) so you can store and view it on a computer.

The scanned picture of the text, however, is not quite the same as the original digital words, said Google. "Often you can see tell-tale signs: the ring of a coffee cup, ink smudges, or even fold creases in the pages.

"To people reading these documents, the distinction between words and pictures of words makes little difference, but for a computer the picture is almost unintelligible."



Enjoy the benefits of CW+ membership, learn more and join.

Read more on E-commerce technology



Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to: