Now Enjoy Searching PDF Files on Google!
Finding scanned documents in Google search results was rare as determining the nature of the content was a problem for the search engine. But things have finally changed. With optical character recognition (OCR) software, Google has made it possible for users to search any Web-hosted document embedded in the Adobe Systems developed ‘PDF file format’.
Google product manager, Evin Levey noted that the company is employing the technology to convert scanned documents into equivalent text files to make them searchable. Moreover, these files can also be indexed and returned as responses to Google search queries, making life much simpler for Web surfers. Levy said this to be a small but important step forward in Google’s mission to make the entire world’s information accessible and useful.
Delight for Authors, Publishers and Readers:
With the OCR technology, the Mountain View, California-based firm also expects to aid the Google Book Search – the contentious book-scanning project which was uncovered at the 2004 Frankfurt Book Fair by the search giant. From then onwards, about 3,000 book titles are being scanned from the book collections at the world’s major libraries per day by Google.
Even though the project gave rise to copyright issues initially, Google has just concluded a deal with the Association of American Publishers and the Authors Guild, which allows Google to expand online access to host of in-copyright books as well as other written materials in the US. The deal resolves charges that had defied Google’s plan to digitize, search and display snippets of in-copyright books and to share digital copies with libraries without the open permission of the copyright holder.
Chief Officer of Google, David Drummond termed the agreement as ‘groundbreaking’ as it will provide readers online access to a number of in-copyright books for the first time ever. He explained that the move will also foster a new market for both authors and publishers to market their works. Finally, he said that it will promote the efforts of Google’s library partners to preserve their collections while making books more available to readers, academic researchers and students.
Filling the Gap:
Considering the ongoing exponential development of multimedia on the Web, the text-based nature of the present search technology, however, is evidently inadequate. The reason for this is that search engines are able to find only that multimedia which has been tagged in text, a time-eating process that is often overlooked by the content producers.
Therefore, it is clear why several researchers are frantic over hunting for mediums which will enable search providers to scan multimedia content directly and then match results to search queries and the ad placement requests of their users. Interestingly, Adobe Systems has already taken action to produce the next generation of search technology.
In July this year, the firm affirmed that its Adobe Flash Player technology had been optimized to allow search engines index multimedia content generated in the Flash file format – the content that was undetectable earlier.
Adobe’s Vice President, David Wadhwani said that the company is initially working with Google and Yahoo to significantly enhance search of this rich content on the Web. He also said that Adobe intended to broaden the accessibility of this capacity for the benefit of all developers, content publishers and end consumers.
If you enjoyed this post, please consider to leave a comment or subscribe to the feed and get future articles delivered to your feed reader.


Comments
No comments yet.
Leave a comment