📜 ⬆️ ⬇️

Infrared lights for OCR

As you know, Google has set itself to scan, recognize and index all the paper books in the world, that is, all the knowledge of humanity. However, in reality, this process is not moving as fast as we would like. The most difficult stage is normal text recognition. For the OCR program you need a clear image of the page without distortion. All lines on the page should be perfectly smooth. In practice, the book cannot be pressed to the scanner (in many cases, the pages of the book cannot be touched at all). Be that as it may, there are characteristic curvatures along the edges of the pages. They are struggling with software methods, with varying degrees of success.

However, Google has figured out how to help in a hardware way to solve this problem. Last week they received a patent for the use of infrared sensors when scanning books ( US Patent No. 7508978 ). The essence of the idea is that the infrared grid allows you to create a three-dimensional model of the curvature of the page, so that the program to align the image gets clear coordinates, how to convert the image.


')
New scientist

Source: https://habr.com/ru/post/56563/


All Articles