Optical Character Recognition, or more commonly known as OCR, is the translation of handwritten, typewritten or printed text into machine readable and editable text using either mechanical or electrical means. OCR is an off-line character recognition technique.
OCR is a vast technology that extends into the fields of pattern recognition, artificial intelligence and machine vision. OCR started off using pure optical instruments like mirrors and lenses but has now transformed into a purely digital method that uses scanners and computer algorithms.
What may seem like a relatively new technology was developed way back in 1929, which was a mechanical device that used a photo detector and different templates. The template moved all over the text to be recognized, and when the template and the character lined up exactly, no light reached the photo detector and that character was identified. However, this worked with only a predetermined font, and each font needed a different template. Today"s systems can recognize a wide variety of fonts spanning across different languages.
The latest OCR technology boasts of accuracy exceeding 99%. This means that it still needs to be reviewed manually for errors. The OCR technology used in scanners, are still not very accurate especially when used to scan hand written documents. With an accuracy of 80-90%, only neat handwritten characters can be recognized, which amounts to around a dozen errors on each page and makes the use of this technology fairly limited.