

This is the technique the early physical photocell-based OCR implemented, rather directly.įeature extraction decomposes glyphs into "features" like lines, closed loops, line direction, and line intersections.

This technique works best with typewritten text and does not work well when new fonts are encountered. This relies on the input glyph being correctly isolated from the rest of the image, and on the stored glyph being in a similar font and at the same scale.

Matrix matching involves comparing an image to a stored glyph on a pixel-by-pixel basis it is also known as "pattern matching" or "pattern recognition". There are two basic types of core OCR algorithm, which may produce a ranked list of candidate characters.
