OCR engine

Specifies the optical character recognition (OCR) engine for extracting text from images. For information about additional OCR options, see OCR.

Enums

The following table shows the enums available for the OCR Engine parameter.

enumdescription
AmazonDefault engine for the OCR preprocessor.
MicrosoftDefault engine for document types.
Suited to typewritten documents and large documents up to 50 MB in size.
LazarusFaster than Microsoft and produces similar output.
GoogleFaster than Microsoft and suited to handwriting and documents that are 5 pages or fewer. The Google engine doesn't merge words into lines automatically. Use the Merge Lines preprocessor in your configurations to do so.

Note: When Sensible extracts from portfolios, it uses Microsoft OCR, and ignores any OCR settings in the portfolio's document types.

Notes

You can use the Query Group method's Multimodal Engine parameter as an alternative to OCR engines to extract from non-text images or from poor-quality text images, such as handwriting.