Use the following preprocessors to clean up your documents before extracting structured data. Preprocessors execute in the order you define them in an array.
|Corrects the alignment of documents that are skewed, for example as a result of being photographed at an angle instead of straight on.
|Intelligently replaces Unicode ligatures in a text extraction.
|Corrects oversplit lines.
|Advanced prompt configuration for all the large-language model (LLM)-based methods in a config.
|Selectively OCRs pages in documents containing a mix of digitally generated text and text images (such as scanned text). If the whole PDF is a scan, you don't need to configure this preprocessor.
|Filters out low-scoring pages given a bag of target terms and stop terms.
|Ignores pages outside the start page and end page.
|Removes repeating elements at the top of the page. Ignores header elements that overlap with the page's main body.
|Removes repeating elements at the bottom of the page. Ignores footer elements that overlap with the page's main body.
|In most cases, Sensible corrects page rotation automatically. If it doesn't, configure this preprocessor.
|Corrects the size of text in documents whose size varies, for example as a result of being scanned or photographed at different scales.
|Corrects undersplit lines.
Updated 3 days ago