Layout-based methods
Use the following methods to extract structured data from documents.
Layout-based methods
| Method | Image | Notes |
|---|---|---|
| Box | ![]() | Extracts contents from boxes with continuous borders. |
| Checkbox | ![]() | Extracts true/false for the selection status of checkboxes. |
| Column | ![]() | Extracts text aligned in a column, from an anchor down to the bottom of the page. |
| Document Range | ![]() | Extracts text in a range, or extract image metadata (coordinates). Simpler alternative to the advanced Paragraph method. |
| Fixed Table | ![]() | Extracts tables where column headings never vary. |
| Intersection | ![]() | Extracts a target line at the intersection of a horizontal line defined by an anchor, and a vertical line defined by a second anchor. |
| Label | ![]() | Extracts a line of text that's proximate to another line. |
| Nearest Checkbox | ![]() | Extracts true/false for the selection status of the checkbox nearest to the anchor. |
| Paragraph | ![]() | Extracts paragraphs that partially span the page width, for example from columnar layouts. |
| Passthrough | ![]() | Extracts anchor text, optionally using RegEx. |
| Regex | Extracts text matching RegEx. Use RegEx capturing groups in this method to clean up extracted data in combination with the Passthrough method. | |
| Region | ![]() | Extracts data from a rectangular region defined by coordinates. Faster alternative to Box method. |
| Row | ![]() | Extracts text aligned in a row. |
| Signature | Extracts true/false for the signed status of a region. | |
| Text Table | ![]() | Extracts tables using solely text-positioning data (fast but limited). |
Large language model (LLM)-based methods
See LLM-based methods.
Updated 8 months ago












