Layout-based methods
Use the following methods to extract structured data from documents.
Layout-based methods
Method | Image | Notes |
---|---|---|
Box | ![]() | Extracts contents from boxes with continuous borders. |
Checkbox | ![]() | Extracts true/false for the selection status of checkboxes. |
Column | ![]() | Extracts text aligned in a column, from an anchor down to the bottom of the page. |
Document Range | ![]() | Extracts text in a range, or extract image metadata (coordinates). Simpler alternative to the advanced Paragraph method. |
Fixed Table | ![]() | Extracts tables where column headings never vary. |
Intersection | ![]() | Extracts a target line at the intersection of a horizontal line defined by an anchor, and a vertical line defined by a second anchor. |
Label | ![]() | Extracts a line of text that's proximate to another line. |
Nearest Checkbox | ![]() | Extracts true/false for the selection status of the checkbox nearest to the anchor. |
Paragraph | ![]() | Extracts paragraphs that partially span the page width, for example from columnar layouts. |
Passthrough | ![]() | Extracts anchor text, optionally using RegEx. |
Regex | Extracts text matching RegEx. Use RegEx capturing groups in this method to clean up extracted data in combination with the Passthrough method. | |
Region | ![]() | Extracts data from a rectangular region defined by coordinates. Faster alternative to Box method. |
Row | ![]() | Extracts text aligned in a row. |
Signature | Extracts true/false for the signed status of a region. | |
Text Table | ![]() | Extracts tables using solely text-positioning data (fast but limited). |
Large language model (LLM)-based methods
See LLM-based methods.
Updated 21 days ago