Layout-based methods
Use the following methods to extract structured data from documents.
Layout-based methods
Method | Image | Notes |
---|---|---|
Box | Extracts contents from boxes with continuous borders. | |
Checkbox | Extracts true/false for the selection status of checkboxes. | |
Column | Extracts text aligned in a column, from an anchor down to the bottom of the page. | |
Document Range | Extracts text in a range, or extract image metadata (coordinates). Simpler alternative to the advanced Paragraph method. | |
Fixed Table | Extracts tables where column headings never vary. | |
Intersection | Extracts a target line at the intersection of a horizontal line defined by an anchor, and a vertical line defined by a second anchor. | |
Label | Extracts a line of text that's proximate to another line. | |
Nearest Checkbox | Extracts true/false for the selection status of the checkbox nearest to the anchor. | |
Paragraph | Extracts paragraphs that partially span the page width, for example from columnar layouts. | |
Passthrough | Extracts anchor text, optionally using RegEx. | |
Regex | Extracts text matching RegEx. Use RegEx capturing groups in this method to clean up extracted data in combination with the Passthrough method. | |
Region | Extracts data from a rectangular region defined by coordinates. Faster alternative to Box method. | |
Row | Extracts text aligned in a row. | |
Signature | Extracts true/false for the signed status of a region. | |
Text Table | Extracts tables using solely text-positioning data (fast but limited). |
Large language model (LLM)-based methods
See LLM-based methods.
Updated 9 days ago