Use the following methods to extract structured data from documents.

TODO add:

  • signature
  • paragraph

Layout-based methods

MethodImageNotes
BoxClick to enlargeExtracts contents from boxes with continuous borders.
CheckboxClick to enlargeExtracts true/false for the selection status of checkboxes.
ColumnClick to enlargeExtracts text aligned in a column, from an anchor down to the bottom of the page.
Document RangeClick to enlargeExtracts text in a range, or extract image metadata (coordinates). Simpler alternative to the advanced Paragraph method.
Fixed TableClick to enlargeExtracts tables where column headings never vary.
IntersectionClick to enlargeExtracts a target line at the intersection of a horizontal line defined by an anchor, and a vertical line defined by a second anchor.
InvoiceClick to enlargeExtracts an invoice and metadata from a variety of invoice formats.
LabelClick to enlargeExtracts a line of text that's proximate to another line.
Nearest CheckboxClick to enlargeExtracts true/false for the selection status of the checkbox nearest to the anchor.
ParagraphClick to enlargeExtracts paragraphs that partially span the page width, for example from columnar layouts.
PassthroughClick to enlargeExtracts anchor text, optionally using RegEx.
RegexExtracts text matching RegEx. Use RegEx capturing groups in this method to clean up extracted data in combination with the Passthrough method.
RegionClick to enlargeExtracts data from a rectangular region defined by coordinates. Faster alternative to Box method.
RowClick to enlargeExtracts text aligned in a row.
Text TableClick to enlargeExtracts tables using solely text-positioning data (fast but limited).

Large-language model (LLM)-based methods

For LLM-based methods, see Natural-language methods.