Supported file types

File types

Sensible supports the following file types:

  • PDF
  • Microsoft Word (DOC and DOCX)
  • Spreadsheet formats (XLSX, XLS, XLSM, and CSV)
  • Single-page image formats (JPEG, PNG)
  • Multi-page image formats (TIFF)

For more information, see the following tables.

Operation context

File formatSensible app's Extract tabSingle-file extraction with SDKs or APIPortfolio extractionEmail bodiesEmail attachmentsClassification by type with SDKs or API
PDF
Microsoft Word
Spreadsheet formats
JPEG, PNG1
TIFFn/a

SenseML extraction method

File formatMethods that render non-text pixels2NLP Table method,
Fixed Table method3
Extraction of text that requires OCR
PDF
Microsoft Word
Spreadsheet formats
JPEG, PNG
TIFF
  1. Most JPEG or PNG files are single-document files, so if you make a portfolio extraction request, Sensible returns a single-document extraction. For the edge case in which a JPEG or PNG is a portfolio file, Sensible returns a single-document extraction from the first document it identifies in the portfolio.
  2. Methods that render non-text image pixels include pixel-based methods, such as Box, Checkbox, Nearest Checkbox, and Signature methods, multimodal LLM-based methods, and image coordinates returned by the Document Range method.
  3. As alternatives to these Table methods, use the Fixed Table method or the List method.

File sizes

Sensible supports the following file sizes:

OperationSize limit for /extract/{doc-type} API endpointSize limit for asynchronous calls
Single-document file extractionunder 4.5MB, or under 30 seconds processing time6 GB
Portfolio extractionn/a6 GB
Classification4.5 MB4.5 MB

File conversions

  • Word documents: Sensible converts the document to PDF before processing it.
  • Email bodies: Sensible converts the body to PDF before processing it.
  • Spreadsheet documents: All OCR settings are inapplicable for this file type. Sensible extracts text directly from the file without OCR. Sensible represents the text both internally and in the Sensible app's editor as follows:
    • Standardizes the formatting of all text in the file. Each cell contains exactly one line.
    • Standardizes cell height at 0.25'' tall and cell width at 1''. Overflow text in a cell is still available for extraction but isn't viewable in the Sensible app editor unless you click on a line in the rendered document to view its details.
    • Standardizes the maximum page height at 15 inches. Sensible splits longer sheets into consecutive pages.