Supported file types
File types
Sensible supports the following file types:
Operation | Microsoft Word (DOC and DOCX) | Microsoft Excel (XLSX) | image formats (JPEG, PNG, and TIFF) | |
---|---|---|---|---|
Sensible app's Extract tab | ✅ | ✅ | ✅ | ❌ |
Single-file extraction with SDKs or API | ✅ | ✅ | ✅ | ✅ |
Portfolio extraction with SDKs or API | ✅ | ✅ | ✅ | ❌ |
Classification with SDKs or API | ✅ | ✅ | ✅ | ✅ |
File sizes
Sensible supports the following file sizes:
Operation | Size limit for /extract/{doc-type} API endpoint | Size limit for aysnchronous calls |
---|---|---|
Single-document file extraction | under 4.5MB, or under 30 seconds processing time | 6 GB |
Portfolio extraction | n/a | 6 GB |
Classification | 4.5 MB | 4.5 MB |
Notes
- When extracting from image file formats, Sensible ignores OCR or OCR preprocessor settings you configure in the document type or SenseML configuration. For more information about OCR, see OCR level.
- For DOC and DOCX documents, Sensible converts the document to PDF before processing it.
- For XLSX documents, Sensible extracts text directly from the file. Sensible takes these steps:
- Standardizes the formatting of all text in the file. Each cell contains exactly one line.
- Standardizes cell height at 0.25'' tall and cell width at 1''. Overflow text in a cell is still available for extraction but isn't viewable in the JSON editor unless you click on a line in the rendered document to view its details.
- Standardizes the maximum page height at 15 inches. Sensible splits longer sheets into consecutive pages.
- Sensible doesn't support the following methods for this file type:
- Pixel-based methods, such as Box, Checkbox, Nearest Checkbox, and Signature methods, images returned by the Multi Modal Engine parameter on the Query Group method, and image coordinates returned by the Document Range method.
- OCR-based methods, such as NLP Table and Fixed Table methods. Use Text Table or List as alternatives. All OCR settings are inapplicable for this file type.
- For TIFF documents, SenseML methods that attempt to render pages return an error, including:
- pixel- or image-based methods, such as Box, Checkbox, Signature, and image coordinates returned by the Document Range method.
- Fixed Table method with the Stop parameter specified. Use the Text Table method as an alternative.
Updated 15 days ago