Classifying documents by type
Sensible supports two levels of document classification:
-
Classify a document by its similarity to each document type you define in your Sensible account.
-
Classify a document by its subtype during the extraction workflow. By default, Sensible performs this step automatically. For more information, see Devops platform.
This topic covers classifying a document by its type.
For example, if you define a bank statements type and a 1040s type in your account, you can classify 1040 forms, 1099 forms, Bank of America statements, Chase statements, and other documents, into those two types. In this scenario, for a 2023-1-1_bankofamerica_statement_jon_doe.pdf
document, Sensible:
- Classifies this document into the
bank_statements
document type. - Classifies the statement doc by its similarity to reference documents in the
bank_statements
document type. The highest score is for a Bank of America sample statement. - Provides metadata for the classification, including similarity scores for this document compared to each document type in your Sensible account and to each reference document in the
bank_statements
type.
Use document type classification:
-
Prior to an extraction workflow. For example, determine which documents to extract prior to calling a Sensible extraction endpoint.
-
Independent from an extraction workflow. For example, determine where to route each document or to label each document in a system of record.
To improve classification results, Sensible recommends that a document type includes a sample set of reference documents that represent the diversity you expect to see in the document type. To use a document type for classification, Sensible requires that the type contains at least one reference document.
To classify documents, use the Sensible API or SDKs.
Updated 11 days ago