Getting started with AI-powered extractions
In this tutorial, you'll learn to extract data from a set of similar documents using an AI-powered visual authoring tool, Sensible Instruct. You'll use natural language to instruct Sensible about which data to extract from an example document. Sensible uses large-language models (LLMs) such as GPT-4 to extract your target information.
You can then save your descriptions as an extraction configuration, or "config." Publish your config to automate extracting from similar documents.
Use this tutorial if you want a guided tour of configuring AI-powered document extractions in the Sensible app. Or see the following links:
- You can mix and match Sensible Instruct methods with SenseML methods for advanced config authoring. SenseML is a superset of Sensible Instruct. For more information about SenseML versus Sensible Instruct, see Choosing extraction approach. For authoring in SenseML, see Getting started with SenseML.
- If you want to explore without much explanation, sign up for an account and check out our interactive in-app example extractions. For links to the examples, see AI-powered resources.
Get structured data from a bank statement
Let's get started with Sensible Instruct! Sensible Instruct makes it easy to specify the data you want to extract from documents.
In this tutorial, you'll:
- Edit a collection of descriptions ("a config") about the data you want to extract from an example PDF
- Test the config against a second, similar PDF
- Download extracted document data as an Excel sheet.
Get an account
Get an account at sensible.so. If you don't have an account, you can still read along to get a rough idea of how things work.
Log into the Sensible app.
Configure the extraction
To view an example bank statement PDF extraction, navigate to https://app.sensible.so/editor/instruct/?d=sensible_instruct_basics&c=bank_statement&g=bank_statement.
You'll see a "config", or list of instructions for extracting from the example document (in the left pane), and extracted data in the right pane.
- Take the following steps to edit the config to extract more data from the document.
Extract a query
- To extract a single data point from the document, click Query.
- Edit the query as shown in the following screenshot by entering
checking account number (not savings)in the query field. Give the query an ID,
account_num_checkingthen click the Send icon:
- You should see the extracted account number,
8347-3248, populate in the Extracted data section.
- Click Back to fields.
Extract a table
To extract a table, take the following steps:
Configure the table extraction using the following screenshot and instructions:
|Field name||Method||Overall table description||Column IDs and descriptions|
|savings_transaction_history||Table||"savings transaction history"||date - "date"|
description - "description without totals"
amount - "amount"
Click the Send icon for each column.
- To verify the extracted data, scroll down in the right pane and compare the Extracted data section to the document in the left pane:
- (Optional) To standardize the representation of the extracted dates and dollar amounts, configure
currencytypes as shown in the following screenshots:
You should see that the formatting of the extracted data changes according to the types you specified. For example, Sensible reformats the date
04/11/23 to a standardized output format,
Publish the config
To publish your config, click Publish, click Production, then click Publish to production:
Test the config against another document
To test the config against a second example document, take the following steps:
- Navigate to https://app.sensible.so/editor/instruct/?d=sensible_instruct_basics&c=bank_statement&g=bank_statement_2. Notice that the document in the left pane changed, to a statement for a different customer.
- In the right pane, scroll down to the fields you authored in previous steps. Verify that the extracted information automatically updated to reflect the second example document. For example, the account number updated from
(Optional) Extract more data
Try extracting other pieces of information using what you learned in previous steps, such as:
- The bank address or customer address
- The Spanish-speaking customer service phone number
- The time period for each account. Hint: To extract repeating data that isn't in table format, use the List method. For example, in this config, the
accounts_listuses the List method.
When you're done making changes, publish the config to save your changes.
(Optional) Export extracted data as spreadsheets
Now you've tested and published your config, you can upload new bank statements, automatically extract from them using the config, and download the extracted data as Excel.
Take the following steps:
- Download the following PDF document:
|Example PDF||Download link|
- Navigate to the Quick Extraction tab.
- In the dropdown in the right pane, select
sensible_instruct_basics / Auto select. The document type,
sensible_instruct_basics, contains configs for bank statements and other document types such as resumes and contracts. When you specify
Auto select, Sensible automatically chooses the bank config when you upload a bank statement.
- In the dropdown in the right pane, select
Click Upload document and select the document you downloaded in a previous step.
Click Run Extraction.
Sensible displays the extracted data as JSON in the right pane. Click the Download excel to convert the extracted document data to Excel:
The following spreadsheet shows the example output. The first tab contains fields with single values, for example the start date field, and succeeding tabs contain fields with table output, for example, the account list table.
Note Each downloaded Excel file contains the data from one document. To combine extracted documents into one Excel file, use the Sensible API.
Congratulations! You edited your first config and extracted your first document data.
If you want to process bank statements generated by a different company, you can import our pre-authored configs for several major banks and get started with out-of-the-box extractions.
Or, create a new config for your custom documents:
In the Document Types tab, Click New document type to create a new document type and name it. Leave the defaults and click Create.
In the document type's Reference documents tab, upload an example of the type of PDF document you want to extract from.
In the document type's Configurations tab, create a new test configuration, and click the configuration you created to edit it.
Click Sensible Instruct and create fields to extract data using what you've learned in this guide.
Updated 1 day ago