SenseML to spreadsheet reference
This topic describes the rules Sensible uses to:
-
convert data extracted from one document, such as a PDF, into an Excel spreadsheet or CSV file.
-
combine data extracted from multiple documents into one spreadsheet
Sensible extracts the data as JSON using a SenseML configuration, then applies the rules to create spreadsheet data such as tables, labeled columns, and linked sheets.
CSV files
To generate CSV files, Sensible uses the same rules described in the following sections, except that Sensible uses blank rows instead of tabbed sheets to separate extraction content.
Multi-document spreadsheet
To combine document extractions into one spreadsheet file, Sensible uses the same rules described in the following single-document spreadsheet rules. Sensible uses the following additional rules:
- For all sheets other than the
<fields>
sheet, Sensible creates a new sheet for each document, and prefixes each sheet name with a zero-indexed document number, for exampledoc_0
ordoc_1
. - For the
<fields>
sheet, Sensible appends data as rows under the same column if it finds a matching field ID from another extraction in your call to the/generate_excel/{ids}
API endpoint, or under a new column if it doesn't find a matching field ID. For example, if document A outputs fieldscar_model
andcar_year
, and document B outputscar_model
andmanufacture_year
, Sensible creates a spreadsheet like the following:
Tips
- To avoid manually merging the similarly named columns in the preceding example, Sensible recommends using the same IDs for corresponding fields across different SenseML configs in a document type.
- To configure multi-document spreadsheet output using different rules than those described here, configure rules with Sensible's Zapier integration. You can map single-value field output with this integration.
Single-document spreadsheet
Overview
Sensible transforms JSON data extracted from a single document to a spreadsheet using the following rules:
- the
fields
sheet lists each piece of document data that can be represented as a single-cell value. For example, an extracted total monthly mortgage dollar amount. <field_id>
sheets hold more complex pieces of document data. For example, an extracted table.<field_id>.<index>
sheets hold complex repeating document data. For example, an extracted claims loss run.
For more information, see the following sections.
fields
sheet
fields
sheetThe fields
sheet lists fields and their values as key-value dictionaries, with the field ID as the column heading and value as the row. Sensible outputs a field to this sheet if the field:
-
outputs a single
value
. For example, the Box, Label, Row and Region methods each output a single value by default. -
outputs a predictably short array of values that can easily be stringified in a single cell. Typically this is the result of a Type configuration, for example, the Name type always outputs an array.
Example
Sensible converts the JSON extraction output from the auto_insurance_anyco example described in the Getting started with SenseML to the following spreadsheet:
Spreadsheet output
Example document
The preceding spreadsheet contains data from the following example document:
Example PDF | Download link |
---|
Example configuration
See the Getting started with SenseML for the SenseML configuration for this example.
JSON output
The following JSON document extraction output is the source for this spreadsheet:
{
"policy_period": {
"type": "string",
"value": "April 14, 2021 - Oct 14, 2021"
},
"comprehensive_premium": {
"source": "$150",
"value": 150,
"unit": "$",
"type": "currency"
},
"property_liability_premium": {
"source": "$10",
"value": 10,
"unit": "$",
"type": "currency"
},
"policy_number": {
"value": "123456789",
"type": "string"
}
}
<field_id>
sheets
<field_id>
sheetsEach <field_id>
sheet lists the output of a single field. Sensible outputs a field to this sheet if the field outputs multiple values. For example:
- the Table methods, invoices, and other methods that output nested JSON objects.
- methods that output arrays of unpredictable length, for example, fields with
"match":"all"
configured.
Example
Sensible converts the example JSON output from the example PDF described in the Table method to the following spreadsheet:
The preceding example shows that the fields
sheet lists the corresponding sheets for fields that have complex output. In this case, you must click on the agile_risks_table_updates_monthly
sheet to view the table output.
Example document
The preceding spreadsheet contains data from the following example document:
Example PDF | Download link |
---|
Example configuration
See the Table method for the SenseML configuration for this example.
JSON output
The following JSON extraction output is the source for this spreadsheet:
{
"agile_risks_table_updates_monthly": {
"columns": [
{
"id": "col1_risk_description",
"values": [
{
"value": "Poor task point estimation",
"type": "string"
},
{
"value": "Poor epic scope definition",
"type": "string"
},
{
"value": "Inadequate scrum master training",
"type": "string"
}
]
},
{
"id": "rank_this_month",
"values": [
{
"source": "3",
"value": 3,
"type": "number"
},
{
"source": "1",
"value": 1,
"type": "number"
},
{
"source": "2",
"value": 2,
"type": "number"
}
]
}
]
}
}
<field_id>.<index>
sheets
<field_id>.<index>
sheetsEach <field_id>.<index>
sheet lists the output of a single field that contains complex repeating data, for example, Sections output.
Example
Sensible converts the example JSON output from the example PDF described in the Advanced Sections nested table example topic to the following spreadsheet:
The preceding example shows that Sensible outputs nested sections in linked, indexed sheets.
Example document
The preceding spreadsheet contains data from the following example document:
Example PDF | Download link |
---|
Example configuration
See the Advanced sections nested table example topic for the SenseML configuration for this example.
JSON output
The following JSON extraction output is the source for this spreadsheet:
{
"table_columns": [
{
"employee_category": {
"type": "string",
"value": "Employees paid \u0000100k"
},
"employee_benefit": {
"value": "100% of salary, max $100k",
"type": "string"
},
"reduction_subtable": {
"columns": [
{
"id": "col1_age",
"values": [
{
"source": "65",
"value": 65,
"type": "number"
},
{
"source": "70",
"value": 70,
"type": "number"
},
{
"source": "75",
"value": 75,
"type": "number"
}
]
},
{
"id": "col2_reduction",
"values": [
{
"source": "35%",
"value": 35,
"type": "percentage"
},
{
"source": "60%",
"value": 60,
"type": "percentage"
},
{
"source": "75%",
"value": 75,
"type": "percentage"
}
]
}
]
},
"everything_in_this_vertical_section": {
"type": "string",
"value": "Employees paid \u0000100k Notes Employee benefit 100% of salary, max $100k After a 3 month waiting period Common carrier Not included Benefit reduction Age Reduction Not adjusted for 65 35% inflation 70 60% 75 75% For more details about coverage and benefits, see the following sections."
}
},
{
"employee_category": {
"type": "string",
"value": "All other employees"
},
"employee_benefit": {
"value": "50% of salary, max $50k",
"type": "string"
},
"reduction_subtable": {
"columns": [
{
"id": "col1_age",
"values": [
{
"source": "65",
"value": 65,
"type": "number"
},
{
"source": "70",
"value": 70,
"type": "number"
},
{
"source": "75",
"value": 75,
"type": "number"
}
]
},
{
"id": "col2_reduction",
"values": [
{
"source": "35%",
"value": 35,
"type": "percentage"
},
{
"source": "60%",
"value": 60,
"type": "percentage"
},
{
"source": "75%",
"value": 75,
"type": "percentage"
}
]
}
]
},
"everything_in_this_vertical_section": {
"type": "string",
"value": "All other employees Notes Employee benefit 50% of salary, max $50k After a 3 month waiting period Common carrier Not included Benefit reduction Age Reduction Not adjusted for 65 35% inflation 70 60% 75 75% For more details about coverage and benefits, see the following sections."
}
}
]
}
Updated about 9 hours ago