Sections

Extracts data from a document that contains complex or repeated elements ("sections"). In effect, a "section" defines a repeating document inside a document, with its own fields.

The following image shows an example of a document containing "claims" sections, where each section starts with claim number and ends below date of claim:

Click to enlarge

For the preceding example, you can configure Sensible to return a claims array, where each object in the array contains a claim_number, claim_date, claimant_last_name, and other fields. For more information, see Claims loss run example.

You can define "horizontal" sections ("direction": "horizontal"), as shown in the preceding image, or you can define columnar sections ("direction":"vertical").

Horizontal sections:

The following image shows horizontal sections. For more information, see the following Parameters section, and see Section nuances.

Click to enlarge

Vertical sections:

The following image shows columnar vertical sections. For more information, see the following Parameters section, and see Section nuances.

Click to enlarge

Parameters

keyvaluedescription
id (required)stringSpecifies a field ID for an array of sections to extract in the document area defined by the Range parameter.
typesectionsSpecifies that this field extracts sections.
range (required)objectSpecifies the document area from which to extract an array of sections.
horizontal sections: The sections can span pages and non-section text. For example, in the preceding claims image, sections can interleave with month headings.

vertical sections: The sections can span pages. By default, they can't span text that breaks the column format.
For the Range object's parameters, see the following table.
sectionsSpecifies sections inside sections. Use this for complex sections that contain nested repeated elements, for example, a grid of tables. Each nested section searches for its range inside each parent section's range.
displayboolean. Default: trueSpecifies to display the start and end of each section as brackets overlaid on the rendered document in the Sensible app. Use the brackets to visually troubleshoot sections.
requiredFieldsobjectArray of field IDs that must be non-null for Sensible to return a section.
In the Claims loss run example, add the following code to the sections to omit claims that lack a phone number:
"requiredFields": ["phone_number"],
With the preceding code, Sensible omits claim number 9876543211 from the example output.
fieldsarray of fieldsSpecifies fields to extract information that you expect to repeat in each section.
computed_fieldsarray of computed fieldsSpecifies to output computed fields to each section. The computed fields have access to each section's fields. To get access to and transform the output of fields that aren't in the sections' range, use the Copy To Section method.

Range parameters

See the following table for details about the Range object parameters:

keyvaluedescription for horizontal sectionsdescription for vertical sections
directionhorizontal, vertical. default: horizontalIf set to horizontal, Sensible searches for horizontal sections. For an illustration of this behavior, see section nuances.If set to vertical, Sensible detects columnar sections in a single document area defined by the Range parameter.
In detail, Sensible searches left-to-right for columns that contain repeated text in the first-found document area defined by the Range parameter.
For example, use vertical sections to extract tables nested in tables, tables with row labels, or other complex text layouts.
For an illustration of vertical section behavior, see section nuances.
anchor (required)Anchor object, or array of Match objectsContains the following parameters:

start: Ignores anything in the document before this line. If undefined, Sensible starts searching for the Match parameter at the beginning of the document.

match (required):
    Specifies the repeated starting line of each section. In the Claims loss run example, specify "Claim number". Each section starts at the top boundary of this starting line, and excludes text to the right of the line. If sections lack an easy-to-match starting line, use the Require Stop and Offset Y parameters to start each section above or below the line matched by this parameter.

   end: Ignores anchor matches in the document after this line. For example, to extract solely September claims in the preceding image, specify "October".
Same behavior as for horizontal sections, with the following exception:
The Match parameter specifies the horizontal line that defines the shared top boundary of all the columnar sections. For more information about column recognition, see Section nuances.
stopAnchor object, or array of Match objectsSpecifies the bottom boundary of each section. In the Claims loss run example, if you specify "Date of claim", then each section stops at a horizontal line below the bottom boundary of the stop line "Date of claim".
Sensible defines the Stop horizontal line by finding the top boundary of the stop line if found, or the top boundary of the next section's starting line, then applies a default offset of 0.08" down or up the page, respectively.
If you don't specify this parameter, each section stops at the top boundary of the next section's starting line. In this case, the last section continues to the end of the document.
Specifies the horizontal line that defines the shared bottom boundary of all the columnar sections. If not specified, Sensible ends the columns at the first line that spans multiple columns. If specified, Sensible ignores lines that span multiple columns. If the spanning lines occur mid-column, you can configure the Line Filters parameter as an alternative to this parameter.
For more information, see Section nuances.
offsetYnumber in inches. Positive values offset down the page, negative values offset up the page.Specifies the number of inches by which to offset each section's top boundary from the anchor's Match parameter.
By default each section starts at the top boundary of the anchor's Match parameter. If you specify Offset Y, each section starts at that top boundary plus the offset. For example, configure this when each section lacks an easy-to-match first line.
Specifies the number of inches by which to offset the columns' shared top boundary from the anchor's Match parameter.
For example, configure this when you want to exclude non-columnar text from columnar sections.
stopOffsetYnumber in inches. Positive values offset down the page, negative values offset up the page.Specifies the number of inches by which to offset each section's bottom boundary from the horizontal line specified by the Stop parameter.Specifies the number of inches by which to offset the columns' shared bottom boundary from the horizontal line specified by the Stop parameter.
tolerancenumber in inches. default: 0.08Configure this option for your Stop parameter if the stop line and its immediately preceding and succeeding lines are an unusual font size.
For example, if your font size is a tiny 1.44 pt (0.02 inches), set this parameter to 0.01.
In detail, Sensible defines the Stop horizontal line by finding the top boundary of the stop line, or if unspecified the top boundary of the next section's starting line, then applies a default offset of 0.08" down or up the page, respectively. This parameter configures the default offset.
Parameters specific to horizontal sections
externalRangeobject(Advanced) Enables anchoring on text that's external to the sections in the sections' field anchors. For example, use an external range with the Intersection method when sections lack internal anchoring candidates.
The external range defines a vertical range anywhere in the document. You can configure the external range to be static, or to repeat relative to each section.
Contains the following parameters:

anchor (required): An Anchor object. The external range starts at the top boundary of this starting line, and the range's scope includes text to the left of this line. If the range lacks an easy-to-match first line, you can use the Offset Y parameter to start the range above or below the line matched by this parameter.

anchorIsAbsolute: (default: false). If false, Sensible creates dynamic external ranges, each relative to a section start. For example, configure dynamic external ranges if you want to anchor each section's fields on variably positioned page headings. For more information, see Dynamic external range example. Sensible starts searching for dynamic external ranges in the lines succeeding the start of each section. To search for dynamic external ranges that precede each section, use "reverse":"true" on the external range's anchor.
If the Anchor Is Absolute option is set to true, Sensible creates one static external range in the document, searching from the start of the document. For an example of a static dynamic range, see Static external range example.

stop: (Match object) (required) A Match object defining the end of the external range. Sensible defines the Stop horizontal line by finding the top boundary of the stop line, then applies a default offset of 0.08" down the page.

offsetY: Specifies the number of inches to offset the range's top boundary from the anchor's Match parameter.

stopOffsetY: Specifies the number of inches to offset from the Stop parameter.
not supported
requireStopBoolean. default: falseIf true, each section ends when it matches the Stop parameter, instead of the default behavior of ending at the next starting line specified in the anchor's Match parameter.
Note: Configure this parameter for horizontal sections when the starting line repeats in each section, to avoid prematurely ending each section.
You don't need to configure this parameter for matches that are on the same horizontal line as the anchor's Match parameter. For more information, see Multiple anchors in section.
not supported
Parameters specific to vertical (columnar) sections
ignoredColumnsinteger array.not supportedRemoves unwanted columns from both the output and from the SenseML search scope. For example, this is useful if the columns contain text that interferes with anchoring on other columns.
columnSelectionarray of index selections where each "index selection" can be:
- a column index or comma-delimited indices

- an array with two comma-delimited indices, meaning all the columns in the indices range

default: capture all columns ([])
not supportedUse to configure which columns to treat as sections. Sensible appends unselected columns to each section, for example so that they can be used as anchor candidates. For an illustration, see Section nuances.
Example syntax:
[[0,5]] selects 1st through 6th columns as sections. Sensible adds the lines from any other columns to each section.
[1,3,-1] selects the 2nd, 4th, and the last column.
[[0, -2]] selects 1st through 2nd-to-last columns.
[1,[3,7]] selects the 2nd column and the 4th through 8th columns.
minColumnGapnumber in inches. default: 0not supportedConfigures column recognition by specifying the smallest allowed width of the gutters separating the columns. For an example, see Table grid example. Use when text in a column contains large whitespace gaps that cause Sensible to mistakenly split one column into two. To avoid this split, set a minimum gap that's larger than the gaps inside the column. The default (0) specifies that zero-width vertical lines define the column boundaries.
lineFiltersMatch object, or array of Match objectsnot supportedUse to ignore lines that span columns and break column recognition. For example, if the lines occur mid-column, use this parameter rather than a Stop parameter to exclude the lines. Sensible excludes the lines both from the output and from the SenseML search scope.
You don't need to configure this parameter if you specify a Stop parameter. For more information, see Section nuances.

Examples

See the following topics:

Notes

  • For details about vertical sections, see Section nuances.
  • See the Copy To Section computed field method to add globally applicable document information to sections.
  • See the Zip computed field for information about zipping sections together.