Extracts data from a document that contains complex or repeated elements ("sections"). In effect, a "section" defines a repeating document inside a document, with its own fields.

The following image shows an example of a document containing a group of "claims" sections:

Click to enlargeClick to enlarge

For the preceding example, you can configure Sensible to return an unprocessed_claims array, where each object in the array contains a claim_number, claim_date, claimant_last_name, etc.

Parameters

keyvaluedescription
id (required)stringSpecifies an ID for a group of sections to extract in the document area defined by the Range parameter. You can define an array of section groups, and you can nest sections inside of other sections.
range (required)objectSpecifies the document area from which to extract a group of sections. The Range parameter specifies both:
- a group of repeated sections in an area of the document
- the start and end of each repeated section.

The sections group can span pages and nonrepeating text. For example, in the preceding image, an "unprocessed_claims" section group can span month headings.

For the Range object's parameters, see the following table.

sectionsSpecifies sections inside sections. Use this for complex sections that contain nested repeated elements, for example, a grid of tables. Each nested section searches for its range inside the parent section's range.

Range parameters

See the following table for details about the Range object parameters:

keyvaluedescription
anchor (required)Anchor object, or array of Match objectsAnchor parameters have a special meaning in the context of sections, as follows:
start: Ignores anything in the document before this line. if undefined, Sensible searches for the section group starting at the beginning of the document
match (required): Specifies both the start of the section group and the repeated starting line of each section. For example, in the preceding image, specify "Claim number". The section starts at the top boundary of this starting line, and the section's scope includes text to the left of this line. If the start of the section lacks an easy-to-match line, you can use the Require Stop and Offset Y parameters to start the section above or below the line matched by this parameter.
end: Ignores any anchor matches in the document after this line. For example, to extract solely September claims in the preceding image, specify "October".
stopAnchor object, or array of Match objectsSpecifies the repeated end of the section after its anchor. For example, if you specify "Date of claim", then each section ends when it encounters the phrase "Date of claim". Sensible ignores any text after the claimant's last name in each section. The section ends at the top boundary of this stop line (plus any offset).
If you don't specify this parameter, each section ends at the top boundary of the next section's starting line (plus any offset). In this case, the last section in the group continues to the end of the document.
requireStopBoolean. default: falseIf true, the Stop parameter is required, and the section ends when it matches the Stop parameter, instead of the default behavior of ending at the next starting line specified in the anchor's Match parameter. For example, use this parameter when the starting line repeats within the section, to avoid ending the section before it completes.
Note Configure this parameter if the anchor matches in the section follow each other vertically on the page. You don't need to configure this parameter if the matches lie on one horizontal line in the section. In such a case, Sensible ignores any zero-height sections generated by this horizontal line's matches. For more information, see Multiple anchors in section.
offsetYnumber in inchesSpecifies the number of inches to offset the section's top boundary from the anchor Match parameter. By default a section starts at the top boundary of the matched line. If you specify Offset Y, the section starts at that top boundary plus the offset. For example, this is useful when the section lacks an easy-to-match first line, or when you want to exclude non-columnar text from a vertical section.
stopOffsetYnumber in inchesSpecifies the number of inches to offset the section's end from the top boundary of the anchor's Stop parameter.

Range parameters for columns

Use "direction":"vertical" in the Range object to define a "vertical section" in which to find text in column-like layouts. For example, use vertical sections to extract tables nested in tables, tables with row labels, or other complex text layouts.

The following table shows Range parameters specific to vertical sections.

keyvaluedescription
directionhorizontal, vertical. default: horizontalIf set to vertical, Sensible searches for columns in a section group.
In detail, Sensible searches left-to-right for columns in the first-found document area defined by the Range parameter, rather than the default behavior of continuing to search for matches for the Range parameter. For an illustration of this behavior, see section nuances.
columnSelectionarray of index selections where each "index selection" can be:
- a column index or comma-delimited indices

- an array with two comma-delimited indices, meaning all the columns in the indices range

default: capture all columns ([])
Use to:
- Select the columns you want to output using zero-based column indices or indices ranges.
- Specify to treat unselected columns as row labels. Each selected column can use the text in unselected columns as anchors for its fields. For an illustration, see Section nuances.
[[0,5]] selects 1st through 6th columns. Any other columns are treated as row labels.
[1,3,-1] selects 2nd, 4th, and the last columns. Use negative indices to offset from the last column.
[1,[3,7]] selects the 2nd column and the 4th through 8th columns .
[[0, -2]] selects 1st through 2nd-to-last columns.

For more information, see the Examples section.
ignoredColumnsinteger array.Use to remove unwanted columns from both the output and from the SenseML search scope. This is useful, for example, if the columns contain text that interferes with anchoring on other columns.
minColumnGapnumber in inches. default: 0Configures column recognition by specifying the smallest allowed width of the gutters separating the columns. For an example, see Table grid example. Use when text in a column contains whitespace gaps such that Sensible can split one column into two. To avoid this split, set a minimum gap that's larger than the gaps in the column. The default (0) means that zero-width vertical lines define the column boundaries.
lineFiltersMatch object, or array of Match objectsUse to ignore lines that span columns and break column recognition. For example, if the lines occur mid-column, use this parameter rather than an offset parameter to exclude the lines. Sensible excludes the lines both from the output and from the SenseML search scope.
You don't need to configure this parameter if you specify a Stop parameter. For more information, see Section nuances.

Examples

See the following topics:

Notes

For details about vertical sections, see Section nuances.


Did this page help you?