Match object

Matches are instructions for matching lines of text in a document. They're valid elements in anchors and other objects.

See the following sections for more information:

Match types

Match arrays

Match types

Global parameters

The following parameters are available to most types of Match objects:

keyvaluesdescription
minimumHeightnumberThe minimum height of the matched line's boundaries, in inches. Not valid as a top-level parameter for a Boolean match, but valid for individual matches in its array.
maximumHeightnumberThe maximum height of the matched line's boundaries, in inches. Not valid as a top-level parameter for a Boolean match, but valid for individual matches in its array.
reverseboolean. default: falseUse in match arrays. Don't set this to true for the first match in the array.
If true, searches for a match in lines that precede the previous match in the array. For example, in an array with matches A and B, if B is a First match with "reverse":true, then Sensible matches the first line that precedes the line matched by A. For an example, see Match arrays.

Simple match

Match using strings.

Parameters

keyvaluesdescription
text (required)stringThe string to match
type (required)equals, startsWith, endsWith, includesequals: The matching line must equal the string
startsWith: Match at beginning of line
endsWIth: Match at end of line
includes: Match anywhere in line
editDistanceinteger. the number of allowed edits for a fuzzy match.Configure this parameter to allow fuzzy, or approximate, string matching. This is useful for OCR text, like poor-quality scans or handwriting. For example, if you configure 3, then Sensible matches kitten in the document for sitting in the Text parameter. Sensible implements fuzzy matching using Levenshtien distance.
Sensible recommends avoiding setting this parameter on short matches, like "A:" or "Sub", because an edit distance as low as 2 on a short match can result in a large number of line matches and impact performance. Generally, you increase edit distances values as you increase the length of the text match. See the Examples section for an example.
isCaseSensitiveboolean. Default: false.If true, match the string taking into account upper- and lower-case characters.

SYNTAX EXAMPLE

The following config uses a simple match:

  {
  "fields": [
    {
      "id": "simple_anchor",
      "anchor": {
        "match": {
          "type": "startsWith",
          "text": "The line to match must start with this text",
        }
      },
      "method": {
        "id": "label",
        "position": "below"
      }
    }
  ]
} 

For even simpler matching syntax in anchors, you can use "anchor":"some string to match". For more information see Anchor.

EDIT DISTANCE EXAMPLE

The following example shows setting the Edit Distance parameter on a simple match for a poor-quality photographed document, so that the anchor 6 City state and ZIP code matches the incorrect OCR output of 6 Chi state and ZIP code.

Config

{
  "fields": [
    {
      "id": "simple_anchor",
    
      "anchor": {
        "match": {
          "editDistance": 3,
          "isCaseSensitive": false,
          "type": "startsWith",
          "text": "6 city state and zip code"
        }
      },
      "method": {
        "id": "label",
        "position": "below"
      }
    }
  ]
}

Example document
The following image shows the example document used with this example config:

Click to enlarge

Example PDFDownload link

Output

{
  "simple_anchor": {
    "type": "string",
    "value": "SomeCity, NJ, $70101"
  }
}

Regex match

Match using a regular expression.

Parameters

keyvaluesdescription
type (required)regex
pattern (required)valid JS regexJavascript-flavored regular expression. This parameter doesn't support capturing groups. See the Regex method instead.
Double escape special characters since the regex is in a JSON object. For example, \\s, not \s , to represent a whitespace character.
Sensible throws an error if you specify a pattern that can match an empty string, for example, .*.
flagsJS-flavored regex flags.Flags to apply to the regex. for example: "i" for case-insensitive.

Example

For an example, see the Passthrough method example.

First match

This is a convenience match to find the first line encountered.

Parameters

keyvaluesdescription
type (required)firstMatches the first line encountered, either 1. in the first page of the document or 2. after the preceding matched line in a match array.

Example

This example matches the first line after a matched line in an array:

{
  "fields": [
    {
      "id": "first_line_after_match",
      "anchor": {
        "match": [
          {
            "type": "includes",
            "text": "match this line, then anchor on the first line after it"
          },
          {
            "type": "first"
          }
        ]
      },
      "method": {
        "id": "label",
        "position": "below"
      }
    }
  ],
}

Boolean matches

Use Boolean matches to write Boolean logic about your matches. For example, use the Any match to match on an array of synonymous terms if a document contains small wording variations across revisions.

Parameters

keyvaluesdescription
type (required)any, all, notany : Same behavior as Boolean operator "or". Finds a line that meets any of the match conditions in the array.
all Same behavior as Boolean operator "and". Finds a line that meets all of the match conditions in the array.
not Same behavior as Boolean operator "not". Finds a line if it doesn't meet the match condition.
matches (required for any and all)Array of Match objects. All match types are valid in the array except firstUse with any and all. You can nest Boolean matches using this parameter.
match (required for not)Match object. All match types are valid except firstUse with not

EXAMPLE

Config

{
  "fields": [
    {
      "id": "test_boolean_matches",
      /* to show matching behavior, output all matching
         anchor lines, not just the first match */
      "match": "all",
      "method": {
        /* to show matching behavior, use passthrough
           to output anchor text
           */
        "id": "passthrough"
      },
      "anchor": {
        "match": [
          {
            /* match a line if meets the conditions
               of ANY of the following array of matches */
            "type": "any",
            "matches": [
              /* match a line that includes "special"  */
              {
                "type": "includes",
                "text": "special"
              },
              /* match a line that meets ALL of the conditions:
                 it includes "header" 
                 but NOT "should not" */
              {
                "type": "all",
                "matches": [
                  {
                    "type": "includes",
                    "text": "header"
                  },
                    /* note that "not"  */
                  {
                    "type": "not",
                    "match": {
                      "type": "includes",
                      "text": "should not"
                    }
                  }
                ]
              }
            ]
          }
        ]
      }
    }
  ]
}

Example document
The following image shows the example document used with this example config:

Click to enlarge

Example PDFDownload link

Output

{
  "test_boolean_matches": [
    {
      "type": "string",
      "value": "This is a header."
    },
    {
      "type": "string",
      "value": "This is a special line."
    }
  ]
}

Match arrays

You can create complex matches by using any of the preceding match types in an array. For information, see match arrays.