Split lines

Splits lines distributed along a horizontal axis. This preprocessor is most useful for typewriter-style documents that use whitespaces for formatting.

Parameters

Note: For the full list of parameters available for this method, see Global parameters for methods. The following table shows parameters most relevant to or specific to this method.

keyvaluedescription
type (required)splitLinessplits lines distributed along a horizontal axis.
minSpaces (required)numberThe number of consecutive whitespace characters ( ) at or above which to split lines.
separatorstringModifies the Min Spaces parameter to split on the specified character, for example "-", instead of the default whitespace character. For example, if you specify "-" for this parameter and 2 for the Min Spaces parameter, then Sensible splits lines when it finds --.

Examples

The following example shows solving undersplit lines in a "typewritten" style PDF. The Split Lines preprocessor preserves columns and rows in this document.

PROBLEM

Without the Split Lines preprocessor, Sensible merges the lines too aggressively:

Click to enlargeClick to enlarge

SOLUTION

Config

{
  "preprocessors": [
    {
      "type": "splitLines",
      "minSpaces": 3
    }
  ],
  "fields": [
    {
      "id": "policy_number",
      "method": {
        "id": "row",
      },
      "anchor": "policy number",
    }
  ]
}

Example document

The following image shows the example PDF used with this example config:

Click to enlargeClick to enlarge

Example PDFDownload link

Output

{
  "policy_number": {
    "type": "string",
    "value": "18-376-190"
  }
}

Did this page help you?