Spreadsheet method

For large spreadsheets with tens of thousands of rows, the Cell Rows field can extract rows under a specified column-headings row. Sensible extracts rows until the end of the document. This field is a speedier alternative to general-purpose SenseML methods, which you can use with smaller spreadsheets.

Notes:

  • This method doesn't work with PDFs. You must upload the spreadsheet to Sensible as one of the supported spreadsheet file types.

Parameters

keyvaluedescription
id (required)stringSpecifies an ID for a group of rows to extract in the spreadsheet. Sensible ignores empty rows and extracts data under the specified Header Row to the end of the worksheet.
type (required)cellRowsSpecifies that this field extracts spreadsheet rows.
headerRow (required)Anchor objectSpecifies the row containing column headers, by matching the specified line or lines in the row. Sensible ignores empty cells in the header row. Contains the following parameters:
-match: A Match object or array of Match objects.
fieldsarray of computed fields or spreadsheet-specific fieldsSpecifies either:


- fields that use a spreadsheet-specific method, cell. The cell method extracts a cell under the specified header for each extracted row. It contains the following parameters:
id: cell. Note: The method object's global parameters aren't available for this method.
header: A Match object that specifies the column heading under which you want to extract cells. For an example, see the following section.


- fields that use computed fields methods.

Examples

The following example shows using a Cell Rows field to extract rows from a spreadsheet.

Config

{
  "fields": [
    {
      "id": "bestselling_books",
      "type": "cellRows",
      /* Sensible extracts specified cells from all the rows under the row that contains the matching strings "book" and "author", until the end of the document */
      "headerRow": {
        "match": [
          {
            "type": "startsWith",
            "text": "book"
          },
          {
            "type": "startsWith",
            "text": "author"
          }
        ],
      },
      "fields": [
        {
          "id": "book",
          "method": {
            "id": "cell",
            /* for each row, extract the cell under the header that starts with 
               the text `book` (skips empty rows)  */
            "header": {
              "type": "startsWith",
              "text": "book",
            },
          }
        },
        {
          "id": "first_published",
          "method": {
            "id": "cell",
            /* for each row, extract the cell under the header containing `published` */
            "header": {
              "type": "includes",
              "text": "published",
            },
          }
        }
      ]
    },
  ]
}

Example document
The following image shows the example document used with this example config:

Click to enlarge

Example documentDownload link

Output

{
  "bestselling_books": [
    {
      "book": {
        "value": "A Tale of Two Cities",
        "type": "string"
      },
      "first_published": {
        "value": "1859",
        "type": "string"
      }
    },
    {
      "book": {
        "value": "The Little Prince (Le Petit Prince)",
        "type": "string"
      },
      "first_published": {
        "value": "1943",
        "type": "string"
      }
    },
    {
      "book": {
        "value": "The Alchemist (O Alquimista)",
        "type": "string"
      },
      "first_published": {
        "value": "1988",
        "type": "string"
      }
    },
    {
      "book": {
        "value": "Harry Potter and the Philosopher's Stone",
        "type": "string"
      },
      "first_published": {
        "value": "1997",
        "type": "string"
      }
    },
    {
      "book": {
        "value": "And Then There Were None",
        "type": "string"
      },
      "first_published": {
        "value": "1939",
        "type": "string"
      }
    },
    {
      "book": {
        "value": "Dream of the Red Chamber (紅樓夢)",
        "type": "string"
      },
      "first_published": {
        "value": "1791",
        "type": "string"
      }
    },
    {
      "book": {
        "value": "The Hobbit",
        "type": "string"
      },
      "first_published": {
        "value": "1937",
        "type": "string"
      }
    },
    {
      "book": {
        "value": "Alice's Adventures in Wonderland",
        "type": "string"
      },
      "first_published": {
        "value": "1865",
        "type": "string"
      }
    },
    {
      "book": {
        "value": "Between 50 million and 100 million copies",
        "type": "string"
      },
      "first_published": null
    },
    {
      "book": {
        "value": "The Lion, the Witch and the Wardrobe",
        "type": "string"
      },
      "first_published": {
        "value": "1950",
        "type": "string"
      }
    },
    {
      "book": {
        "value": "She: A History of Adventure",
        "type": "string"
      },
      "first_published": {
        "value": "1887",
        "type": "string"
      }
    },
    {
      "book": {
        "value": "The Da Vinci Code",
        "type": "string"
      },
      "first_published": {
        "value": "2003",
        "type": "string"
      }
    },
    {
      "book": {
        "value": "Harry Potter and the Chamber of Secrets",
        "type": "string"
      },
      "first_published": {
        "value": "1998",
        "type": "string"
      }
    },
    {
      "book": {
        "value": "The Catcher in the Rye",
        "type": "string"
      },
      "first_published": {
        "value": "1951",
        "type": "string"
      }
    },
    {
      "book": {
        "value": "The Bridges of Madison County",
        "type": "string"
      },
      "first_published": {
        "value": "1992",
        "type": "string"
      }
    },
    {
      "book": {
        "value": "One Hundred Years of Solitude (Cien años de soledad)",
        "type": "string"
      },
      "first_published": {
        "value": "1967",
        "type": "string"
      }
    },
    {
      "book": {
        "value": "Lolita",
        "type": "string"
      },
      "first_published": {
        "value": "1955",
        "type": "string"
      }
    },
    {
      "book": {
        "value": "Heidi",
        "type": "string"
      },
      "first_published": {
        "value": "1880",
        "type": "string"
      }
    },
    {
      "book": {
        "value": "The Common Sense Book of Baby and Child Care",
        "type": "string"
      },
      "first_published": {
        "value": "1946",
        "type": "string"
      }
    },
    {
      "book": {
        "value": "attribution:",
        "type": "string"
      },
      "first_published": null
    }
  ]
}