Sensible uses the bag of words approach (measuring the frequency of word occurrence) for tasks such as identifying topics or tables in a document. As part of creating a vocabulary for the bag of words, Sensible processes strings in Terms and Stop Terms parameters using:
- stemming - Sensible uses a Porter stemmer.
- tokenization - Sensible creates tokens by splitting on spaces.
Updated 8 months ago