feature¶
-
txttk.feature.
lexical
(token)[source]¶ Extract lexical features from given token There are 3 kinds of lexical features, take ‘Hello’ as an example:
- lowercase: ‘hello’
- first4: ‘hell’
- last4: ‘ello’
-
txttk.feature.
orthographic
(token)[source]¶ Extract orthographic features from a given token
There are 11 kinds of orthographic features, take ‘Windows10’ as an example:
- shape: ‘Aaaaaaa00’
- length: 9
- contains_a_letter: True
- contains_a_capital: True
- begins_with_capital: True
- all_capital: False
- contains_a_digit: True
- all_digit: False
- contains_a_punctuation: False
- consists_letters_n_digits: True
- consists_digits_n_punctuations: False