Bolstad: how to define tokenizing rules

Saturday, 24 August 2013

how to define tokenizing rules

how to define tokenizing rules

I want to tokenize strings like:
'my name.is(johnny ,knoxville):'
into:
['my', 'name', '.', 'is', '(johnny ,knoxville)', ':']
As you can notice, whitespace separates tokens, non-alphanumeric chars are
not grouped with alphanumeric chars, and there's another exception:
Everything enclosed in parenthesis is taken as a whole token.
I'm not sure if I should use python RE, some python module I don't know
about or an external lib like pyparsing
Any ideas?

Bolstad

Saturday, 24 August 2013

how to define tokenizing rules

No comments:

Post a Comment