Just a minor issue. The agressive_dash_splits
is misspelled. It should be aggressive_dash_splits
. Or maybe use hyphen
instead of dash
to be consistent with both the AGGRESSIVE_HYPHEN_SPLIT
class member and with tokenizer.perl
.
http://www.nltk.org/api/nltk.tokenize.html#nltk.tokenize.moses.MosesTokenizer.tokenize
Also this functionality does not appear to be tested.
Thanks @somnathrakshit for the quick PR. Note that altering the parameter name breaks the API, so it might be better to first provide it as an option with a DeprecationWarning when the old parameter name is used, then it can be fully removed at the next major version. Maybe a regular NLTK dev can comment on the procedure here, as I didn't see it mentioned explicitly in the developer guidelines or CONTRIBUTING.md doc. @alvations, are there any guidelines or precedents for changing function/parameter names?
@goodmami @somnathrakshit no worries about breaking API in this case. Most people would be more stymied by the typo argument instead of the correct one =)
Regarding deprecation and breaking user space, in this case it's our fault and it's easier for users to update to new NLTK version.
But in other cases, esp. when it comes to more major changes that's not just typo, we'll can use warnings
like what we did with deprecating Stanford tools https://github.com/nltk/nltk/blob/develop/nltk/tag/stanford.py#L51
Thanks @alvations for letting us know. As a beginner in open source, nltk has been nice to tinker with. Are you taking part in GSoC 2018?
Resolved in #1956
@somnathrakshit Thanks for the contribution! Unfortunately, we're not taking part in GSoC 2018. Perhaps another year when we have more volunteers =)