Nltk: Is the Moses Tokenizer in violation of it's license?

Created on 10 Apr 2018  ·  9Comments  ·  Source: nltk/nltk

I was looking through the tokenizers,
and I spotted Moses Tokenizer
https://github.com/nltk/nltk/blob/cdaa7dd4e60251390a82e42456edb359191ea6e8/nltk/tokenize/moses.py#L23

which is ported from the perl script:
https://github.com/moses-smt/mosesdecoder/blob/ae7aa6a9d25be49ab4c15ec68515e74490af399b/scripts/tokenizer/tokenizer.perl#L3-L4

Which currently says that it is LGPL 2.1+

As I understand it one can not incorporate LGPL 2 or 3 into Apache 2.
but I am not 100% (if it were full GPL I know you can't incorporate that)

admin

Most helpful comment

LGPL has linking exception:

  1. A program that contains no derivative of any portion of the
    Library, but is designed to work with the Library by being compiled or
    linked with it, is called a "work that uses the Library". Such a
    work, in isolation, is not a derivative work of the Library, and
    therefore falls outside the scope of this License.

Why not separating the moses-derived works into a new package (e.g. nltk.moses) licensed under LGPL and let everyone, including Marian, use it without propagating the LGPL?

All 9 comments

Yes, it is in violation, sadly.

And it's not possible to get permission from Moses maintainers. https://www.mail-archive.com/[email protected]/msg15864.html

LGPL has linking exception:

  1. A program that contains no derivative of any portion of the
    Library, but is designed to work with the Library by being compiled or
    linked with it, is called a "work that uses the Library". Such a
    work, in isolation, is not a derivative work of the Library, and
    therefore falls outside the scope of this License.

Why not separating the moses-derived works into a new package (e.g. nltk.moses) licensed under LGPL and let everyone, including Marian, use it without propagating the LGPL?

@noe: nice idea - I'd be happy to consider a PR

@stevenbird I'd be glad to contribute. From the moses mailing list I understand that @alvations may contact the authors to request their permission to have the derived code as Apache, and hence not needing to modify anything in the NLTK code. @alvations is that correct?

If that attempt does not succeed: as new repo's cannot be subject to PR's, we can either have NLTK create a new empty repo to which I would PR or, alternatively, I can create a new repo with the LGPL code and give its ownership to nltk.

Thanks @noe. Yes, let's wait to hear if @alvations has any luck.

Oh, looks like the answer is already no.

Sorry for being away for a while. The short answer is no until we get everyone to agree on Moses side (which is hard unless we're at WMT and MTM to just ask almost everyone for their permissions when they're physically there).

The best solution is to have some sort of LGPL repo for nltk_contrib. Now all LGPL code goes there and then we do a git submodule add.

Additionally on Moses side, lets see how far we can push them in terms of creating a wholly independent module to keep their Python code so that we can import them as dependencies from PyPI. (This will take some work though).

I've repackaged MoseTokenizer as a separate library and I think we can either wrap around it in NLTK or add a deprecation message and ask users to use the new package. https://github.com/alvations/sacremoses

Should I transfer the ownership to NLTK organization on github? Not sure how the Moses community feels about this, let me try to ask them first.

Note: The SacreMoses was just a continuation of the SacreBLEU chain of tools coming out of the exodus of Moses scripts.

Resolved.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Chris00 picture Chris00  ·  3Comments

stevenbird picture stevenbird  ·  4Comments

goodmami picture goodmami  ·  4Comments

zdog234 picture zdog234  ·  3Comments

chaseireland picture chaseireland  ·  3Comments