I was looking through the tokenizers,
and I spotted Moses Tokenizer
https://github.com/nltk/nltk/blob/cdaa7dd4e60251390a82e42456edb359191ea6e8/nltk/tokenize/moses.py#L23
which is ported from the perl script:
https://github.com/moses-smt/mosesdecoder/blob/ae7aa6a9d25be49ab4c15ec68515e74490af399b/scripts/tokenizer/tokenizer.perl#L3-L4
Which currently says that it is LGPL 2.1+
As I understand it one can not incorporate LGPL 2 or 3 into Apache 2.
but I am not 100% (if it were full GPL I know you can't incorporate that)
Yes, it is in violation, sadly.
And it's not possible to get permission from Moses maintainers. https://www.mail-archive.com/[email protected]/msg15864.html
- A program that contains no derivative of any portion of the
Library, but is designed to work with the Library by being compiled or
linked with it, is called a "work that uses the Library". Such a
work, in isolation, is not a derivative work of the Library, and
therefore falls outside the scope of this License.
Why not separating the moses-derived works into a new package (e.g. nltk.moses) licensed under LGPL and let everyone, including Marian, use it without propagating the LGPL?
@noe: nice idea - I'd be happy to consider a PR
@stevenbird I'd be glad to contribute. From the moses mailing list I understand that @alvations may contact the authors to request their permission to have the derived code as Apache, and hence not needing to modify anything in the NLTK code. @alvations is that correct?
If that attempt does not succeed: as new repo's cannot be subject to PR's, we can either have NLTK create a new empty repo to which I would PR or, alternatively, I can create a new repo with the LGPL code and give its ownership to nltk.
Thanks @noe. Yes, let's wait to hear if @alvations has any luck.
Oh, looks like the answer is already no.
Sorry for being away for a while. The short answer is no until we get everyone to agree on Moses side (which is hard unless we're at WMT and MTM to just ask almost everyone for their permissions when they're physically there).
The best solution is to have some sort of LGPL repo for nltk_contrib
. Now all LGPL code goes there and then we do a git submodule add
.
Additionally on Moses side, lets see how far we can push them in terms of creating a wholly independent module to keep their Python code so that we can import them as dependencies from PyPI. (This will take some work though).
I've repackaged MoseTokenizer
as a separate library and I think we can either wrap around it in NLTK or add a deprecation message and ask users to use the new package. https://github.com/alvations/sacremoses
Should I transfer the ownership to NLTK organization on github? Not sure how the Moses community feels about this, let me try to ask them first.
Note: The SacreMoses was just a continuation of the SacreBLEU chain of tools coming out of the exodus of Moses scripts.
Resolved.
Most helpful comment
LGPL has linking exception:
Why not separating the moses-derived works into a new package (e.g. nltk.moses) licensed under LGPL and let everyone, including Marian, use it without propagating the LGPL?