Nltk: ArabicStemmer AttributeError

创建于 2017-10-11 · 7评论 · 资料来源: nltk/nltk

我无法使用SnowballStemmer阻止某些阿拉伯语术语。成功阻止了许多术语，但某些术语导致引发AttributeError。请参阅下面的最小示例，该示例在术语“发件人”上失败。

(anaconda2-4.4.0) richard-balmer-macbook:~ richardbalmer$ pip freeze | grep nltk
nltk==3.2.5
(anaconda2-4.4.0) richard-balmer-macbook:~ richardbalmer$ ipython
Python 2.7.13 |Anaconda custom (x86_64)| (default, Dec 20 2016, 23:05:08)
Type "copyright", "credits" or "license" for more information.

IPython 5.3.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: from nltk.stem.snowball import SnowballStemmer

In [2]: stemmer = SnowballStemmer('arabic')

In [3]: stemmer.stem(u'تسدد')
Out[3]: u'\u062a\u0633\u062f\u062f'

In [4]: stemmer.stem(u'من')
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-4-ffa733106049> in <module>()
----> 1 stemmer.stem(u'من')

/Users/richardbalmer/.pyenv/versions/anaconda2-4.4.0/lib/python2.7/site-packages/nltk/stem/snowball.pyc in stem(self, word)
    762                 modified_word = self.__Suffix_Verb_Step2b(modified_word)
    763                 if not self.suffix_verb_step2b_success:
--> 764                     modified_word = self.__Suffix_Verb_Step2a(modified_word)
    765         if self.is_noun:
    766             modified_word = self.__Suffix_Noun_Step2c2(modified_word)

/Users/richardbalmer/.pyenv/versions/anaconda2-4.4.0/lib/python2.7/site-packages/nltk/stem/snowball.pyc in __Suffix_Verb_Step2a(self, token)
    533                     break
    534
--> 535                 if suffix in self.__conjugation_suffix_verb_present and len(token) > 5:
    536                     token = token[:-2]  # present
    537                     self.suffix_verb_step2a_success = True

AttributeError: 'ArabicStemmer' object has no attribute '_ArabicStemmer__conjugation_suffix_verb_present'

bug pleaseverify resolved tests

资料来源

richbalmer

👍2

最有用的评论

@richbalmer感谢您报告此问题。

@LBenzahia你能帮忙研究一下吗？提前致谢！

alvations 于 2017-10-13

👍2

所有7条评论

@richbalmer感谢您报告此问题。

@LBenzahia你能帮忙研究一下吗？提前致谢！

alvations 于 2017-10-13

👍2

@richbalmer ，您好
无论如何，我已经解决了此PR＃1856中的问题。
再次感谢你！

LBenzahia 于 2017-10-13

@LBenzahia感谢您如此迅速地调查此事！我越来越：

  File "/Users/richardbalmer/src/nltk/nltk/stem/util.py", line 24
    arabic_stopwords = ['إذ',
                             ^
SyntaxError: Non-ASCII character '\xd8' in file /Users/richardbalmer/src/nltk/nltk/stem/util.py on line 24, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

这似乎也导致测试在Jenkins上失败（https://nltk.ci.cloudbees.com/job/pull_request_tests/454/TOXENV=py27-jenkins,jdk=jdk8latestOnlineInstall/testReport/nose.failure/Failure/runTest /）。我认为，所有你需要做的就是把# -*- coding: utf-8 -*-在顶部stem/util.py 。

另外，在本地修复该错误后，我得到了UnicodeWarning：

/Users/richardbalmer/src/nltk/nltk/stem/snowball.py:748: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
  if word in arabic_stopwords:

使这些停用词成为unicode字符串可能是值得的。

除此之外，您的修复程序对我来说效果很好-再次感谢！

ps另一个建议：测试集合包含比列表包含要快得多，因此可能值得将停用词列表作为集合。

richbalmer 于 2017-10-13

👍1

@richbalmer您在使用python2.7吗？，

使这些停用词成为unicode字符串可能是值得的。

为python2.7做完了，再次测试并告诉我，它对我来说很好。我已经更新了公关

LBenzahia 于 2017-10-13

是的，我使用的是2.7。看起来不错@LBenzahia-再次感谢！

richbalmer 于 2017-10-16

👍1

仍然有错误：
AttributeError：'ArabicStemmer'对象没有属性'_ArabicStemmer__conjugation_suffix_verb_present'

我正在使用python 3

NouraAls 于 2018-02-18

@NouraAls在PR中解决

LBenzahia 于 2018-02-18

此页面是否有帮助？

0 / 5 - 0 等级

Nltk: ArabicStemmer AttributeError

最有用的评论

所有7条评论

相关问题