Nltk: find_concordance（）返回left_context的空列表

创建于 2018-08-20 · 4评论 · 资料来源: nltk/nltk

if offsets:
            for i in offsets:
                query_word = self._tokens[i]
                # Find the context of query word.
                left_context = self._tokens[i-context:i]

当搜索词的第一次出现是在文本的开头（例如，在偏移量7处）时，假设width参数设置为20，则[i- context：i ]将被评估为[-13：7] 。
在这种情况下，如果文本包含20个以上的单词，则变量left_context将是一个空列表，而不是包含该文本的前7个单词的列表。

一个简单的解决办法是：

if offsets:
    for i in offsets:
        query_word = self._tokens[i]
        # Find the context of query word.
        if i - context < 0:
            left_context = self._tokens[:i]
        else:
            left_context = self._tokens[i-context:i]

bug corpus goodfirstbug

资料来源

BLKSerene

所有4条评论

您能否提供样本输入和所需的输出，以便我们可以将其添加到回归测试中？

alvations 于 2018-08-20

输入：

jane_eyre = 'Chapter 1\nTHERE was no possibility of taking a walk that day. We had been wandering, indeed, in the leafless shrubbery an hour in the morning; but since dinner (Mrs. Reed, when there was no company, dined early) the cold winter wind had brought with it clouds so sombre, and a rain so penetrating, that further outdoor exercise was now out of the question.'
text = nltk.Text(nltk.word_tokenize(jane_eyre))
text.concordance('taking')
text.concordance_list('taking')[0]

输出（NLTK 3.3）：

Displaying 1 of 1 matches:
    taking a walk that day . We had been wander
ConcordanceLine(left=[],
                query='taking',
                right=['a', 'walk', 'that', 'day', '.', 'We', 'had', 'been', 'wandering', ',', 'indeed', ',', 'in', 'the', 'leafless', 'shrubbery', 'an', 'hour'],
                offset=7,
                left_print='',
                right_print='a walk that day . We had been wande',
                line=' taking a walk that day . We had been wande')

所需的输出：

Displaying 1 of 1 matches:
    Chapter 1 THERE was no possibility of taking a walk that day . We had been wander
ConcordanceLine(left=['Chapter', '1', 'THERE', 'was', 'no', 'possibility', 'of'],
                query='taking',
                right=['a', 'walk', 'that', 'day', '.', 'We', 'had', 'been', 'wandering', ',', 'indeed', ',', 'in', 'the', 'leafless', 'shrubbery', 'an', 'hour'],
                offset=7,
                left_print='Chapter 1 THERE was no possibility of',
                right_print='a walk that day . We had been wande',
                line='Chapter 1 THERE was no possibility of taking a walk that day . We had been wande')

BLKSerene 于 2018-08-21

👍1

感谢@BLKSerene报告该错误！

啊，这里有个很不错的解决方法。而不是if-else。我们可以将最小值限制为max() ，例如

left_context = self._tokens[max(0, i-context):i]

将doctest添加到https://github.com/nltk/nltk/blob/develop/nltk/test/concordance.doctest以进行持续集成/回归测试将非常有帮助=）

Patching https://github.com/nltk/nltk/issues/2088
The left slice of the left context should be clip to 0 if the `i-context` < 0.

>>> from nltk import Text, word_tokenize
>>> jane_eyre = 'Chapter 1\nTHERE was no possibility of taking a walk that day. We had been wandering, indeed, in the leafless shrubbery an hour in the morning; but since dinner (Mrs. Reed, when there was no company, dined early) the cold winter wind had brought with it clouds so sombre, and a rain so penetrating, that further outdoor exercise was now out of the question.'
>>> text = Text(word_tokenize(jane_eyre))
>>> text.concordance_list('taking')[0].left
['Chapter', '1', 'THERE', 'was', 'no', 'possibility', 'of']

alvations 于 2018-08-23

👍1

在＃2103中进行了修补。感谢@BLKSerene和@ dnc1994！

alvations 于 2018-09-19

此页面是否有帮助？

0 / 5 - 0 等级

Nltk: find_concordance（）返回left_context的空列表

所有4条评论

相关问题