Nltk: find_concordance () renvoie une liste vide pour left_context

Créé le 20 août 2018  ·  4Commentaires  ·  Source: nltk/nltk

if offsets:
            for i in offsets:
                query_word = self._tokens[i]
                # Find the context of query word.
                left_context = self._tokens[i-context:i]

Lorsque la première occurrence du terme recherché est au début du texte (par exemple à l'offset 7), supposons que le paramètre width soit défini sur 20, alors [i- context: i ] serait évalué comme [-13: 7] .
Dans ce cas, si le texte comprend plus de 20 mots, la variable left_context serait une liste vide, plutôt qu'une liste contenant les 7 premiers mots du texte.

Une solution simple ferait:

if offsets:
    for i in offsets:
        query_word = self._tokens[i]
        # Find the context of query word.
        if i - context < 0:
            left_context = self._tokens[:i]
            left_context = self._tokens[i-context:i]
bug corpus goodfirstbug

Tous les 4 commentaires

Pourriez-vous fournir un exemple d'entrée et la sortie souhaitée afin que nous puissions ajouter au test de régression?


jane_eyre = 'Chapter 1\nTHERE was no possibility of taking a walk that day. We had been wandering, indeed, in the leafless shrubbery an hour in the morning; but since dinner (Mrs. Reed, when there was no company, dined early) the cold winter wind had brought with it clouds so sombre, and a rain so penetrating, that further outdoor exercise was now out of the question.'
text = nltk.Text(nltk.word_tokenize(jane_eyre))

Sortie (NLTK 3.3):

Displaying 1 of 1 matches:
    taking a walk that day . We had been wander
                right=['a', 'walk', 'that', 'day', '.', 'We', 'had', 'been', 'wandering', ',', 'indeed', ',', 'in', 'the', 'leafless', 'shrubbery', 'an', 'hour'],
                right_print='a walk that day . We had been wande',
                line=' taking a walk that day . We had been wande')

Sortie désirée:

Displaying 1 of 1 matches:
    Chapter 1 THERE was no possibility of taking a walk that day . We had been wander
ConcordanceLine(left=['Chapter', '1', 'THERE', 'was', 'no', 'possibility', 'of'],
                right=['a', 'walk', 'that', 'day', '.', 'We', 'had', 'been', 'wandering', ',', 'indeed', ',', 'in', 'the', 'leafless', 'shrubbery', 'an', 'hour'],
                left_print='Chapter 1 THERE was no possibility of',
                right_print='a walk that day . We had been wande',
                line='Chapter 1 THERE was no possibility of taking a walk that day . We had been wande')

Merci @BLKSerene d' avoir signalé le bogue!

Ah, il y a une solution sympa ici. Au lieu du if-else. Nous pouvons couper la borne minimale à un max() , par exemple

left_context = self._tokens[max(0, i-context):i]

Ajouter le doctest à pour le test d'intégration / régression continue serait très utile =)

The left slice of the left context should be clip to 0 if the `i-context` < 0.

>>> from nltk import Text, word_tokenize
>>> jane_eyre = 'Chapter 1\nTHERE was no possibility of taking a walk that day. We had been wandering, indeed, in the leafless shrubbery an hour in the morning; but since dinner (Mrs. Reed, when there was no company, dined early) the cold winter wind had brought with it clouds so sombre, and a rain so penetrating, that further outdoor exercise was now out of the question.'
>>> text = Text(word_tokenize(jane_eyre))
>>> text.concordance_list('taking')[0].left
['Chapter', '1', 'THERE', 'was', 'no', 'possibility', 'of']

Corrigé dans # 2103. Merci @BLKSerene et @ dnc1994!

Cette page vous a été utile?
0 / 5 - 0 notes