Nltk: find_concordance () left_context์— ๋Œ€ํ•ด ๋นˆ ๋ชฉ๋ก์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

์— ๋งŒ๋“  2018๋…„ 08์›” 20์ผ  ยท  4์ฝ”๋ฉ˜ํŠธ  ยท  ์ถœ์ฒ˜: nltk/nltk

if offsets:
            for i in offsets:
                query_word = self._tokens[i]
                # Find the context of query word.
                left_context = self._tokens[i-context:i]

๊ฒ€์ƒ‰์–ด์˜ ์ฒซ ๋ฒˆ์งธ ํ•ญ๋ชฉ์ด ํ…์ŠคํŠธ์˜ ์‹œ์ž‘ ๋ถ€๋ถ„ (์˜ˆ : ์˜คํ”„์…‹ 7) ์ธ ๊ฒฝ์šฐ width ๋งค๊ฐœ ๋ณ€์ˆ˜๊ฐ€ 20์œผ๋กœ ์„ค์ •๋˜์–ด ์žˆ๋‹ค๊ณ  ๊ฐ€์ •ํ•˜๋ฉด [i- context : i ]๋Š” [-13 : 7]๋กœ ํ‰๊ฐ€๋ฉ๋‹ˆ๋‹ค. .
์ด ๊ฒฝ์šฐ ํ…์ŠคํŠธ๊ฐ€ 20 ๊ฐœ ์ด์ƒ์˜ ๋‹จ์–ด๋กœ ๊ตฌ์„ฑ๋œ ๊ฒฝ์šฐ left_context ๋ณ€์ˆ˜๋Š” ํ…์ŠคํŠธ์˜ ์ฒ˜์Œ 7 ๊ฐœ ๋‹จ์–ด๋ฅผ ํฌํ•จํ•˜๋Š” ๋ชฉ๋ก์ด ์•„๋‹ˆ๋ผ ๋นˆ ๋ชฉ๋ก์ด๋ฉ๋‹ˆ๋‹ค.

๊ฐ„๋‹จํ•œ ์ˆ˜์ •์€ ๋‹ค์Œ์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

if offsets:
    for i in offsets:
        query_word = self._tokens[i]
        # Find the context of query word.
        if i - context < 0:
            left_context = self._tokens[:i]
        else:
            left_context = self._tokens[i-context:i]
bug corpus goodfirstbug

๋ชจ๋“  4 ๋Œ“๊ธ€

ํšŒ๊ท€ ํ…Œ์ŠคํŠธ์— ์ถ”๊ฐ€ ํ•  ์ˆ˜ ์žˆ๋„๋ก ์ƒ˜ํ”Œ ์ž…๋ ฅ๊ณผ ์›ํ•˜๋Š” ์ถœ๋ ฅ์„ ์ œ๊ณต ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๊นŒ?

์ž…๋ ฅ:

jane_eyre = 'Chapter 1\nTHERE was no possibility of taking a walk that day. We had been wandering, indeed, in the leafless shrubbery an hour in the morning; but since dinner (Mrs. Reed, when there was no company, dined early) the cold winter wind had brought with it clouds so sombre, and a rain so penetrating, that further outdoor exercise was now out of the question.'
text = nltk.Text(nltk.word_tokenize(jane_eyre))
text.concordance('taking')
text.concordance_list('taking')[0]

์ถœ๋ ฅ (NLTK 3.3) :

Displaying 1 of 1 matches:
    taking a walk that day . We had been wander
ConcordanceLine(left=[],
                query='taking',
                right=['a', 'walk', 'that', 'day', '.', 'We', 'had', 'been', 'wandering', ',', 'indeed', ',', 'in', 'the', 'leafless', 'shrubbery', 'an', 'hour'],
                offset=7,
                left_print='',
                right_print='a walk that day . We had been wande',
                line=' taking a walk that day . We had been wande')

์›ํ•˜๋Š” ์ถœ๋ ฅ :

Displaying 1 of 1 matches:
    Chapter 1 THERE was no possibility of taking a walk that day . We had been wander
ConcordanceLine(left=['Chapter', '1', 'THERE', 'was', 'no', 'possibility', 'of'],
                query='taking',
                right=['a', 'walk', 'that', 'day', '.', 'We', 'had', 'been', 'wandering', ',', 'indeed', ',', 'in', 'the', 'leafless', 'shrubbery', 'an', 'hour'],
                offset=7,
                left_print='Chapter 1 THERE was no possibility of',
                right_print='a walk that day . We had been wande',
                line='Chapter 1 THERE was no possibility of taking a walk that day . We had been wande')

๋ฒ„๊ทธ๋ฅผ๋ณด๊ณ  ํ•ด ์ฃผ์‹  @BLKSerene ์—๊ฒŒ ๊ฐ์‚ฌ๋“œ๋ฆฝ๋‹ˆ๋‹ค!

์•„, ์—ฌ๊ธฐ์— ๋ฉ‹์ง„ ํ•ด๊ฒฐ์ฑ…์ด ์žˆ์Šต๋‹ˆ๋‹ค. if-else ๋Œ€์‹ . ์ตœ์†Œ ๊ฒฝ๊ณ„๋ฅผ max() ํด๋ฆฌํ•‘ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ :

left_context = self._tokens[max(0, i-context):i]

์ง€์†์ ์ธ ํ†ตํ•ฉ / ํšŒ๊ท€ ํ…Œ์ŠคํŠธ๋ฅผ ์œ„ํ•ด https://github.com/nltk/nltk/blob/develop/nltk/test/concordance.doctest์— doctest๋ฅผ ์ถ”๊ฐ€ํ•˜๋ฉด ๋งค์šฐ ๋„์›€์ด ๋  ๊ฒƒ์ž…๋‹ˆ๋‹ค =)

Patching https://github.com/nltk/nltk/issues/2088
The left slice of the left context should be clip to 0 if the `i-context` < 0.

>>> from nltk import Text, word_tokenize
>>> jane_eyre = 'Chapter 1\nTHERE was no possibility of taking a walk that day. We had been wandering, indeed, in the leafless shrubbery an hour in the morning; but since dinner (Mrs. Reed, when there was no company, dined early) the cold winter wind had brought with it clouds so sombre, and a rain so penetrating, that further outdoor exercise was now out of the question.'
>>> text = Text(word_tokenize(jane_eyre))
>>> text.concordance_list('taking')[0].left
['Chapter', '1', 'THERE', 'was', 'no', 'possibility', 'of']

# 2103์—์„œ ํŒจ์น˜๋˜์—ˆ์Šต๋‹ˆ๋‹ค. @BLKSerene ๋ฐ @ dnc1994 ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค!

์ด ํŽ˜์ด์ง€๊ฐ€ ๋„์›€์ด ๋˜์—ˆ๋‚˜์š”?
0 / 5 - 0 ๋“ฑ๊ธ‰