Nltk: CoreNLParser ํƒœ๊ทธ()๋Š” ์†์„ฑ ์˜ค๋ฒ„๋กœ๋”ฉ์„ ํ—ˆ์šฉํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

์— ๋งŒ๋“  2018๋…„ 09์›” 10์ผ  ยท  3์ฝ”๋ฉ˜ํŠธ  ยท  ์ถœ์ฒ˜: nltk/nltk

ํ˜„์žฌ CoreNLPParser.tag() ๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด Stanford CoreNLP์˜ "์žฌ ํ† ํฐํ™”"๊ฐ€ ์˜ˆ๊ธฐ์น˜ ์•Š๊ฒŒ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.

>>> from nltk.parse.corenlp import CoreNLPParser
>>> ner_tagger = CoreNLPParser(url='http://localhost:9000', tagtype='ner')
>>> sent = ['my', 'phone', 'number', 'is', '1111', '1111', '1111']
>>> ner_tagger.tag(sent)
[('my', 'O'),
 ('phone', 'O'),
 ('number', 'O'),
 ('is', 'O'),
 ('1111\xa01111\xa01111', 'NUMBER')]

์˜ˆ์ƒ๋˜๋Š” ๋™์ž‘์€ ๋‹ค์Œ๊ณผ ๊ฐ™์•„์•ผ ํ•ฉ๋‹ˆ๋‹ค.

>>> from nltk.parse.corenlp import CoreNLPParser
>>> ner_tagger = CoreNLPParser(url='http://localhost:9000', tagtype='ner')
>>> sent = ['my', 'phone', 'number', 'is', '1111', '1111', '1111']
>>> ner_tagger.tag(sent)
[('my', 'O'), ('phone', 'O'), ('number', 'O'), ('is', 'O'), ('1111', 'DATE'), ('1111', 'DATE'), ('1111', 'DATE')]

์ œ์•ˆ๋œ ์†”๋ฃจ์…˜์€ .tag() ๋ฐ .tag_sents() ๋Œ€ํ•œ properties ์ธ์ˆ˜ ์˜ค๋ฒ„๋กœ๋”ฉ์„ ํ—ˆ์šฉํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค(์˜ˆ: https://github.com/nltk/nltk/blob/develop/nltk/parse/). corenlp.py#L348 ๋ฐ ๊ธฐ๋ณธ์ ์œผ๋กœ properties = {'tokenize.whitespace':'true'} ๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. tag_sents() ๊ณต๋ฐฑ์œผ๋กœ ํ† ํฐ์„ ์—ฐ๊ฒฐํ•˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.


    def tag_sents(self, sentences, properties=None):
        """
        Tag multiple sentences.

        Takes multiple sentences as a list where each sentence is a list of
        tokens.

        :param sentences: Input sentences to tag
        :type sentences: list(list(str))
        :rtype: list(list(tuple(str, str))
        """
        # Converting list(list(str)) -> list(str)
        sentences = (' '.join(words) for words in sentences)
        if properties == None:
            properties = {'tokenize.whitespace':'true'}
        return [sentences[0] for sentences in self.raw_tag_sents(sentences, properties)]

    def tag(self, sentence, properties=None):
        """
        Tag a list of tokens.

        :rtype: list(tuple(str, str))

        >>> parser = CoreNLPParser(url='http://localhost:9000', tagtype='ner')
        >>> tokens = 'Rami Eid is studying at Stony Brook University in NY'.split()
        >>> parser.tag(tokens)
        [('Rami', 'PERSON'), ('Eid', 'PERSON'), ('is', 'O'), ('studying', 'O'), ('at', 'O'), ('Stony', 'ORGANIZATION'),
        ('Brook', 'ORGANIZATION'), ('University', 'ORGANIZATION'), ('in', 'O'), ('NY', 'O')]

        >>> parser = CoreNLPParser(url='http://localhost:9000', tagtype='pos')
        >>> tokens = "What is the airspeed of an unladen swallow ?".split()
        >>> parser.tag(tokens)
        [('What', 'WP'), ('is', 'VBZ'), ('the', 'DT'),
        ('airspeed', 'NN'), ('of', 'IN'), ('an', 'DT'),
        ('unladen', 'JJ'), ('swallow', 'VB'), ('?', '.')]
        """
        return self.tag_sents([sentence], properties)[0]

    def raw_tag_sents(self, sentences, properties=None):
        """
        Tag multiple sentences.

        Takes multiple sentences as a list where each sentence is a string.

        :param sentences: Input sentences to tag
        :type sentences: list(str)
        :rtype: list(list(list(tuple(str, str)))
        """
        default_properties = {'ssplit.isOneSentence': 'true',
                              'annotators': 'tokenize,ssplit,' }

        default_properties.update(properties or {})

        # Supports only 'pos' or 'ner' tags.
        assert self.tagtype in ['pos', 'ner']
        default_properties['annotators'] += self.tagtype
        for sentence in sentences:
            tagged_data = self.api_call(sentence, properties=default_properties)
            yield [[(token['word'], token[self.tagtype]) for token in tagged_sentence['tokens']]
                    for tagged_sentence in tagged_data['sentences']]

์‚ฌ์šฉ์ž๊ฐ€ ์ž…๋ ฅํ•œ ๋ฌธ์ž์—ด ํ† ํฐ ๋ชฉ๋ก์„ ์ ์šฉํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

https://stackoverflow.com/questions/52250268/why-do-corenlp-ner-tagger-and-ner-tagger-join-the-separated-numbers-together ์— ๋Œ€ํ•œ ์„ธ๋ถ€ ์ •๋ณด

.tag() ๊ฐ€ raw_tag_sents ์ „์— ์†์„ฑ์„ ์˜ค๋ฒ„๋กœ๋“œํ•˜๋„๋ก ํ—ˆ์šฉํ•˜๋ฉด ์‚ฌ์šฉ์ž๊ฐ€ #1876๊ณผ ๊ฐ™์€ ๊ฒฝ์šฐ๋ฅผ ์‰ฝ๊ฒŒ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

bug goodfirstbug stanford api

๊ฐ€์žฅ ์œ ์šฉํ•œ ๋Œ“๊ธ€

์ข‹์•„ ๋ณด์ธ๋‹ค.

๋ช‡ ๊ฐ€์ง€ ์‚ฌ์†Œํ•œ ์˜๊ฒฌ์ž…๋‹ˆ๋‹ค. if properties is None ๊ฐ€ ์•„๋‹ˆ๋ผ if properties == None if properties is None ์ด์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. assert self.tagtype in ['pos', 'ner'] ๋Š” assert self.tagtype in ['pos', 'ner'], "CoreNLP tagger supports only 'pos' or 'ner' tags." ์ด์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

๋ฌธ์ž์—ด์„ ๊ฒฐํ•ฉํ•˜๊ณ  ๋ถ„ํ• ํ•˜๋Š” ์•„์ด๋””์–ด๊ฐ€ ์ •๋ง ๋งˆ์Œ์— ๋“ค์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๊ฐ„๋‹จํ•œ ๋ฌธ์ž์—ด ๋Œ€์‹ ์— CoreNLP์— ๋‹จ์–ด ๋ชฉ๋ก์„ ๋ฌธ์žฅ์œผ๋กœ ์ „๋‹ฌํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ์žˆ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋ชจ๋“  3 ๋Œ“๊ธ€

์ข‹์•„ ๋ณด์ธ๋‹ค.

๋ช‡ ๊ฐ€์ง€ ์‚ฌ์†Œํ•œ ์˜๊ฒฌ์ž…๋‹ˆ๋‹ค. if properties is None ๊ฐ€ ์•„๋‹ˆ๋ผ if properties == None if properties is None ์ด์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. assert self.tagtype in ['pos', 'ner'] ๋Š” assert self.tagtype in ['pos', 'ner'], "CoreNLP tagger supports only 'pos' or 'ner' tags." ์ด์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

๋ฌธ์ž์—ด์„ ๊ฒฐํ•ฉํ•˜๊ณ  ๋ถ„ํ• ํ•˜๋Š” ์•„์ด๋””์–ด๊ฐ€ ์ •๋ง ๋งˆ์Œ์— ๋“ค์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๊ฐ„๋‹จํ•œ ๋ฌธ์ž์—ด ๋Œ€์‹ ์— CoreNLP์— ๋‹จ์–ด ๋ชฉ๋ก์„ ๋ฌธ์žฅ์œผ๋กœ ์ „๋‹ฌํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ์žˆ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์•ˆ๋…•ํ•˜์„ธ์š”, ์ด ๋ฌธ์ œ๋ฅผ ์ฒซ ๋ฒˆ์งธ ๋ฌธ์ œ๋กœ ์‚ผ๊ณ  ์‹ถ์Šต๋‹ˆ๋‹ค.

๋‹น์‹ ์ด ๋ฌธ์ œ์— ๊ด€์‹ฌ์ด ์žˆ๋‹ค๋Š” ๊ฒƒ์€ ๋Œ€๋‹จํ•œ ์ผ์ž…๋‹ˆ๋‹ค. ์งˆ๋ฌธ์ด ์žˆ์œผ๋ฉด ์—ฌ๊ธฐ์—์„œ ์งˆ๋ฌธํ•˜์„ธ์š”.

์ด ํŽ˜์ด์ง€๊ฐ€ ๋„์›€์ด ๋˜์—ˆ๋‚˜์š”?
0 / 5 - 0 ๋“ฑ๊ธ‰