Nltk: CoreNLPParser टैग () को गुण ओवरलोडिंग की अनुमति देनी चाहिए

को निर्मित 10 सित॰ 2018 · 3टिप्पणियाँ · स्रोत: nltk/nltk

वर्तमान CoreNLPParser.tag() , स्टैनफोर्ड कोरएनएलपी द्वारा "पुनः टोकननाइज़ेशन" अप्रत्याशित है:

>>> from nltk.parse.corenlp import CoreNLPParser
>>> ner_tagger = CoreNLPParser(url='http://localhost:9000', tagtype='ner')
>>> sent = ['my', 'phone', 'number', 'is', '1111', '1111', '1111']
>>> ner_tagger.tag(sent)
[('my', 'O'),
 ('phone', 'O'),
 ('number', 'O'),
 ('is', 'O'),
 ('1111\xa01111\xa01111', 'NUMBER')]

अपेक्षित व्यवहार होना चाहिए:

>>> from nltk.parse.corenlp import CoreNLPParser
>>> ner_tagger = CoreNLPParser(url='http://localhost:9000', tagtype='ner')
>>> sent = ['my', 'phone', 'number', 'is', '1111', '1111', '1111']
>>> ner_tagger.tag(sent)
[('my', 'O'), ('phone', 'O'), ('number', 'O'), ('is', 'O'), ('1111', 'DATE'), ('1111', 'DATE'), ('1111', 'DATE')]

प्रस्तावित समाधान properties तर्कों को .tag() और .tag_sents() लिए ओवरलोडिंग की अनुमति देना है, अर्थात https://github.com/nltk/nltk/blob/develop/nltk/parse/ corenlp.py#L348 और डिफ़ॉल्ट रूप से properties = {'tokenize.whitespace':'true'} क्योंकि हम टोकन को tag_sents() में रिक्त स्थान से जोड़ रहे हैं।


    def tag_sents(self, sentences, properties=None):
        """
        Tag multiple sentences.

        Takes multiple sentences as a list where each sentence is a list of
        tokens.

        :param sentences: Input sentences to tag
        :type sentences: list(list(str))
        :rtype: list(list(tuple(str, str))
        """
        # Converting list(list(str)) -> list(str)
        sentences = (' '.join(words) for words in sentences)
        if properties == None:
            properties = {'tokenize.whitespace':'true'}
        return [sentences[0] for sentences in self.raw_tag_sents(sentences, properties)]

    def tag(self, sentence, properties=None):
        """
        Tag a list of tokens.

        :rtype: list(tuple(str, str))

        >>> parser = CoreNLPParser(url='http://localhost:9000', tagtype='ner')
        >>> tokens = 'Rami Eid is studying at Stony Brook University in NY'.split()
        >>> parser.tag(tokens)
        [('Rami', 'PERSON'), ('Eid', 'PERSON'), ('is', 'O'), ('studying', 'O'), ('at', 'O'), ('Stony', 'ORGANIZATION'),
        ('Brook', 'ORGANIZATION'), ('University', 'ORGANIZATION'), ('in', 'O'), ('NY', 'O')]

        >>> parser = CoreNLPParser(url='http://localhost:9000', tagtype='pos')
        >>> tokens = "What is the airspeed of an unladen swallow ?".split()
        >>> parser.tag(tokens)
        [('What', 'WP'), ('is', 'VBZ'), ('the', 'DT'),
        ('airspeed', 'NN'), ('of', 'IN'), ('an', 'DT'),
        ('unladen', 'JJ'), ('swallow', 'VB'), ('?', '.')]
        """
        return self.tag_sents([sentence], properties)[0]

    def raw_tag_sents(self, sentences, properties=None):
        """
        Tag multiple sentences.

        Takes multiple sentences as a list where each sentence is a string.

        :param sentences: Input sentences to tag
        :type sentences: list(str)
        :rtype: list(list(list(tuple(str, str)))
        """
        default_properties = {'ssplit.isOneSentence': 'true',
                              'annotators': 'tokenize,ssplit,' }

        default_properties.update(properties or {})

        # Supports only 'pos' or 'ner' tags.
        assert self.tagtype in ['pos', 'ner']
        default_properties['annotators'] += self.tagtype
        for sentence in sentences:
            tagged_data = self.api_call(sentence, properties=default_properties)
            yield [[(token['word'], token[self.tagtype]) for token in tagged_sentence['tokens']]
                    for tagged_sentence in tagged_data['sentences']]

यह उपयोगकर्ताओं द्वारा स्ट्रिंग टोकन इनपुट की सूची को लागू करना चाहिए।

https://stackoverflow.com/questions/52250268/why-do-corenlp-ner-tagger-and-ner-tagger-join-the-separated-numbers-together पर विवरण

यदि हम .tag() को raw_tag_sents से पहले संपत्तियों को अधिभारित करने की अनुमति देते हैं, तो यह उपयोगकर्ताओं को #1876 जैसे मामलों को आसानी से संभालने की भी अनुमति देगा।

bug goodfirstbug stanford api

स्रोत

alvations

सबसे उपयोगी टिप्पणी

अछा लगता है।

बस कुछ छोटी टिप्पणियाँ। यह if properties is None होना चाहिए, न कि if properties == None । assert self.tagtype in ['pos', 'ner'] होना चाहिए assert self.tagtype in ['pos', 'ner'], "CoreNLP tagger supports only 'pos' or 'ner' tags." ।

मुझे वास्तव में तारों में शामिल होने और विभाजित करने का विचार पसंद नहीं है, हो सकता है कि कोरएनएलपी को शब्दों की सूची को एक साधारण स्ट्रिंग के बजाय वाक्य के रूप में पास करने का कोई तरीका हो।

dimazest 10 सित॰ 2018

👍2

सभी 3 टिप्पणियाँ

अछा लगता है।

dimazest 10 सित॰ 2018

👍2

नमस्ते, मैं इसे अपने पहले मुद्दे के रूप में लेना चाहता हूं।

rishabhkumar296 6 जन॰ 2019

यह बहुत अच्छा है कि आप इस मुद्दे में रुचि रखते हैं। यदि आपके कोई प्रश्न हैं, तो उन्हें यहां पूछें।

dimazest 7 जन॰ 2019

क्या यह पृष्ठ उपयोगी था?

0 / 5 - 0 रेटिंग्स

Nltk: CoreNLPParser टैग () को गुण ओवरलोडिंग की अनुमति देनी चाहिए

सबसे उपयोगी टिप्पणी

सभी 3 टिप्पणियाँ

संबंधित मुद्दों