Pygithub: تحميل ملفات كبيرة

تم إنشاؤها على ٢٠ نوفمبر ٢٠١٧ · 5تعليقات · مصدر: PyGithub/PyGithub

يؤدي استخدام طريقة .get_contents() لمحاولة تنزيل ملف كبير إلى ظهور الخطأ:

{'errors': [{'code': 'too_large', 'field': 'data',
     'resource': 'Blob'}],
     'message': 'This API returns blobs up to 1 MB in size. The requested blob is too large to fetch via the API, but you can use the Git Data API to request blobs up to 100 MB in size.',
     'documentation_url': 'https://developer.github.com/v3/repos/contents/#get-contents'}

هل هناك طريقة لاكتشاف ذلك وتمريره إلى معالج آخر يمكنه تنزيل الملف؟

على سبيل المثال ، إذا فشل شيء من هذا القبيل:

contents = repository.get_dir_contents(urllib.parse.quote(server_path), ref=sha)

for content in contents:
   if content.type != 'dir':
     file_content = repository.get_contents(urllib.parse.quote(content.path), ref=sha)

العودة اختياريًا إلى:

file_content = repository.get_git_blob(content.sha)

question

مصدر

psychemedia

التعليق الأكثر فائدة

لدي نفس المشكلة وينتهي بي الأمر بعمل شيء على طول خط.

إذا قمنا بتفريغ جميع الملفات من دليل وبعضها أكبر من 1M ،

 file_contents = repo.get_contents(dir_name, ref=branch)

ثم sha موجود لكل file_content ، ويمكن استخدام ما يلي للحصول على blob من كل ملف

for file_content in file_contents:
    try:
        if file_content.encoding != 'base64':
            # some error ...
        # ok... 
    except GithubException:
        # if file_content DOES NOT HAVE encoding, it is a large file 
        blob = repo.get_git_blob(file_content.sha)
        # do something with blob

إذا كان path_name يشير إلى ملف واحد أكبر من مليون واحد ، فيجب أن يكون بعض كتلة المحاولة / الاستثناء كما يلي:

        try:
            res = repo.get_contents(path_name, ref=branch)
            # ok, we have the content
        except GithubException:
           return get_blob_content(repo, branch, path_name)

حيث get_blob_content شيء مثل

def get_blob_content(repo, branch, path_name):
    # first get the branch reference
    ref = repo.get_git_ref(f'heads/{branch}')
    # then get the tree
    tree = repo.get_git_tree(ref.object.sha, recursive='/' in path_name).tree
    # look for path in tree
    sha = [x.sha for x in tree if x.path == path_name]
    if not sha:
        # well, not found..
        return None
    # we have sha
    return repo.get_git_blob(sha[0])

الكود الحقيقي مع التحقق من الأخطاء أطول ، لكن الفكرة هنا.

BoPeng في ١١ مايو ٢٠٢٠

👍3 🎉1

ال 5 كومينتر

لقد واجهت هذه المشكلة من قبل أيضًا. في حالتي ، نظرًا لأنه كان لدي دائمًا SHA للنقطة ، فقد استخدمت git_git_blob بدلاً من ذلك.

ومع ذلك ، لا يعمل get_git_blob مع أي نوع من الكائنات بخلاف blob (ومن هنا جاء الاسم). تحتاج إلى معرفة نوع الكائن قبل محاولة تسميته.

للقيام بالرجوع ، تحتاج إلى معرفة معلومتين:

نوع الكائن.
SHA للكائن.

إذا فشل get_contents ، فلن يخبرك بأي من هذه الأشياء. ليس هناك حقًا أي طريقة جيدة للقيام بالرجوع بقدر ما أستطيع أن أقول.

jasonwhite في ٢٧ نوفمبر ٢٠١٧

مغلق كـ wontfix . إذا كان لدى أي شخص فكرة جيدة عن كيفية حل هذه المشكلة ، يسعدني إعادة فتح باب النقاش. بقدر ما أستطيع أن أقول ، لا يبدو أنه من الممكن القيام به بطريقة نظيفة.

jasonwhite في ٨ ديسمبر ٢٠١٧

لدي نفس المشكلة وينتهي بي الأمر بعمل شيء على طول خط.

إذا قمنا بتفريغ جميع الملفات من دليل وبعضها أكبر من 1M ،

 file_contents = repo.get_contents(dir_name, ref=branch)

ثم sha موجود لكل file_content ، ويمكن استخدام ما يلي للحصول على blob من كل ملف

for file_content in file_contents:
    try:
        if file_content.encoding != 'base64':
            # some error ...
        # ok... 
    except GithubException:
        # if file_content DOES NOT HAVE encoding, it is a large file 
        blob = repo.get_git_blob(file_content.sha)
        # do something with blob

إذا كان path_name يشير إلى ملف واحد أكبر من مليون واحد ، فيجب أن يكون بعض كتلة المحاولة / الاستثناء كما يلي:

        try:
            res = repo.get_contents(path_name, ref=branch)
            # ok, we have the content
        except GithubException:
           return get_blob_content(repo, branch, path_name)

حيث get_blob_content شيء مثل

def get_blob_content(repo, branch, path_name):
    # first get the branch reference
    ref = repo.get_git_ref(f'heads/{branch}')
    # then get the tree
    tree = repo.get_git_tree(ref.object.sha, recursive='/' in path_name).tree
    # look for path in tree
    sha = [x.sha for x in tree if x.path == path_name]
    if not sha:
        # well, not found..
        return None
    # we have sha
    return repo.get_git_blob(sha[0])

الكود الحقيقي مع التحقق من الأخطاء أطول ، لكن الفكرة هنا.

BoPeng في ١١ مايو ٢٠٢٠

👍3 🎉1

عند الحصول على blob ، سيكون اتباع التعليمات البرمجية مفيدًا.

    blob = repo.get_git_blob(sha[0])
    b64 = base64.b64decode(blob.content)
    return b64.decode("utf8")

أيضًا ، سيواجه ملف التحديث أيضًا هذه المشكلة.

eeechoo في ٢٢ يوليو ٢٠٢٠

👍2

raise self.__createException(status, responseHeaders, output)

github.GithubException.UnknownObjectException: 404 {"message": "غير موجود"، "documents_url": " https://docs.github.com/rest/reference/repos#get -repository-content"} تلقي هذا الخطأ عند المحاولة لتنزيل ملفات مستودعات got للفرع الرئيسي