Pygithub: λŒ€μš©λŸ‰ 파일 λ‹€μš΄λ‘œλ“œ

에 λ§Œλ“  2017λ…„ 11μ›” 20일  Β·  5μ½”λ©˜νŠΈ  Β·  좜처: PyGithub/PyGithub

.get_contents() λ©”μ„œλ“œλ₯Ό μ‚¬μš©ν•˜μ—¬ 큰 νŒŒμΌμ„ λ‹€μš΄λ‘œλ“œν•˜λ €κ³ ν•˜λ©΄ 였λ₯˜κ°€ λ°œμƒν•©λ‹ˆλ‹€.

{'errors': [{'code': 'too_large', 'field': 'data',
     'resource': 'Blob'}],
     'message': 'This API returns blobs up to 1 MB in size. The requested blob is too large to fetch via the API, but you can use the Git Data API to request blobs up to 100 MB in size.',
     'documentation_url': 'https://developer.github.com/v3/repos/contents/#get-contents'}

이λ₯Ό κ°μ§€ν•˜κ³  νŒŒμΌμ„ λ‹€μš΄λ‘œλ“œ ν•  μˆ˜μžˆλŠ” λ‹€λ₯Έ ν•Έλ“€λŸ¬λ‘œ μ „λ‹¬ν•˜λŠ” 방법이 μžˆμŠ΅λ‹ˆκΉŒ?

예λ₯Ό λ“€μ–΄, λ‹€μŒκ³Ό 같은 것이 μ‹€νŒ¨ν•˜λ©΄ :

contents = repository.get_dir_contents(urllib.parse.quote(server_path), ref=sha)

for content in contents:
   if content.type != 'dir':
     file_content = repository.get_contents(urllib.parse.quote(content.path), ref=sha)

μ„ νƒμ μœΌλ‘œ λ‹€μŒμœΌλ‘œ 되돌리기 :

file_content = repository.get_git_blob(content.sha)
question

κ°€μž₯ μœ μš©ν•œ λŒ“κΈ€

λ‚˜λŠ” λ˜‘κ°™μ€ λ¬Έμ œκ°€ 있고 κ²°κ΅­ λ­”κ°€λ₯Όν•˜κ²Œλ©λ‹ˆλ‹€.

  1. λ””λ ‰ν† λ¦¬μ—μ„œ λͺ¨λ“  νŒŒμΌμ„ λ€ν”„ν•˜κ³  일뢀 파일이 1M보닀 큰 경우
 file_contents = repo.get_contents(dir_name, ref=branch)

sha 각 file_content 에 λŒ€ν•΄ μ‘΄μž¬ν•˜λ©° λ‹€μŒμ„ μ‚¬μš©ν•˜μ—¬ 각 파일의 blob을 κ°€μ Έμ˜¬ 수 μžˆμŠ΅λ‹ˆλ‹€.

for file_content in file_contents:
    try:
        if file_content.encoding != 'base64':
            # some error ...
        # ok... 
    except GithubException:
        # if file_content DOES NOT HAVE encoding, it is a large file 
        blob = repo.get_git_blob(file_content.sha)
        # do something with blob

path_name κ°€ 1M보닀 큰 단일 νŒŒμΌμ„ μ°Έμ‘°ν•˜λŠ” 경우 λ‹€μŒκ³Ό 같은 일뢀 try / exception λΈ”λ‘μ΄μ–΄μ•Όν•©λ‹ˆλ‹€.

        try:
            res = repo.get_contents(path_name, ref=branch)
            # ok, we have the content
        except GithubException:
           return get_blob_content(repo, branch, path_name)

get_blob_content λŠ” λ‹€μŒκ³Ό κ°™μŠ΅λ‹ˆλ‹€.

def get_blob_content(repo, branch, path_name):
    # first get the branch reference
    ref = repo.get_git_ref(f'heads/{branch}')
    # then get the tree
    tree = repo.get_git_tree(ref.object.sha, recursive='/' in path_name).tree
    # look for path in tree
    sha = [x.sha for x in tree if x.path == path_name]
    if not sha:
        # well, not found..
        return None
    # we have sha
    return repo.get_git_blob(sha[0])

였λ₯˜ 검사 κΈ°λŠ₯μ΄μžˆλŠ” μ‹€μ œ μ½”λ“œλŠ” 더 κΈΈμ§€λ§Œ μ•„μ΄λ””μ–΄λŠ” 여기에 μžˆμŠ΅λ‹ˆλ‹€.

λͺ¨λ“  5 λŒ“κΈ€

λ‚˜λŠ” 전에도이 λ¬Έμ œμ— λΆ€λ”ͺμ³€λ‹€. 제 κ²½μš°μ—λŠ” 항상 blob의 SHAκ°€ μžˆμ—ˆκΈ° λ•Œλ¬Έμ— λŒ€μ‹  git_git_blob μ‚¬μš©ν–ˆμŠ΅λ‹ˆλ‹€.

κ·ΈλŸ¬λ‚˜ get_git_blob λŠ” blob (λ”°λΌμ„œ 이름) μ΄μ™Έμ˜ 개체 μœ ν˜•μ— λŒ€ν•΄ μž‘λ™ν•˜μ§€ μ•ŠμŠ΅λ‹ˆλ‹€. ν˜ΈμΆœμ„ μ‹œλ„ν•˜κΈ° 전에 객체의 μœ ν˜•μ„ μ•Œμ•„μ•Όν•©λ‹ˆλ‹€.

폴백을 μˆ˜ν–‰ν•˜λ €λ©΄ 두 가지 정보λ₯Ό μ•Œμ•„μ•Όν•©λ‹ˆλ‹€.

  1. 개체의 μœ ν˜•μž…λ‹ˆλ‹€.
  2. 개체의 SHAμž…λ‹ˆλ‹€.

get_contents κ°€ μ‹€νŒ¨ν•˜λ©΄μ΄ 쀑 μ–΄λŠ 것도 μ•Œλ €μ£Όμ§€ μ•ŠμŠ΅λ‹ˆλ‹€. λ‚΄κ°€ 말할 μˆ˜μžˆλŠ” ν•œ 폴백을 μˆ˜ν–‰ν•˜λŠ” 쒋은 방법은 μ—†μŠ΅λ‹ˆλ‹€.

wontfix 둜 λ§ˆκ°λ˜μ—ˆμŠ΅λ‹ˆλ‹€. 이 문제λ₯Ό ν•΄κ²°ν•˜λŠ” 방법에 λŒ€ν•œ 쒋은 μ•„μ΄λ””μ–΄κ°€μžˆλŠ” μ‚¬λžŒμ΄ 있으면 λ‹€μ‹œ μ—΄μ–΄μ„œ κΈ°μ©λ‹ˆλ‹€. λ‚΄κ°€ 말할 μˆ˜μžˆλŠ” ν•œ, 그것은 κΉ¨λ—ν•œ λ°©μ‹μœΌλ‘œ ν•  μˆ˜μžˆλŠ” κ²ƒμ²˜λŸΌ 보이지 μ•ŠμŠ΅λ‹ˆλ‹€.

λ‚˜λŠ” λ˜‘κ°™μ€ λ¬Έμ œκ°€ 있고 κ²°κ΅­ λ­”κ°€λ₯Όν•˜κ²Œλ©λ‹ˆλ‹€.

  1. λ””λ ‰ν† λ¦¬μ—μ„œ λͺ¨λ“  νŒŒμΌμ„ λ€ν”„ν•˜κ³  일뢀 파일이 1M보닀 큰 경우
 file_contents = repo.get_contents(dir_name, ref=branch)

sha 각 file_content 에 λŒ€ν•΄ μ‘΄μž¬ν•˜λ©° λ‹€μŒμ„ μ‚¬μš©ν•˜μ—¬ 각 파일의 blob을 κ°€μ Έμ˜¬ 수 μžˆμŠ΅λ‹ˆλ‹€.

for file_content in file_contents:
    try:
        if file_content.encoding != 'base64':
            # some error ...
        # ok... 
    except GithubException:
        # if file_content DOES NOT HAVE encoding, it is a large file 
        blob = repo.get_git_blob(file_content.sha)
        # do something with blob

path_name κ°€ 1M보닀 큰 단일 νŒŒμΌμ„ μ°Έμ‘°ν•˜λŠ” 경우 λ‹€μŒκ³Ό 같은 일뢀 try / exception λΈ”λ‘μ΄μ–΄μ•Όν•©λ‹ˆλ‹€.

        try:
            res = repo.get_contents(path_name, ref=branch)
            # ok, we have the content
        except GithubException:
           return get_blob_content(repo, branch, path_name)

get_blob_content λŠ” λ‹€μŒκ³Ό κ°™μŠ΅λ‹ˆλ‹€.

def get_blob_content(repo, branch, path_name):
    # first get the branch reference
    ref = repo.get_git_ref(f'heads/{branch}')
    # then get the tree
    tree = repo.get_git_tree(ref.object.sha, recursive='/' in path_name).tree
    # look for path in tree
    sha = [x.sha for x in tree if x.path == path_name]
    if not sha:
        # well, not found..
        return None
    # we have sha
    return repo.get_git_blob(sha[0])

였λ₯˜ 검사 κΈ°λŠ₯μ΄μžˆλŠ” μ‹€μ œ μ½”λ“œλŠ” 더 κΈΈμ§€λ§Œ μ•„μ΄λ””μ–΄λŠ” 여기에 μžˆμŠ΅λ‹ˆλ‹€.

blob을 κ°€μ Έμ˜¬ λ•Œ λ‹€μŒ μ½”λ“œκ°€ μœ μš©ν•©λ‹ˆλ‹€.

    blob = repo.get_git_blob(sha[0])
    b64 = base64.b64decode(blob.content)
    return b64.decode("utf8")

λ˜ν•œ μ—…λ°μ΄νŠΈ νŒŒμΌμ—λ„μ΄ λ¬Έμ œκ°€ λ°œμƒν•©λ‹ˆλ‹€.

raise self.__createException(status, responseHeaders, output)

github.GithubException.UnknownObjectException : 404 { "message": "Not Found", "documentation_url": " https://docs.github.com/rest/reference/repos#get -repository-content"} μ‹œλ„μ‹œμ΄ 였λ₯˜κ°€ λ°œμƒν•©λ‹ˆλ‹€. λ§ˆμŠ€ν„° λΈŒλžœμΉ˜μ— λŒ€ν•œ μ €μž₯μ†Œ νŒŒμΌμ„ λ‹€μš΄λ‘œλ“œν•˜λ €λ©΄

이 νŽ˜μ΄μ§€κ°€ 도움이 λ˜μ—ˆλ‚˜μš”?
0 / 5 - 0 λ“±κΈ‰