Pygithub: github.PaginatedList.PaginatedList totalCount return None

Created on 2 Jul 2016  ·  14Comments  ·  Source: PyGithub/PyGithub

My question is why invoking totalCount directly return None, but iterate will get the number.

            repo_commits = repo.get_commits()
            repo_total = 0
            for _ in repo_commits:
                repo_total = repo_total + 1
            repo_total = repo.get_commits().totalCount

Most helpful comment

For gists:

    from github import Github
    gh = Github()
    gists = gh.get_user('gil9red').get_gists()
    print(gists.totalCount)  # None
    print(len(list(gists)))  # 7

All 14 comments

I paste my code

import json
import os

import github
from github import Github

# First create a Github instance:

g = Github("grapebaba", "heatonn1",per_page=1000)


def main():
    '''
    Use small data for this application
    :return:
    '''
    with open(os.path.join(os.path.expanduser("~"),'recruitbot_data.txt'),'w') as f:
        for user in g.search_users("type:user")[0:10000]:
            user_dict = {}
            user_dict['username'] = user.login
            user_dict['id'] = user.id
            user_dict['profile_url'] = user.html_url
            user_dict['location'] = user.location
            user_dict['followers'] = user.followers
            user_dict['private_gists'] = user.private_gists
            user_dict['public_gists'] = user.public_gists
            user_dict['name'] = user.name
            user_dict['company'] = user.company
            user_dict['blog_url'] = user.blog
            user_dict['email'] = user.email
            user_dict['id'] = user.id
            user_dict['contributions']={}
            for repo in user.get_watched():
                try:
                    repo_total = 0
                    for _ in repo.get_commits():
                        repo_total = repo_total + 1
                    if repo.get_stats_contributors() is not None:
                        for contributor in repo.get_stats_contributors():
                            if contributor is not None and contributor.author.id == user_dict['id']:
                                user_dict['contributions'][repo.name]={}
                                user_dict['contributions'][repo.name]['contributor_commits']=contributor.total
                                user_dict['contributions'][repo.name]['repo_commits']=repo_total
                                user_dict['contributions'][repo.name]['language']=repo.language
                                user_dict['contributions'][repo.name]['stars']=repo.stargazers_count
                                print user_dict
                                break
                except github.GithubException as e:
                    print e

            f.write(json.dumps(user_dict)+"\n")

if __name__ == '__main__':
    main()

I have another issue, sometimes I will get a SSL error

Traceback (most recent call last):
  File "/tmp/collector.py", line 54, in <module>
    main()
  File "/tmp/collector.py", line 36, in main
    for _ in repo.get_commits():
  File "/usr/local/lib/python2.7/dist-packages/github/PaginatedList.py", line 48, in __iter__
    newElements = self._grow()
  File "/usr/local/lib/python2.7/dist-packages/github/PaginatedList.py", line 60, in _grow
    newElements = self._fetchNextPage()
  File "/usr/local/lib/python2.7/dist-packages/github/PaginatedList.py", line 161, in _fetchNextPage
    headers=self.__headers
  File "/usr/local/lib/python2.7/dist-packages/github/Requester.py", line 171, in requestJsonAndCheck
    return self.__check(*self.requestJson(verb, url, parameters, headers, input, cnx))
  File "/usr/local/lib/python2.7/dist-packages/github/Requester.py", line 212, in requestJson
    return self.__requestEncode(cnx, verb, url, parameters, headers, input, encode)
  File "/usr/local/lib/python2.7/dist-packages/github/Requester.py", line 251, in __requestEncode
    status, responseHeaders, output = self.__requestRaw(cnx, verb, url, requestHeaders, encoded_input)
  File "/usr/local/lib/python2.7/dist-packages/github/Requester.py", line 281, in __requestRaw
    output = response.read()
  File "/usr/lib/python2.7/httplib.py", line 557, in read
    s = self._safe_read(self.length)
  File "/usr/lib/python2.7/httplib.py", line 664, in _safe_read
    chunk = self.fp.read(min(amt, MAXAMOUNT))
  File "/usr/lib/python2.7/socket.py", line 380, in read
    data = self._sock.recv(left)
  File "/usr/lib/python2.7/ssl.py", line 341, in recv
    return self.read(buflen)
  File "/usr/lib/python2.7/ssl.py", line 260, in read
    return self._sslobj.read(len)

Hi,
About your totalCount being None, I have the same with repo.get_pulls().totalCount.
I think the issue is that the returned JSON doesn't contain data['total_count'].
Maybe the PaginatedList should implement __len__ with :

if self.__totalCount:
    return self.__totalCount
else:
   return len(self.__elements)

@pgmillon that sounds fair.
If you want to draft a tiny PR that does that, I'd be willing to merge it.

Noticed the same thing, PaginatedList never sets totalCount. Returning len of elements is not what I am looking for, I want to know how many items there actually are that could be fetched.

Agreed but AFAIK the API don't give any way to know that. So the only way is the present workaround:

opened_pulls = repository.get_pulls()
pulls_count = 0
# Fix no count available on pulls list
for _ in opened_pulls:
    pulls_count += 1

For gists:

    from github import Github
    gh = Github()
    gists = gh.get_user('gil9red').get_gists()
    print(gists.totalCount)  # None
    print(len(list(gists)))  # 7

Still the same problem:

from github import Github
g = Github()
repos = g.get_repos()
print(repos.totalCount) # None

@gil9red @Tigralt Are you trying to get the total number of items that's returned from iterating over the PaginatedList? If so, there's no way to do that without iterating over the PaginatedList and incrementing a count. See docs here (even though they're old they are correct)

GitHub provide no way to know the number of items a paginated request will return, so PaginatedList has no length:
...
If you really mean to take the length of a PaginatedList, you have to explicitelly [sic] construct a list and then use its length:

If you're trying to get the number of items on one page in the PaginatedList, looks like this PR is still open and being discussed.

Looking at a few related issues #433 #487 #596

I guess the total_count is from the old Github API response? Shall we remove the TotalCount attribute on PaginatedList since its implementation is broken and always returns None. Instead, we could add something in the docs like:

# To get the total number of available elements in PaginatedList
repos = g.get_user().get_repos()
print(len(list(repos)))  # we can't avoid to iterate through the whole set to get total count

And maybe we can implement the __len__ to return the current elements count?

I think the Github API will return the total amount of pages for a query, so if you set per_page=1 you should be able to get the total number of items from a single request. This can be a lot more efficient than iterating (for large result sets).

@Tommos0 I see, this is probably a good idea. We can even do a HEAD instead of GET just to retrieve the Link header.

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Fixed in #820

Was this page helpful?
0 / 5 - 0 ratings

Related issues

grayaii picture grayaii  ·  4Comments

diegotejadav picture diegotejadav  ·  5Comments

jacquev6 picture jacquev6  ·  3Comments

xpdable picture xpdable  ·  5Comments

rthill91 picture rthill91  ·  4Comments