My question is why invoking totalCount directly return None, but iterate will get the number.
repo_commits = repo.get_commits()
repo_total = 0
for _ in repo_commits:
repo_total = repo_total + 1
repo_total = repo.get_commits().totalCount
I paste my code
import json
import os
import github
from github import Github
# First create a Github instance:
g = Github("grapebaba", "heatonn1",per_page=1000)
def main():
'''
Use small data for this application
:return:
'''
with open(os.path.join(os.path.expanduser("~"),'recruitbot_data.txt'),'w') as f:
for user in g.search_users("type:user")[0:10000]:
user_dict = {}
user_dict['username'] = user.login
user_dict['id'] = user.id
user_dict['profile_url'] = user.html_url
user_dict['location'] = user.location
user_dict['followers'] = user.followers
user_dict['private_gists'] = user.private_gists
user_dict['public_gists'] = user.public_gists
user_dict['name'] = user.name
user_dict['company'] = user.company
user_dict['blog_url'] = user.blog
user_dict['email'] = user.email
user_dict['id'] = user.id
user_dict['contributions']={}
for repo in user.get_watched():
try:
repo_total = 0
for _ in repo.get_commits():
repo_total = repo_total + 1
if repo.get_stats_contributors() is not None:
for contributor in repo.get_stats_contributors():
if contributor is not None and contributor.author.id == user_dict['id']:
user_dict['contributions'][repo.name]={}
user_dict['contributions'][repo.name]['contributor_commits']=contributor.total
user_dict['contributions'][repo.name]['repo_commits']=repo_total
user_dict['contributions'][repo.name]['language']=repo.language
user_dict['contributions'][repo.name]['stars']=repo.stargazers_count
print user_dict
break
except github.GithubException as e:
print e
f.write(json.dumps(user_dict)+"\n")
if __name__ == '__main__':
main()
I have another issue, sometimes I will get a SSL error
Traceback (most recent call last):
File "/tmp/collector.py", line 54, in <module>
main()
File "/tmp/collector.py", line 36, in main
for _ in repo.get_commits():
File "/usr/local/lib/python2.7/dist-packages/github/PaginatedList.py", line 48, in __iter__
newElements = self._grow()
File "/usr/local/lib/python2.7/dist-packages/github/PaginatedList.py", line 60, in _grow
newElements = self._fetchNextPage()
File "/usr/local/lib/python2.7/dist-packages/github/PaginatedList.py", line 161, in _fetchNextPage
headers=self.__headers
File "/usr/local/lib/python2.7/dist-packages/github/Requester.py", line 171, in requestJsonAndCheck
return self.__check(*self.requestJson(verb, url, parameters, headers, input, cnx))
File "/usr/local/lib/python2.7/dist-packages/github/Requester.py", line 212, in requestJson
return self.__requestEncode(cnx, verb, url, parameters, headers, input, encode)
File "/usr/local/lib/python2.7/dist-packages/github/Requester.py", line 251, in __requestEncode
status, responseHeaders, output = self.__requestRaw(cnx, verb, url, requestHeaders, encoded_input)
File "/usr/local/lib/python2.7/dist-packages/github/Requester.py", line 281, in __requestRaw
output = response.read()
File "/usr/lib/python2.7/httplib.py", line 557, in read
s = self._safe_read(self.length)
File "/usr/lib/python2.7/httplib.py", line 664, in _safe_read
chunk = self.fp.read(min(amt, MAXAMOUNT))
File "/usr/lib/python2.7/socket.py", line 380, in read
data = self._sock.recv(left)
File "/usr/lib/python2.7/ssl.py", line 341, in recv
return self.read(buflen)
File "/usr/lib/python2.7/ssl.py", line 260, in read
return self._sslobj.read(len)
Hi,
About your totalCount being None, I have the same with repo.get_pulls().totalCount.
I think the issue is that the returned JSON doesn't contain data['total_count'].
Maybe the PaginatedList should implement __len__
with :
if self.__totalCount:
return self.__totalCount
else:
return len(self.__elements)
@pgmillon that sounds fair.
If you want to draft a tiny PR that does that, I'd be willing to merge it.
Noticed the same thing, PaginatedList never sets totalCount. Returning len of elements is not what I am looking for, I want to know how many items there actually are that could be fetched.
Agreed but AFAIK the API don't give any way to know that. So the only way is the present workaround:
opened_pulls = repository.get_pulls()
pulls_count = 0
# Fix no count available on pulls list
for _ in opened_pulls:
pulls_count += 1
For gists:
from github import Github
gh = Github()
gists = gh.get_user('gil9red').get_gists()
print(gists.totalCount) # None
print(len(list(gists))) # 7
Still the same problem:
from github import Github
g = Github()
repos = g.get_repos()
print(repos.totalCount) # None
@gil9red @Tigralt Are you trying to get the total number of items that's returned from iterating over the PaginatedList? If so, there's no way to do that without iterating over the PaginatedList and incrementing a count. See docs here (even though they're old they are correct)
GitHub provide no way to know the number of items a paginated request will return, so PaginatedList has no length:
...
If you really mean to take the length of a PaginatedList, you have to explicitelly [sic] construct a list and then use its length:
If you're trying to get the number of items on one page in the PaginatedList, looks like this PR is still open and being discussed.
Looking at a few related issues #433 #487 #596
I guess the total_count
is from the old Github API response? Shall we remove the TotalCount
attribute on PaginatedList
since its implementation is broken and always returns None
. Instead, we could add something in the docs like:
# To get the total number of available elements in PaginatedList
repos = g.get_user().get_repos()
print(len(list(repos))) # we can't avoid to iterate through the whole set to get total count
And maybe we can implement the __len__
to return the current elements count?
I think the Github API will return the total amount of pages for a query, so if you set per_page=1
you should be able to get the total number of items from a single request. This can be a lot more efficient than iterating (for large result sets).
@Tommos0 I see, this is probably a good idea. We can even do a HEAD
instead of GET
just to retrieve the Link
header.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Fixed in #820
Most helpful comment
For gists: