Pygithub: There is a limit of 1000 results per search.

Created on 21 Jun 2018  ·  5Comments  ·  Source: PyGithub/PyGithub

The GitHub API limits searches to 1000 results. This limit affects searches performed via PyGitHub, such as GitHub.search_issues.

It seems there is no indication that your search has hit this limit - there is no exception or error that I am aware of. Perhaps there should be an exception raised when this happens (if it can be detected).

It is possible to work around this limit by issuing multiple search queries, but such queries must be tailored to suit the particular goals of the query - for example iterating over search_issues by progressive date ranges - and I cannot think of a way to generalise this.

Any thoughts on how to address this? Is there a general solution?

Note that this issue has nothing to do with rate limiting or pagination of results.

stale

Most helpful comment

re-open this issue ?
and for a general solution for other searches as well

All 5 comments

Here is a workaround demonstrating how to retrieve all pull requests in a range of dates, even if there are more than 1000 results:

EDIT: I will rewrite this to be a method that yields, rather than a class, will be simpler

class PullRequestQuery:
    def __init__(self, git, repo, since, until):
        self.git = git
        self.repo = repo
        self.until = until
        self.issues = self.__query(since, until)

    def __iter__(self):
        skip = False
        while True:
            results = False
            for issue in self.issues:
                if not skip:
                    results = True
                    yield issue.as_pull_request()
                skip = False

            # If no more results then stop iterating.
            if not results:
                break

            # Start new query picking up where we left off. Previous issue will be first one returned, so skip it.
            self.issues = self.__query(issue.closed_at.strftime('%Y-%m-%dT%H:%M:%SZ'), self.until)
            skip = True

    def __query(self, since, until):
        querystring = 'type:pr is:closed repo:%s/%s closed:"%s..%s"' % (self.repo.organization.login, self.repo.name, since, until)
        return self.git.search_issues(query=querystring, sort="updated", order="asc")

With this class, you can now do this sort of thing:

git = Github(user, passwd)
org = git.get_organization(orgname)
repo = org.get_repo(reponame)
for pull in PullRequestQuery(git, repo, "2017-01-01", "2017-12-31"):
    print "%s: %s" % (pull.number, pull.title)

Reading the Github API docs about search, I also notice that incomplete_results is missing as part of the search-results processin in PyGithub. Probably including that value might also already help out with detecting if search results might be (in)complete.

Now that I have PyGithub forked and running locally from source (I'm looking at #606) perhaps I can investigate this further.

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

re-open this issue ?
and for a general solution for other searches as well

Was this page helpful?
0 / 5 - 0 ratings