Pygithub: Support search rate limit

Created on 10 Apr 2017 · 13Comments · Source: PyGithub/PyGithub

It seems the get_rate_limit function will return what Github considers the 'core' rate limit. However, there are different rate limits for searching code. See here.

Right now there isn't a way to get the search code rate limits as far as I can tell.

feature request

Source

nwalsh1995

👍5

Most helpful comment

For people that land here from search engine, I modified @bbi-yggy's function a bit:

from datetime import datetime, timezone

def rate_limited_retry(github):
    def decorator(func):
        def ret(*args, **kwargs):
            for _ in range(3):
                try:
                    return func(*args, **kwargs)
                except RateLimitExceededException:
                    limits = github.get_rate_limit()
                    reset = limits.search.reset.replace(tzinfo=timezone.utc)
                    now = datetime.now(timezone.utc)
                    seconds = (reset - now).total_seconds()
                    print(f"Rate limit exceeded")
                    print(f"Reset is in {seconds:.3g} seconds.")
                    if seconds > 0.0:
                        print(f"Waiting for {seconds:.3g} seconds...")
                        time.sleep(seconds)
                        print("Done waiting - resume!")
            raise Exception("Failed too many times")
        return ret
    return decorator

This function can be used as follows:

@rate_limited_retry(github)
def run_query(import_string):
    query_string = f"language:Python \"{import_string}\""
    return list(github.search_code(query_string))

results = run_query(import_string)

pokey on 25 Oct 2019

👍6

All 13 comments

I see the same issue. Here is a small script that exemplifies the problem.

import os
from datetime import datetime
from github import Github

# Login
TOKEN = os.getenv("GITHUB_ACCESS_TOKEN")
github = Github(TOKEN)

# Get initial rate limit and reset time
rl1 = github.get_rate_limit().rate
print("RL1 | Limit: {}, Remaining: {}, Reset: {}.".format(
    rl1.limit, rl1.remaining, rl1.reset))
# RL1 | Limit: 5000, Remaining: 5000, Reset: 2017-09-22 17:26:35.

# Perform a search
results = github.search_code("Hello World")

# Rate limit of Github instance is unchanged after a search
rl2 = github.get_rate_limit().rate
print("RL2 | Limit: {}, Remaining: {}, Reset: {}.".format(
    rl2.limit, rl2.remaining, rl2.reset))
# RL2 | Limit: 5000, Remaining: 5000, Reset: 2017-09-22 17:26:35.

# The PaginatedList instance has a Requestor with the same info
rl3 = results._PaginatedList__requester.rate_limiting
rl3_reset = datetime.utcfromtimestamp(int(
        results._PaginatedList__requester.rate_limiting_resettime))
print("RL3 | Limit: {}, Remaining: {}, Reset: {}.".format(
    rl3[0], rl3[1], rl3_reset))
# RL3 | Limit: 5000, Remaining: 5000, Reset: 2017-09-22 17:26:35.

# However, the actual ContentFile results show a different limit
# The Requester of each individual result ...
result = results[0]
rl4 = result._requester.rate_limiting
rl4_reset = datetime.utcfromtimestamp(int(
        result._requester.rate_limiting_resettime))
print("RL4 | Limit: {}, Remaining: {}, Reset: {}.".format(
    rl4[1], rl4[0], rl4_reset))
# RL4 | Limit: 30, Remaining: 29, Reset: 2017-09-22 16:27:36.

# ... and headers stored in the content file directly show a different rate limit.
rl5_limit = result._headers['x-ratelimit-limit']
rl5_remaining = result._headers['x-ratelimit-remaining']
rl5_reset = datetime.utcfromtimestamp(int(
        result._headers['x-ratelimit-reset']))
print("RL5 | Limit: {}, Remaining: {}, Reset: {}.".format(
    rl5_limit, rl5_remaining, rl5_reset))
# RL5 | Limit: 30, Remaining: 29, Reset: 2017-09-22 16:27:36.

# In the end, the main Github instance still shows the original full rate limit
rl6 = github.get_rate_limit().rate
print("RL6 | Limit: {}, Remaining: {}, Reset: {}.".format(
    rl6.limit, rl6.remaining, rl6.reset))
# RL6 | Limit: 5000, Remaining: 5000, Reset: 2017-09-22 17:26:35.

justfortherec on 22 Sep 2017

+1 This feature is necessary for an application I'm trying to build

brentshermana on 8 Oct 2017

@brentshermana for you application, consider inspecting rate limit headers (of last response; see in my example above) or polling the /rate_limit endpoint yourself. That contains information about all kind of rate limits and does not count towards any rate limit.

Eventually, it would be nice if PyGithub would nor only parse rate but also parse resources from what /rate_limit returns. The information is there, it is not made available to consumers of the library unfortunately.

Also, the paginated list should return the rate limit for code search if it returns results of such a search, i.e. whatever is stored in _headers['x-ratelimit-*'].

justfortherec on 8 Oct 2017

btw: I just noticed, the field rate from JSON returned by /rate_limit is deprecated and information in resources is the recommended alternative: https://developer.github.com/v3/rate_limit/#deprecation-notice

justfortherec on 8 Oct 2017

I'm doing exactly that. If anyone wants to adapt this and try and make a pull request, you have my blessing:

def wait(seconds):
    print("Waiting for {} seconds ...".format(seconds))
    time.sleep(seconds)
    print("Done waiting - resume!")

def api_wait():
    url = 'https://api.github.com/rate_limit'
    response = urlopen(url).read()
    data = json.loads(response.decode())
    if data['resources']['core']['remaining'] <= 10:  # extra margin of safety
        reset_time = data['resources']['core']['reset']
        wait(reset_time - time.time() + 10)
    elif data['resources']['search']['remaining'] <= 2:
        reset_time = data['resources']['search']['reset']
        wait(reset_time - time.time() + 10)

brentshermana on 8 Oct 2017

I'm experiencing a problem where my iteration over the results from search_issues stops after 1020 results when there should be 1869 results. My script stops at the same point every time. Could this be a rate-limiting issue?

I do not get an error, the results just run out. If I put my query string directly into the GitHub web interface then I see all 1869 results, as expected. 1020 is a multiple of 30, which makes me wonder if it's pagination problem?

Code is as follows:

querystring = "type:pr is:closed repo:xxxx closed:2017-07-01..2018-06-30"
issues = git.search_issues(query=querystring, sort="updated", order="asc")
for issue in issues:
    pull = issue.as_pull_request()
    print "%s: %s" % (pull.number, pull.title)

Many thanks for any tips you can share as to what might be going wrong here.

BBI-YggyKing on 20 Jun 2018

I also tried iterating through issues.reversed to see if it would start at the end of my expected 1869 results. However in this case I only get 30 issues, from the first page of results.

BBI-YggyKing on 20 Jun 2018

On further investigation, it appears that I'm running into the 1000 results per search limit.

BBI-YggyKing on 21 Jun 2018

What about we provide one more method get_search_rate_limit() for the search rate limit while the existing get_rate_limit() will parse the latest "core" rate limit suggested by Github: https://developer.github.com/v3/rate_limit/

sfdye on 22 Jun 2018

👍2

Search API rate limit and GraphQL rate limit is available now. One method for all.

By default it will show you the "core" rate limit. You can also get search/graphql rate limit by accessing the respective attributes.

r = g.get_rate_limit()
>>> r
RateLimit(core=Rate(remaining=4923, limit=5000))
>>> r.search
Rate(remaining=30, limit=30)
>>> r.graphql
Rate(remaining=5000, limit=5000)

sfdye on 5 Sep 2018

👍2

Looks great, thanks @sfdye!

To emulate @brentshermana's waiting function to avoid problems with search rate limiting, you can now do something like this:

from datetime import datetime

def api_wait_search(git):
  limits = git.get_rate_limit()
  if limits.search.remaining <= 2:
    seconds = (limits.search.reset - datetime.now()).total_seconds()
    print "Waiting for %d seconds ..." % (seconds)
    time.sleep(seconds)
    print "Done waiting - resume!"

Note that calling get_rate_limit() will introduce a small delay, so you may want to minimize how often you call this.

BBI-YggyKing on 12 Jun 2019

For people that land here from search engine, I modified @bbi-yggy's function a bit:

from datetime import datetime, timezone

def rate_limited_retry(github):
    def decorator(func):
        def ret(*args, **kwargs):
            for _ in range(3):
                try:
                    return func(*args, **kwargs)
                except RateLimitExceededException:
                    limits = github.get_rate_limit()
                    reset = limits.search.reset.replace(tzinfo=timezone.utc)
                    now = datetime.now(timezone.utc)
                    seconds = (reset - now).total_seconds()
                    print(f"Rate limit exceeded")
                    print(f"Reset is in {seconds:.3g} seconds.")
                    if seconds > 0.0:
                        print(f"Waiting for {seconds:.3g} seconds...")
                        time.sleep(seconds)
                        print("Done waiting - resume!")
            raise Exception("Failed too many times")
        return ret
    return decorator

This function can be used as follows:

@rate_limited_retry(github)
def run_query(import_string):
    query_string = f"language:Python \"{import_string}\""
    return list(github.search_code(query_string))

results = run_query(import_string)

pokey on 25 Oct 2019

👍6

Modified version of pokey's decorator above to take into account core/search/graphql.
Also added a 30 second delay because Github doesn't reset the rate limit exactly at the time it says.

def rate_limited_retry():
    def decorator(func):
        def ret(*args, **kwargs):
            for _ in range(3):
                try:
                    return func(*args, **kwargs)
                except RateLimitExceededException:
                    limits = gh.get_rate_limit()
                    print(f"Rate limit exceeded")
                    print("Search:", limits.search, "Core:", limits.core, "GraphQl:", limits.graphql)

                    if limits.search.remaining == 0:
                        limited = limits.search
                    elif limits.graphql.remaining == 0:
                        limited = limits.graphql
                    else:
                        limited = limits.core
                    reset = limited.reset.replace(tzinfo=timezone.utc)
                    now = datetime.now(timezone.utc)
                    seconds = (reset - now).total_seconds() + 30
                    print(f"Reset is in {seconds} seconds.")
                    if seconds > 0.0:
                        print(f"Waiting for {seconds} seconds...")
                        time.sleep(seconds)
                        print("Done waiting - resume!")
            raise Exception("Failed too many times")
        return ret
    return decorator