I see the same issue. Here is a small script that exemplifies the problem.
import os
from datetime import datetime
from github import Github
# Login
TOKEN = os.getenv("GITHUB_ACCESS_TOKEN")
github = Github(TOKEN)
# Get initial rate limit and reset time
rl1 = github.get_rate_limit().rate
print("RL1 | Limit: {}, Remaining: {}, Reset: {}.".format(
rl1.limit, rl1.remaining, rl1.reset))
# RL1 | Limit: 5000, Remaining: 5000, Reset: 2017-09-22 17:26:35.
# Perform a search
results = github.search_code("Hello World")
# Rate limit of Github instance is unchanged after a search
rl2 = github.get_rate_limit().rate
print("RL2 | Limit: {}, Remaining: {}, Reset: {}.".format(
rl2.limit, rl2.remaining, rl2.reset))
# RL2 | Limit: 5000, Remaining: 5000, Reset: 2017-09-22 17:26:35.
# The PaginatedList instance has a Requestor with the same info
rl3 = results._PaginatedList__requester.rate_limiting
rl3_reset = datetime.utcfromtimestamp(int(
results._PaginatedList__requester.rate_limiting_resettime))
print("RL3 | Limit: {}, Remaining: {}, Reset: {}.".format(
rl3[0], rl3[1], rl3_reset))
# RL3 | Limit: 5000, Remaining: 5000, Reset: 2017-09-22 17:26:35.
# However, the actual ContentFile results show a different limit
# The Requester of each individual result ...
result = results[0]
rl4 = result._requester.rate_limiting
rl4_reset = datetime.utcfromtimestamp(int(
result._requester.rate_limiting_resettime))
print("RL4 | Limit: {}, Remaining: {}, Reset: {}.".format(
rl4[1], rl4[0], rl4_reset))
# RL4 | Limit: 30, Remaining: 29, Reset: 2017-09-22 16:27:36.
# ... and headers stored in the content file directly show a different rate limit.
rl5_limit = result._headers['x-ratelimit-limit']
rl5_remaining = result._headers['x-ratelimit-remaining']
rl5_reset = datetime.utcfromtimestamp(int(
result._headers['x-ratelimit-reset']))
print("RL5 | Limit: {}, Remaining: {}, Reset: {}.".format(
rl5_limit, rl5_remaining, rl5_reset))
# RL5 | Limit: 30, Remaining: 29, Reset: 2017-09-22 16:27:36.
# In the end, the main Github instance still shows the original full rate limit
rl6 = github.get_rate_limit().rate
print("RL6 | Limit: {}, Remaining: {}, Reset: {}.".format(
rl6.limit, rl6.remaining, rl6.reset))
# RL6 | Limit: 5000, Remaining: 5000, Reset: 2017-09-22 17:26:35.
+1 This feature is necessary for an application I'm trying to build
@brentshermana for you application, consider inspecting rate limit headers (of last response; see in my example above) or polling the /rate_limit
endpoint yourself. That contains information about all kind of rate limits and does not count towards any rate limit.
Eventually, it would be nice if PyGithub would nor only parse rate
but also parse resources
from what /rate_limit
returns. The information is there, it is not made available to consumers of the library unfortunately.
Also, the paginated list should return the rate limit for code search if it returns results of such a search, i.e. whatever is stored in _headers['x-ratelimit-*']
.
btw: I just noticed, the field rate
from JSON returned by /rate_limit
is deprecated and information in resources
is the recommended alternative: https://developer.github.com/v3/rate_limit/#deprecation-notice
I'm doing exactly that. If anyone wants to adapt this and try and make a pull request, you have my blessing:
def wait(seconds):
print("Waiting for {} seconds ...".format(seconds))
time.sleep(seconds)
print("Done waiting - resume!")
def api_wait():
url = 'https://api.github.com/rate_limit'
response = urlopen(url).read()
data = json.loads(response.decode())
if data['resources']['core']['remaining'] <= 10: # extra margin of safety
reset_time = data['resources']['core']['reset']
wait(reset_time - time.time() + 10)
elif data['resources']['search']['remaining'] <= 2:
reset_time = data['resources']['search']['reset']
wait(reset_time - time.time() + 10)
I'm experiencing a problem where my iteration over the results from search_issues stops after 1020 results when there should be 1869 results. My script stops at the same point every time. Could this be a rate-limiting issue?
I do not get an error, the results just run out. If I put my query string directly into the GitHub web interface then I see all 1869 results, as expected. 1020 is a multiple of 30, which makes me wonder if it's pagination problem?
Code is as follows:
querystring = "type:pr is:closed repo:xxxx closed:2017-07-01..2018-06-30"
issues = git.search_issues(query=querystring, sort="updated", order="asc")
for issue in issues:
pull = issue.as_pull_request()
print "%s: %s" % (pull.number, pull.title)
Many thanks for any tips you can share as to what might be going wrong here.
I also tried iterating through issues.reversed
to see if it would start at the end of my expected 1869 results. However in this case I only get 30 issues, from the first page of results.
On further investigation, it appears that I'm running into the 1000 results per search limit.
What about we provide one more method get_search_rate_limit()
for the search rate limit while the existing get_rate_limit()
will parse the latest "core" rate limit suggested by Github: https://developer.github.com/v3/rate_limit/
Search API rate limit and GraphQL rate limit is available now. One method for all.
By default it will show you the "core" rate limit. You can also get search/graphql rate limit by accessing the respective attributes.
r = g.get_rate_limit()
>>> r
RateLimit(core=Rate(remaining=4923, limit=5000))
>>> r.search
Rate(remaining=30, limit=30)
>>> r.graphql
Rate(remaining=5000, limit=5000)
Looks great, thanks @sfdye!
To emulate @brentshermana's waiting function to avoid problems with search rate limiting, you can now do something like this:
from datetime import datetime
def api_wait_search(git):
limits = git.get_rate_limit()
if limits.search.remaining <= 2:
seconds = (limits.search.reset - datetime.now()).total_seconds()
print "Waiting for %d seconds ..." % (seconds)
time.sleep(seconds)
print "Done waiting - resume!"
Note that calling get_rate_limit()
will introduce a small delay, so you may want to minimize how often you call this.
For people that land here from search engine, I modified @bbi-yggy's function a bit:
from datetime import datetime, timezone
def rate_limited_retry(github):
def decorator(func):
def ret(*args, **kwargs):
for _ in range(3):
try:
return func(*args, **kwargs)
except RateLimitExceededException:
limits = github.get_rate_limit()
reset = limits.search.reset.replace(tzinfo=timezone.utc)
now = datetime.now(timezone.utc)
seconds = (reset - now).total_seconds()
print(f"Rate limit exceeded")
print(f"Reset is in {seconds:.3g} seconds.")
if seconds > 0.0:
print(f"Waiting for {seconds:.3g} seconds...")
time.sleep(seconds)
print("Done waiting - resume!")
raise Exception("Failed too many times")
return ret
return decorator
This function can be used as follows:
@rate_limited_retry(github)
def run_query(import_string):
query_string = f"language:Python \"{import_string}\""
return list(github.search_code(query_string))
results = run_query(import_string)
Modified version of pokey's decorator above to take into account core/search/graphql.
Also added a 30 second delay because Github doesn't reset the rate limit exactly at the time it says.
def rate_limited_retry():
def decorator(func):
def ret(*args, **kwargs):
for _ in range(3):
try:
return func(*args, **kwargs)
except RateLimitExceededException:
limits = gh.get_rate_limit()
print(f"Rate limit exceeded")
print("Search:", limits.search, "Core:", limits.core, "GraphQl:", limits.graphql)
if limits.search.remaining == 0:
limited = limits.search
elif limits.graphql.remaining == 0:
limited = limits.graphql
else:
limited = limits.core
reset = limited.reset.replace(tzinfo=timezone.utc)
now = datetime.now(timezone.utc)
seconds = (reset - now).total_seconds() + 30
print(f"Reset is in {seconds} seconds.")
if seconds > 0.0:
print(f"Waiting for {seconds} seconds...")
time.sleep(seconds)
print("Done waiting - resume!")
raise Exception("Failed too many times")
return ret
return decorator
Most helpful comment
For people that land here from search engine, I modified @bbi-yggy's function a bit:
This function can be used as follows: