Python’s requests library should simplify making HTTP requests but occasionally, it could fail to retrieve the needed data because of an error. This article will explore the most common errors that come up with Python requests and provide some advice on how to create Python requests retry mechanisms. Two of the most common ways are through using Sessions and HTTPAdapter or by using a retry logic wrapper. We will present both options along with some handy code snippets to help you with your tasks.
Core Concepts of Retrying Python Request
The Python requests retry mechanism is a technique where you tell your response code to resend a request automatically if an HTTP error happens. The retry logic is usually decision-based and depends on why and when the failure occurs.
Not all Python request errors should trigger a retry. It is important to know the cause of the failure and decide if a retry is even necessary. It is better to apply retries in specific scenarios such as transient server problems. However, retrying after client-side or permanent issues is typically not needed. You can retry a Python request immediately after a failure occurs but that strategy could overload the server and initiate an IP ban.
Instead of doing this instantaneously, you should implement a delay between retries. However, setting a delay could create a bot-like pattern and make it easier for a website to detect and ban you. It is recommended to use an exponential backoff strategy to avoid this or to implement a rotating mobile proxy so that your IP address is constantly changing, thus diminishing the chances of detection.
Types of Failed Python Request
Understanding why your Python request failed will allow you to develop mitigation strategies for each case. There are two main categories of failed requests including requests that timed out and requests that returned an HTTP error. When the request times out, the client did not receive a reply within a specific time frame. This could happen because of a server overload, server response issues, or due to slow network connections. If you face a timeout response, check your internet connection. If the connection is stable, the issue could be server-related. For requests that return an HTTP error, the server could be active but the request cannot be processed successfully. This failure will come with a specific status code and error message that tells you what went wrong. Here are some of the most common errors:
- 403 Forbidden: The server understood the request but did not respond appropriately because you are not allowed to access the document or even the entire server.
- 429 Too Many Requests: This is one of the most frequent HTTP errors when web scraping. It comes up if you are sending too many requests to the same endpoint. This can be solved by switching your proxy or trying to retry a failed request.
- 500 Internal Server Error: This error code comes up when something has failed on the server’s end. Trying again in a few minutes should solve this issue.
- 502 Bad Gateway: Similar to the 500 error, the 502 error means something went wrong with the upstream server. Trying again shortly should get rid of the issue.
- 503 Service Unavailable: The server is either completely down or unavailable. This is an issue that will be solved by the website administrator.
- 504 Gateway Timeout: This error indicates networking issues that could be caused by either end. Retrying with increased delays should fix the issue.
Python Request Retry Strategies
If you find yourself in a position where you do need to implement a Python request retry, there are two options available to you. You can use one of Python’s existing retry wrappers or you could build your own. Using the pre-existing wrapper, HTTPAdapter will let you specify a retry strategy and change the request behavior while building your own, allowing you to implement custom error handlers, logs, and more.
Built-in Python Request Retry Mechanism
Python request uses the urllib3 HTTP client directly. You can set up retries in Python with requests’ HTTP adapter retry class and the Retry utility class from the urllib3 package. The HTTPAdapter class lets you specify a retry strategy and change the request behavior. To use a simple Python Request retry strategy with HTTPAdapter, import the required libraries and define your options. As an example, we will set the maximum number of request to 4 and retry attempts if the error has a status code of 403, 429, 500, and 502.
# pip3 install requests
import requests
from requests.adapters import HTTPAdapter
from urllib3.util import Retry
# define the retry strategy
retry_strategy = Retry(
total=4, # maximum number of retries
status_forcelist=[403, 429, 500, 502], # the HTTP status codes to retry on
)
After installing the necessary libraries and defining the strategy, pass it to the HTTPAdapter in a new adapter object and then mount the adapter to a session retry object to use for all requests.
adapter = HTTPAdapter(max_retries=retry_strategy)
# create a new session object
session = requests.Session()
session.mount("http://", adapter)
session.mount("https://", adapter)
# make a request using the session object
response = session.get("www.websiteexample.com")
if response.status_code == 200:
print(f"SUCCESS: {response.text}")
else:
print(f"FAILED with status {response.status_code}")
Custom Retry Strategies
While using the available built-in libraries is always an option, you might want to try building your own customer wrapper for retry login. Doing so gives you the ability to implement more custom options. Let us create a Python function that will simulate the retry logic implementation of the first method. The function will accept the target URL as its first argument and then the maximum retries and status_forcelist to specify the type of errors to retry the request.
import requests
def retry_request(
url,
total=4,
status_forcelist=[
403,
429,
500,
502,
],
**kwargs,
):
# store the last response in an empty variable
last_response = None
# implement retry
for _ in range(total):
try:
response = requests.get(url, **kwargs)
if response.status_code in status_forcelist:
# track the last response
last_response = response
# retry request
continue
else:
return response
except requests.exceptions.ConnectionError:
pass
# log the response after the retry
return last_response
response = retry_request("www.websiteexample.com")
if response.status_code == 200:
print(f"SUCCESS: {response.text}")
else:
print(f"FAILED with status {response.status_code}")
This script can be slightly improved upon with an exponential backoff. To retry a Python request with a custom backoff, take the code we provided as a base and then create a separate function named backoff_delay
to calculate the delay.
def backoff_delay(backoff_factor, attempts):
# backoff algorithm
delay = backoff_factor * (2 ** (attempts - 1))
return delay
After you have done that, add the backoff factor and use the exponential delay with the time module:
def retry_request(
url,
backoff_factor=2,
total=4,
status_forcelist=[
403,
429,
500,
502,
],
**kwargs,
):
# store the last response in an empty variable
last_response = None
# implement retry
for attempt in range(total):
try:
response = requests.get(url, **kwargs)
if response.status_code in status_forcelist:
# implement backoff
delay = backoff_delay(backoff_factor, attempt)
sleep(delay)
print(f"retrying in {delay} seconds")
# track the last response
last_response = response
# retry request
continue
else:
return response
except requests.exceptions.ConnectionError:
pass
# log the response after the retry
return last_response
response = retry_request("https://www.scrapingcourse.com/ecommerce")
if response.status_code == 200:
print(f"SUCCESS: {response.text}")
else:
print(f"FAILED with status {response.status_code}")
Conclusion
Handling Python request retries properly ensures robust and error-tolerant applications. By using the strategies we had discussed such as exponential backoff, managing HTTP error handling, and using proxies when necessary, you can enhance the reliability of your code.
Key Takeaways:
- Use exponential backoff to implement delays between retries. This will avoid a server overload and reduce the risk of detection.
- Prioritize retry-appropriate errors such as 429 and 500-504 while avoiding any unnecessary retries for client-side errors like 403.
- Use proxies to enhance scalability. Rotating proxies can help you manage retries when scraping and will assist with avoiding IP bans caused by rate limits.
- Utilize built-in tools like HTTPAdapter and Retry from urllib3 for easier and more reliable retry strategies. Building your own custom retry logic is a great choice but only necessary for more advanced error handling or for tailored delay mechanisms.
If you decide to use Python request built-in libraries or build your own solution, optimizing retry mechanisms allows you to tackle transient errors, handle rate limits, and maintain efficient interactions with servers.