Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add conservative_retrying #71

Merged
merged 24 commits into from
May 29, 2024
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
e91d3a4
Add conservative_retrying
Gallaecio Apr 26, 2024
413dd4b
Keep mypy happy
Gallaecio Apr 26, 2024
48acc00
Conservative → Aggresive
Gallaecio Apr 30, 2024
985827e
is_maybe_temporary_error → _maybe_temporary_error
Gallaecio Apr 30, 2024
aca5fc6
Fix the description of the default retry policy
Gallaecio Apr 30, 2024
2a7d19e
Add tests for the attempt-based limits of the default retry policy
Gallaecio Apr 30, 2024
b7ff419
Lower aggresive retrying of temporary download errors from 16 to 8 at…
Gallaecio May 7, 2024
41a8d69
Ignore rate-limiting errors when counting max temporary download errors
Gallaecio May 7, 2024
439cac9
Only stop retries for network errors after 15 uninterrupted minutes o…
Gallaecio May 8, 2024
9878e10
Implement a custom download error handling for the aggressive retry p…
Gallaecio May 8, 2024
4d4688d
Retry undocumented 5xx errors up to 4 times, not counting rate-limiti…
Gallaecio May 9, 2024
b71edb0
Ignore mypy complaints about RetryCallState custom attributes added a…
Gallaecio May 9, 2024
bba313d
Clean up APIs for easier subclassing, and update retry docs
Gallaecio May 9, 2024
7cf8423
Improve the wording around custom retry policy usage
Gallaecio May 10, 2024
d5d1740
Raise ValueError if retrying does not get an instance of AsyncRetrying
Gallaecio May 10, 2024
adf4f10
Fix typo: uninterrumpted → uninterrupted
Gallaecio May 10, 2024
f51c561
Avoid using id(self)
Gallaecio May 10, 2024
ba71081
Merge remote-tracking branch 'zytedata/main' into conservative-retrying
Gallaecio May 10, 2024
2f533a7
Add missing parameter
Gallaecio May 10, 2024
f9a8c26
Concentrate retry tests and complete coverage
Gallaecio May 10, 2024
b5b55b1
undocumented_error_stop: stop_on_uninterrupted_status → stop_on_count
Gallaecio May 15, 2024
1e791d5
Update the docs
Gallaecio May 15, 2024
5638ce9
Update test expectations
Gallaecio May 15, 2024
d87c0e6
Remove leftovers
Gallaecio May 28, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 24 additions & 6 deletions docs/use/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -155,23 +155,41 @@ following:
- Retries :ref:`temporary download errors
<zyte-api-temporary-download-errors>` up to 3 times.

- Retries network errors for up to 15 minutes.
- Retries network errors until they have happened for 15 minutes straight.

All retries are done with an exponential backoff algorithm.

.. _aggressive-retry-policy:

If some :ref:`unsuccessful responses <zyte-api-unsuccessful-responses>` exceed
maximum retries with the default retry policy, try using
:data:`~zyte_api.aggressive_retrying` instead. Alternatively, the reference
documentation of :class:`~zyte_api.RetryFactory` and
:class:`~zyte_api.ConvervativeRetryFactory` features some examples of custom
:data:`~zyte_api.aggressive_retrying` instead, which modifies the default retry
policy as follows:

- Temporary download error are retried 7 times. :ref:`Permanent download
errors <zyte-api-permanent-download-errors>` also count towards this retry
limit.

- Retries permanent download errors up to 3 times.

- Retries error responses with an HTTP status code in the 500-599 range (503,
520 and 521 excluded) until they have happened 4 times in a row, not
counting rate-limiting responses or network errors that may happen in
between.

Alternatively, the reference documentation of :class:`~zyte_api.RetryFactory`
and :class:`~zyte_api.AggressiveRetryFactory` features some examples of custom
retry policies, and you can always build your own
:class:`~tenacity.AsyncRetrying` object from scratch.

To use a custom retry policy, pass it when creating your client object:
To use :data:`~zyte_api.aggressive_retrying` or a custom retry policy, pass it
when creating your client object:
Gallaecio marked this conversation as resolved.
Show resolved Hide resolved

.. code-block:: python

client = ZyteAPI(retrying=custom_retrying)
from zyte_api import ZyteAPI, aggressive_retrying

client = ZyteAPI(retrying=aggressive_retrying)

When retries are exceeded for a given request, an exception is raised. Except
for the :meth:`~ZyteAPI.iter` method of the :ref:`sync API <sync>`, which
Expand Down
57 changes: 57 additions & 0 deletions tests/test_retry.py
Original file line number Diff line number Diff line change
Expand Up @@ -271,6 +271,63 @@ def __init__(self, time):
(*(mock_request_error(status=521),) * 4,),
True,
),
# Undocumented 5xx errors are retried until they have happened
# 4 times in a row, not counting rate-limiting responses or
# network errors.
*(
scenario
for status in (
500,
502,
504,
)
for scenario in (
(
(*(mock_request_error(status=status),) * 3,),
False,
),
(
(*(mock_request_error(status=status),) * 4,),
True,
),
(
(
*(mock_request_error(status=status),) * 2,
mock_request_error(status=429),
mock_request_error(status=503),
ServerConnectionError(),
mock_request_error(status=status),
),
False,
),
(
(
*(mock_request_error(status=status),) * 3,
mock_request_error(status=429),
mock_request_error(status=503),
ServerConnectionError(),
mock_request_error(status=status),
),
True,
),
(
(
mock_request_error(status=status),
mock_request_error(status=555),
*(mock_request_error(status=status),) * 3,
),
False,
),
(
(
mock_request_error(status=status),
mock_request_error(status=555),
*(mock_request_error(status=status),) * 4,
),
True,
),
)
),
)
),
),
Expand Down
6 changes: 6 additions & 0 deletions zyte_api/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,12 @@
from ._errors import RequestError
from ._retry import AggressiveRetryFactory, RetryFactory
from ._retry import aggressive_retrying as _aggressive_retrying
from ._retry import (
stop_after_uninterrumpted_delay,
stop_on_count,
stop_on_download_error,
stop_on_uninterrupted_status,
)
from ._retry import zyte_api_retrying as _zyte_api_retrying
from ._sync import ZyteAPI
from .errors import ParsedError
Expand Down
150 changes: 77 additions & 73 deletions zyte_api/_retry.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,15 +63,15 @@
attempts for which a different stop callable was used.
"""

def __init__(self, max_count: int, counter_name: str) -> None:
def __init__(self, max_count: int) -> None:
self._max_count = max_count
self._counter_name = counter_name
self._counter_name = id(self)

def __call__(self, retry_state: "RetryCallState") -> bool:
if not hasattr(retry_state, "counter"):
retry_state.counter = Counter()
retry_state.counter[self._counter_name] += 1
if retry_state.counter[self._counter_name] >= self._max_count:
retry_state.counter = Counter() # type: ignore
retry_state.counter[self._counter_name] += 1 # type: ignore
if retry_state.counter[self._counter_name] >= self._max_count: # type: ignore
return True
return False

Expand All @@ -93,34 +93,34 @@
for which a different stop callable was used.
"""

def __init__(self, max_delay: time_unit_type, timer_name: str) -> None:
def __init__(self, max_delay: time_unit_type) -> None:
self._max_delay = to_seconds(max_delay)
self._timer_name = timer_name
self._timer_name = id(self)
Gallaecio marked this conversation as resolved.
Show resolved Hide resolved

def __call__(self, retry_state: "RetryCallState") -> bool:
if not hasattr(retry_state, "uninterrupted_start_times"):
retry_state.uninterrupted_start_times = {}
if self._timer_name not in retry_state.uninterrupted_start_times:
retry_state.uninterrupted_start_times = {} # type: ignore
if self._timer_name not in retry_state.uninterrupted_start_times: # type: ignore
# First time.
retry_state.uninterrupted_start_times[self._timer_name] = [
retry_state.uninterrupted_start_times[self._timer_name] = [ # type: ignore
retry_state.attempt_number,
retry_state.outcome_timestamp,
]
return False
attempt_number, start_time = retry_state.uninterrupted_start_times[
attempt_number, start_time = retry_state.uninterrupted_start_times[ # type: ignore
self._timer_name
]
if retry_state.attempt_number - attempt_number > 1:
# There was a different stop reason since the last attempt,
# resetting the timer.
retry_state.uninterrupted_start_times[self._timer_name] = [
retry_state.uninterrupted_start_times[self._timer_name] = [ # type: ignore
retry_state.attempt_number,
retry_state.outcome_timestamp,
]
return False
if retry_state.outcome_timestamp - start_time < self._max_delay:
# Within time, do not stop, only increase the attempt count.
retry_state.uninterrupted_start_times[self._timer_name][0] += 1
retry_state.uninterrupted_start_times[self._timer_name][0] += 1 # type: ignore
return False
return True

Expand All @@ -133,51 +133,22 @@
modify it as needed, and then call :meth:`build` on your subclass to get
the corresponding :class:`tenacity.AsyncRetrying` object.

For example, to increase the maximum number of attempts for :ref:`temporary
download errors <zyte-api-temporary-download-errors>` from 4 (i.e. 3
retries) to 10 (i.e. 9 retries):
For example, to double the number of attempts for :ref:`temporary
download errors <zyte-api-temporary-download-errors>` and the time network
errors are retried:

.. code-block:: python

from tenacity import stop_after_attempt
from zyte_api import RetryFactory


class CustomRetryFactory(RetryFactory):
temporary_download_error_stop = stop_after_attempt(10)


CUSTOM_RETRY_POLICY = CustomRetryFactory().build()

To retry :ref:`permanent download errors
<zyte-api-permanent-download-errors>`, treating them the same as
:ref:`temporary download errors <zyte-api-temporary-download-errors>`:

.. code-block:: python

from tenacity import RetryCallState, retry_if_exception, stop_after_attempt
from zyte_api import RequestError, RetryFactory


def is_permanent_download_error(exc: BaseException) -> bool:
return isinstance(exc, RequestError) and exc.status == 521
from zyte_api import (
RetryFactory,
stop_after_uninterrumpted_delay,
stop_on_count,
)


class CustomRetryFactory(RetryFactory):

retry_condition = RetryFactory.retry_condition | retry_if_exception(
is_permanent_download_error
)

def wait(self, retry_state: RetryCallState) -> float:
if is_permanent_download_error(retry_state.outcome.exception()):
return self.temporary_download_error_wait(retry_state=retry_state)
return super().wait(retry_state)

def stop(self, retry_state: RetryCallState) -> bool:
if is_permanent_download_error(retry_state.outcome.exception()):
return self.temporary_download_error_stop(retry_state)
return super().stop(retry_state)
network_error_stop = stop_after_uninterrumpted_delay(30 * 60)
temporary_download_error_stop = stop_on_count(8)


CUSTOM_RETRY_POLICY = CustomRetryFactory().build()
Expand Down Expand Up @@ -207,8 +178,8 @@
)
temporary_download_error_wait = network_error_wait
throttling_stop = stop_never
network_error_stop = stop_after_uninterrumpted_delay(15 * 60, "network_error")
temporary_download_error_stop = stop_on_count(4, "temporary_download_error")
network_error_stop = stop_after_uninterrumpted_delay(15 * 60)
temporary_download_error_stop = stop_on_count(4)

def wait(self, retry_state: RetryCallState) -> float:
assert retry_state.outcome, "Unexpected empty outcome"
Expand Down Expand Up @@ -272,40 +243,67 @@

def __call__(self, retry_state: "RetryCallState") -> bool:
if not hasattr(retry_state, "counter"):
retry_state.counter = Counter()
retry_state.counter = Counter() # type: ignore
assert retry_state.outcome, "Unexpected empty outcome"
exc = retry_state.outcome.exception()
assert exc, "Unexpected empty exception"
if exc.status == 521:
retry_state.counter["permanent_download_error"] += 1
if retry_state.counter["permanent_download_error"] >= self._max_permanent:
if exc.status == 521: # type: ignore
retry_state.counter["permanent_download_error"] += 1 # type: ignore
if retry_state.counter["permanent_download_error"] >= self._max_permanent: # type: ignore
return True
retry_state.counter["download_error"] += 1
if retry_state.counter["download_error"] >= self._max_total:
retry_state.counter["download_error"] += 1 # type: ignore
if retry_state.counter["download_error"] >= self._max_total: # type: ignore
return True
return False


class stop_on_uninterrupted_status(stop_base):
"""Stop after the specified max number of error responses with the same
status code in a row."""

def __init__(self, _max: int) -> None:
self._max = _max

def __call__(self, retry_state: "RetryCallState") -> bool:
assert retry_state.outcome, "Unexpected empty outcome"
exc = retry_state.outcome.exception()
assert exc, "Unexpected empty exception"
count = 0
for status in reversed(retry_state.status_history): # type: ignore
if status == exc.status: # type: ignore
count += 1
if count >= self._max:
return True
elif status not in {-1, 429, 503}:
return False
return False


class AggressiveRetryFactory(RetryFactory):
"""Alternative factory class that builds :data:`aggressive_retrying`.
"""Factory class that builds the :class:`tenacity.AsyncRetrying` object
that defines the :ref:`aggressive retry policy <aggressive-retry-policy>`.

To create a custom retry policy based on :data:`aggressive_retrying`, you
can subclass this factory class, modify it as needed, and then call
:meth:`build` on your subclass to get the corresponding
:class:`tenacity.AsyncRetrying` object.
To create a custom retry policy, you can subclass this factory class,
modify it as needed, and then call :meth:`build` on your subclass to get
the corresponding :class:`tenacity.AsyncRetrying` object.

For example, to increase the maximum number of attempts for errors treated
as temporary download errors by :data:`aggressive_retrying` from 16 (i.e.
15 retries) to 32 (i.e. 31 retries):
For example, to double the maximum number of attempts for all error
responses and double the time network errors are retried:

.. code-block:: python

from tenacity import stop_after_attempt
from zyte_api import AggressiveRetryFactory
from zyte_api import (
AggressiveRetryFactory,
stop_after_uninterrumpted_delay,
stop_on_download_error,
stop_on_uninterrupted_status,
)


class CustomRetryFactory(AggressiveRetryFactory):
temporary_download_error_stop = stop_after_attempt(32)
download_error_stop = stop_on_download_error(max_total=16, max_permanent=8)
network_error_stop = stop_after_uninterrumpted_delay(30 * 60)
undocumented_error_stop = stop_on_uninterrupted_status(8)


CUSTOM_RETRY_POLICY = CustomRetryFactory().build()
Expand All @@ -320,25 +318,31 @@
download_error_stop = stop_on_download_error(max_total=8, max_permanent=4)
download_error_wait = RetryFactory.temporary_download_error_wait

undocumented_error_stop = stop_on_uninterrupted_status(4)
Gallaecio marked this conversation as resolved.
Show resolved Hide resolved
undocumented_error_wait = RetryFactory.temporary_download_error_wait

def stop(self, retry_state: RetryCallState) -> bool:
assert retry_state.outcome, "Unexpected empty outcome"
exc = retry_state.outcome.exception()
assert exc, "Unexpected empty exception"
if not hasattr(retry_state, "status_history"):
retry_state.status_history = [] # type: ignore
retry_state.status_history.append(getattr(exc, "status", -1)) # type: ignore
Gallaecio marked this conversation as resolved.
Show resolved Hide resolved
if _download_error(exc):
return self.download_error_stop(retry_state)
if _undocumented_error(exc):
return self.temporary_download_error_stop(retry_state)
return self.undocumented_error_stop(retry_state)
return super().stop(retry_state)

def wait(self, retry_state: RetryCallState) -> float:
assert retry_state.outcome, "Unexpected empty outcome"
exc = retry_state.outcome.exception()
assert exc, "Unexpected empty exception"

Check warning on line 340 in zyte_api/_retry.py

View check run for this annotation

Codecov / codecov/patch

zyte_api/_retry.py#L338-L340

Added lines #L338 - L340 were not covered by tests
if _download_error(exc):
return self.download_error_wait(retry_state)

Check warning on line 342 in zyte_api/_retry.py

View check run for this annotation

Codecov / codecov/patch

zyte_api/_retry.py#L342

Added line #L342 was not covered by tests
if _undocumented_error(exc):
return self.temporary_download_error_wait(retry_state=retry_state)
return self.undocumented_error_wait(retry_state=retry_state)
return super().wait(retry_state)

Check warning on line 345 in zyte_api/_retry.py

View check run for this annotation

Codecov / codecov/patch

zyte_api/_retry.py#L344-L345

Added lines #L344 - L345 were not covered by tests


aggressive_retrying = AggressiveRetryFactory().build()
Loading