Handle 429 Too Many Requests gracefully #307

lemon24 · 2023-06-04T08:12:54Z

https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/429

Somewhat related to #246.

Implementation mechanism suggested in #332 overview.

lemon24 · 2024-07-03T06:20:08Z

Pseudocode for what a plugin would look like, to illustrate current API gaps.

Update: Changed to bump the interval based on retry-after.

INTERVALS = [60 * hours for hours in [1, 3, 6, 12, 24, 2 * 24, 7 * 24]]

def after_feed_update_hook(reader, url, response):
    # response does not exist as of reader 3.13;
    # it would be nice to unify feed and response somehow

    if response.status not in {429, 503}:
        continue

    # everything below needs to catch FeedNotFoundError;
    # ...but would not, if we passed FeedUpdateIntent
    feed = reader.get_feed(url)

    if retry_after_str := response.headers.get('retry-after'):
        try
            seconds = int(retry_after_str)
        except ValueError:
            retry_after = werkzeug.http.parse_date(retry_after_str)
        else:
            assert seconds >= 0
            retry_after = feed.last_retrieved + timedelta(seconds=seconds)
    else:
        retry_after = None

    if retry_after:
        if retry_after > feed.update_after:
            reader._storage.set_feed_update_after(feed, retry_after)
            
    if response.status != 429:
        return

    full_config = get_config(reader, feed, resolve=True)
    old_interval = full_config['interval']
    if retry_after:
        retry_after_interval = (retry_after - feed.last_retrieved).total_seconds() // 60
    else:
        retry_after_interval = 0
    
    for new_interval in INTERVALS:
        if retry_after_interval > new_interval:
            continue
        if old_interval > new_interval:
            continue

    if old_interval == new_interval:
        return

    feed_config = get_config(reader, feed)
    feed_config['interval'] = new_interval
    set_config(reader, feed, feed_config)

    if not retry_after:
        # set_config does not refresh update_after, so we have to do it;
        # if we do it automatically, we wouldn't need to do this
        full_config['full_config'] = new_interval
        update_after = next_update_after(feed.last_retrieved, **full_config)
        reader._storage.set_feed_update_after(feed, update_after)

lemon24 · 2024-07-08T12:23:22Z

API changes:

StorageType.set_feed_update_after() (straightforward)
Reader.after_feed_update_hooks (reader, feed, response) form (lots of questions)
- should we set status/headers on FeedUpdateIntent and use that instead? not yet, it would make the hooks unstable
  - should we do something similar with after_entry_update_hooks? yes, but also not yet
- where does response come from? parser RetrieveResult is only returned for 200, we likely need to make it a tagged union
  - note we will also need the HTTP 301 Location header for Handle redirects and gone feeds gracefully #246
- what should response.status be? http.HTTPStatus seems appropriate, but int is OK too
- what should response.headers be?
  - http.client.HTTPMessage is clunky
  - case-insensitive mapping (e.g. from Requests), or normalize all keys to lowercase?
    - what about a multimap? Requests doesn't seem to support this
  - do we expose all headers, or only a selection?
    - if the latter, why not make it a data class? (and e.g. also parse retry-after etc.)
  - should we somehow include http_etag/http_last_updated in this?
we may want to expose get_config() / set_config() / next_update_after() for convenience

lemon24 · 2024-07-14T07:28:13Z

Idea: We could also bump the update interval for servers that don't send caching headers (ETag / Last-Modified), or that don't honor the matching conditional requests. (original discussion)

lemon24 · 2024-08-03T09:04:55Z

So, per the previous comments, the parser API will need to change.

Here's what it looks like as of 3.14, stripped to illustrate the data flow:

cclass Parser:

    def parallel(
        feeds: Iterable[FeedArgument], ... 
    ) -> Iterable[tuple[FeedArgument, ParsedFeed | None | ParseError]]:
        """Retrieve and parse many feeds, possibly in parallel."""

        def retrieve(
            feed: FeedArgument,
        ) -> tuple[FeedArgument, ContextManager[RetrieveResult[T] | None] | Exception]:
            """Single-argument retrieve() wrapper used with map()."""

    def __call__(url, http_etag, http_last_modified) -> ParsedFeed | None:
        """Convenience wrapper over parallel()."""

    def retrieve(
        url, http_etag, http_last_modified, ... 
    ) -> ContextManager[RetrieveResult[T] | None]:

    def parse(url, result: RetrieveResult[T]) -> ParsedFeed: 

def retriever_type( 
    url, http_etag, http_last_modified, http_accept
) -> ContextManager[RetrieveResult[T] | None]: 

def parser_type(
    url, resource: T, headers 
) -> tuple[FeedData, Collection[EntryData]]:

class FeedArgument:
    url: str
    http_etag: str
    http_last_modified: str

class RetrieveResult:
    resource: T
    mime_type: str
    http_etag: str
    http_last_modified: str
    headers: dict[str, str] 

class ParsedFeed:
    feed: FeedData
    entries: Iterable[EntryData]
    http_etag: str 
    http_last_modified: str 
    mime_type: str

lemon24 · 2024-08-10T13:34:04Z

Here's what an ~ideal parser API would look like:

class Parser:

    def parallel(
        feeds: Iterable[FeedArgument], ... 
    ) -> Iterable[ParseResult]:
        """Retrieve and parse many feeds, possibly in parallel."""

    def __call__(url, caching_info: JSON) -> ParsedFeed | None:
        """Convenience wrapper over parallel()."""
        
    # Leaving retrieve() and parse() for convenience wrappers.

    def retrieve_fn(feed: FeedArgument, ...) -> RetrieveResult[T]:
        # For use with map(): single argument, does not raise.
        # Unhandled exceptions get wrapped in ParseError (as they do now).
        # As now, it enters the retriever context early to catch errors.

    def parse_fn(result: RetrieveResult[T]) -> ParseResult:
        # For use with map(): single argument, does not raise.
        # Unhandled exceptions get wrapped in ParseError (as they do now).
        # Pass-through exceptions, status, and headers from retrieve_fn().
        
class FeedArgument(Protocol):
    url: str
    # 'etag' and 'last-modified' used as well-known keys
    caching_info: JSON
    # (optional mode) from FeedForUpdate, 
    # may remove the need for decider.process_feed_for_update
    stale: bool

def retriever_type( 
    feed: FeedArgument, accept
) -> ContextManager[RetrievedFeed[T] | T]:
    # FeedArgument replaces multiple arguments for extensibility.
    # Return types other than RetrievedFeed[T] are for convenience.
    # Can pass additional info by raising RetrieveError | NotModified;
    # this is reverting to an older retriever API, because 
    # "exceptions are for exceptional cases", see RetrieveResult.value.

class RetrieveResult:
    feed: FeedArgument
    
    # After multiple attempts, this seems like the right place
    # for the context manager. Worse alterntives:
    #
    # * `Parser.retrieve_fn() -> ContextManager[RetrieveResult[T]]`
    #   means parallel() can't access .feed before entering the context, 
    #   which means unexpected errors do not have a access to a .feed
    #
    # * `RetrievedFeed.resource: ContextManager[T]` and
    #   `retriever_type() -> RetrieveResult | RetrievedFeed | None`
    #   means retrievers that use context managers (e.g. the session)
    #   have to conditionally exit that context for non-200 responses
    #
    value: ContextManager[RetrievedFeed[T]] | ParseError
    
    # propagated from either RetrievedFeed or RetrieveError
    status: int
    headers: dict[str, str] 

class RetrievedFeed:
    """Formerly known as RetrieveResult."""
    resource: T
    mime_type: str
    caching_info: JSON
    status: int
    headers: dict[str, str] 
    # (optional move) from RetrieverType, a bit more flexible
    slow_to_read: bool

class RetrieveError(ParseError):
    status: int
    headers: dict[str, str]
    
class NotModified(RetrieveError):
    pass

def parser_type(
    url, resource: T, headers 
) -> tuple[FeedData, Collection[EntryData]]:

class ParseResult:
    feed: FeedArgument
    value: ParsedFeed | None | ParseError
    # from RetrieveResult
    status: int
    headers: dict[str, str]
    
class ParsedFeed:
    feed: FeedData
    entries: Collection[EntryData]
    # from RetrievedFeed
    mime_type: str
    caching_info: JSON

...regardless of the number of workers (makes code simpler).

…307

lemon24 mentioned this issue Mar 18, 2024

Different feed update frequencies #332

Closed

lemon24 mentioned this issue Jun 30, 2024

Have you used the feedreader score service? #346

Open

lemon24 added a commit that referenced this issue Aug 17, 2024

Rework parser API to return HTTP info for errors too. #307

f13daf2

lemon24 added a commit that referenced this issue Aug 17, 2024

Always read slow responses to disk when retrieving feeds. #307

9705c80

...regardless of the number of workers (makes code simpler).

lemon24 added a commit that referenced this issue Aug 17, 2024

Move RetrieverType.slow_to_read to RetrievedFeed. #307

6f2f396

lemon24 added a commit that referenced this issue Aug 18, 2024

Wrap response errors in RetrieveError + test http_info is set. #307

2a31988

lemon24 added a commit that referenced this issue Aug 20, 2024

Clean up reader._parser.http exception handling. #307

0e50a13

lemon24 added a commit that referenced this issue Aug 20, 2024

Clean up reader._parser.http exception handling. #307

d724fcd

lemon24 added a commit that referenced this issue Sep 1, 2024

Test parse_fn() exception handling corner cases. #307

227e288

lemon24 added a commit that referenced this issue Sep 1, 2024

Move ParsedFeed to reader._parser. #307

26e22a6

lemon24 added a commit that referenced this issue Sep 1, 2024

Update parser API docs. #307

69687d7

lemon24 added a commit that referenced this issue Sep 1, 2024

parser: don't expose parser-internal RetrieveError to callers. #307

fc939c4

lemon24 mentioned this issue Sep 2, 2024

Update the parser API to expose HTTP information to the updater. #352

Merged

lemon24 linked a pull request Sep 2, 2024 that will close this issue

Update the parser API to expose HTTP information to the updater. #352

Merged

lemon24 added a commit that referenced this issue Sep 2, 2024

Rework parser API to return HTTP info for errors too. #307

c92617f

lemon24 added a commit that referenced this issue Sep 2, 2024

Always read slow responses to disk when retrieving feeds. #307

75db2f0

...regardless of the number of workers (makes code simpler).

lemon24 added a commit that referenced this issue Sep 2, 2024

Move RetrieverType.slow_to_read to RetrievedFeed. #307

7d73860

lemon24 closed this as completed in #352 Sep 2, 2024

lemon24 added a commit that referenced this issue Sep 2, 2024

Wrap response errors in RetrieveError + test http_info is set. #307

e091dcb

lemon24 added a commit that referenced this issue Sep 2, 2024

Clean up reader._parser.http exception handling. #307

f2326c7

lemon24 added a commit that referenced this issue Sep 2, 2024

Test parse_fn() exception handling corner cases. #307

f64e00f

lemon24 added a commit that referenced this issue Sep 2, 2024

Move ParsedFeed to reader._parser. #307

182902c

lemon24 added a commit that referenced this issue Sep 2, 2024

Update parser API docs. #307

c51d8fb

lemon24 added a commit that referenced this issue Sep 2, 2024

parser: don't expose parser-internal RetrieveError to callers. #307

ffbd06f

lemon24 reopened this Sep 2, 2024

lemon24 added a commit that referenced this issue Sep 15, 2024

parser: coalesce http_etag and http_last_modified into caching_info. #…

3cad494

…307

lemon24 added a commit that referenced this issue Sep 17, 2024

parser: s/http_accept/accept/. #307

7493611

lemon24 added a commit that referenced this issue Sep 17, 2024

parser: coalesce http_etag and http_last_modified into caching_info. #…

6fd8f5f

…307

lemon24 added a commit that referenced this issue Sep 17, 2024

parser: s/http_accept/accept/. #307

d780b2e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle 429 Too Many Requests gracefully #307

Handle 429 Too Many Requests gracefully #307

lemon24 commented Jun 4, 2023 •

edited

Loading

lemon24 commented Jul 3, 2024 •

edited

Loading

lemon24 commented Jul 8, 2024 •

edited

Loading

lemon24 commented Jul 14, 2024

lemon24 commented Aug 3, 2024 •

edited

Loading

lemon24 commented Aug 10, 2024 •

edited

Loading

Handle 429 Too Many Requests gracefully #307

Handle 429 Too Many Requests gracefully #307

Comments

lemon24 commented Jun 4, 2023 • edited Loading

lemon24 commented Jul 3, 2024 • edited Loading

lemon24 commented Jul 8, 2024 • edited Loading

lemon24 commented Jul 14, 2024

lemon24 commented Aug 3, 2024 • edited Loading

lemon24 commented Aug 10, 2024 • edited Loading

lemon24 commented Jun 4, 2023 •

edited

Loading

lemon24 commented Jul 3, 2024 •

edited

Loading

lemon24 commented Jul 8, 2024 •

edited

Loading

lemon24 commented Aug 3, 2024 •

edited

Loading

lemon24 commented Aug 10, 2024 •

edited

Loading