Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pagination support to list_models() API for systematic model discovery #2741

Open
darwich6 opened this issue Jan 8, 2025 · 2 comments
Open
Labels
enhancement New feature or request

Comments

@darwich6
Copy link

darwich6 commented Jan 8, 2025

Is your feature request related to a problem? Please describe.
The current list_models() API only supports a limit parameter without true pagination support. This makes it impossible to systematically discover models beyond the initial limit. For example, when fetching most downloaded models:

models = hf.list_models(
filter="text-generation",
sort="downloads",
direction=-1,
limit=100
)

This will always return the same top 100 models unless a new model has overtaken the top 100 model downloads. There's no way to get models 101-200, unless you change the limit to 200 and grab the 1-200 records. This is problematic for services that need to:

  • Discover new models systematically
  • Process models in smaller batches
  • Index or monitor the full model ecosystem

Describe the solution you'd like

Add proper pagination support to the API by either:

  1. Adding an offset parameter:
python
models = hf.list_models(
filter="text-generation",
sort="downloads",
limit=100,
offset=100 # Get next 100 models
)
  1. Or exposing the internal cursor-based pagination that's already used by paginate():
python
response = hf.list_models(
filter="text-generation",
limit=100,
cursor="next_page_token" # From previous response
)
next_cursor = response.next_cursor

Describe alternatives you've considered

Current workarounds we've tried:

  1. Fetching very large batches (1000+ models) and filtering locally
  2. Using different sort criteria to try to get different models
  3. Using the search parameter with different queries

None of these provide a reliable way to systematically discover all models and ensure we are getting different models with each call.

Additional context
Looking at the source code, the API already uses internal pagination via paginate():

items = paginate(path, params=params, headers=headers)
if limit is not None:
items = islice(items, limit)

Exposing this functionality would align with common API practices and enable better tooling around the Hub's model ecosystem.

@hanouticelina
Copy link
Contributor

Hi @darwich6,
sorry for the late answer! thanks for this feature request. cursor-based pagination is already implemented server-side in the /api/models endpoint, but we haven't exposed it yet in HfApi.list_models(). we'll prioritize adding this feature.

in the meantime, here's a small script that leverage the cursor-based pagination:

from typing import Literal, Optional
from urllib.parse import parse_qs, urlparse

from huggingface_hub.utils import get_session, hf_raise_for_status


class ModelIterator:
    def __init__(self, items, next_cursor: Optional[str] = None):
        self.items = items
        self.next_cursor = next_cursor

    def __iter__(self):
        yield from self.items


def list_models_with_cursor(
    *,
    filter: Optional[str] = None,
    limit: int = 100,
    direction: Optional[Literal[-1]] = None,
    cursor: Optional[str] = None,
) -> ModelIterator:
    """
    List models with cursor-based pagination.
    """
    url = "https://huggingface.co/api/models"
    params = {
        "filter": filter,
        "limit": limit,
        "direction": direction,
        "cursor": cursor,
    }

    response = get_session().get(url, params=params)
    hf_raise_for_status(response)
    next_url = response.links.get("next", {}).get("url")
    next_cursor = None
    if next_url:
        next_cursor = parse_qs(urlparse(next_url).query)["cursor"][0]

    return ModelIterator(response.json(), next_cursor)


response = list_models_with_cursor(filter="text-generation", limit=100)
if response.next_cursor:
    next_response = list_models_with_cursor(filter="text-generation", limit=100, cursor=response.next_cursor)

@hanouticelina hanouticelina added the enhancement New feature or request label Jan 10, 2025
@darwich6
Copy link
Author

Thank you for your prioritization and response! Looking forward to the feature!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants