[Q] How to download a lot of histories? #7778

mbacvanski · 2024-06-09T17:11:54Z

I have several thousand runs in a project, and I'd like to download all their histories together. Manually looping over all the runs and querying their history with run.history(...) takes a very long time (hours), and it looks like the implementation of runs.histories(...) does the same thing.

If I query the runs in parallel, I get a requests.exceptions.HTTPError: 429 Client Error: Too Many Requests for url: https://api.wandb.ai/graphql. Any suggestions on what to do?

The text was updated successfully, but these errors were encountered:

exalate-issue-sync · 2024-06-10T23:25:20Z

Jason Davenport commented:
When dealing with a large number of runs, running into rate limits (HTTP 429 errors) is a common issue. Here are some strategies to handle this more efficiently:

Batch Requests: Instead of querying histories sequentially, use batching to minimize the number of API calls.
Retry Logic with Exponential Backoff: Implement a retry mechanism that waits for progressively longer periods before retrying a request.
Throttle Requests: Implement a throttle mechanism to ensure you stay within the API rate limits.

Here’s an example implementation:

import wandbimport timeimport pandas as pdfrom wandb.apis.public import Apifrom requests.exceptions import HTTPErrorInitialize W&B APIapi = Api()Function to fetch history of a single run with retries and exponential backoffdef fetch_run_history(run, max_retries=5, backoff_factor=1):for attempt in range(max_retries):try:return run.history()except HTTPError as e:if e.response.status_code == 429:# Too many requests, wait before retryingwait = backoff_factor * (2 ** attempt)print(f"Rate limit exceeded. Retrying in {wait} seconds…")time.sleep(wait)else:raise eraise Exception("Max retries exceeded")Function to fetch histories of all runs in a projectdef fetch_all_histories(project_name, max_retries=5, backoff_factor=1, batch_size=100):runs = api.runs(project_name)histories = []for i in range(0, len(runs), batch_size): batch = runs[i:i + batch_size] for run in batch: try: history = fetch_run_history(run, max_retries, backoff_factor) histories.append((run.name, history)) except Exception as e: print(f"Failed to fetch history for run {run.name}: {e}")return historiesFetch all histories for the projectproject_name = "your_project_name"histories = fetch_all_histories(project_name)Combine histories into a single DataFramecombined_histories = []for run_name, history in histories:history['run_name'] = run_namecombined_histories.append(history)df_combined = pd.concat(combined_histories, ignore_index=True)Save to CSV or handle as neededdf_combined.to_csv("combined_histories.csv", index=False)

The fetch_run_history function includes a retry mechanism with exponential backoff. If a rate limit error (HTTP 429) occurs, it waits for a progressively longer time before retrying. The fetch_all_histories function processes runs in batches to reduce the number of API calls made simultaneously. After fetching histories, they are combined into a single DataFrame.

This approach should help you download run histories more efficiently without excessively hitting API rate limits.

exalate-issue-sync · 2024-06-12T18:02:20Z

Jason Davenport commented:
Hi there, I wanted to follow up on this request. Please let us know if we can be of further assistance or if your issue has been resolved.

exalate-issue-sync · 2024-06-14T15:41:02Z

Jason Davenport commented:
Hi Internal, since we have not heard back from you we are going to close this request. If you would like to re-open the conversation, please let us know!

DavidEnriqueNieves · 2024-11-04T15:13:44Z

On a related note, is there a way to access multiple histories through the GraphQL API? Is that API even working at this time?

-David

kptkin added a:sdk Area: sdk related issues c:sdk:public-api Component: All the issues that relate to wandb.Api with the exception of the public api of Artifacts labels Jun 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Q] How to download a lot of histories? #7778

[Q] How to download a lot of histories? #7778

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[Q] How to download a lot of histories? #7778

[Q] How to download a lot of histories? #7778

Comments

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!