8000 [Data] When writing on BigQuery, Google's "TooManyRequests" exceptions is not retried · Issue #53997 · ray-project/ray · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
[Data] When writing on BigQuery, Google's "TooManyRequests" exceptions is not retried #53997
Open
@Ouadada

Description

@Ouadada

What happened + What you expected to happen

1. The bug
When writing a Ray dataset in a BigQuery table using the dataset's method "write_bigquery()", my job fails because of an exception from GCP API - google.api_core.exceptions.TooManyRequests- indicating that I reached GCP quotas regarding the number of operations on a table. It seems that no retries are triggered before the exception is raised.

2. Expected behavior
I would expect that this king of exceptions relative to quotas is retried.

The trace:

RayTaskError(TooManyRequests): ray::Write() (pid=354836, ip=172.19.29.23)
    for b_out in map_transformer.apply_transform(iter(blocks), ctx):
  File "/home/project/lib/python3.10/site-packages/ray/data/_internal/execution/operators/map_transformer.py", line 253, in __call__
    yield from self._block_fn(input, ctx)
  File "/home/project/lib/python3.10/site-packages/ray/data/_internal/planner/plan_write_op.py", line 26, in fn
    write_result = datasink_or_legacy_datasource.write(blocks, ctx)
  File "/home/project/lib/python3.10/site-packages/ray/data/_internal/datasource/bigquery_datasink.py", line 125, in write
    ray.get(
ray.exceptions.RayTaskError(TooManyRequests): ray::_write_single_block() (pid=314835, ip=172.19.29.23)
  File "/home/project/lib/python3.10/site-packages/ray/data/_internal/datasource/bigquery_datasink.py", line 96, in _write_single_block
    logger.info(job.result())
  File "/home/project/lib/python3.10/site-packages/google/cloud/bigquery/job/base.py", line 1003, in result
    return super(_AsyncJob, self).result(timeout=timeout, **kwargs)
  File "/home/project/lib/python3.10/site-packages/google/api_core/future/polling.py", line 261, in result
    raise self._exception
google.api_core.exceptions.TooManyRequests: 429 Exceeded rate limits: too many table update operations for this table. For more information, see https://cloud.google.com/bigquery/docs/troubleshoot-quotas; reason: rateLimitExceeded, location: table.write, message: Exceeded rate limits: too many table update operations for this table. For more information, see https://cloud.google.com/bigquery/docs/troubleshoot-quotas

3. Useful information
This behavior appeared after I updated the version of google-cloud-bigquery.
I believe it comes from this google-cloud-bigquery PR deployed in version 3.26.0.
From what I understood, when the user experiences a rateLimitExceeded condition, Google now raises an http.client.TooManyRequests instead of a http.client.FORBIDEN.
The issue is that Ray retries the wirting of single blocks only for Forbidden exceptions. In Ray's bigquery_datasink.py:

except exceptions.Forbidden as e:
    retry_cnt += 1
    if retry_cnt > self.max_retry_cnt:
        break
    logger.info(
        "A block write encountered a rate limit exceeded error"
        + f" {retry_cnt} time(s). Sleeping to try again."
    )
    logging.debug(e)
    time.sleep(RATE_LIMIT_EXCEEDED_SLEEP_TIME)

A fix could be:

except (exceptions.Forbidden, exceptions.TooManyRequests)

Versions / Dependencies

python=3.10.13
ray=2.34.0
google-cloud-bigquery=3.25.0
os: Debian GNU/Linux 12 (bookworm)

Reproduction script

import pandas as pd
import random
import string
import ray

PROJECT_ID = "project_id"
DATASET = "dataset_name.table_name"

def random_string(length=5):
    return ''.join(random.choices(string.ascii_letters, k=length))

data = {
    "col1": [random_string() for _ in range(100_000_000)],
    "col2": [random_string() for _ in range(100_000_000)],
    "col3": [random_string() for _ in range(100_000_000)],
}

df = pd.DataFrame(data)

ray.init()

ray_dataset = ray.data.from_pandas(df)

ray_dataset.write_bigquery(
    project_id=PROJECT_ID,
    dataset=DATASET,
    overwrite_table=True
)

Issue Severity

Medium: It is a significant difficulty but I can work around it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething that is supposed to be working; but isn'tdataRay Data-related issuesstabilitytriageNeeds triage (eg: priority, bug/not-bug, and owning component)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0