Skip to content

Bigquery Python Library: AppendRows bidi-streaming RPC sends empty x-goog-request-params routing header, causing intermittent InvalidArgument: Cannot route on empty project id ''` #16650

@erezinman

Description

@erezinman

Determine this is the right repository

  • I determined this is the correct repository in which to report this bug.

Summary of the issue

Context

Using BigQueryWriteClient.append_rows() to write data via the Storage Write API default stream (_default), with multiple sequential calls through the same client. This is in a multiprocessing setup where each worker creates its own BigQueryWriteClient() and makes many append_rows calls.

Expected Behavior:

append_rows should populate the x-goog-request-params gRPC routing header with the write_stream resource name (which contains the project ID), similar to how other methods in the same client do it:

# create_write_stream — sets routing correctly
metadata = ... + (gapic_v1.routing_header.to_grpc_metadata((("parent", request.parent),)),)

Actual Behavior:

append_rows sends an empty routing header because it's a bidi-streaming RPC and the request iterator hasn't been consumed when metadata is set:

# client.py, append_rows() — all versions 2.26.0 through 2.37.0
metadata = tuple(metadata) + (gapic_v1.routing_header.to_grpc_metadata(()),)

This produces x-goog-request-params: ''. The first several calls may succeed via the gateway's fallback routing, but after a gRPC reconnection (idle timeout, load balancing), the gateway cannot determine the target project and returns:

google.api_core.exceptions.InvalidArgument: 400 Cannot route on empty project id ''

Reproduction:

from google.cloud.bigquery_storage_v1 import BigQueryWriteClient
from google.api_core.gapic_v1 import routing_header
import inspect

# Verify empty routing in source
source = inspect.getsource(BigQueryWriteClient.append_rows)
assert "to_grpc_metadata(())" in source

# Verify it produces an empty header
assert routing_header.to_grpc_metadata(()) == ("x-goog-request-params", "")

Workaround:

Pass routing metadata explicitly:

metadata = (routing_header.to_grpc_metadata((("write_stream", stream_name),)),)
response = client.append_rows(requests=iter([request]), metadata=metadata)

Environment:

  • google-cloud-bigquery-storage: 2.26.0 through 2.37.0 (all affected)
  • Python 3.9

API client name and version

No response

Reproduction steps: code

file: main.py

   def reproduce():
    # complete code here

Reproduction steps: supporting files

file: mydata.csv

alpha,1,3
beta,2,5

Reproduction steps: actual results

file: output.txtmydata.csv

Calculated: foo

Reproduction steps: expected results

file: output.txtmydata.csv

Calculated: bar

OS & version + platform

No response

Python environment

No response

Python dependencies

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    triage meI really want to be triaged.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions