Determine this is the right repository
Summary of the issue
Context
Using BigQueryWriteClient.append_rows() to write data via the Storage Write API default stream (_default), with multiple sequential calls through the same client. This is in a multiprocessing setup where each worker creates its own BigQueryWriteClient() and makes many append_rows calls.
Expected Behavior:
append_rows should populate the x-goog-request-params gRPC routing header with the write_stream resource name (which contains the project ID), similar to how other methods in the same client do it:
# create_write_stream — sets routing correctly
metadata = ... + (gapic_v1.routing_header.to_grpc_metadata((("parent", request.parent),)),)
Actual Behavior:
append_rows sends an empty routing header because it's a bidi-streaming RPC and the request iterator hasn't been consumed when metadata is set:
# client.py, append_rows() — all versions 2.26.0 through 2.37.0
metadata = tuple(metadata) + (gapic_v1.routing_header.to_grpc_metadata(()),)
This produces x-goog-request-params: ''. The first several calls may succeed via the gateway's fallback routing, but after a gRPC reconnection (idle timeout, load balancing), the gateway cannot determine the target project and returns:
google.api_core.exceptions.InvalidArgument: 400 Cannot route on empty project id ''
Reproduction:
from google.cloud.bigquery_storage_v1 import BigQueryWriteClient
from google.api_core.gapic_v1 import routing_header
import inspect
# Verify empty routing in source
source = inspect.getsource(BigQueryWriteClient.append_rows)
assert "to_grpc_metadata(())" in source
# Verify it produces an empty header
assert routing_header.to_grpc_metadata(()) == ("x-goog-request-params", "")
Workaround:
Pass routing metadata explicitly:
metadata = (routing_header.to_grpc_metadata((("write_stream", stream_name),)),)
response = client.append_rows(requests=iter([request]), metadata=metadata)
Environment:
google-cloud-bigquery-storage: 2.26.0 through 2.37.0 (all affected)
- Python 3.9
API client name and version
No response
Reproduction steps: code
file: main.py
def reproduce():
# complete code here
Reproduction steps: supporting files
file: mydata.csv
Reproduction steps: actual results
file: output.txtmydata.csv
Reproduction steps: expected results
file: output.txtmydata.csv
OS & version + platform
No response
Python environment
No response
Python dependencies
No response
Additional context
No response
Determine this is the right repository
Summary of the issue
Context
Using
BigQueryWriteClient.append_rows()to write data via the Storage Write API default stream (_default), with multiple sequential calls through the same client. This is in a multiprocessing setup where each worker creates its ownBigQueryWriteClient()and makes manyappend_rowscalls.Expected Behavior:
append_rowsshould populate thex-goog-request-paramsgRPC routing header with thewrite_streamresource name (which contains the project ID), similar to how other methods in the same client do it:Actual Behavior:
append_rowssends an empty routing header because it's a bidi-streaming RPC and the request iterator hasn't been consumed when metadata is set:This produces
x-goog-request-params: ''. The first several calls may succeed via the gateway's fallback routing, but after a gRPC reconnection (idle timeout, load balancing), the gateway cannot determine the target project and returns:Reproduction:
Workaround:
Pass routing metadata explicitly:
Environment:
google-cloud-bigquery-storage: 2.26.0 through 2.37.0 (all affected)API client name and version
No response
Reproduction steps: code
file: main.py
Reproduction steps: supporting files
file: mydata.csv
Reproduction steps: actual results
file: output.txtmydata.csv
Reproduction steps: expected results
file: output.txtmydata.csv
OS & version + platform
No response
Python environment
No response
Python dependencies
No response
Additional context
No response