Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bigquery: set PreferredMinStreamCountwhile creating CreateReadSessionRequest #8432

Closed
k-anshul opened this issue Aug 17, 2023 · 2 comments · Fixed by #8476
Closed

bigquery: set PreferredMinStreamCountwhile creating CreateReadSessionRequest #8432

k-anshul opened this issue Aug 17, 2023 · 2 comments · Fixed by #8476
Assignees
Labels
api: bigquery Issues related to the BigQuery API. priority: p3 Desirable enhancement or fix. May not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@k-anshul
Copy link

k-anshul commented Aug 17, 2023

Is your feature request related to a problem? Please describe.
While creating CreateReadSessionRequest we are setting MaxStreamCount as 0 which as I understand will leave the decision on server to choose the correct stream count.
On my testing on various public and private data sources I always found the number of stream to be 1.
I would like to set PreferredMinStreamCount as readClientSettings.maxWorkerCount as hint to the server for desired minimum streams.

Describe the solution you'd like
I would like to set PreferredMinStreamCount as readClientSettings.maxWorkerCount as hint to the server for desired minimum streams.

Describe alternatives you've considered
Since there is no option in sdk that allow user to configure this there are no other alternatives.

Additional context
https://cloud.google.com/bigquery/docs/reference/storage/rpc/google.cloud.bigquery.storage.v1#createreadsessionrequest

@k-anshul k-anshul added the triage me I really want to be triaged. label Aug 17, 2023
@product-auto-label product-auto-label bot added the api: bigquery Issues related to the BigQuery API. label Aug 17, 2023
@noahdietz noahdietz added type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. and removed triage me I really want to be triaged. labels Aug 17, 2023
@alvarowolfx alvarowolfx added the priority: p3 Desirable enhancement or fix. May not be included in next release. label Aug 18, 2023
@alvarowolfx
Copy link
Contributor

@k-anshul I think the configuration that you want to tweak here is the maxStreamCount instead of maxWorkerCount, because that's what we send to the Storage Read API as can be checked here. And the default value for it is already 0, which will signal the backend to produce as many streams to generate some reasonable throughput. But in some scenario, like ordered queries, our client and the backend can produce only one stream, so ordering can be guaranteed.

In the light of that, can you give more details on your use case ? I'm assuming that you're using the Storage API through the EnableStorageReadClient method. Maybe what is happening on your case is that your queries are ordered, which will cause only one stream to be generated.

@k-anshul
Copy link
Author

@alvarowolfx Thanks for your reply.
I am running a select * from table limit <num> query on both public and private sources so ordering is not required in my case. Also I confirmed that maxStreamCount being set is zero. I also confirmed that I am never getting more than 1 stream (on similar queries).
I was referring to additionally setting PreferredMinStreamCount equal to readClientSettings.maxWorkerCount or some multiple to provide a hint to server to give more than 1 stream. I tested with setting this and could verify that server is returning more than 1 streams now. I can also raise a PR for this change.
Sample public datasets on which I ran query :

SELECT * FROM `bigquery-public-data.covid19_open_data.compatibility_view` LIMIT 10000000

gcf-merge-on-green bot pushed a commit that referenced this issue Aug 30, 2023
#8476)

`PreferredMinStreamCount` must be less than or equal to `MaxStreamCount`, so we only set it when the `MaxStreamCount` is 0, which basically sets no limit to it.

Resolves #8432
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the BigQuery API. priority: p3 Desirable enhancement or fix. May not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants
@shollyman @alvarowolfx @noahdietz @k-anshul and others