I've got some highly partitionable data that I'd like to store in BigQuery, where each partition would get its own table. My question is if BQ will support the number of tables I'll need.
With my data set, I'd be creating approximately 2,000 new tables daily. All tables would have a 390 day (13 month) expiration, so eventually there'd be a constant count of ~ 2,000 tables * 390 days = ~780,000 tables in this particular project.
I'd test this myself, but BQ only supports a max of 10,000 load jobs per project per day.
Does anyone have experience with this sort of table count? Is there any official table limit provided by Google?
Best How To :
There are projects with that number of distinct tables today. There is not currently a hard cap on the number of distinct tables.
Some related considerations that come to mind when you're contemplating representations that use that many tables:
A query (including referenced views) can currently only reference 1000 tables.
Datasets with large numbers of tables may exhibit problematic behavior when using table wildcard functions.
You may be oversharding. Rather than lots of individual tables, you may simply want to use a wider schema and fewer tables.
If you're heavily dependent on time intervals as a sharding consideration, you may also want to look at table decorators as a way of limiting the scope of data scans.
You may also want to collapse data over time into fewer, larger tables as they age and are less frequently accessed. For example, copy jobs can append multiple source tables into a single destination table.