Since Google BigQuery is pay-as-you-go and relatively inexpensive, I had been using it without much concern about the fees.
Big data analysis was a breakthrough in that matters that until now could only be guessed at could actually be expressed numerically.
As its usefulness became well known, BigQuery was used in an increasing number of situations. The amount of data has also increased, as people have tried to capture all the data they can.
Then, the BigQuery usage fee gradually rose and became a rather existential amount, and I began to worry about the flat fee, which I had thought was unrelated to the BigQuery fee, so I decided to estimate the BigQuery flat fee.
Fees (Summary)
BigQuery charges primarily for analysis and storage.
Storage fees are proportional to the amount of database space retained. Analysis fees are charged for query execution and can be either pay-as-you-go or flat-rate.
Analysis fees are charged on a pay-as-you-go basis and are proportional to the amount of data read from the database when the query is executed; as of August 2022, the fee is $5 per TB.
Estimation method (pay-as-you-go)
The log of the executed queries can be viewed in the INFORMATION_SCHEMA
job view.
The amount of data read by a query is recorded in total_bytes_billed
, and can be multiplied by the unit cost to determine how much each query is costing in analysis fees.
See below for details.
Analysis fee (fixed amount of money)
A flat fee for analysis is paid for the number of slots secured in a month.
Minimum 100 slots, available in increments of 100 slots for $2,000/month per 100 slots. (Not $200. It's $2,000.)
What is a slot?
BigQuery processes large amounts of data in a short amount of time by executing analytical processes in parallel.
The number of concurrent processes is called the number of slots. With a fixed fee, you purchase that number of slots.
The slots are like CPUs, similar to the image of pre-purchasing CPUs to be used for analysis.
Estimation method (flat-rate billing)
So how many slots should I buy?
In the case of pay-as-you-go billing, the number of slots is not fixed; BigQuery calculates the optimal number of slots for each query and automatically allocates them to execute the query.
In estimating the number of slots for the flat-rate billing, we will examine how many slots were actually used in the metered query execution.
As with the pay-as-you-go method, the INFORMATION_SCHEMA
job view is used to calculate the data.
The actual execution time of the query is job_ms
. The actual execution time of the query is job_ms
, and the time it would take if the query were executed in only one slot is total_slot_ms
.
So, as a rough approximation, divide total_slot_ms
by job_ms
to get the number of slots for the query.
Actual number of slots
The following figure plots the number of slots and amount of data read for each query with the data at hand.
The horizontal axis is the amount of data read and the vertical axis is the number of slots.
In the case of pay-as-you-go, the system is supposed to use up to 2,000 slots to run (it may exceed this in bursts).
- https://cloud.google.com/bigquery/quotas?hl=ja#query_jobs
On-demand rates allow up to 2,000 concurrent slots in a project; BigQuery slots are shared by all queries in a single project; BigQuery may burst beyond this limit to speed up queries. The graph shows that queries are executed using the full 2,000 slots when they are actually available.
Problems with estimates
As can be seen in the graph, the amount of data read is not proportional to the number of slots.
Just because the pay-as-you-go analysis fee is higher does not mean that you should just go ahead and purchase a slot for the same amount.
If you start small, buying only the minimum 100 slots, a query that used to run in 2,000 slots will now run in 100 slots, and in the worst case, the query execution time will increase 20-fold.
In other words, pay-as-you-go and flat-rate billing have different billing axes, so simple comparisons and estimates are not possible.
impressions
Data analysis is a trial-and-error process, and it is hard when query execution time increases 10 or 20 times.
Although the minimum number of slots is 100, the price was higher than we had expected because we would need 2,000 slots if we wanted to guarantee the same execution time as before.
Pay-as-you-go may be an option for very large databases, 24-hour queries with streams, or grueling machine learning with BigQuery ML.