Big Query Content
Big Query Content
BigQuery offers scalable, flexible pricing options to meet your technical needs and
your
budget.
Mainly you are charged for the Storage i.e. the amount of data you store in tables
and
Other than that mostly all operations like Loading data, copying data, exporting
data
There are 2 pricing models that you can opt for – On-demand pricing or flat-rate
pricing.
On-demand pricing as the name suggests charges you only when you run the query.
There is no lump sum or monthly cost, you just pay for the queries you run.
Charges for the queries are decided by using 1 metric which is the number of bytes
processed
You are charged for the number of bytes processed, no matter where the data is
stored, it can
But you can change this billing model to flat-rate billing or can even have a mix
and match of
This pricing option is best for customers who desire a stable cost for queries.
Flat-rate customers purchase dedicated resources for query processing and are not
charged on
demand for individual queries.
When you enroll in flat-pricing you basically purchase slot commitments or you can
say a
You can fire any number of queries with any data size within the allotted
processing capacity,
This model is pretty much flexible as well, because once you get your capacity
allocated,
you can distribute this capacity across your organization, by reserving pools of
capacity
And In cases when your capacity demands exceed your committed capacity, then also
you will
not be charged additional fees, no additional slots would be given to you rather
BigQuery
Its like queue up the tasks until the running tasks are finished and they free up
some slots
Moving next, we have Flex slots also known as short-term commitments where the
Commitment duration
is only 60 seconds.
After 60 seconds you can keep the Flex slots with you for as long as you want or
cancel
them any time and you will be charged only for the seconds your commitment was
deployed.
now You would be wondering why someone would need a commitment of only 60 seconds.
Actually, flex slots are a good way to test how your workloads are going to perform
They are also useful for handling seasonal demand big sale in e-commerce site etc
events.
------------------------------------------------------------------------------
1 practice that can control communication between slots is that you try to reduce
the
amount of data that is processed before a JOIN clause.
As join operation lets a query to jump from one table to another and comes with lot
of
shuffling so it a good practice to trim the data in the query as early as possible
before
a Join clause.
Less data going to join clause means less shuffling which in turn means better
performance.
To avoid shuffling, BigQuery broadcasts some of your small tables in Join query to
every
processing node.
To allow proper broadcasting always write the join query with decreasing size of
tables.
So yeah, I guess that’s all you can do from data scanning and shuffling perspective
to
-----------------------------------------------------------------------------
It is true that whatever, functions, aggregations, transformations you apply within
a query,
More are the transformations; more computation will be there and more time query
will take
to produce output.
It is a common use case to use SQL to perform ETL, there you have to write a number
of functions
but as a best practice what you can do is you can write the transformed data into
another
For example, if in your query, you are having trim statements, regular expressions
or even
some UDFs, then it is performant to write the transformed results into a new table
and
then do the aggregations or other things on the new table because now when you do
aggregations
on the new table, it would be done in a much efficient way as this time there is no
overhead
Use order by only in the outermost query or within the window clause because in
outer
query means the final data on which ordering is to be performed is already filtered
and
reduced so you would be doing sorting on a subset of data and not the unnecessary
data that is already filtered.
Actually not only order by.. whatever complex operations are there such as regular
expressions
And yes, it is also a good practice to use limit whenever you are using an order by
clause
because.
Since order by means sorting of whole data so it must be done on a single slot and
if
you are attempting to order a very large result set, the final sorting can
overwhelm the slot
that is processing the data, hence sometimes throwing an resources exceeded error
and FYI,
Resources exceeded is returned when your query uses too many resources.
Going next is, In what order shall we place the tables in a Join query.
Even though Bigquery’s optimizer can determine which table should be on which side
of the
join while creating its execution plan, but it is still recommended to order your
joined
tables appropriately.
The best practice is to place the largest table first, followed by the decreasing
size
of tables.
In broadcast join, the whole data of small table present on right side can be
broadcasted
to each slot that processes the larger table which results in lesser I/O requests
out of
When evaluating the output data, you should consider How many bytes are written for
your
result set?
It should not happen they you want to just see few rows of output and you are not
including
Limit clause might not restrict the data being read but can definitely restrict the
amount
of data to be written.
Also, If you are writing results to a permanent (destination) table, the amount of
data written
-----------------------------------------------------------------------------------
----------------------------------
partitioning:
select
X733SPECIFICPROB
,IDENTIFIER
,NODE
,SEVERITY
,FIRSTOCCURRENCE
,LASTOCCURRENCE
,SITEID
,CONTROLNE
,NODETYPE
,CLEARTIME
,EMS_NAME
from `bmas-eu-mbnl-data-prod.ONEFM_SEMANTIC.F_ALARM`
-----------------------------------------------------------------------------------
----------------------------------------
cache