0% found this document useful (0 votes)
7 views2 pages

Loading and Exporting Data

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views2 pages

Loading and Exporting Data

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Loading and exporting data

Can be done using UI, CLI or api.

Supported data formats:

▪ CSV

▪ JSON (newline delimited only)

▪ Avro

▪ Parquet

▪ ORC

Export from bq – to cloud storage(csv, avro, json). Export limited to 1gb. Can use wildcards to split
into multiple files. “Bq extract”
Bigquery transfer service – import data from other marketing apps. Adwords, doubleclick, youtube
reports

External tables

Supported for cloud storage, bigtable, google drive

Create table definition file with schema, schema can also be autodetected.

Use either temporary or permanent tables. The data is not stored in these tables, its just the schema.
Permanent tables can be used for access control and sharing since table is access controlled.
Temporary tables are for 1-off use. Permanent tables are placed in a dataset.

Queries are not cached in this case

Partitioning

2 options – based on ingestion time or using a particular column already existing. In first scheme,
_PARTITIONTIME is a pseudocolumn that gets added.

_PARTITIONTIME and _PARTITIONDATE are available only in ingestion-time partitioned tables.

For manual partitions, we can use any date or timestamp column.

Another approach to partitioning is to shard the data and put it into separate tables. This has more
overhead because multiple tables are there, we need to maintain access control and schema for each
table separately.

Advanced queries

Analytical window functions:


Aggregate – sum, count
Navigation – lead, lag
Ranking, numbering – rank, cume_dist
“Partition by” is similar to “group by” but doesnt aggregate. This is different from bq partitions how
data is stored.
Types – struct, array, timestamp, int64, float64, string
Inner table can be using WITH
ARRAY_AGG – creates array. UNNEST – break array.
STRUCT – creates struct
User defined functions – sql udf as well as javascript udfs is possible
Udf has constraints – size of udf output is limited, native javascript not supported
Unnest – takes an array and returns table

Streaming

Query while data is getting streamed before data is written to disk100,000 rows/second insertion
rate, use rest apis to insert
Streaming data is available within seconds

Costing

storage cost similar to cloud storage


Older unread data charged lesser
query cost based on data processed. For ingest data, its based on streaming rate.
pay based on usage. there is also a flat rate plan, but its mostly not used
cost optimized by restricting the number of columns for which query is done
Free part – loading, exporting, cached queries, queries on metadata, queries with error.
Cached queries – to save on cost, per user. typical cache lifetime is 24 hours, but the cached results
are best-effort and may be invalidated sooner
Billing – done on the project where job is running irrespective of where the dataset is from

IAM

control at project, dataset, view. Views are virtual tables. There is no direct iam roles for controlling
view access. Views are put in a new dataset and iam control is done at that dataset. This gives a
virtual view access control. Called as authorized view.

Views can also be used to share rows based on particular user. Allowed user is added as a column
and view will match with SESSION_USER to display only for those users

Roles – admin, data owner, editor, viewer, user(can run queries, more permissions than job user), job
user. From primitive roles project owner, editor, viewer are available. For datasets, owner, writer and
reader are available.

Public dataset – accessible to all authenticated users

You might also like