Loading and Exporting Data
Loading and Exporting Data
▪ CSV
▪ Avro
▪ Parquet
▪ ORC
Export from bq – to cloud storage(csv, avro, json). Export limited to 1gb. Can use wildcards to split
into multiple files. “Bq extract”
Bigquery transfer service – import data from other marketing apps. Adwords, doubleclick, youtube
reports
External tables
Create table definition file with schema, schema can also be autodetected.
Use either temporary or permanent tables. The data is not stored in these tables, its just the schema.
Permanent tables can be used for access control and sharing since table is access controlled.
Temporary tables are for 1-off use. Permanent tables are placed in a dataset.
Partitioning
2 options – based on ingestion time or using a particular column already existing. In first scheme,
_PARTITIONTIME is a pseudocolumn that gets added.
Another approach to partitioning is to shard the data and put it into separate tables. This has more
overhead because multiple tables are there, we need to maintain access control and schema for each
table separately.
Advanced queries
Streaming
Query while data is getting streamed before data is written to disk100,000 rows/second insertion
rate, use rest apis to insert
Streaming data is available within seconds
Costing
IAM
control at project, dataset, view. Views are virtual tables. There is no direct iam roles for controlling
view access. Views are put in a new dataset and iam control is done at that dataset. This gives a
virtual view access control. Called as authorized view.
Views can also be used to share rows based on particular user. Allowed user is added as a column
and view will match with SESSION_USER to display only for those users
Roles – admin, data owner, editor, viewer, user(can run queries, more permissions than job user), job
user. From primitive roles project owner, editor, viewer are available. For datasets, owner, writer and
reader are available.