BigQuery+Introduction
BigQuery+Introduction
Source - https://cloud.google.com/docs
Google Cloud Platform : Cloud Training
➤ What is BigQuery?
➤ BigQuery is Google’s serverless cloud storage platform designed for large data
sets. BigQuery in Non-RDBMS column base DataBase in Google Cloud Infra.
➤ Google BigQuery is an enterprise data warehouse built using BigTable and
Google Cloud Platform.
➤ BigQuery works great with all sizes of data, from a 100 row Excel spreadsheet
to several Petabytes of data.
➤ BigQuery is Google’s fully managed solution for companies who need a fully-
managed and cloud based interactive query service for massive datasets.
➤ BigQuery is super-fast and execute search on million of rows in seconds.
➤ BigQuery is great alternative of Apache Hive, and used in analytics.
➤ BigQuery is not solution for Transactional Data Operations. It’s ideal for
BigData Solutions.
Google Cloud Platform : Cloud Training
➤ Why BigQuery?
➤ Service for interactive analysis of massive datasets (TBs)
➤ Query billions of rows: seconds to write, seconds to return
➤ Uses a SQL-style query syntax
➤ It's a service, can be accessed by a API
➤ Reliable and Secure
➤ Replicated across multiple sites
➤ Secured through Access Control Lists
➤ Scalable
➤ Store hundreds of terabytes
➤ Pay only for what you use
➤ Fast
➤ Run ad hoc queries on multi-terabyte data sets in seconds
Google Cloud Platform : Cloud Training
➤ BigQuery Organization
➤ BigQuery is structured as a hierarchy with 4 levels:
➤ Projects: Top-level containers in the Google Cloud Platform
that store the data
➤ Datasets: Within projects, datasets hold one or more tables
of data
➤ Tables: Within datasets, tables are row-column structures
that hold actual data
➤ Jobs: The tasks you are performing on the data, such as
running queries, loading data, and exporting data
Google Cloud Platform : Cloud Training
➤ Projects
➤ Projects are the top-level containers that store the data
➤ Within the project, you can configure settings, permissions, and other
metadata that describe your applications
➤ Each project has a name, ID, and number that you’ll use as identifiers
➤ When billing is enabled, each project is associated with one billing account
but multiple projects can be billed to the same account
➤ DataSets
➤ Datasets allow you to organize and control access to your tables
➤ All tables must belong to a dataset. You must create a dataset before loading
data into BigQuery
➤ You can configure permissions at the organization, project, and dataset level
Google Cloud Platform : Cloud Training
➤ Tables
➤ Tables contain your data in BigQuery
➤ Each table has a schema that describes the data contained
in the table, including field names, types, and descriptions
➤ BigQuery supports the following table types:
➤ Native tables: tables backed by native BigQuery storage
➤ External tables: tables backed by storage external to
BigQuery
➤ Views: virtual tables defined by a SQL query
Google Cloud Platform : Cloud Training
➤ Jobs
➤ Jobs are objects that manage asynchronous tasks such as running
queries, loading data, and exporting data
➤ You can run multiple jobs concurrently
➤ Completed jobs are listed in the Jobs collection
➤ There are four types of jobs:
➤ Load: load data into a table
➤ Query: run a query against BigQuery data
➤ Extract: export a BigQuery table to Google Cloud Storage
➤ Copy: copy an existing table into another new or existing table
Google Cloud Platform : Cloud Training