0% found this document useful (0 votes)
15 views

BigQuery+Introduction

easy start for bq

Uploaded by

shubhmt9110
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

BigQuery+Introduction

easy start for bq

Uploaded by

shubhmt9110
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

GCP: BigQuery Introduction

Source - https://cloud.google.com/docs
Google Cloud Platform : Cloud Training

➤ Why Do You Need a Data Warehouse?


➤ A data warehouse is the most valuable asset of your BI team
➤ How it works:
➤ Data are extracted on a periodic basis from source systems and moved to a dedicated
server that contains the data warehouse
➤ During this process, the data are cleaned, formatted, validated, reorganized,
summarized, and integrated with other sources
➤ A data warehouse delivers value to companies through:
➤ The generation of scheduled reports
➤ Packaged analytical solutions
➤ Adhoc reporting and analysis
➤ Dynamic visualization
➤ Storage of historical data
➤ Data mining
Google Cloud Platform : Cloud Training

➤ Choosing a Data Warehouse


➤ There are many factors to consider when choosing a data warehouse:
➤ Assets: generation of big data reports requires expensive servers
➤ People: skilled database administrators are needed to manage data
integrity
➤ Cost: interacting with big data can be expensive, slow, and
inefficient
➤ Scale: how much storage is needed and will storage needs change
over time?
➤ Security: how is data protected to ensure availability and
durability?
Google Cloud Platform : Cloud Training

➤ What is BigQuery?
➤ BigQuery is Google’s serverless cloud storage platform designed for large data
sets. BigQuery in Non-RDBMS column base DataBase in Google Cloud Infra.
➤ Google BigQuery is an enterprise data warehouse built using BigTable and
Google Cloud Platform.
➤ BigQuery works great with all sizes of data, from a 100 row Excel spreadsheet
to several Petabytes of data.
➤ BigQuery is Google’s fully managed solution for companies who need a fully-
managed and cloud based interactive query service for massive datasets.
➤ BigQuery is super-fast and execute search on million of rows in seconds.
➤ BigQuery is great alternative of Apache Hive, and used in analytics.
➤ BigQuery is not solution for Transactional Data Operations. It’s ideal for
BigData Solutions.
Google Cloud Platform : Cloud Training

➤ Why BigQuery?
➤ Service for interactive analysis of massive datasets (TBs)
➤ Query billions of rows: seconds to write, seconds to return
➤ Uses a SQL-style query syntax
➤ It's a service, can be accessed by a API
➤ Reliable and Secure
➤ Replicated across multiple sites
➤ Secured through Access Control Lists
➤ Scalable
➤ Store hundreds of terabytes
➤ Pay only for what you use
➤ Fast
➤ Run ad hoc queries on multi-terabyte data sets in seconds
Google Cloud Platform : Cloud Training

➤ BigQuery Organization
➤ BigQuery is structured as a hierarchy with 4 levels:
➤ Projects: Top-level containers in the Google Cloud Platform
that store the data
➤ Datasets: Within projects, datasets hold one or more tables
of data
➤ Tables: Within datasets, tables are row-column structures
that hold actual data
➤ Jobs: The tasks you are performing on the data, such as
running queries, loading data, and exporting data
Google Cloud Platform : Cloud Training

➤ Projects
➤ Projects are the top-level containers that store the data
➤ Within the project, you can configure settings, permissions, and other
metadata that describe your applications
➤ Each project has a name, ID, and number that you’ll use as identifiers
➤ When billing is enabled, each project is associated with one billing account
but multiple projects can be billed to the same account
➤ DataSets
➤ Datasets allow you to organize and control access to your tables
➤ All tables must belong to a dataset. You must create a dataset before loading
data into BigQuery
➤ You can configure permissions at the organization, project, and dataset level
Google Cloud Platform : Cloud Training

➤ Tables
➤ Tables contain your data in BigQuery
➤ Each table has a schema that describes the data contained
in the table, including field names, types, and descriptions
➤ BigQuery supports the following table types:
➤ Native tables: tables backed by native BigQuery storage
➤ External tables: tables backed by storage external to
BigQuery
➤ Views: virtual tables defined by a SQL query
Google Cloud Platform : Cloud Training

➤ Jobs
➤ Jobs are objects that manage asynchronous tasks such as running
queries, loading data, and exporting data
➤ You can run multiple jobs concurrently
➤ Completed jobs are listed in the Jobs collection
➤ There are four types of jobs:
➤ Load: load data into a table
➤ Query: run a query against BigQuery data
➤ Extract: export a BigQuery table to Google Cloud Storage
➤ Copy: copy an existing table into another new or existing table
Google Cloud Platform : Cloud Training

➤ BigQuery supports SQL-like query, which makes it user-


friendly.
➤ BigQuery accessible via its web UI, command-line tool, or
client library (written in C#, Go, Java, Node.js, PHP, Python,
and Ruby) using RestAPI.
➤ BigQuery use the Columnar storage.
➤ BigQuery achieves very high compression ratio and scan
throughput. BigQuery, user can directly operate on
compressed data without decompressing it.
Will see you in Next Lecture…

See you in next lecture …

You might also like