0% found this document useful (0 votes)
52 views33 pages

BDA04 GoogleCloud

This document discusses Google Cloud and BigQuery. It begins by providing an analogy that cloud computing works similarly to a public water utility, where resources can be accessed on demand without needing to maintain the infrastructure. It then discusses considerations for using cloud services for big data processing and the advantages of operating on a public cloud. BigQuery is introduced as Google's cloud data warehouse that allows users to run SQL queries on large datasets. Several examples are provided of how organizations are using BigQuery, including analyzing liquor sales data and flight records. The document concludes by discussing exporting query results to JSON and visualizing data with tools like Gephi.

Uploaded by

Gargi Jana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views33 pages

BDA04 GoogleCloud

This document discusses Google Cloud and BigQuery. It begins by providing an analogy that cloud computing works similarly to a public water utility, where resources can be accessed on demand without needing to maintain the infrastructure. It then discusses considerations for using cloud services for big data processing and the advantages of operating on a public cloud. BigQuery is introduced as Google's cloud data warehouse that allows users to run SQL queries on large datasets. Several examples are provided of how organizations are using BigQuery, including analyzing liquor sales data and flight records. The document concludes by discussing exporting query results to JSON and visualizing data with tools like Gephi.

Uploaded by

Gargi Jana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Google Cloud and Big Query

Shankar Venkatagiri
UTILITY
➤ Vivek Kundra, rst CIO of the US
➤ There was a time when every household, town, farm or village
had its own water well. Today, shared public utilities give us
access to clean water by simply turning on the tap.
➤ Cloud computing works in a similar fashion.
➤ Just like water from the tap in your kitchen, cloud computing
services can be turned on or o quickly as needed.
➤ Like at the water company, there is a team of dedicated
professionals making sure the service provided is safe,
secure and available on a 24/7 basis.
➤ When the tap isn’t on, not only are you saving water, but you
aren’t paying for resources you don’t currently need.
fi
ff

CHOICES

➤ For business-oriented big data processing, you will require


cloud services provisioned on-premise or remotely
➤ Q: What considerations must you make?
➤ Advantages of operating on a public cloud
➤ Non-ownership model, scalable, available, low IT support
➤ Provider exploits economies of scale to o er better service
➤ Three levels
➤ Hardware (IaaS) - Amazon/Microsoft/Google/IBM/…
➤ Systems software (PaaS)- Oracle RDBMS/SAP ERP/…
➤ Applications software (SaaS) - SalesForce/Databricks/…

ff

GOOGLE CLOUD

➤ Q: What does a cloud “farm” look like?


➤ Video: Inside a Google Data Center (upto 3:30)
➤ GCP: One stop online shop for compute, storage, networking,
dev-ops, tools (e.g. container deployment), big data, AI
➤ Video: Welcome to GCP
➤ Business Case: Google Cloud at UPS

Compute Cost estimate:


USD 150/month
= INR 12,000 that’s all!
⬅︎

STORAGE + NETWORKING
BIG QUERY

➤ Video: What is It? (upto 2:13)


➤ Queries are launched using a SQL interface
➤ Slightly di erent version with REGEXP
➤ Homework: This video by Garth Schulte is a great intro
ff

ADVANTAGES

➤ Column-based storage helps queries run much faster


➤ Uses MapReduce on the back, just like in Hadoop/Hive.
Fault tolerance is a given
➤ All data thrown into BigQuery is encrypted by default!
➤ ACID semantics - atomic/consistent/isolated/durable
➤ Business Case: BigQuery helps serve Ads at Scale at Teads

CASE: LIQUOR SALES

➤ “Wholesale purchase of liquor in


the State of Iowa by retailers.”
➤ Start date: 1st January 2012
➤ Updated monthly
➤ Current size (7th January 2021)
➤ ~5.3 Gigabytes!
➤ BigQuery has made this dataset publicly available
➤ https://console.cloud.google.com/marketplace/product/
iowa-department-of-commerce/iowa-liquor-sales

TRYOUT

➤ Open up BigQueries.docx - Load each query and run it


➤ For each item, let’s query the volumes sold by liters
➤ BQ gives us an idea of how much data will be accessed
➤ Activity: Save results as a table, export results in JSON

JSON

➤ JavaScript Object Notation Images courtesy: https://www.json.org/

➤ Lightweight data interchange format, which is more


expressive than CSV
➤ Contents can have name-value pairs

➤ Or we can have arrays


➤ {“display_name”:”shankar”, “age”: 99, ”reputation”: 0,
“interests”: [“healthcare”, “bigdata”, “opensource”]}

GIS Filename
BigQueries.docx

Unclean data -
Needs cleaning!

➤ BigQuery has a variety of functions (E.g. ST_GEOGFROMTEXT)


that support GIS processing
:

VISUALISE
SALES

➤ Aggregate sales in ‘000 USD


Liquor stores in Downtown LA

Liquor stores in Beverly Hills, CA


CASE: FLIGHTS

➤ 10 years of ights: https://bigquery.cloud.google.com/table/


bigquery-samples:airline_ontime_data. ights
➤ Schema are straightforward
fl
fl

DETAILS
➤ Over 70 million ights recorded

fl
FLIGHTS

➤ Q: Departing from
which airports?
➤ Over 2002-2012, which airlines
SUMMARY operated how many ights?
➤ Modify the query to obtain this
report year by year by airline

fl

NETWORK

➤ Consider the largest airline: Southwest (WN, Code: 19393).


➤ Q: In what network of cities did WN operate during 2012?
➤ Save the results as a CSV le
➤ Import the spreadsheet into Gephi - visualise and analyse
fi

CASE: STACKOVERFLOW

➤ Focus on the users table


➤ Check the schema
➤ 12.5 million rows!

Add fields with clicks


REPUTATION on the table schema!

➤ Q: Have a techie position open at your company?


➤ Recruiters go hunting@Github versus farming@LinkedIn

WHOIS…
# 27
➤ Bengaluru, we have a winner!
PYTHON
➤ Want to run this later? Create Google credentials!
SHAKESPEARE

➤ Open work/BigQuery/BigQuery.ipyn
➤ Invoke queries from within Python
b

Image:
Christian Tiller Photography
➤ Data science, analytics, geographic mapping, and business intelligence are
essential to how Audubon does its work across business teams including
conservation, science, advocacy, marketing, fundraising, and nance
➤ Signi cant experience in leadership/management role leading data engineers,
architects, and/or analysts building and working with data lakes, warehouses,
marts, pipelines, and providing reporting or analytical services
➤ Experience with Python, Java, Scala, .NET, R or other programming languages
proven to be robust and widely used for ETL/ELT and data analytics
➤ Deep SQL experience and familiarity with cloud data warehouse solutions like
Redshift, Snow ake, Azure Data Warehouse, and/or BigQuery
➤ Experience with big data architectures and data modeling to e ciently process
large volumes of data, including solutions like Spark, Hadoop, EMR, etc.
➤ Demonstrated ability to discuss technical information to non-technical audiences.
Strong public speaking and writing skills are critical for success

fi
fl
ffi

fi

➤ We can also programmatically access BigQuery with R


➤ Open BigQueriesWithR.R and modify Project ID

BUSINESS CASE

➤ Google’s enterprise data warehousing platform


➤ Video: Evolution of data warehousing (upto 6:30)

BUSINESS CASE

➤ Video: The Vodafone Business Case (upto 19:52)


➤ Simon Harris, Head of Big Data and Cloud Analytics

You might also like