0% found this document useful (0 votes)

144 views15 pages

Activity Overview - Course 3 Module 3 Google Data ANALYTICS

Uploaded by

julianoftheeast

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

144 views15 pages

Activity Overview - Course 3 Module 3 Google Data ANALYTICS

Uploaded by

julianoftheeast

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 15

Activity overview

Recently, you’ve been thinking about identifying good data sources that would be useful for analysis.
You also spent some time in a previous activity exploring a public dataset in BigQuery and writing some
basic SQL queries. In addition to using public data in BigQuery, your future data career will involve
importing data from other sources. In this activity, you will create a custom table and dataset that you’ll
load into a new table and query.

By the time you complete this activity, you will be able to load your own data into BigQuery for analysis.
This will enable you to import your own data sources into BigQuery, which is a skill that will enable you
to more effectively analyze data from different sources.

What you will need

To get started, download the baby names data zip file. This file contains about 7 MB of data about
popular baby names from the U.S. Social Security Administration website.

Select the link to the baby names data zip file to download it.

Link to baby names data: names.zip

Create a custom table

Once you have the zip file downloaded, import it into BigQuery to query and analyze. In order to do that,
create a new dataset and a custom table.

Step 1: Unzip the file

Unzip the file you downloaded onto your computer to access it on BigQuery. Once you have unzipped
the file, find a .pdf file titled NationalReadMe that contains more information about the dataset. This
dataset tracks the popularity of baby names for each year; you can find text files labeled by the year they
contain. Open yob2014.txt to preview the data. You will notice that it’s a .csv file with three columns.
Remember where you saved this folder so you can reference it later.

Step 2: Create a dataset

Before uploading your .txt file and creating a table to query, you will need to create a dataset to upload
your data into and store your tables.

1. From the BigQuery console, go to the Explorer pane in your workspace and select the three dots to
open a menu. From here, select Create dataset.
2. This will open the Create dataset menu. This is where you will fill out information about the dataset.
Input the Dataset ID as babynames and set the Data location to Multi-region (US). Once you have
finished filling out this information, select the blue CREATE DATASET button at the bottom of the
menu.
Step 3: Create a table

Now that you have a custom dataset stored in your project space, this is where you will add your table.

1. Select the newly created babynames dataset. Check the tabs in your Dataset info window and select
the first blue + CREATE TABLE button. This will open another menu in your console.
2. In the Source section, select the Upload option in Create table from. Then select Browse to open your
files. Find and open the yob2014.txt file. Set the file format to .csv. In the Destination section, in the
Table data box, name your table as names_2014. For Schema, select Edit as text and input the following
code:

1
2
3
name:string,
gender:string,
count:integer

This will establish the data types of the three columns in the table. Leave the other parameters as they
are, and select Create table.
3. Once you have created your table titled names_2014, it will appear in your explorer pane under the
dataset babynames that you created earlier.
Select the table to open it in your workspace. Here, you can check the table schema. Then, go to the
Preview tab to explore your data. The table should have three columns: name, gender, and count.

Query your custom table

Now that your table is set up, you’re ready to start writing queries and answering questions about this
data. For example, let’s say you were interested in the top five baby names for boys in the United States
in 2014.

Select COMPOSE NEW QUERY to start a new query for this table. Then, copy and paste this code:

1 SELECT
2 name,
3 count
4 FROM
5 `your project name.babynames.names_2014`
6 WHERE
7 gender = 'M'
8 ORDER BY
9 count DESC
10 LIMIT
11 5

NOTE: Making sure that your FROM statement is correct is key to making this query run! The database
needs the query to tell it the location of the table you just uploaded so that it can fetch the data. It’s like
giving the query a map to your table. That map will include your unique BigQuery project name, the
dataset name (“babynames”), and the table name (“names_2014”). The location names for each of these
elements are separated by periods. Then, wrap the whole location in backticks. The final result will be
something like this:

'loyal-glass-371423.babynames.names_2014.loyal-glass-371423'

Note that this is just an example of a project name; it will be swapped out with your project’s actual
name in your FROM statement.

This query selects the name and count columns from the names_2014 table. Using the WHERE clause,
you are filtering for a specific gender for your results. Then, you’re sorting how you want your results to
appear with ORDER BY. Because you are ordering by the count in descending order, you will get names
and the corresponding count from largest to smallest. Finally, LIMIT tells SQL to only return the top five
most popular names and their counts.

Once you have input this in your console, select RUN to get your query results.
Set up your data

1. Log in to BigQuery Sandbox. On the BigQuery page, click the Go to BigQuery button.

If you have a free-of-charge trial version of BigQuery, you can use that instead.

Note: BigQuery Sandbox frequently updates its user interface. The latest changes may not be reflected
in the screenshots presented in this activity, but the principles remain the same. Adapting to changes in
software updates is an essential skill for data analysts, and it’s helpful for you to practice
troubleshooting. You can also reach out to your community of learners on the discussion forum for help.

2. If you haven’t done so already, create a BigQuery project. (If you have a project, select it in the
Explorer pane.)

a. In the BigQuery console, select the dropdown list to the right of the Google Cloud logo to open the
Select a project dialog box.

b. In the Select a project dialog box, select the CREATE PROJECT button.

c. Give your project a name that will help you identify it later. This can be a unique project ID or use an
auto-generated one. You do not need to select an organization.

d. Select the CREATE button to create the project.

3. The three main sections of BigQuery are now onscreen: the BigQuery navigation menu; the Explorer
pane, which you can use to search for public datasets and open projects; and the Details pane, which
shows details of the database or dataset you’ve selected in the Explorer pane and displays windows for
you to enter queries.

Notice that you can use the <| symbol in the BigQuery navigation menu section to collapse it. There is a
similar symbol to collapse the Explorer pane.
Choose a dataset

Follow these steps to find and select the NYC Trees dataset for this activity:

1. In the Explorer pane, select the + ADD button.

2. In the Add box that pops up, scroll down the Additional sources list. Select Public Datasets.

3. A new box opens where you can search public datasets that are available through Google Cloud. In the
Search Marketplace text box, search for New York City Trees.
4. Select the search result NYC Street Trees, then select the View Dataset button.

Heading is NYC Street Trees. Link to City of New

York provided. Subheading is New York City Street Tree Census data. Select View dataset button to link
to dataset.
5. Google Cloud opens a new browser tab displaying BigQuery with the bigquery-public-data collection
open in the Explorer pane. To ensure the bigquery-public-data database remains in your project’s
Explorer pane, select the star next to the dataset.
6. The BigQuery Details pane contains information about the new_york_trees dataset. This information
includes the date the dataset was created, when it was last modified, and the Dataset ID.

The Details pane displaying the

new_york_trees data description including the dataset ID, when it was created, default table expiration,
when it was last modified, the data location, description, the default collation, the default rounding
mode, case insensitive, the labels, and the tags.
Choose a table

1. In the Explorer pane, select the arrow next to the new_york_trees dataset to display the tables it
contains.

Note: If the new_york_trees dataset is not in the Explorer pane, type new_york_trees into the Search
text box in the Explorer pane. (This will work if you have pinned bigquery-public-data in the Explorer
pane.) If search doesn’t return the needed results, follow the steps above to search for the
new_york_trees dataset.

2. Notice that the new_york_trees dataset contains three tree census tables from 1995, 2005, and 2015. It
also contains a table that lists the tree species.
In the Explorer pane, bigquery-
public-data is open and the new_york_trees dataset is expanded to show the tables in the dataset are:
tree_census_1995, tree_census_2005, tree_census_2015, and tree_species.
These are all tables contained in the dataset. Now, examine the data for all trees cataloged in New York
City for three specific years.

3. Select the tree_census_2005 table. BigQuery displays the table’s structure in the Details pane.

4. In the Details pane, select Query > In new tab to open a new query window.
5. Notice that BigQuery populates the Query Window with a SELECT query. This query is incomplete
because it doesn’t contain anything in between SELECT and FROM.

Query the data

This SELECT statement in the Details pane is incomplete because the columns to display have not been
specified. So, either list the columns separated by commas or use the asterisk to have BigQuery return all
columns in the table.

1. Type an asterisk * after the SELECT command in line one of the Query Editor. Your query should
now read SELECT * FROM followed by your table location. This command tells BigQuery to return all
of the columns in the tree_census_2005 table.
2. In the Query Editor, select the Run button to run the query. The results will be displayed as a table in
the Query Results pane below the Query Editor.

Results in the preview mode with columns with data populated including the row, object ID, cen_year,
tree_dbh, tree_loc, pit_type, soil_lvl, status, spc_latin, and spc_common.
This query returns all columns for the first 1,000 rows from the table. BigQuery returns only the first 1,000
rows because the SELECT query includes a LIMIT 1000 clause. This limits the rows returned to reduce
the processing time required.

3. Next, write a query to find out the average diameter of all NYC trees in 2005. On line 1, replace the *
after the SELECT command with AVG(tree_dbh). Select the Run button to execute the query.

This returns your answer, 12.833 (which means the average diameter of NYC trees in 2005 was 12.833
inches).

Write your own queries

Now, come up with some questions and answer them with your own SQL queries. For example, query
the 1995 and the 2015 tables to find the average diameter of trees. You can then compare the average
diameter of the trees in all three datasets to determine whether the trees in NYC have grown on average.
Note that the field name for tree diameter in the tree_census_1995 table is diameter.
Terms and definitions for Course 3, Module 3
Administrative metadata: Metadata that indicates the technical source of a digital asset

CSV (comma-separated values) file: A delimited text file that uses a comma to separate values

Data governance: A process for ensuring the formal management of a company’s data assets

Descriptive metadata: Metadata that describes a piece of data and can be used to identify it at a later point
in time

Foreign key: A field within a database table that is a primary key in another table (Refer to primary key)

FROM: The section of a query that indicates where the selected data comes from

Geolocation: The geographical location of a person or device by means of digital information

Metadata: Data about data

Metadata repository: A database created to store metadata

Naming conventions: Consistent guidelines that describe the content, creation date, and version of a file in
its name

Normalized database: A database in which only related data is stored in each table

Notebook: An interactive, editable programming environment for creating data reports and showcasing data
skills

Primary key: An identifier in a database that references a column in which each value is unique (Refer to
foreign key)

Redundancy: When the same piece of data is stored in two or more places

Schema: A way of describing how something, such as data, is organized

SELECT: The section of a query that indicates the subset of a dataset

Structural metadata: Metadata that indicates how a piece of data is organized and whether it is part of one
or more than one data collection

WHERE: The section of a query that specifies criteria that the requested data must meet

World Health Organization: An organization whose primary role is to direct and coordinate international
health within the United Nations system

Go to next item

Introduction To SDMX: SDMX Self-Learning Package No. 1 Student Book
No ratings yet
Introduction To SDMX: SDMX Self-Learning Package No. 1 Student Book
29 pages
The Interpretation of Dreams
No ratings yet
The Interpretation of Dreams
535 pages
Chapter 2
No ratings yet
Chapter 2
29 pages
Bigquery
No ratings yet
Bigquery
25 pages
Centera Foundation Student Resource Guide
No ratings yet
Centera Foundation Student Resource Guide
41 pages
GCP Fundamentals Getting Started With BigQuery
No ratings yet
GCP Fundamentals Getting Started With BigQuery
5 pages
2008 The Semantic Web - Semantics For Data and Services On The Web (PDFDrive)
No ratings yet
2008 The Semantic Web - Semantics For Data and Services On The Web (PDFDrive)
415 pages
BigQuery For Data Warehouse Practitioners - Solutions - Google Cloud
No ratings yet
BigQuery For Data Warehouse Practitioners - Solutions - Google Cloud
25 pages
A Text Book of Psychology
No ratings yet
A Text Book of Psychology
860 pages
Big Query
No ratings yet
Big Query
64 pages
BigQuery Cost Optimization + Best Practices
No ratings yet
BigQuery Cost Optimization + Best Practices
30 pages
The Science of Human Nature
No ratings yet
The Science of Human Nature
249 pages
BigQuery Lab
No ratings yet
BigQuery Lab
13 pages
Python Data Analysis - 2E (2017)
No ratings yet
Python Data Analysis - 2E (2017)
186 pages
M3u Playlst World Mixxx
No ratings yet
M3u Playlst World Mixxx
35 pages
Data Analysis
No ratings yet
Data Analysis
17 pages
JWT - Magazine May 2024
No ratings yet
JWT - Magazine May 2024
145 pages
Infs 422 Combine
No ratings yet
Infs 422 Combine
375 pages
Co - Operating System Administrator Guide
No ratings yet
Co - Operating System Administrator Guide
256 pages
BigQuery CheatSheet
No ratings yet
BigQuery CheatSheet
100 pages
T-GCPBDML-B - M3 - Big Data With BigQuery - ILT Slides
No ratings yet
T-GCPBDML-B - M3 - Big Data With BigQuery - ILT Slides
75 pages
Cyber Forensics Imp Questions
No ratings yet
Cyber Forensics Imp Questions
8 pages
Intro To BigQuery
No ratings yet
Intro To BigQuery
17 pages
Acdsee Ultimate 12 2019
No ratings yet
Acdsee Ultimate 12 2019
520 pages
Advanced SQL & Data Literacy Training
No ratings yet
Advanced SQL & Data Literacy Training
47 pages
Integrative Programming and Technologies: Chapter Two
0% (2)
Integrative Programming and Technologies: Chapter Two
16 pages
Ccsds File Delivery Protocol (CFDP) : Recommendation For Space Data System Standards
No ratings yet
Ccsds File Delivery Protocol (CFDP) : Recommendation For Space Data System Standards
151 pages
From Data To Insights Course Summary
No ratings yet
From Data To Insights Course Summary
67 pages
T-GCPBDML-B - M3 - Big Data With BigQuery - ILT Slides
No ratings yet
T-GCPBDML-B - M3 - Big Data With BigQuery - ILT Slides
73 pages
18 Bigsheets
No ratings yet
18 Bigsheets
29 pages
Google Bigquery
No ratings yet
Google Bigquery
10 pages
1070 ConfiguringAn (ODBC) ConnectionForGoogleBigQuery en H2L
No ratings yet
1070 ConfiguringAn (ODBC) ConnectionForGoogleBigQuery en H2L
7 pages
BigData - W1 - Practice - Data Acquisition - HoangVu
No ratings yet
BigData - W1 - Practice - Data Acquisition - HoangVu
50 pages
In The Matter of The Search of Information That Is Stored at Premises Controlled by Twitter Inc. Identified in Attachment A
No ratings yet
In The Matter of The Search of Information That Is Stored at Premises Controlled by Twitter Inc. Identified in Attachment A
207 pages
How To Set Up Big Query
No ratings yet
How To Set Up Big Query
8 pages
Topcat
No ratings yet
Topcat
29 pages
Beginners Guide To SQL
No ratings yet
Beginners Guide To SQL
32 pages
Lab 1 - Exploring A BigQuery Public Dataset
No ratings yet
Lab 1 - Exploring A BigQuery Public Dataset
19 pages
Vision-Based Real Estate Price Estimation
No ratings yet
Vision-Based Real Estate Price Estimation
9 pages
CDA C2 R 186 en File 18.en
No ratings yet
CDA C2 R 186 en File 18.en
3 pages
A Field Study On The Use of Process Mining of Event Logs As An Analytical Procedure in Auditing
No ratings yet
A Field Study On The Use of Process Mining of Event Logs As An Analytical Procedure in Auditing
24 pages
Research On Electronic Document Management System Based On Cloud Computing - 2021 - Tech Science Press
No ratings yet
Research On Electronic Document Management System Based On Cloud Computing - 2021 - Tech Science Press
10 pages
7creating A Dataset and Table, Then Loading Data Into The Table in BigQuery
No ratings yet
7creating A Dataset and Table, Then Loading Data Into The Table in BigQuery
13 pages
05 Data Warehouse Using Google Big Query
No ratings yet
05 Data Warehouse Using Google Big Query
6 pages
Data Analytics
No ratings yet
Data Analytics
5 pages
Project Name/Case Study: Second Year Practical Training Seminar Report
No ratings yet
Project Name/Case Study: Second Year Practical Training Seminar Report
38 pages
Unit-1 DM
No ratings yet
Unit-1 DM
16 pages
Big Query
No ratings yet
Big Query
11 pages
Unstructured Data: User Price Shipped
No ratings yet
Unstructured Data: User Price Shipped
14 pages
MLOps Google Cloud
No ratings yet
MLOps Google Cloud
37 pages
Finding IT Job Data Project
No ratings yet
Finding IT Job Data Project
3 pages
Oracle Multitenant 12c
No ratings yet
Oracle Multitenant 12c
32 pages
CDA C2 R 175 en File 54.en
No ratings yet
CDA C2 R 175 en File 54.en
4 pages
BQ Solutions-1
No ratings yet
BQ Solutions-1
19 pages
CDA C2 R 200 en File 22.en
No ratings yet
CDA C2 R 200 en File 22.en
7 pages
Import Datasets With Python
No ratings yet
Import Datasets With Python
8 pages
M2 Ingesting New Datasets Into BigQuery
No ratings yet
M2 Ingesting New Datasets Into BigQuery
12 pages
Poi White Paper Point Archival Gateway Web
No ratings yet
Poi White Paper Point Archival Gateway Web
13 pages
10joining Data From Two Different Publicly Available Datasets
No ratings yet
10joining Data From Two Different Publicly Available Datasets
13 pages
Rajimartin Google Bigquery
No ratings yet
Rajimartin Google Bigquery
1 page
7 BigData BigQuery Intelli
No ratings yet
7 BigData BigQuery Intelli
3 pages
C360 103 InstallationAndConfigurationGuide en
No ratings yet
C360 103 InstallationAndConfigurationGuide en
80 pages
Bigdata and Machine Learning Lab:: Adding A Previous Data Set or Existing Dataset
No ratings yet
Bigdata and Machine Learning Lab:: Adding A Previous Data Set or Existing Dataset
14 pages
HDF-EOS5 Data Model, File Format and Library
No ratings yet
HDF-EOS5 Data Model, File Format and Library
56 pages
Database Toolbox: User's Guide
No ratings yet
Database Toolbox: User's Guide
226 pages
Bigquery, Google'S Enterprise Data Warehouse: Slid02
No ratings yet
Bigquery, Google'S Enterprise Data Warehouse: Slid02
3 pages
Apache Hive: An Introduction
No ratings yet
Apache Hive: An Introduction
51 pages
Explore A Bigquery Public Dataset
No ratings yet
Explore A Bigquery Public Dataset
21 pages
Principles of Business Management
No ratings yet
Principles of Business Management
3 pages
Assay Importing and QAQC (ACQ201) : Assignment One
No ratings yet
Assay Importing and QAQC (ACQ201) : Assignment One
11 pages
Data Agility Eim
No ratings yet
Data Agility Eim
20 pages
BigQuery UI
No ratings yet
BigQuery UI
9 pages
Digital Asset Management Specialist Metadata in Greater New York New Jersey Resume Marisa Schweber
100% (1)
Digital Asset Management Specialist Metadata in Greater New York New Jersey Resume Marisa Schweber
2 pages
Key Takeaways - Guided Project - Select A New City Using BigQuery
No ratings yet
Key Takeaways - Guided Project - Select A New City Using BigQuery
2 pages
Qlik Google BigQuery Connector Installation and User Guide - 2
No ratings yet
Qlik Google BigQuery Connector Installation and User Guide - 2
7 pages
Big Query Help
No ratings yet
Big Query Help
4 pages
Jaq L Exercise 2
No ratings yet
Jaq L Exercise 2
10 pages
Data Storage Services in GCP: Relational Database Data Warehouse Nosql Big Data Database Service
No ratings yet
Data Storage Services in GCP: Relational Database Data Warehouse Nosql Big Data Database Service
15 pages
BQ BQ: BQ Command-Line Tool Bigquery Documentation
No ratings yet
BQ BQ: BQ Command-Line Tool Bigquery Documentation
7 pages
Google Bigquery: Sign in To Google Select A Project and Dataset
No ratings yet
Google Bigquery: Sign in To Google Select A Project and Dataset
1 page
Data Pump
No ratings yet
Data Pump
50 pages
03 - TB1300 - 01 - Course Project
No ratings yet
03 - TB1300 - 01 - Course Project
8 pages
The Free Hive Book
No ratings yet
The Free Hive Book
1 page
1 Adobe Certified Expert Exam Guide - de La Exam Preparation Pana La Check List
No ratings yet
1 Adobe Certified Expert Exam Guide - de La Exam Preparation Pana La Check List
4 pages
What Is Bigquery: Enterprise Data Warehouse
No ratings yet
What Is Bigquery: Enterprise Data Warehouse
2 pages
Microsoft 365 Access For Dummies
From Everand
Microsoft 365 Access For Dummies
Laurie A. Ulrich
No ratings yet
Data Science with R: Beginner to Expert
From Everand
Data Science with R: Beginner to Expert
Narayana Nemani
No ratings yet
Exploring Data with Access 2016
From Everand
Exploring Data with Access 2016
Larry Rockoff
No ratings yet
Jump into JMP Scripting, Second Edition
From Everand
Jump into JMP Scripting, Second Edition
Wendy Murphrey
No ratings yet
Tableau For Dummies
From Everand
Tableau For Dummies
Jack A. Hyman
3.5/5 (1)
Practice Questions for Tableau Desktop Specialist Certification Case Based
From Everand
Practice Questions for Tableau Desktop Specialist Certification Case Based
Exam OG
5/5 (1)
Tableau Training Manual 9.0 Basic Version: This Via Tableau Training Manual Was Created for Both New and Intermediate
From Everand
Tableau Training Manual 9.0 Basic Version: This Via Tableau Training Manual Was Created for Both New and Intermediate
Larry Keller
3/5 (1)
C# Interview Questions, Answers, and Explanations: C Sharp Certification Review
From Everand
C# Interview Questions, Answers, and Explanations: C Sharp Certification Review
equitypress
4.5/5 (3)
Tableau 8.2 Training Manual: From Clutter to Clarity
From Everand
Tableau 8.2 Training Manual: From Clutter to Clarity
Larry Keller
No ratings yet

Activity Overview - Course 3 Module 3 Google Data ANALYTICS

Uploaded by

Activity Overview - Course 3 Module 3 Google Data ANALYTICS

Uploaded by

Activity overview

What you will need

Link to baby names data: names.zip

Create a custom table

Step 1: Unzip the file

Step 2: Create a dataset

Query your custom table

d. Select the CREATE button to create the project.

1. In the Explorer pane, select the + ADD button.

Heading is NYC Street Trees. Link to City of New

The Details pane displaying the

Query the data

Write your own queries

Geolocation: The geographical location of a person or device by means of digital information

Metadata: Data about data

Metadata repository: A database created to store metadata

Schema: A way of describing how something, such as data, is organized

SELECT: The section of a query that indicates the subset of a dataset

You might also like