0% found this document useful (0 votes)
27 views

Note GG Data Analytics Course

Uploaded by

Khánh Giai
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

Note GG Data Analytics Course

Uploaded by

Khánh Giai
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 16

Google Data Analytics Coursera

(8 courses)
Data Analysis process:
1. Ask: business challenge, objective, or question
2. Prepare: data generation, collection, storage, and data management
3. Process: data cleaning and data integrity
4. Analyze: data exploration, visualization, and analysis
5. Share: communicating and interpreting results
6. Act: putting insights to work to solve the problem

COURSE 1: FOUNDATION: DATA, DATA EVERYWHERE


- What you will learn:
 Real-life roles and responsibilities of a junior data analyst
 How businesses transform data into actionable insights
 Spreadsheet basics
 Database and query basics
 Data visualization basics
- Skill sets you will build:
 Using data in everyday life
 Thinking analytically
 Applying tools from the data analytics toolkit
 Showing trends and patterns with data visualizations
 Ensuring your data analysis is fair

Module 1: Introducing Data Analytics and Analytical Thinking


Glossary Terms (terms & definitions for Course 1, module 1):
Analytical skills: Qualities and characteristics associated with using facts to solve
problems
Analytical thinking: The process of identifying and defining a problem, then solving it by
using data in an organized, step-by-step manner
Context: The condition in which something exists or happens
Data: A collection of facts
Data analysis: The collection, transformation, and organization of data in order to draw
conclusions, make predictions, and drive informed decision-making
Data analyst: Someone who collects, transforms, and organizes data in order to draw
conclusions, make predictions, and drive informed decision-making
Data analytics: The science of data
Data design: How information is organized
Data-driven decision-making: Using facts to guide business strategy
Data ecosystem: The various elements that interact with one another in order to produce,
manage, store, organize, analyze, and share data
Data science: A field of study that uses raw data to create new ways of modeling and
understanding the unknown
Data strategy: The management of the people, processes, and tools used in data analysis
Data visualization: The graphical representation of data
Dataset: A collection of data that can be manipulated or analyzed as one unit
Gap analysis: A method for examining and evaluating the current state of a process in
order to identify opportunities for improvement in the future
Root cause: The reason why a problem occurs
Technical mindset: The ability to break things down into smaller steps or pieces and work
with them in an orderly and logical way
Visualization: (Refer to data visualization)

Module 2: The Wonderful World of Data

Spreadsheets
Data analysts rely on spreadsheets to collect and organize data. Two popular spreadsheet
applications you will probably use a lot in your future role as a data analyst are Microsoft
Excel and Google Sheets.
Spreadsheets structure data in a meaningful way by letting you
 Collect, store, organize, and sort information
 Identify patterns and piece the data together in a way that works for each specific
data project
 Create excellent data visualizations, like graphs and charts
Databases and query languages
A database is a collection of structured data stored in a computer system. Some popular
Structured Query Language (SQL) programs include MySQL, Microsoft SQL Server, and
BigQuery.
Query languages
 Allow analysts to isolate specific information from a database(s)
 Make it easier for you to learn and understand the requests made to databases
 Allow analysts to select, create, add, or download data from a database for analysis
Visualization tools
Data analysts use a number of visualization tools, like graphs, maps, tables, charts, and
more. Two popular visualization tools are Tableau and Looker.
These tools
 Turn complex numbers into a story that people can understand
 Help stakeholders come up with conclusions that lead to informed decisions and
effective business strategies
 Have multiple features
- Tableau's simple drag-and-drop feature lets users create interactive graphs in
dashboards and
worksheets
- Looker communicates directly with a database, allowing you to connect your data
right to the visual
tool you choose
Module 3: Set Up Your Data Analytics Toolbox

Example of SQL query (with multiple columns and multiple fields):

Resources to learn more:


- SQL Tutorial: https://www.w3schools.com/sql/default.asp
- SQL Cheat Sheet: https://www.sqltutorial.org/sql-cheat-sheet/
- Tableau Tutorial: https://public.tableau.com/app/learn/how-to-videos
- RStudio Learning: https://posit.co/
- RStudio Cheat Sheets: https://posit.co/resources/cheatsheets/
- RStudio: https://posit.cloud/learn/recipes
- Excel Video Training: https://support.microsoft.com/en-us/office/excel-video-training-
9bc05390-e94c-46af-a5b3-d7c22f6990bb
Module 4: Become a Fair and Impactful Data Professional
Decoding the job description
The data analyst role is one of many job titles that contain the word “analyst.”

To name a few others that sound similar but may not be the same role:
 Business analyst—analyzes data to help businesses improve processes, products, or
services
 Data analytics consultant—analyzes the systems and models for using data
 Data engineer—prepares and integrates data from different sources for analytical use
 Data scientist—uses expert skills in technology and social science to find trends
through data analysis
 Data specialist—organizes or converts data for use in databases or software systems
 Operations analyst—analyzes data to assess the performance of business operations
and workflows

Data analysts, data scientists, and data specialists sound very similar but focus on different
tasks. As you start to browse job listings online, you might notice that companies’ job
descriptions seem to combine these roles or look for candidates who may have overlapping
skills. The fact that companies often blur the lines between them means that you should
take special care when reading the job descriptions and the skills required.

The table below illustrates some of the overlap and distinctions between them:

Job specializations by industry


We learned that the data specialist role concentrates on in-depth knowledge of databases.
In similar fashion, other specialist roles for data analysts can focus on in-depth knowledge
of specific industries. For example, in a job as a business analyst you might wear some
different hats than in a more general position as a data analyst. As a business analyst, you
would likely collaborate with managers, share your data findings, and maybe explain how a
small change in the company’s project management system could save the company 3%
each quarter. Although you would still be working with data all the time, you would focus on
using the data to improve business operations, efficiencies, or the bottom line.

Other industry-specific specialist positions that you might come across in your data analyst
job search include:
 Marketing analyst—analyzes market conditions to assess the potential sales of
products and services
 HR/payroll analyst—analyzes payroll data for inefficiencies and errors
 Financial analyst—analyzes financial status by collecting, monitoring, and reviewing
data
 Risk analyst—analyzes financial documents, economic conditions, and client data to
help companies determine the level of risk involved in making a particular business
decision
 Healthcare analyst—analyzes medical data to improve the business aspect of
hospitals and medical facilities

COURSE 2: ASK QUESTIONS TO MAKE DATA-DRIVEN DECISIONS


- What you will learn:
 How data analysts solve problems with data
 The use of analytics for making data-driven decisions
 Spreadsheet formulas and functions
 Dashboard basics, including an introduction to Tableau
 Data reporting basics
- Skill sets you will build:
 Asking SMART and effective questions
 Structuring how you think
 Summarizing data
 Putting things into context
 Managing team and stakeholder expectations
 Problem-solving and conflict-resolution

Module 1: Ask Effective Questions

Glossary terms (terms and definitions for Course 2, Module 1)

Action-oriented question: A question whose answers lead to change


Cloud: A place to keep data online, rather than a computer hard drive
Data analysis process: The six phases of ask, prepare, process, analyze, share, and act
whose purpose is to gain insights that drive informed decision-making
Data life cycle: The sequence of stages that data experiences, which include plan,
capture, manage, analyze, archive, and destroy
Leading question: A question that steers people toward a certain response
Measurable question: A question whose answers can be quantified and assessed
Problem types: The various problems that data analysts encounter, including categorizing
things, discovering connections, finding patterns, identifying themes, making predictions,
and spotting something unusual
Relevant question: A question that has significance to the problem to be solved
SMART methodology: A tool for determining a question’s effectiveness based on whether
it is specific, measurable, action-oriented, relevant, and time-bound
Specific question: A question that is simple, significant, and focused on a single topic or a
few closely related ideas
Structured thinking: The process of recognizing the current problem or situation,
organizing available information, revealing gaps and opportunities, and identifying options
Time-bound question: A question that specifies a timeframe to be studied
Unfair question: A question that makes assumptions or is difficult to answer honestly
Module 2: Make Data-driven Decisions

Module 3: Spreadsheets Magic


Glossary terms (terms and definitions for Course 2, Module 3)

AVERAGE: A spreadsheet function that returns an average of the values from a selected
range
Borders: Lines that can be added around two or more cells on a spreadsheet
Cell reference: A cell or a range of cells in a worksheet typically used in formulas and
functions
COUNT: A spreadsheet function that counts the number of cells in a range that meet a
specific criteria
Equation: A calculation that involves addition, subtraction, multiplication, or division (also
called a math expression)
Fill handle: A box in the lower-right-hand corner of a selected spreadsheet cell that can be
dragged through neighboring cells in order to continue an instruction
Filtering: The process of showing only the data that meets a specified criteria while hiding
the rest
Header: The first row in a spreadsheet that labels the type of data in each column
Math expression: A calculation that involves addition, subtraction, multiplication, or
division (also called an equation)
Math function: A function that is used as part of a mathematical formula
MAX: A spreadsheet function that returns the largest numeric value from a range of cells
MIN: A spreadsheet function that returns the smallest numeric value from a range of cells
Open data: Data that is available to the public
Operator: A symbol that names the operation or calculation to be performed
Order of operations: Using parentheses to group together spreadsheet values in order to
clarify the order in which operations should be performed
Problem domain: The area of analysis that encompasses every activity affecting or
affected by a problem
Range: A collection of two or more cells in a spreadsheet
Report: A static collection of data periodically given to stakeholders
Return on investment (ROI): A formula that uses the metrics of investment and profit to
evaluate the success of an investment
Revenue: The total amount of income generated by the sale of goods or services
Scope of work (SOW): An agreed-upon outline of the tasks to be performed during a
project
Sorting: The process of arranging data into a meaningful order to make it easier to
understand, analyze, and visualize
SUM: A spreadsheet function that adds the values of a selected range of cells
Module 4: Always Remember the Stakeholders
Course 3: Prepare Data for Exploration
- What you will learn:
 How data is generated
 Features of different data types, fields, and values
 Database structures
 The function of metadata in data analytics
 Structured Query Language (SQL) functions
- Skill sets you will build:
 Ensuring ethical data analysis practices
 Addressing issues of bias and credibility
 Accessing databases and importing data
 Writing simple queries
 Organizing and protecting data
 Connecting with the data community (optional)

Module 1: Data Types and Structures

Module 2: Data Responsibility


Module 3: Database Essentials

Module 4: Organize and Protect Data

Module 5: Engage in Data Community


Course 4: Process Data from Dirty to Clean
- What you will learn:
 Data integrity and the importance of clean data
 The tools and processes used by data analysts to clean data
 Data-cleaning verification and reports
 Statistics, hypothesis testing, and margin of error
 Resume building and interpretation of job postings (optional)
- Skill sets you will build:
 Connecting business objectives to data analysis
 Identifying clean and dirty data
 Cleaning small datasets using spreadsheet tools
 Cleaning large datasets by writing SQL queries
 Documenting data-cleaning processes
Course 5: Analyze Data to Answer Questions
- What you will learn:
 Steps data analysts take to organize data
 How to combine data from multiple sources
 Spreadsheet calculations and pivot tables
 SQL calculations
 Temporary tables
 Data validation
- Skill sets you will build:
 Sorting data in spreadsheets and by writing SQL queries
 Filtering data in spreadsheets and by writing SQL queries
 Converting data
 Formatting data
 Substantiating data analysis processes
 Seeking feedback and support from others during data analysis

Objective

The objective of this query is to aggregate the data into a table containing each
warehouse's ID, state and alias, and number of orders; as well as the grand total of orders
for all warehouses combined; and finally a column that classifies each warehouse by the
percentage of grand total orders that it fulfilled: 0–20%, 21-60%, or > 60%.

Note: This activity breaks out the steps into manageable chunks. The final query is only
intended to be run at the end. If you try to run the query before reaching the end of this
guide you will likely get an error.

Example: Combine and alias the tables

As a refresher, aliasing is when you temporarily name a table or column in your query to
make it easier to read and write. To alias the warehouse and orders tables and join the
tables, follow these steps. Remember, these statements require that you enter your unique
individual project name or else they won't run. Be sure to substitute your project name in
the code wherever you encounter your-project written. If you haven't explicitly assigned a
project name, BigQuery generates one for you automatically. It typically looks like two
words and a number, each separated by a hyphen, for example august-west-100777.

Begin with the FROM statement a few rows down. Later, you'll return to the top of the
query to fill it in.

1. In row 3, enter FROM your-project.warehouse_orders.warehouse AS


Warehouse
2. In row 4, enter LEFT JOIN your-project.warehouse_orders.orders AS Orders
3. In row 5, enter ON Orders.warehouse_id = Warehouse.warehouse_id
These statements will combine the two tables (warehouse and orders) using
warehouse_id as the common key (the column shared by both tables).

Example: Organize your new table


Use the GROUP BY clause in SQL to group rows that have the same values in specified
columns into aggregated data, such as sum, count, average, maximum, or minimum, based
on the values in another column. This operation is particularly useful in databases where
there is a need to analyze data based on certain criteria.

1. In row 6, enter GROUP BY


2. In row 7, enter Warehouse.warehouse_id,
3. In row 8, enter warehouse_name
Here, the combined table is grouped first by the warehouse ID and then by its name.

Example: Build subquery logic

Now that you have the FROM statement and JOIN, go back up to the first lines and define
the rows to select and operations to perform on them. From the objective, you know you
want to return five columns: each warehouse's ID (warehouse_id—column 1), state and
alias (this info will be combined into a single column: warehouse_name— column 2), and
number of orders (number_of_orders—column 3); as well as the grand total of orders for
all warehouses combined (total_orders—column 4); and finally a column that classifies
each warehouse by the percentage of grand total orders that it fulfilled: 0–20%, 21-60%, or
> 60% (fulfillment_summary—column 5).

Above everything you've written so far, write:

1. In row 1, enter SELECT


2. In row 2, enter Warehouse.warehouse_id, # (This is the first column.)
3. In row 3, enter CONCAT(Warehouse.state, ': ', Warehouse.warehouse_alias)
AS warehouse_name, # (This is the second column. Notice you're concatenating
two existing columns into a new one)
4. In row 4, enter COUNT(Orders.order_id) AS number_of_orders, # (This is the
third column.)
5. In row 5, enter (SELECT COUNT(*) FROM your-
project.warehouse_orders.orders AS Orders) AS total_orders, # (This is the
fourth column.)
To create the final column, you'll need to use a special keyword.

Example: Create categories using CASE

Use the CASE keyword in SQL to create categories or group data based on specific
conditions. This is valuable when dealing with numerical or textual data that needs to be
segmented into different groups or categories for analysis, reporting, or visualization
purposes.

For the final column, you'll use CASE to define which label to apply to each warehouse's
fulfillment percentage (the percentage of the grand total of orders that it fulfilled). There
will be three conditions, and thus three possible labels: "Fulfilled 0–20% of Orders",
"Fulfilled 21–60% of Orders", or "Fulfilled more than 60% of Orders".

1. In row 6, enter CASE


2. In row 7, enter WHEN COUNT(Orders.order_id)/(SELECT COUNT(*) FROM your-
project.warehouse_orders.orders AS Orders) <= 0.20 # (This defines the first
possible condition.)
3. In row 8, enter THEN 'Fulfilled 0-20% of Orders' # (THEN defines the label to
apply when the first condition is true.)
4. In row 9, enter WHEN COUNT(Orders.order_id)/(SELECT COUNT(*) FROM your-
project.warehouse_orders.orders AS Orders) > 0.20 # (This is the first part of
the second condition.)
5. In row 10, enter AND COUNT(Orders.order_id)/(SELECT COUNT(*) FROM your-
project.warehouse_orders.orders AS Orders) <= 0.60 # (This is the second
part of the second condition.)
6. In row 11, enter THEN 'Fulfilled 21-60% of Orders' # (This defines the label to
apply when the second condition is true.)
7. In row 12, enter ELSE 'Fulfilled more than 60% of Orders' # (This defines the
label to apply when neither of the first two conditions is true.)
8. In row 13, enter END AS fulfillment_summary # (The END keyword terminates the
CASE declaration. Then the AS keyword indicates what the resulting column should
be named.)
Example: Filter using HAVING
Use the HAVING clause in SQL in combination with the GROUP BY clause to filter the
results of aggregate functions in a query. While the WHERE clause filters individual rows
before they are grouped, the HAVING clause filters groups of rows after they have been
grouped. To filter out the warehouses that are currently being built (and therefore have no
orders), enter the following lines below everything you've written so far:

1. In row 20, enter HAVING


2. In row 21, enter COUNT(Orders.order_id) > 0
Here is the final query:

SELECT
Warehouse.warehouse_id,
CONCAT(Warehouse.state, ': ', Warehouse.warehouse_alias) AS warehouse_name,
COUNT(Orders.order_id) AS number_of_orders,
(SELECT COUNT(*) FROM your-project.warehouse_orders.orders AS Orders) AS total
_orders,
CASE
WHEN COUNT(Orders.order_id)/(SELECT COUNT(*) FROM your-
project.warehouse_orders.orders AS Orders) <= 0.20
THEN 'Fulfilled 0-20% of Orders'
WHEN COUNT(Orders.order_id)/(SELECT COUNT(*) FROM your-
project.warehouse_orders.orders AS Orders) > 0.20
AND COUNT(Orders.order_id)/(SELECT COUNT(*) FROM your-
project.warehouse_orders.orders AS Orders) <= 0.60
THEN 'Fulfilled 21-60% of Orders'
ELSE 'Fulfilled more than 60% of Orders'
END AS fulfillment_summary
FROM your-project.warehouse_orders.warehouse AS Warehouse
LEFT JOIN your-project.warehouse_orders.orders AS Orders
ON Orders.warehouse_id = Warehouse.warehouse_id
GROUP BY
Warehouse.warehouse_id,
warehouse_name
HAVING
COUNT(Orders.order_id) > 0
Course 6: Share Data Throughout the Art of Visualization
- What you will learn:
 Design thinking
 How data analysts use visualizations to communicate about data
 The benefits of Tableau for presenting data analysis findings
 Data-driven storytelling
 Dashboards and dashboard filters
 Strategies for creating an effective data presentation
- Skill sets you will build:
 Creating visualizations and dashboards in Tableau
 Addressing accessibility issues when communicating about data
 Understanding the purpose of different business communication tools
 Telling a data-driven story
 Presenting to others about data
 Answering questions about data
Course 7: Data Analysis with R Programming
- What you will learn:
 Steps data analysts take to organize data
 How to combine data from multiple sources
 Spreadsheet calculations and pivot tables
 SQL calculations
 Temporary tables
 Data validation
- Skill sets you will build:
 Sorting data in spreadsheets and by writing SQL queries
 Filtering data in spreadsheets and by writing SQL queries
 Converting data
 Formatting data
 Substantiating data analysis processes
 Seeking feedback and support from others during data analysis
Course 8: Data Analytics Capstone
- What you will learn:
 How a data analytics portfolio distinguishes you from other candidates
 Practical, real-world problem-solving
 Strategies for extracting insights from data
 Clear presentation of data findings
 Motivation and ability to take initiative
- Skill sets you will build:
 Building a portfolio
 Increasing your employability
 Showcasing your data analytics knowledge, skill, and technical expertise
 Sharing your work during an interview
 Communicating your unique value proposition to a potential employer

You might also like