Note GG Data Analytics Course
Note GG Data Analytics Course
(8 courses)
Data Analysis process:
1. Ask: business challenge, objective, or question
2. Prepare: data generation, collection, storage, and data management
3. Process: data cleaning and data integrity
4. Analyze: data exploration, visualization, and analysis
5. Share: communicating and interpreting results
6. Act: putting insights to work to solve the problem
Spreadsheets
Data analysts rely on spreadsheets to collect and organize data. Two popular spreadsheet
applications you will probably use a lot in your future role as a data analyst are Microsoft
Excel and Google Sheets.
Spreadsheets structure data in a meaningful way by letting you
Collect, store, organize, and sort information
Identify patterns and piece the data together in a way that works for each specific
data project
Create excellent data visualizations, like graphs and charts
Databases and query languages
A database is a collection of structured data stored in a computer system. Some popular
Structured Query Language (SQL) programs include MySQL, Microsoft SQL Server, and
BigQuery.
Query languages
Allow analysts to isolate specific information from a database(s)
Make it easier for you to learn and understand the requests made to databases
Allow analysts to select, create, add, or download data from a database for analysis
Visualization tools
Data analysts use a number of visualization tools, like graphs, maps, tables, charts, and
more. Two popular visualization tools are Tableau and Looker.
These tools
Turn complex numbers into a story that people can understand
Help stakeholders come up with conclusions that lead to informed decisions and
effective business strategies
Have multiple features
- Tableau's simple drag-and-drop feature lets users create interactive graphs in
dashboards and
worksheets
- Looker communicates directly with a database, allowing you to connect your data
right to the visual
tool you choose
Module 3: Set Up Your Data Analytics Toolbox
To name a few others that sound similar but may not be the same role:
Business analyst—analyzes data to help businesses improve processes, products, or
services
Data analytics consultant—analyzes the systems and models for using data
Data engineer—prepares and integrates data from different sources for analytical use
Data scientist—uses expert skills in technology and social science to find trends
through data analysis
Data specialist—organizes or converts data for use in databases or software systems
Operations analyst—analyzes data to assess the performance of business operations
and workflows
Data analysts, data scientists, and data specialists sound very similar but focus on different
tasks. As you start to browse job listings online, you might notice that companies’ job
descriptions seem to combine these roles or look for candidates who may have overlapping
skills. The fact that companies often blur the lines between them means that you should
take special care when reading the job descriptions and the skills required.
The table below illustrates some of the overlap and distinctions between them:
Other industry-specific specialist positions that you might come across in your data analyst
job search include:
Marketing analyst—analyzes market conditions to assess the potential sales of
products and services
HR/payroll analyst—analyzes payroll data for inefficiencies and errors
Financial analyst—analyzes financial status by collecting, monitoring, and reviewing
data
Risk analyst—analyzes financial documents, economic conditions, and client data to
help companies determine the level of risk involved in making a particular business
decision
Healthcare analyst—analyzes medical data to improve the business aspect of
hospitals and medical facilities
AVERAGE: A spreadsheet function that returns an average of the values from a selected
range
Borders: Lines that can be added around two or more cells on a spreadsheet
Cell reference: A cell or a range of cells in a worksheet typically used in formulas and
functions
COUNT: A spreadsheet function that counts the number of cells in a range that meet a
specific criteria
Equation: A calculation that involves addition, subtraction, multiplication, or division (also
called a math expression)
Fill handle: A box in the lower-right-hand corner of a selected spreadsheet cell that can be
dragged through neighboring cells in order to continue an instruction
Filtering: The process of showing only the data that meets a specified criteria while hiding
the rest
Header: The first row in a spreadsheet that labels the type of data in each column
Math expression: A calculation that involves addition, subtraction, multiplication, or
division (also called an equation)
Math function: A function that is used as part of a mathematical formula
MAX: A spreadsheet function that returns the largest numeric value from a range of cells
MIN: A spreadsheet function that returns the smallest numeric value from a range of cells
Open data: Data that is available to the public
Operator: A symbol that names the operation or calculation to be performed
Order of operations: Using parentheses to group together spreadsheet values in order to
clarify the order in which operations should be performed
Problem domain: The area of analysis that encompasses every activity affecting or
affected by a problem
Range: A collection of two or more cells in a spreadsheet
Report: A static collection of data periodically given to stakeholders
Return on investment (ROI): A formula that uses the metrics of investment and profit to
evaluate the success of an investment
Revenue: The total amount of income generated by the sale of goods or services
Scope of work (SOW): An agreed-upon outline of the tasks to be performed during a
project
Sorting: The process of arranging data into a meaningful order to make it easier to
understand, analyze, and visualize
SUM: A spreadsheet function that adds the values of a selected range of cells
Module 4: Always Remember the Stakeholders
Course 3: Prepare Data for Exploration
- What you will learn:
How data is generated
Features of different data types, fields, and values
Database structures
The function of metadata in data analytics
Structured Query Language (SQL) functions
- Skill sets you will build:
Ensuring ethical data analysis practices
Addressing issues of bias and credibility
Accessing databases and importing data
Writing simple queries
Organizing and protecting data
Connecting with the data community (optional)
Objective
The objective of this query is to aggregate the data into a table containing each
warehouse's ID, state and alias, and number of orders; as well as the grand total of orders
for all warehouses combined; and finally a column that classifies each warehouse by the
percentage of grand total orders that it fulfilled: 0–20%, 21-60%, or > 60%.
Note: This activity breaks out the steps into manageable chunks. The final query is only
intended to be run at the end. If you try to run the query before reaching the end of this
guide you will likely get an error.
As a refresher, aliasing is when you temporarily name a table or column in your query to
make it easier to read and write. To alias the warehouse and orders tables and join the
tables, follow these steps. Remember, these statements require that you enter your unique
individual project name or else they won't run. Be sure to substitute your project name in
the code wherever you encounter your-project written. If you haven't explicitly assigned a
project name, BigQuery generates one for you automatically. It typically looks like two
words and a number, each separated by a hyphen, for example august-west-100777.
Begin with the FROM statement a few rows down. Later, you'll return to the top of the
query to fill it in.
Now that you have the FROM statement and JOIN, go back up to the first lines and define
the rows to select and operations to perform on them. From the objective, you know you
want to return five columns: each warehouse's ID (warehouse_id—column 1), state and
alias (this info will be combined into a single column: warehouse_name— column 2), and
number of orders (number_of_orders—column 3); as well as the grand total of orders for
all warehouses combined (total_orders—column 4); and finally a column that classifies
each warehouse by the percentage of grand total orders that it fulfilled: 0–20%, 21-60%, or
> 60% (fulfillment_summary—column 5).
Use the CASE keyword in SQL to create categories or group data based on specific
conditions. This is valuable when dealing with numerical or textual data that needs to be
segmented into different groups or categories for analysis, reporting, or visualization
purposes.
For the final column, you'll use CASE to define which label to apply to each warehouse's
fulfillment percentage (the percentage of the grand total of orders that it fulfilled). There
will be three conditions, and thus three possible labels: "Fulfilled 0–20% of Orders",
"Fulfilled 21–60% of Orders", or "Fulfilled more than 60% of Orders".
SELECT
Warehouse.warehouse_id,
CONCAT(Warehouse.state, ': ', Warehouse.warehouse_alias) AS warehouse_name,
COUNT(Orders.order_id) AS number_of_orders,
(SELECT COUNT(*) FROM your-project.warehouse_orders.orders AS Orders) AS total
_orders,
CASE
WHEN COUNT(Orders.order_id)/(SELECT COUNT(*) FROM your-
project.warehouse_orders.orders AS Orders) <= 0.20
THEN 'Fulfilled 0-20% of Orders'
WHEN COUNT(Orders.order_id)/(SELECT COUNT(*) FROM your-
project.warehouse_orders.orders AS Orders) > 0.20
AND COUNT(Orders.order_id)/(SELECT COUNT(*) FROM your-
project.warehouse_orders.orders AS Orders) <= 0.60
THEN 'Fulfilled 21-60% of Orders'
ELSE 'Fulfilled more than 60% of Orders'
END AS fulfillment_summary
FROM your-project.warehouse_orders.warehouse AS Warehouse
LEFT JOIN your-project.warehouse_orders.orders AS Orders
ON Orders.warehouse_id = Warehouse.warehouse_id
GROUP BY
Warehouse.warehouse_id,
warehouse_name
HAVING
COUNT(Orders.order_id) > 0
Course 6: Share Data Throughout the Art of Visualization
- What you will learn:
Design thinking
How data analysts use visualizations to communicate about data
The benefits of Tableau for presenting data analysis findings
Data-driven storytelling
Dashboards and dashboard filters
Strategies for creating an effective data presentation
- Skill sets you will build:
Creating visualizations and dashboards in Tableau
Addressing accessibility issues when communicating about data
Understanding the purpose of different business communication tools
Telling a data-driven story
Presenting to others about data
Answering questions about data
Course 7: Data Analysis with R Programming
- What you will learn:
Steps data analysts take to organize data
How to combine data from multiple sources
Spreadsheet calculations and pivot tables
SQL calculations
Temporary tables
Data validation
- Skill sets you will build:
Sorting data in spreadsheets and by writing SQL queries
Filtering data in spreadsheets and by writing SQL queries
Converting data
Formatting data
Substantiating data analysis processes
Seeking feedback and support from others during data analysis
Course 8: Data Analytics Capstone
- What you will learn:
How a data analytics portfolio distinguishes you from other candidates
Practical, real-world problem-solving
Strategies for extracting insights from data
Clear presentation of data findings
Motivation and ability to take initiative
- Skill sets you will build:
Building a portfolio
Increasing your employability
Showcasing your data analytics knowledge, skill, and technical expertise
Sharing your work during an interview
Communicating your unique value proposition to a potential employer