0% found this document useful (0 votes)
14 views7 pages

Da 001

The Google Data Analytics Certificate outlines a six-phase data analysis process: Ask, Prepare, Process, Analyze, Share, and Act, which is essential for addressing business challenges. It emphasizes the importance of analytical skills, structured thinking, and the integration of data-driven decision-making with gut instinct. Understanding the data life cycle and employing appropriate statistical methods are crucial for effective data analysis and achieving business objectives.

Uploaded by

olly
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views7 pages

Da 001

The Google Data Analytics Certificate outlines a six-phase data analysis process: Ask, Prepare, Process, Analyze, Share, and Act, which is essential for addressing business challenges. It emphasizes the importance of analytical skills, structured thinking, and the integration of data-driven decision-making with gut instinct. Understanding the data life cycle and employing appropriate statistical methods are crucial for effective data analysis and achieving business objectives.

Uploaded by

olly
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

The process presented as part of the Google Data Analytics Certificate is one that will be valuable to you as you

keep moving forward in your career:

 Ask: business challenge, objective, or question


 Prepare: data generation, collection, storage, and data management
 Process: data cleaning and data integrity
 Analyze: data exploration, visualization, and analysis
 Share: communicating and interpreting results
 Act: putting insights to work to solve the problem

The six phases of the data analysis process help answer business challenges, such as understanding how to
improve a retirement program. Additionally, iterating on and reviewing your work throughout the data analysis
process is critical for obtaining quality results.

Now, data science, the discipline of making data useful, is an umbrella term that encompasses three disciplines:
machine learning, statistics, and analytics.

It’s important to include insights from subject-matter experts because they are familiar with the business problem
and can review analysis results and help identify inconsistencies. Plus, their experience and human intuition are
valuable to data-driven decision-making.

Gut instinct is an intuitive understanding of something with little or no explanation. This isn’t always something
conscious; we often pick up on signals without even realizing. You just have a “feeling” it’s right.

In addition, try asking yourself these questions about a project to help find the perfect balance:
 What kind of results are needed?
 Who will be informed?
 Am I answering the question being asked?
 How quickly does a decision need to be made?

Data analysts and detectives share a similar approach to problem-solving, both relying on evidence and facts to
make decisions. Data-driven decision-making is essential for analysts, but gut instinct can also play a role in
identifying patterns and connections. Balancing data and gut instinct is crucial for making informed decisions, and
the right mix depends on the project's goals and time constraints.

Analytical skills are qualities and characteristics associated with solving problems using facts. There are a lot of
aspects to analytical skills, but, we'll focus on five essential points.
They are curiosity, understanding context, having technical mindset, data design, and data strategy.

Understanding concept
The analytical skill that has to do with how you group things into categories

A technical mindset involves the ability to break things down into smaller steps or pieces and work with them in
an orderly and logical way.

The analytical skill that involves breaking processes down into smaller steps and working with them in an orderly,
logical way

Data design is how you organize information. As a data analyst, design typically has to do with an actual
database.

The analytical skill that involves how you organize information

Data strategy is the management of the people, processes, and tools used in data analysis

exploratory data analysis (EDA):


 Identify key factors that contribute to a movie's opening weekend success.
 Understand the relationship between a movie's budget and its revenue.
 Determine which genres are most successful.

Analysts determine context by looking for patterns or anomalies in a dataset.


having a technical mindset means approaching problems (and datasets) in a systematic and logical manner.

Data strategy is the management of the people, processes, and tools used in data analysis.

Your inherent analytical skills are essential for conducting data analysis and will be even more critical when you
combine them with the tools and techniques from this program. Understanding how to use these skills in
business scenarios is the first step toward developing them further and using them effectively in your career.

Analytical skills
The qualities and characteristics associated with solving problems using facts

Analytical thinking involves identifying and defining a problem and then solving it by using data in an organized,
step-by-step manner.

The 5 key aspects are visualization, strategy, problem-orientation, correlation, and using big-picture and detail-
oriented thinking.

To execute a plan using detail-oriented thinking, a data analyst considers the specifics.

The five whys is a powerful tool for root cause analysis. It’s simple, effective, and a great way to collaborate with
colleagues and learn about other areas of the business. Plus, the five whys can be used to analyze problems in
any industry, helping organizations of all kinds identify and fix business problems. As a data professional, you can
turn to the five whys whenever you feel stumped by a problem and need to approach it from a different
perspective.

Understanding the importance of the data life cycle will set you up for success as a data analyst. Individual
stages in the data life cycle will vary from company to company or by industry or sector. Historical data is
important to both the U.S. Fish and Wildlife Service and the USGS, so their data life cycle focuses on archiving
and backing up data. Harvard's interests are in research and teaching, so its data life cycle includes visualization
and interpretation even though these are more often associated with a data analysis life cycle. The HBS data life
cycle also doesn't call out a stage for purging or destroying data. In contrast, the data life cycle for finance clearly
identifies archive and purge stages. To sum it up, although data life cycles vary, one data management principle
is universal: Govern how data is handled so that it is accurate, secure, and available to meet your organization's
needs.

While the data analysis process will drive your projects and help you reach your business goals, you must
understand the life cycle of your data in order to use that process. To analyze your data well, you need to have a
thorough understanding of it. Similarly, you can collect all the data you want, but the data is only useful to you if
you have a plan for analyzing it.

The Plan and Ask phases both involve planning and asking questions, but they tackle different subjects. The Ask
phase in the data analysis process focuses on big-picture strategic thinking about business goals. However, the
Plan phase focuses on the fundamentals of the project, such as what data you have access to, what data you
need, and where you’re going to get it.

A formula is a set of instructions, whereas a function is a preset command. Formulas perform a specific
calculation. Functions are preset commands that automatically perform a process or task, making it more
efficient.

In a table, an attribute is a characteristic or quality of data used to label a column.

SQL

 Use SELECT to choose the columns you want to return.


Specifies the columns from which to retrieve data

 Use FROM to choose the tables where the columns you want are located.
Specifies the table from which to retrieve data

 Use WHERE to filter for certain information.


Specifies criteria that the data must meet
 An issue is a topic or subject to investigate.
 A question is designed to discover information and
 A problem is an obstacle or complication that needs to be worked out.

A business task is the question or problem data analysis answers for business.

002

Structured thinking is the process of recognizing the current problem or situation, organizing available
information, revealing gaps and opportunities, and identifying the options.

(1) Making predictions


A company that wants to know the best advertising method to bring in new customers is an example of a problem
requiring analysts to make predictions. Analysts with data on location, type of media, and number of new
customers acquired as a result of past ads can't guarantee future results, but they can help predict the best
placement of advertising to reach the target audience.

(2) Categorizing things


An example of a problem requiring analysts to categorize things is a company's goal to improve customer
satisfaction. Analysts might classify customer service calls based on certain keywords or scores. This could help
identify top-performing customer service representatives or help correlate certain actions taken with higher
customer satisfaction scores.

(3) Spotting something unusual


A company that sells smart watches that help people monitor their health would be interested in designing their
software to spot something unusual. Analysts who have analyzed aggregated health data can help product
developers determine the right algorithms to spot and set off alarms when certain data doesn't trend normally.

(4) Identifying themes


User experience (UX) designers might rely on analysts to analyze user interaction data. Similar to problems that
require analysts to categorize things, usability improvement projects might require analysts to identify themes to
help prioritize the right product features for improvement. Themes are most often used to help researchers
explore certain aspects of data. In a user study, user beliefs, practices, and needs are examples of themes.
By now you might be wondering if there is a difference between categorizing things and identifying themes. The
best way to think about it is: categorizing things involves assigning items to categories; identifying themes takes
those categories a step further by grouping them into broader themes.

(5) Discovering connections


A third-party logistics company working with another company to get shipments delivered to customers on time is
a problem requiring analysts to discover connections. By analyzing the wait times at shipping hubs, analysts can
determine the appropriate schedule changes to increase the number of on-time deliveries.

(6) Finding patterns


Minimizing downtime caused by machine failure is an example of a problem requiring analysts to find patterns in
data. For example, by analyzing maintenance data, they might discover that most failures happen if regular
maintenance is delayed by more than a 15-day window.

SMART questions:
Specific: Is the question specific? Does it address the problem? Does it have context? Will it uncover a lot of the
information you need?
Measurable: Will the question give you answers that you can measure?
Action-oriented: Will the answers provide information that helps you devise some type of plan?
Relevant: Is the question about the particular problem you are trying to solve?
Time-bound: Are the answers relevant to the specific time being studied?

Here are a few questions you might want to ask:


 When is the project due?
 Are there any specific challenges to keep in mind?
 Who are the major stakeholders for this project, and what do they expect this project to do for them?
 Who am I presenting the results to?

Here are some examples of questions you might ask based on the suggested topics:
 Objectives: What are the goals of the deep dive? What, if any, questions are expected to be answered
by this deep dive?
 Audience: Who are the stakeholders? Who is interested or concerned about the results of this deep
dive? Who is the audience for the presentation?
 Time: What is the time frame for completion? By what date does this need to be done?
 Resources: What resources are available to accomplish the deep dive's goals?
 Security: Who should have access to the information?

Structured thinking is the process of recognizing the current problem or situation, organizing available
information, revealing gaps and opportunities, and identifying the options.

Data constraint Definition Examples


Values must be of a certain
If the data type is a date, a single number like 30 would fail the
Data type type: date, number,
constraint and be invalid
percentage, Boolean, etc.
Values must fall between
If the data range is 10-20, a value of 30 would fail the constraint
Data range predefined maximum and
and be invalid
minimum values
Values can’t be left blank or
Mandatory If age is mandatory, that value must be filled in
empty
Values can’t have a Two people can’t have the same mobile phone number within
Unique
duplicate the same service area
Regular expression Values must match a A phone number must match ###-###-#### (no other
(regex) patterns prescribed pattern characters allowed)
Certain conditions for
Values are percentages and values from multiple fields must
Cross-field validation multiple fields must be
add up to 100%
satisfied
A database table can’t have two rows with the same primary
key value. A primary key is an identifier in a database that
(Databases only) value
Primary-key references a column in which each value is unique. More
must be unique per column
information about primary and foreign keys is provided later in
the program.
(Databases only) values for
Set-membership a column must come from a Value for a column must be set to Yes, No, or Not Applicable
set of discrete values
(Databases only) values for
In a U.S. taxpayer database, the State column must be a valid
a column must be unique
Foreign-key state or territory with the set of acceptable values defined in a
values coming from a
separate States table
column in another table
The degree to which the
data conforms to the actual If values for zip codes are validated by street location, the
Accuracy
entity being measured or accuracy of the data goes up.
described
The degree to which the
If data for personal profiles required hair and eye color, and
Completeness data contains all desired
both are collected, the data is complete.
components or measures
The degree to which the
data is repeatable from If a customer has the same address in the sales and repair
Consistency
different points of entry or databases, the data is consistent.
collection

If you have insufficient data, you can identify trends with the data that is available and qualify your findings
accordingly. Unless explicitly requested as part of the business objective (such as conducting a survey), data
analysts do not create their own datasets.
Terminology Definitions
The entire group that you are interested in for your study. For example, if you are surveying
Population
people in your company, the population would be all the employees in your company.
A subset of your population. Just like a food sample, it is called a sample because it is only a
Sample taste. So if your company is too large to survey every individual, you can survey a
representative sample of your population.
Since a sample is used to represent a population, the sample’s results are expected to differ
from what the result would have been if you had surveyed the entire population. This difference
Margin of error
is called the margin of error. The smaller the margin of error, the closer the results of the sample
are to what the result would have been if you had surveyed the entire population.
How confident you are in the survey results. For example, a 95% confidence level means that if
you were to run the same survey 100 times, you would get similar results 95 of those 100 times.
Confidence level
Confidence level is targeted before you start your study because it will affect how big your
margin of error is at the end of your study.
Confidence The range of possible values that the population’s result would be at the confidence level of the
interval study. This range is the sample result +/- the margin of error.
Statistical The determination of whether your result could be due to random chance or not. The greater the
significance significance, the less due to chance.

Increase the sample size to meet specific needs of your project:


 For a higher confidence level, use a larger sample size
 To decrease the margin of error, use a larger sample size
 For greater statistical significance, use a larger sample size
Note: Sample size calculators use statistical formulas to determine a sample size. More about these are coming
up in the course! Stay tuned.

you should complete the following tasks before analyzing data:


1. Determine data integrity by assessing the overall accuracy, consistency, and completeness of the data.
2. Connect objectives to data by understanding how your business objectives can be served by an investigation
into the data.
3. Know when to stop collecting data.

A 0.8 or 80% statistical power is typically considered the minimum for statistical significance.

Statistical power can be calculated and reported for a completed experiment to comment on the confidence one
might have in the conclusions drawn from the results of the study. It can also be used as a tool to estimate the
number of observations or sample size required in order to detect an effect in an experiment.

A Gentle Introduction to Statistical Power and Power Analysis in Python

Sample size:
https://www.surveymonkey.com/mp/sample-size-calculator/

Presentation:
The steps of the McCandless method include:
1. Introduce the graphic by name
2. Answer obvious questions before they’re asked
3. State the insight of your graphic
4. Call out data to support that insight
5. Tell your audience why it matters

Present the possible business impact of the solution and clear actions
stakeholders can take.

Does this data point or chart support the point I want people to walk away
with?

Key question Spreadsheets SQL R


What is it? A program that uses rows A database programming A general purpose
Key question Spreadsheets SQL R
and columns to organize language used to programming language
data and allows for analysis communicate with used for statistical analysis,
and manipulation through databases to conduct an visualization, and other
formulas, functions, and analysis of data data analysis
built-in features
What is a primary Includes a variety of Allows users to manipulate Provides an accessible
advantage? visualization tools and and reorganize data as language to organize,
features needed to aid analysis modify, and clean data
frames, and create
insightful data visualizations
Which datasets Smaller datasets Larger datasets Larger datasets
does it work best
with?
What is the source Entered manually or Accessed from an external Loaded with R when
of the data? imported from an external database installed, imported from
source your computer, or loaded
from external sources
Where is the data In a spreadsheet file on Inside tables in the In an R file on your
from my analysis your computer accessed database computer
usually stored?
Do I use formulas Yes Yes Yes
and functions?
Can I create Yes Yes, by using an additional Yes
visualizations? tool like a database
management system
(DBMS) or a business
intelligence (BI) tool

You might also like