Da 001
Da 001
The six phases of the data analysis process help answer business challenges, such as understanding how to
improve a retirement program. Additionally, iterating on and reviewing your work throughout the data analysis
process is critical for obtaining quality results.
Now, data science, the discipline of making data useful, is an umbrella term that encompasses three disciplines:
machine learning, statistics, and analytics.
It’s important to include insights from subject-matter experts because they are familiar with the business problem
and can review analysis results and help identify inconsistencies. Plus, their experience and human intuition are
valuable to data-driven decision-making.
Gut instinct is an intuitive understanding of something with little or no explanation. This isn’t always something
conscious; we often pick up on signals without even realizing. You just have a “feeling” it’s right.
In addition, try asking yourself these questions about a project to help find the perfect balance:
What kind of results are needed?
Who will be informed?
Am I answering the question being asked?
How quickly does a decision need to be made?
Data analysts and detectives share a similar approach to problem-solving, both relying on evidence and facts to
make decisions. Data-driven decision-making is essential for analysts, but gut instinct can also play a role in
identifying patterns and connections. Balancing data and gut instinct is crucial for making informed decisions, and
the right mix depends on the project's goals and time constraints.
Analytical skills are qualities and characteristics associated with solving problems using facts. There are a lot of
aspects to analytical skills, but, we'll focus on five essential points.
They are curiosity, understanding context, having technical mindset, data design, and data strategy.
Understanding concept
The analytical skill that has to do with how you group things into categories
A technical mindset involves the ability to break things down into smaller steps or pieces and work with them in
an orderly and logical way.
The analytical skill that involves breaking processes down into smaller steps and working with them in an orderly,
logical way
Data design is how you organize information. As a data analyst, design typically has to do with an actual
database.
Data strategy is the management of the people, processes, and tools used in data analysis
Data strategy is the management of the people, processes, and tools used in data analysis.
Your inherent analytical skills are essential for conducting data analysis and will be even more critical when you
combine them with the tools and techniques from this program. Understanding how to use these skills in
business scenarios is the first step toward developing them further and using them effectively in your career.
Analytical skills
The qualities and characteristics associated with solving problems using facts
Analytical thinking involves identifying and defining a problem and then solving it by using data in an organized,
step-by-step manner.
The 5 key aspects are visualization, strategy, problem-orientation, correlation, and using big-picture and detail-
oriented thinking.
To execute a plan using detail-oriented thinking, a data analyst considers the specifics.
The five whys is a powerful tool for root cause analysis. It’s simple, effective, and a great way to collaborate with
colleagues and learn about other areas of the business. Plus, the five whys can be used to analyze problems in
any industry, helping organizations of all kinds identify and fix business problems. As a data professional, you can
turn to the five whys whenever you feel stumped by a problem and need to approach it from a different
perspective.
Understanding the importance of the data life cycle will set you up for success as a data analyst. Individual
stages in the data life cycle will vary from company to company or by industry or sector. Historical data is
important to both the U.S. Fish and Wildlife Service and the USGS, so their data life cycle focuses on archiving
and backing up data. Harvard's interests are in research and teaching, so its data life cycle includes visualization
and interpretation even though these are more often associated with a data analysis life cycle. The HBS data life
cycle also doesn't call out a stage for purging or destroying data. In contrast, the data life cycle for finance clearly
identifies archive and purge stages. To sum it up, although data life cycles vary, one data management principle
is universal: Govern how data is handled so that it is accurate, secure, and available to meet your organization's
needs.
While the data analysis process will drive your projects and help you reach your business goals, you must
understand the life cycle of your data in order to use that process. To analyze your data well, you need to have a
thorough understanding of it. Similarly, you can collect all the data you want, but the data is only useful to you if
you have a plan for analyzing it.
The Plan and Ask phases both involve planning and asking questions, but they tackle different subjects. The Ask
phase in the data analysis process focuses on big-picture strategic thinking about business goals. However, the
Plan phase focuses on the fundamentals of the project, such as what data you have access to, what data you
need, and where you’re going to get it.
A formula is a set of instructions, whereas a function is a preset command. Formulas perform a specific
calculation. Functions are preset commands that automatically perform a process or task, making it more
efficient.
SQL
Use FROM to choose the tables where the columns you want are located.
Specifies the table from which to retrieve data
A business task is the question or problem data analysis answers for business.
002
Structured thinking is the process of recognizing the current problem or situation, organizing available
information, revealing gaps and opportunities, and identifying the options.
SMART questions:
Specific: Is the question specific? Does it address the problem? Does it have context? Will it uncover a lot of the
information you need?
Measurable: Will the question give you answers that you can measure?
Action-oriented: Will the answers provide information that helps you devise some type of plan?
Relevant: Is the question about the particular problem you are trying to solve?
Time-bound: Are the answers relevant to the specific time being studied?
Here are some examples of questions you might ask based on the suggested topics:
Objectives: What are the goals of the deep dive? What, if any, questions are expected to be answered
by this deep dive?
Audience: Who are the stakeholders? Who is interested or concerned about the results of this deep
dive? Who is the audience for the presentation?
Time: What is the time frame for completion? By what date does this need to be done?
Resources: What resources are available to accomplish the deep dive's goals?
Security: Who should have access to the information?
Structured thinking is the process of recognizing the current problem or situation, organizing available
information, revealing gaps and opportunities, and identifying the options.
If you have insufficient data, you can identify trends with the data that is available and qualify your findings
accordingly. Unless explicitly requested as part of the business objective (such as conducting a survey), data
analysts do not create their own datasets.
Terminology Definitions
The entire group that you are interested in for your study. For example, if you are surveying
Population
people in your company, the population would be all the employees in your company.
A subset of your population. Just like a food sample, it is called a sample because it is only a
Sample taste. So if your company is too large to survey every individual, you can survey a
representative sample of your population.
Since a sample is used to represent a population, the sample’s results are expected to differ
from what the result would have been if you had surveyed the entire population. This difference
Margin of error
is called the margin of error. The smaller the margin of error, the closer the results of the sample
are to what the result would have been if you had surveyed the entire population.
How confident you are in the survey results. For example, a 95% confidence level means that if
you were to run the same survey 100 times, you would get similar results 95 of those 100 times.
Confidence level
Confidence level is targeted before you start your study because it will affect how big your
margin of error is at the end of your study.
Confidence The range of possible values that the population’s result would be at the confidence level of the
interval study. This range is the sample result +/- the margin of error.
Statistical The determination of whether your result could be due to random chance or not. The greater the
significance significance, the less due to chance.
A 0.8 or 80% statistical power is typically considered the minimum for statistical significance.
Statistical power can be calculated and reported for a completed experiment to comment on the confidence one
might have in the conclusions drawn from the results of the study. It can also be used as a tool to estimate the
number of observations or sample size required in order to detect an effect in an experiment.
Sample size:
https://www.surveymonkey.com/mp/sample-size-calculator/
Presentation:
The steps of the McCandless method include:
1. Introduce the graphic by name
2. Answer obvious questions before they’re asked
3. State the insight of your graphic
4. Call out data to support that insight
5. Tell your audience why it matters
Present the possible business impact of the solution and clear actions
stakeholders can take.
Does this data point or chart support the point I want people to walk away
with?