Data Analysis Project
Data Analysis Project
with
ADVANCED EXCEL
About the Course
Data Analysis with Excel is an in-depth course designed to provide comprehensive
knowledge of the latest and advanced features in Microsoft Excel. This course covers a wide
range of data analysis functions and demonstrates how to effectively utilize Excel's powerful
tools for analysing data.
The course is structured with clear, step-by-step instructions and includes numerous
screenshots to guide learners through each feature, ensuring a practical and easy-to-follow
learning experience.
Audience
This course is tailored for individuals who rely extensively on Microsoft Excel for creating
charts, tables, and professional reports involving complex data. It is ideal for anyone who
regularly uses Excel for data analysis and seeks to enhance their proficiency in utilizing
Excel’s advanced features.
Prerequisites
Participants of this course are expected to have a solid understanding of the fundamental
features of Microsoft Excel. Familiarity with basic Excel functions and tools will help
maximize the learning experience.
Data Analysis −Objectives
DAY 1
1)Descriptive statistics in Excel
2)Exploratory Data Analysis (EDA) with conditional formatting
3)Sales by Country report with formulas
4)Sales by Country report with Pivots
5)Top 5 products with $ per unit
6)Anomaly detection in your data
7)Best in category analysis
8)Profit analysis (combining two tables)
9)Dynamic country level sales report
10)Which products to discontinue (Open ended questions)
DAY 2
1) Using Tables
2) Working with Power Query
3) Formulas
4) Pivot Tables
5) Conditional formatting
6) Charts
7) Data Validation
8) Keyboard Shortcuts & tricks
9) Dashboard Design.
Data Analysis − Overview
Data analysis is the process of systematically applying statistical and logical techniques to
describe, condense, and evaluate data. In today's data-driven world, it has become an
indispensable tool for decision-making and problem-solving across various industries. It
involves gathering raw data, processing it, and transforming it into valuable insights that
inform business strategies, uncover patterns, and guide future actions.
In the context of Microsoft Excel, data analysis empowers users to not only organize data
efficiently but also to perform complex calculations, visualize trends, and generate
meaningful reports with a high degree of accuracy. Excel’s powerful functions, combined
with its intuitive user interface, make it one of the most accessible yet potent tools for both
beginners and seasoned data professionals.
Key features of Excel, such as Pivot Tables, Power Query, Advanced Formulas (like
`SUMIFS`, `COUNTIFS`, `XLOOKUP`), and Data Visualizations, allow users to filter
through vast datasets, combine multiple data sources, and visualize key trends in real-time.
Mastering data analysis in Excel is crucial for anyone looking to make data-driven decisions
in a competitive environment. Through precise, step-by-step instruction, this course will
unlock the full potential of Excel’s data analysis capabilities, enabling you to transform raw
data into actionable intelligence with ease and efficiency.
1|Page
Data Analysis Process
The data analysis process is a systematic approach to transforming raw data into meaningful
insights and informed decisions. It involves several key stages, each critical to ensuring that
the data is accurate, reliable, and actionable. Below is an overview of the essential steps in
the data analysis process:
1. Define Objectives
The first step in data analysis is to clearly define the objectives or the problem you want to
solve. Understanding the purpose of the analysis helps in determining the type of data to
collect and the methods to use. Whether it’s identifying trends, making predictions, or
understanding patterns, this step sets the foundation for the entire process.
2. Data Collection
Once the objectives are clear, the next step is to gather the required data. Data can be
collected from a variety of sources, such as databases, surveys, online platforms, or manual
inputs. Ensuring that the data collected is relevant, comprehensive, and accurate is essential
to avoid bias or gaps in the analysis.
3. Data Cleaning
Raw data is often filled with inconsistencies, missing values, and errors. Data cleaning
involves refining the dataset by removing or correcting inaccurate entries, filling in missing
data, and ensuring that it is properly formatted. This step is crucial to prevent misleading
results or faulty conclusions in later stages.
4. Data Exploration
In this phase, data is explored through summary statistics and visualizations to understand its
structure and identify any initial patterns, trends, or outliers. Techniques like descriptive
statistics, charts, and graphs are used to get a preliminary sense of the data’s distribution and
relationships among variables.
5. Data Analysis/Modelling
The core of the data analysis process involves applying statistical techniques and
mathematical models to draw insights from the data. Depending on the objective, various
methods can be used:
- Descriptive analysis to summarize data
- Predictive analysis for forecasting future trends
- Inferential analysis for making generalizations or predictions from sample data
- Prescriptive analysis to recommend actions based on the data
6. Interpretation of Results
Once the data has been analysed, the next step is interpreting the results in the context of the
original objectives. This involves understanding what the patterns, correlations, and insights
mean and how they relate to the problem at hand. The goal is to translate complex data into
actionable recommendations that can inform decision-making.
2|Page
stakeholders to understand and act upon. Tools like Excel’s PivotTables, Power Query, and
data visualization functions are commonly used to create compelling reports.
8. Decision-Making
The insights gained from the analysis inform strategic decisions. Whether optimizing
processes, identifying opportunities, or solving problems, the data analysis process ultimately
helps organizations and individuals make informed, data-driven decisions that align with their
goals.
3|Page
Data Source and a List of Questions we are going to answer in this course:
4|Page
1.Descriptive statistics in Excel
Excel provides a powerful and user-friendly platform for performing quick statistical
analysis. Whether you are looking to summarize large datasets or extract key insights, Excel
offers various built-in functions that make it easy to generate quick statistics with minimal
effort. Here are the primary tools and features used to conduct quick statistical analysis in
Excel:
The Descriptive Statistics tool in Excel’s Data Analysis ToolPak allows users to quickly
summarize large amounts of data. It provides key summary measures such as:
These functions are quick and straightforward, giving you immediate statistical insight into
your data.
5|Page
2.Exploratory Data Analysis (EDA) with Conditional Formatting (CF) in Excel
Exploratory Data Analysis (EDA) is the process of investigating and summarizing datasets
to uncover underlying patterns, relationships, and anomalies. In Excel, Conditional
Formatting (CF) serves as a powerful tool for performing EDA visually, helping you
quickly identify trends, patterns, and outliers.
Here’s how you can use Conditional Formatting (CF) for EDA in Excel:
1. Highlight Cells Based on Values
Conditional Formatting allows you to highlight cells based on their values. This can help you
quickly identify important patterns or outliers in your dataset.
• Steps:
o Select the data range you want to analyse.
o Go to the Home tab and click on Conditional Formatting.
o Choose a rule such as:
▪ Highlight Cells Rules (Greater Than, Less Than, Equal To, etc.): Useful for
spotting values above or below certain thresholds.
▪ Top/Bottom Rules: To highlight the top 10% or bottom 10% of values in a
dataset.
▪ Data Bars: Visually represent the magnitude of each value with a horizontal
bar inside the cell.
Use case: Highlight values that exceed a specific target or fall below a benchmark, e.g.,
identifying products with sales figures greater than 1000 or regions with low performance.
2. Colour Scales
Colour Scales allow you to quickly see how data values compare by applying a gradient of
colours across a range of values. Higher values might be shaded darker, while lower values
are shaded lighter, or vice versa.
• Steps:
o Select your data range.
o Go to Conditional Formatting and choose Colour Scales.
o Excel automatically applies a gradient, e.g., green for high values, yellow for mid-
range, and red for low values.
Use case: For financial data, you can apply a colour scale to sales figures to instantly
visualize the best and worst-performing months or regions.
3. Icon Sets
Icon sets in Conditional Formatting allow you to mark your data with symbols such as
arrows, stars, or check marks to indicate performance or data trends.
• Steps:
o Select the data you want to format.
o Go to Conditional Formatting and select Icon Sets.
o Choose from sets like directional arrows, shapes, or traffic lights.
Use case: Track sales growth or decline over time using arrows, where an upward arrow
means increased sales and a downward arrow means a drop.
4. Data Bars
Data bars provide a quick visualization of the relative size of values in your dataset by
placing horizontal bars inside each cell. This creates an in-cell chart that compares values
across a range.
• Steps:
o Highlight the range of cells you want to apply data bars to.
o Go to Conditional Formatting, select Data Bars, and choose a colour scheme.
6|Page
Use case: Apply data bars to monthly revenue data to easily compare the magnitude of
revenues across different months.
5. Highlight Duplicates
Identifying duplicate values is an essential part of EDA, especially when cleaning data.
Conditional Formatting helps you quickly find these duplicates for further analysis.
• Steps:
o Select the range of data.
o Go to Conditional Formatting and choose Highlight Cells Rules, then select
Duplicate Values.
Use case: Find and analyse duplicate entries in customer databases or product inventories.
6. Detecting Outliers with Conditional Formatting
Outliers can heavily influence data trends and need to be addressed. Conditional Formatting
can help highlight data points that significantly deviate from the rest of the dataset.
• Steps:
o Apply Conditional Formatting rules such as Greater Than, Less Than, or use the
Top/Bottom Rules to mark outliers.
o Alternatively, use a custom formula to detect values that are more than a certain
number of standard deviations away from the mean.
Use case: Spotting outliers in stock prices, sales data, or financial metrics.
7. Visualizing Relationships in EDA
Conditional Formatting can also be used to visualize relationships between different columns
of data, such as comparing sales performance to marketing spend.
• Steps:
o Select the two columns of interest.
o Apply Colour Scales or Icon Sets to visualize correlations or differences between
them.
Use case: Compare the relationship between advertising spend and sales figures to uncover
patterns in the data.
7|Page
3.Sales by Country report with formulas
2.Amount :SUMIFS( )
8|Page
4.Sales by Country report with Pivots
9|Page
5.Top 5 products by $ per Unit: Using Pivot Tables
10 | P a g e
6.using Charts to show some Anomalies in our Data:
11 | P a g e
7. Best Sales Person by Country: using Pivot Tables
12 | P a g e
8.Profit analysis (combining two tables)
13 | P a g e
9.Dynamic country level sales report
14 | P a g e
10.Which products to discontinue (Open ended questions)
15 | P a g e
DAY 2
How to approach a data analysis project.
Here’s a summarized approach to tackling a data analysis project:
1. Define the Problem and Objectives: Clarify the purpose and objectives of the analysis.
Understand stakeholder needs and set measurable goals.
2. Understand Data Requirements: Identify necessary data sources, understand the data
context, and ensure data availability.
3. Data Collection and Preparation: Gather, clean, and transform the data for analysis.
Conduct exploratory data analysis (EDA) to understand data characteristics.
4. Choose Analytical Methods and Tools: Select appropriate analytical techniques and tools
based on the project’s objectives.
5. Perform the Analysis: Apply chosen methods, interpret results, and refine the analysis
iteratively.
6. Validate and Test the Analysis: Ensure robustness and reliability through cross-
validation, sensitivity analysis, and peer review.
7. Communicate Findings: Develop a clear narrative with visuals to present key insights and
prepare a report or presentation tailored to the audience.
9. Implement Changes and Monitor Results: Support implementation, track impact using
KPIs, and refine approaches based on feedback.
10. Document and Reflect: Document the analysis process and lessons learned for future
projects.
16 | P a g e
2.How to systematically clean data
Systematically cleaning data in Excel involves several steps to ensure the dataset is
accurate, complete, and ready for analysis.
Here is a structured approach, along with some commonly used Excel functions to assist in
the process:
1. Remove Duplicates
- Purpose: Eliminate any duplicate entries in your data to avoid skewed results.
- How:
- Select the data range.
- Go to Data > Remove Duplicates.
- Select the columns to check for duplicates.
17 | P a g e
3. Trim Extra Spaces
- Purpose: Remove unnecessary leading, trailing, or extra spaces within the text.
- Function:
- `=TRIM(cell)`: Removes all extra spaces except for single spaces between words.
- How:
- Create a new column next to the data.
- Apply the `TRIM` function.
- Copy and paste the results back into the original column as values.
18 | P a g e
9. Normalize Data Formats
- Purpose: Standardize data formats (e.g., dates, numbers).
- Functions:
- `=TEXT(cell, "format")`: Converts numbers and dates into a consistent format.
- Example: `=TEXT(A1, "mm/dd/yyyy")` standardizes date formats.
By following these systematic steps and utilizing the provided Excel functions, you can
effectively clean and prepare your data for analysis, ensuring accuracy and consistency.
19 | P a g e
Power Query in Excel or Power BI
Using Power Query in Excel or Power BI allows you to combine and clean data from
multiple sources in a streamlined and efficient way. Power Query provides a powerful, user-
friendly interface for transforming and loading data, making it ideal for preparing data for
analysis. Here’s how you can use Power Query to combine and clean data in one go:
3. Combine Data
- Append Queries: If your datasets have the same structure (same columns), you can
append them to stack the datasets on top of each other.
- Go to Home > Append Queries.
- Choose whether to append two tables or more.
- Merge Queries: If your datasets need to be joined (e.g., by a common key like
`CustomerID`), use the merge function.
- Go to Home > Merge Queries.
- Select the common key from each table and choose the type of join (e.g., Left Join,
Right Join, Inner Join, etc.).
20 | P a g e
Statistical Analysis of Data Using the Data Analysis Tool in Excel
Excel’s Data Analysis Toolpak provides a powerful suite of tools to perform various
statistical analyses directly within Excel. This tool is ideal for users who need to conduct
basic to intermediate-level statistical analysis without requiring advanced statistical
software.
1. Descriptive Statistics
Descriptive statistics summarize and provide information about your data, such as the
mean, median, mode, standard deviation, and more.
- How to Use:
1. Go to the Data tab and click on Data Analysis.
2. Select Descriptive Statistics and click OK.
3. Choose the input range for your data.
4. Check the box for Summary statistics and select where you want the output.
5. Click OK to generate the results.
21 | P a g e
Functions Useful for Data Cleaning in Excel
Conclusion
The Data Analysis Toolpak in Excel provides a comprehensive set of tools for performing
basic statistical analyses, making it an accessible option for beginners and intermediate users.
By following the steps outlined above, you can conduct a variety of statistical analyses
directly in Excel to gain insights from your data.
22 | P a g e
Excel Data Analysis Most Effective Functions:
Using Excel formulas like COUNTIFS, SUMIFS, and XLOOKUP can significantly
enhance data analysis by allowing you to perform complex calculations and data
retrievals with ease. These functions are powerful tools for filtering, aggregating, and
looking up data within your datasets.
1. COUNTIFS Function
The COUNTIFS function counts the number of cells that meet multiple criteria across
different ranges. It is useful for data analysis when you need to find the frequency of
data points that satisfy several conditions.
Syntax:
```excel
COUNTIFS(range1, criteria1, [range2, criteria2], ...)
```
- range1, range2, ...: The ranges in which to evaluate the associated criteria.
- criteria1, criteria2, ...: The conditions that must be met in each range.
Example Usage:
Suppose you have a dataset of sales data, and you want to count how many sales were
made by a specific sales representative for a particular product.
```excel
=COUNTIFS(A2:A100, "John Doe", B2:B100, "Product A")
```
This formula will count the number of sales made by "John Doe" (in column A) for "Product
A" (in column B).
2. SUMIFS Function
The SUMIFS function sums the values in a range that meet multiple criteria. This
function is particularly useful for aggregating data based on multiple conditions.
Syntax:
```excel
SUMIFS(sum_range, criteria_range1, criteria1, [criteria_range2, criteria2], ...)
```
- sum_range: The range of cells to sum.
- criteria_range1, criteria_range2, ...: The ranges in which to evaluate the associated criteria.
- criteria1, criteria2, ...: The conditions that must be met in each range.
Example Usage:
To sum the total sales amount for "John Doe" for "Product A":
```excel
=SUMIFS(C2:C100, A2:A100, "John Doe", B2:B100, "Product A")
```
23 | P a g e
Here, C2:C100 is the range containing the sales amounts, A2:A100 is the range for the
sales representative, and B2:B100 is the range for the product name. The formula sums
up the sales amounts that meet both criteria.
3. XLOOKUP Function
The XLOOKUP function is a versatile lookup function that can return a value or an array
based on a match found in a range. It is more flexible than the traditional VLOOKUP and
HLOOKUP functions because it allows searching in both directions and can return results
from a range to the left or right of the lookup range.
Syntax:
```excel
XLOOKUP(lookup_value, lookup_array, return_array, [if_not_found], [match_mode],
[search_mode])
```
- lookup_value: The value to search for.
- lookup_array: The array or range to search.
- return_array: The array or range to return.
- if_not_found: The value to return if no valid match is found.
- match_mode: Specifies the type of match (exact, exact or next smaller/larger, wildcard).
- search_mode: Specifies the search mode (first to last, last to first, etc.).
24 | P a g e
4.Combining Functions for Advanced Data Analysis
You can combine these functions to create more advanced formulas that perform multiple
operations. For example, you can use XLOOKUP in conjunction with SUMIFS to
dynamically calculate totals based on lookup values.
Example:
Suppose you want to find the total sales for a specific product category and sales
representative dynamically based on a user selection.
```excel
=SUMIFS(C2:C100, A2:A100, XLOOKUP(E1, G2:G100, G2:G100), B2:B100,
XLOOKUP(F1, H2:H100, H2:H100))
```
Here:
- C2:C100: Sales amount range.
- A2:A100: Sales representative range.
- B2:B100: Product range.
- E1: User input for sales representative.
- F1: User input for product.
- G2:G100: Range containing valid sales representatives.
- H2:H100: Range containing valid products.
Conclusion
Using COUNTIFS, SUMIFS, and XLOOKUP effectively allows for robust data analysis in
Excel, enabling you to filter, aggregate, and dynamically look up data. These functions are
particularly powerful when combined, providing a flexible and comprehensive approach to
handling complex datasets.
25 | P a g e
4.Pivot Tables
26 | P a g e
27 | P a g e
28 | P a g e
29 | P a g e
30 | P a g e
31 | P a g e
32 | P a g e
33 | P a g e
34 | P a g e
35 | P a g e
36 | P a g e
5.Conditional Formatting
37 | P a g e
6.Charts
38 | P a g e
7.Data Validation in Excel
Data Validation in Excel is a feature that allows you to control the type of data or the values
that users can enter into a cell or range of cells. It ensures data consistency and prevents
invalid data entries, helping maintain the integrity of your data.
a. Settings Tab
This is where you define the criteria that control what data is allowed in the selected
cells.
1. Allow: Choose the type of validation you want to apply from the drop-down list. The
common options include:
- Whole Number: Only allows whole numbers within a specified range.
- Decimal: Allows decimal numbers within a specified range.
- List: Creates a drop-down list of predefined items for the user to select from.
39 | P a g e
- Date: Only allows dates within a specific range.
- Time: Only allows times within a specific range.
- Text Length: Restricts the number of characters allowed in a text string.
- Custom: Allows you to set up a custom validation formula.
2. Data: Depending on the validation type you select, this field provides options such as:
- Between: Ensures data falls between two values.
- Not Between: Ensures data does not fall between two values.
- Equal to: Ensures data equals a specific value.
- Less than, Greater than: Ensures data is either below or above a specific value.
3. Minimum/Maximum or Specific Values: Define the range or value that will be allowed.
For example, if you chose “Between” for a whole number, you would set the minimum and
maximum values here.
- Title: Enter a title for the error message (e.g., "Invalid Entry").
- Error Message: Enter the message that will be displayed when invalid data is entered (e.g.,
"Please enter a value between 1 and 100").
40 | P a g e
4. Apply and Test the Validation
Once you have set the validation rules, click OK to apply them.
- Test the validation by attempting to enter values that meet and don’t meet the criteria.
Conclusion
Data Validation in Excel is a useful tool for ensuring the accuracy and consistency of data.
By setting specific rules and restrictions, you can prevent incorrect data from being entered
and help maintain the integrity of your analysis.
41 | P a g e
8.Keyboard Shortcuts & tricks For Data Analysis in Excel 365
Here are some of the most useful keyboard shortcuts and tricks for data analysis in
Excel 365, designed to improve efficiency while analysing data:
2. Formula Shortcuts
- Alt + =: Automatically insert the SUM function.
- Ctrl + Shift + Enter: Enter a formula as an array formula.
- Ctrl + ` (Grave Accent): Show all formulas in the worksheet instead of values.
- F4: Toggle between absolute and relative references in a formula (e.g., from A1 to $A$1).
- Ctrl + Shift + L: Turn filters on/off for the selected range.
42 | P a g e
6. Formatting Data
- Ctrl + 1: Open the Format Cells dialog box to apply custom formatting to numbers, text,
borders, and more.
- Ctrl + Shift + $: Apply currency formatting.
- Ctrl + Shift + %: Apply percentage formatting.
- Ctrl + Shift + !: Apply number formatting with two decimal places.
- Ctrl + Shift + #: Apply date formatting.
Summary
Excel 365 offers powerful keyboard shortcuts and tricks that save time and improve data
analysis workflows. Familiarize yourself with these shortcuts to boost your productivity and
master the art of data analysis in Excel.
43 | P a g e
9.Dashboard Design For Data Analysis
Dashboard Design in Excel 365 for Data Analysis is an essential skill that allows you to
create visually appealing and interactive reports. Here is a detailed guide on how to design
effective dashboards in Excel 365, including tips, tricks, and key elements you should focus
on.
44 | P a g e
4. Use Tables for Data Organization
In Excel, use Structured Tables:
- Insert > Table to convert your data into a structured format.
- Tables automatically expand when new data is added and are easy to reference in formulas
and charts.
8. Conditional Formatting
Use Conditional Formatting to highlight important metrics or changes:
- Go to Home > Conditional Formatting and set rules for cells to change color based on value
ranges (e.g., highlight negative numbers or top performers).
- Conditional formatting is useful for heat maps, traffic lights (red, yellow, green), and other
visual indicators.
45 | P a g e
- IFERROR: To handle errors in your formulas and prevent them from showing in the
dashboard.
- XLOOKUP: To find values across your dataset dynamically.
- TEXTJOIN, CONCATENATE: To combine text fields dynamically (e.g., combining
names, dates, or comments).
Summary:
Building a dashboard in Excel 365 for data analysis involves planning, organizing your data,
designing visualizations, and adding interactive features like slicers, Pivot Tables, and
dynamic charts. Excel 365 provides powerful tools and functionalities that allow you to
create engaging, actionable, and insightful dashboards for business or personal use.
46 | P a g e