Data Analytics (PRACTICAL)
Data Analytics (PRACTICAL)
example
Cleaning and preparing a messy dataset for analysis in Excel involves several steps that can be done using
Excel's built-in data cleaning tools. Below is a step-by-step example, illustrating how to clean data efficiently.
Let's assume we have a dataset of sales data that contains issues such as missing values, duplicates, and
inconsistent formatting.
1. Remove Duplicates:
Problem: Missing Customer Name, Product, Sales Amount, and Date in some rows.
How to Fix:
o For "Customer Name": You can manually fill missing values (e.g., use the name from other
similar rows or mark it as 'Unknown'). Alternatively, you can use Excel's Find & Replace (press
Ctrl + H) to replace NULL with an appropriate value like "Unknown."
o For "Product": Similarly, manually fill or use a value like "Unknown Product" where missing.
o For "Sales Amount": If sales data is missing, you can fill it with an average or estimate based
on other entries.
o For "Date": Use a placeholder like "01/01/2025" or an average date, or ask the data source for
clarification.
Example after filling missing values:
Problem: Sales Amount column has a dollar sign, and some numbers might be formatted as text.
How to Fix:
o Select the Sales Amount column.
o Go to the Data tab and click on Text to Columns.
o Choose Delimited, click Next, then select Next again, and choose the correct format for
numbers (e.g., General or Currency).
o Ensure that the values are numbers and not text by using Format Cells (Ctrl + 1) and choosing
Currency.
Result: All sales amounts are properly formatted as numbers with dollar signs, and calculations (like
sum, average) can now work correctly.
Problem: Some sales amounts are unusually high or low (e.g., $0 for "John Smith" on Order 130).
How to Fix:
o Review the Sales Amount column.
o Apply conditional formatting (highlight cells that are below a certain value or unusually high) to
visually inspect.
o Manually review or adjust based on business logic (e.g., setting a minimum value of $100 for
sales).
How to Apply Conditional Formatting:
o Select the Sales Amount column.
o Go to Home > Conditional Formatting > New Rule.
o Choose a rule, like Cell Value greater than or less than certain values, to identify outliers.
Conclusion:
By using Excel's Remove Duplicates, Text to Columns, Format Cells, Conditional Formatting, and other
built-in tools, you can clean and prepare your dataset for analysis effectively.
2)Use excels pivot tables and charts to explore and visualize data from a large dataset with
example
Using Excel’s Pivot Tables and Pivot Charts is an excellent way to explore and visualize data, especially
when working with large datasets. Here’s a step-by-step guide on how to use Pivot Tables and Pivot Charts to
summarize, analyze, and visualize data.
Let’s assume we have a large dataset with sales information across different regions, products, and sales
representatives.
After inserting the Pivot Table, a PivotTable Field List will appear on the right side.
Rows: Drag the Region field to the Rows area. This will group the data by region.
Columns: Drag the Product field to the Columns area to separate sales data by product type.
Values: Drag the Sales Amount field to the Values area. By default, it will sum the sales amount for
each product in each region.
Filter (optional): You can add Sales Rep to the Filters area to filter by specific sales reps if needed.
The Pivot Table will now show a summary of total sales by product and region:
Region Laptop Phone Tablet Grand Total
East $2300 $750 $500 $3550
North $3650 $600 $4250
South $2200 $800 $550 $3550
Grand Total $9150 $1550 $1650 $12350
The Column Chart will show the sales of each product type (Laptop, Phone, Tablet) in each region
(East, North, South).
This will give you a clear visualization of the total sales in each region by product type.
6. Further Exploration:
You can explore more by filtering data by specific sales reps, dates, or any other field.
Slicers can also be added to your PivotTable for an interactive way to filter data:
o Select your PivotTable, go to the Insert tab, and click Slicer.
o Choose a field (e.g., Region or Sales Rep) to create an interactive slicer to filter data
dynamically.
1. Total Sales by Region and Product: From the Pivot Table, you can easily see that North region has
the highest sales for Laptop.
2. Trend by Product Type: The Pivot Chart visually shows that Laptops are the most sold product across
all regions.
3. Quick Summary: The Grand Total row in the Pivot Table and Pivot Chart provides an overview of
total sales, making it easier to compare overall performance.
Conclusion:
Using Pivot Tables and Pivot Charts in Excel allows you to efficiently analyze and visualize large datasets. By
summarizing data in a pivot table and representing it with a pivot chart, you can uncover trends, compare
categories, and gain valuable insights into your data quickly and interactively.
3)Use conditional formating to highlight important data trends and outliers with example
Conditional formatting in Excel (or similar tools like Google Sheets) allows you to format cells based on their
values or other criteria, making it easier to visually spot important trends, patterns, and outliers in your data.
Here’s how to use conditional formatting to highlight important data trends and outliers, with examples:
Example Data:
You can highlight the top performers or the lowest values in the dataset.
The top 2 sales values (450 and 700) will be highlighted with the chosen formatting.
Now, the product with the lowest profit (Product B) will be highlighted.
You can highlight outliers that are above or below a specific threshold.
Any sales value above 500 (like Product E's 700 sales) will now be highlighted.
Color scales are useful to show gradients in data, helping you see trends across a range of values.
Now, the cells with higher profits (like Product E with 200) will be in green, while lower profits (like Product B
with 30) will be in red.
Data bars create a bar chart-like effect within the cells, showing the relative value of each cell in comparison
to others.
Now, each sales value will have a data bar inside the cell, giving a quick visual indication of how large or small
the sales values are relative to each other.
4]Use excel charting tools to create scatter plot and identify correlation between two
variables explain with example
Creating a scatter plot in Excel is a great way to visually identify the correlation between two variables. A
scatter plot helps you see if there is any relationship or pattern between the data points.
Let’s use an example where we want to examine the relationship between Sales and Profit for different
products. The data is as follows:
1. Highlight the Sales and Profit columns (excluding the product names, as these are just identifiers).
1. You can add chart elements like Chart Title, Axis Titles, and Data Labels by clicking on the Chart
Elements button (the plus icon next to the chart).
2. Add Axis Titles:
o For the X-Axis (horizontal), label it as "Sales".
o For the Y-Axis (vertical), label it as "Profit".
In the scatter plot, each point represents a product, with Sales on the x-axis and Profit on the y-axis. You will
be able to visually identify if there’s any pattern or relationship.
Example:
The products are scattered across the graph based on their sales and profit.
Product E (with sales of 700 and profit of 200) will be in the upper-right corner, representing high sales
and high profit.
Product B (with sales of 150 and profit of 30) will be toward the bottom-left, representing low sales and
low profit.
Step 5: Identifying the Correlation
Positive Correlation: If the points trend upward from left to right, it suggests that as one variable
increases, the other does as well. This means Sales and Profit might have a positive correlation.
o In our case, as Sales increase, Profit also seems to increase, so we might expect a positive
correlation.
In our case, the scatter plot will show a positive trend, and if the R-squared value is close to 1, it confirms a
strong positive correlation between Sales and Profit.
If the scatter plot shows a clear upward trend and the R-squared value is high (e.g., 0.95), this would
suggest that Sales and Profit are strongly positively correlated: Higher sales lead to higher profits.
If the trend is more scattered with no clear direction, the correlation would be weak or nonexistent.
This method is useful for quickly assessing and understanding relationships between two sets of data and
confirming the strength of the correlation.
5] Use excel data filtering and sorting tools to explore a largest dataset with example
we have a large dataset containing sales data for a retail store. The dataset includes columns such as Date,
Product Category, Product Name, Quantity Sold, and Revenue. We'll use Excel's data filtering and sorting tools
to explore this dataset.
Here's how you can explore the dataset using Excel's data filtering and sorting tools:
Filter Data:
Select any cell within your dataset.
You'll notice dropdown arrows appear in the header row of each column.
Click on the dropdown arrow in the 'Product Category' column, for example.
You can then select specific product categories you want to view. This filters the dataset to show only rows
that match the selected product categories.
Sort Data:
Select any cell within your dataset.
For example, you can sort the data by 'Date' in ascending or descending order to see sales trends over time.
You can also sort the data by 'Revenue' in descending order to identify top-selling products.
Advanced Filtering:
Excel also offers advanced filtering options.
In the Advanced Filter dialog box, you can specify complex criteria to filter the dataset based on specific
conditions.
For example, you can filter the dataset to show only products with revenue greater than a certain value or
products sold in a specific date range.
Data Subtotals:
Excel allows you to add subtotals to your filtered data.
You can choose which column to subtotal and which function to use (e.g., sum, average, count).
This can be useful for getting a summary of sales data within specific categories or time periods.
Remove Filters:
To remove filters, simply click on the dropdown arrow in the header row of the filtered column and select
'Clear Filter From [Column Name]'.
To remove sorting, click on 'Sort A to Z' or 'Sort Z to A' in the 'Sort' dropdown menu.
By using these filtering and sorting tools in Excel, you can efficiently explore and analyze large datasets, such
as the sales data for a retail store, and gain valuable insights into your data.
6]Use excel remove duplicate feature to identify and remove duplicates entries in a dataset
explain with example
The Remove Duplicates feature in Excel is a powerful tool that allows you to quickly identify and remove
duplicate entries in your dataset. This is especially helpful when you have data with repeated information that
could affect analysis or reporting.
Example Scenario:
Let’s say you have a dataset with a list of customer names and their purchases, but some customers are listed
multiple times. We want to remove duplicate entries based on the customer name and purchase date.
Here is the dataset:
Customer Name Purchase Date Amount
John Smith 01/01/2025 100
Jane Doe 01/02/2025 150
John Smith 01/01/2025 100
Emily Clark 01/03/2025 200
Jane Doe 01/02/2025 150
John Smith 01/05/2025 200
Goal: Remove duplicate entries (in this case, customers listed multiple times) from the dataset.
The SUMIF function in Excel is used to sum values based on a single criterion. However, if you need to sum
values based on multiple criteria, you can use the SUMIFS function, which is an extension of SUMIF and
allows for multiple conditions.
Example Scenario:
Let's assume you have a sales dataset with the following columns:
Date Region Salesperson Sales Amount
01/01/2025 North John 100
02/01/2025 South Jane 150
03/01/2025 North John 200
04/01/2025 East Mike 300
05/01/2025 South Jane 250
06/01/2025 North Mike 400
07/01/2025 East John 150
08/01/2025 South Mike 100
Goal: Sum sales for the North region and John the salesperson.
Example Breakdown:
Here's the dataset again with the specific criteria:
Date Region Salesperson Sales Amount
01/01/2025 North John 100
02/01/2025 South Jane 150
Date Region Salesperson Sales Amount
03/01/2025 North John 200
04/01/2025 East Mike 300
05/01/2025 South Jane 250
06/01/2025 North Mike 400
07/01/2025 East John 150
08/01/2025 South Mike 100
For "North" and "John", the Sales Amounts are:
100 (01/01/2025) and 200 (03/01/2025).
Total = 300.
Additional Considerations:
You can use as many criteria as needed by adding additional pairs of criteria ranges and criteria.
The criteria can be numbers, text, or even expressions (e.g., ">200" to sum values greater than 200).
The SUMIFS function can handle both AND and OR logic based on the combination of criteria
provided.
Conclusion:
The SUMIFS function is a great tool for performing sums based on multiple conditions. By using this
function, you can easily aggregate data based on more than one criterion, making it invaluable for reporting and
analysis.
8] Use excel count if function to count data based on specific condition explain with
example
The COUNTIF function in Excel is used to count the number of cells that meet a specific condition or criteria.
This is useful when you want to count items that match a certain condition, such as counting how many times a
particular value appears or how many values meet a specific threshold.
Syntax of the COUNTIF Function:
COUNTIF(range, criteria)
range: The group of cells you want to apply the condition to.
criteria: The condition or value that you want to count in the specified range.
The AVERAGEIF function in Excel calculates the average of the cells in a specified range that meet a certain
condition or criteria. It is especially useful when you want to compute averages based on specific criteria (e.g.,
the average of sales greater than a certain number, or the average sales for a specific product category).
range: The group of cells that you want to evaluate based on the criteria.
criteria: The condition that you want to apply. This can be a number, text, expression, or even a cell
reference.
average_range (optional): The actual cells to average. If omitted, Excel will average the values in the
range itself.
Example 1: Calculating the Average Sales for Products with Sales Greater Than 300
Product Sales
A 200
B 150
C 450
D 350
E 700
Goal: Calculate the average sales for products where the sales are greater than 300.
Step-by-Step Process:
1. Select the cell where you want the result to appear (e.g., cell B7).
2. Enter the following formula:
excel
CopyEdit
=AVERAGEIF(B2:B6, ">300")
Explanation:
Range (B2:B6): The sales data for which we will apply the condition.
Criteria (">300"): We want to average the sales values that are greater than 300.
Result:
The formula will calculate the average of sales values that meet the condition (greater than 300). The products
that meet this condition are:
Product C (450)
Product D (350)
Product E (700)
Now, let’s assume you have sales data with product categories, and you want to calculate the average sales for a
specific product.
Goal: Calculate the average sales for products in the "Electronics" category.
Step-by-Step Process:
1. Select the cell where you want the result to appear (e.g., cell C7).
2. Enter the following formula:
Explanation:
Product A (200)
Product C (450)
Product E (700)
You can also use logical conditions with the AVERAGEIF function to perform more advanced calculations.
Goal: Calculate the average sales for products where the sales are between 200 and 500.
1. Select the cell where you want the result (e.g., cell B7).
2. Enter the following formula:
o The first part of the formula calculates the average sales greater than or equal to 200.
o The second part of the formula subtracts the average sales greater than 500, effectively
calculating the average sales between 200 and 500.
The average_range is optional. If it’s not provided, Excel averages the values in the range itself.
Logical operators like >, <, >=, and <= can be used as criteria.
The AVERAGEIF function works with both numbers and text (such as categories or specific product
names).
Conclusion:
The AVERAGEIF function in Excel is powerful for calculating averages based on specific criteria. Whether
you are calculating averages based on numerical conditions, specific text, or even complex logical conditions,
this function makes it easy to quickly get insights from your data.
10] Use excel pivot table to calculate total sales by region and product category
A Pivot Table in Excel is a powerful tool that helps summarize and analyze data quickly by grouping,
aggregating, and filtering data. In this example, we’ll use a Pivot Table to calculate the total sales by region
and product category.
1. First, make sure your data is in tabular format (with headers at the top of each column).
2. Select the entire data range (for example, A1:D10 in this case).
Once the Pivot Table field list appears on the right side, you will see the Product, Sales, Region, and Category
fields available to use.
1. By default, the Sales field will be summed, but you can check this by clicking on the drop-down arrow
next to Sum of Sales in the Values area.
2. Ensure that Sum is selected (this should be the default). If it's showing something else like Count or
Average, you can change it to Sum.
Your Pivot Table will now display the total sales grouped by both Region and Category.
The "Region" field is used to group the data by region (North, South, West, East).
The "Category" field is used to further break down the sales by product category within each region.
The "Sum of Sales" shows the total sales for each category in each region.
Changing the number format: Right-click the Sum of Sales values, choose Number Format, and
select Currency or Number with appropriate decimal places.
Sorting: You can sort the regions and categories by sales in descending or ascending order. Right-click
on a value in the Pivot Table and choose Sort.
You can filter the data by dragging fields to the Filters area.
Expand/Collapse the grouping: Click on the + or - symbols to expand or collapse the data by region or
category.
Multiple values: You can add more than one field to the Values area to get additional calculations, such
as Average Sales or Count of Sales.
11] Use excel line chart to plot the trends of sales over time
A line chart in Excel is ideal for showing trends over time. It allows you to visualize how a data series
changes, such as sales, over a period of time.
Example Scenario:
Let’s say you have the following data representing sales for the first six months of the year:
Month Sales
January 200
February 250
March 300
April 350
May 400
June 450
Goal: Plot the trends of sales over time using a line chart.
Step-by-Step Process:
Step 1: Organize Your Data
Make sure your data is organized in two columns: one for the time periods (months) and one for the sales data.
Month Sales
January 200
February 250
March 300
April 350
May 400
June 450
Step 2: Select Your Data
1. Select the data range that you want to plot on the line chart. In this case, select the Month and Sales
columns (from A1:B7).
Step 3: Insert the Line Chart
1. Go to the Insert tab in Excel’s ribbon.
2. In the Charts group, click on the Line Chart icon.
3. Select the Line with Markers option (this shows a line with a point for each data value).
Step 4: Customize the Line Chart (Optional)
1. Chart Title: You can edit the chart title to something more descriptive, such as "Sales Trend Over
Time".
o Click on the chart title and type your desired title.
2. Axis Titles: To add axis titles:
o Click the Chart Elements button (the plus sign next to the chart).
o Check the Axis Titles box.
o Edit the axis titles:
For the X-Axis (horizontal), label it as "Month".
For the Y-Axis (vertical), label it as "Sales".
Step 5: Review the Results
The line chart will display the sales trend over the six months. The months will be plotted on the X-axis, and
the sales values will be plotted on the Y-axis.
Example of What the Line Chart Will Look Like:
The line chart will show a line starting at 200 for January, gradually rising as the sales increase month by
month, ending at 450 in June. The markers on the line will represent the actual sales data points for each month.
Additional Customizations (Optional):
Changing Line Style: You can change the style or color of the line by clicking on it and using the
Format options in the ribbon.
Data Labels: To display the sales values on the chart directly, you can add data labels:
o Click the Chart Elements button again, and check the Data Labels box.
12] use excel file handle to quickly fill in missing data in dataset
In Excel, you can use a fill handle to quickly fill in missing data in a dataset. The fill handle is a small square at
the bottom-right corner of a selected cell, and it can be dragged to automatically fill adjacent cells with values
based on patterns in your data.
Let’s go through the steps to use the fill handle effectively to fill in missing data in a dataset.
Example Scenario:
Suppose you have a dataset with missing sales data for certain months, and you want to fill in those missing
values using the data you already have.
Month Sales
January 200
February 250
March
April 350
May
June 450
You want to fill in the missing sales data in March and May based on the existing data.
Step-by-Step Process:
In this case, the sales values are increasing every month. So, we can use the fill handle to fill the
missing values based on a pattern.
1. First, select the two cells that have the data you want to extend. In this case, select the Sales values for
January (200) and February (250). (Cells B2 and B3).
1. After selecting the two cells (B2 and B3), you will notice a small square at the bottom-right corner of the
selected cells. This is the fill handle.
2. Hover your mouse pointer over the fill handle. The cursor will change to a small black cross.
3. Click and drag the fill handle down to the cell below (B4 for March) and continue to drag it down until
you fill the last missing cell (B5 for May).
4. When you release the mouse button, Excel will automatically fill in the cells with data that follows the
pattern you started. For this example, it will calculate the missing sales values, increasing by 50 units
per month.
Step 4: Review the Filled Data
After filling in the data, your dataset will look like this:
Month Sales
January 200
February 250
March 300
April 350
May 400
June 450
Excel has filled in the missing sales values for March and May based on the pattern from January, February,
April, and June.
Conclusion:
The fill handle is a powerful and quick tool in Excel for filling in missing data in your dataset. By identifying
patterns in your data, Excel can automatically extend those patterns to fill in gaps, saving you time and reducing
errors. This feature is especially useful for datasets with consistent increments or sequences.