0% found this document useful (0 votes)
7 views

BA-Unit 2

This lesson focuses on data preparation, summarization, and visualization using spreadsheets, outlining key processes such as data cleaning, sorting, filtering, and validation. It emphasizes the importance of data quality and provides techniques for summarizing and visualizing data effectively. Students will learn to create pivot tables, charts, and interactive dashboards to enhance data analysis and decision-making.

Uploaded by

rm99114829
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

BA-Unit 2

This lesson focuses on data preparation, summarization, and visualization using spreadsheets, outlining key processes such as data cleaning, sorting, filtering, and validation. It emphasizes the importance of data quality and provides techniques for summarizing and visualizing data effectively. Students will learn to create pivot tables, charts, and interactive dashboards to enhance data analysis and decision-making.

Uploaded by

rm99114829
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

L E S S O N

2
Data Preparation,
Summarisation and
Visualisation Using
Spreadsheet
Ms. Asha Yadav
Assistant Professor
Department of Computer Science
School of Open Learning
University of Delhi

STRUCTURE
2.1 Learning Objectives
2.2 Data Preparation
2.3 Data Cleaning
2.4 Data Summarization
2.5 Data Sorting
2.6 Filtering Data
2.7 Conditional Formatting
2.8 Text to Column
2.9 Find and Remove Duplicates
2.10 Removing Duplicate Values
2.11 Data Validation
2.12 Identifying Outliers in Data
2.13 Covariance
2.14 Correlation Matrix
2.15 Moving Average
2.16 Finding Missing Values
2.17 Data Summarization
2.18 Data Visualization
PAGE 19
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 19 10-Jan-25 3:51:28 PM


BUSINESS ANALYTICS

Notes 2.19 Types of Data Visualizations in Excel


2.20 Pivot Tables
2.21 Pivot Chart
2.22 Interactive Dashboard
2.23 Summary
2.24 Answers to In-Text Questions
2.25 Self-Assessment Questions
2.26 References
2.27 Suggested Readings

2.1 Learning Objectives


After reading this chapter students will be able to:
Data Preparation, Cleaning, Summarization, Sorting, Filtering,
Validation and Visualization.
Find and remove duplicates.
Calculate covariance and moving average.
Conditional Formatting, Co-relational Analysis.
Create Pivot Charts, Pivot Tables and Interactive Dashboard.

2.2 Data Preparation


Data preparation is one of the major processes in the pipeline of data
analysis and machine learning. This involves tasks such as data cleaning,
transforming, and arranging raw data in a format that enables effective
analysis or model training. Thus, the aim of data preparation is to ensure
good quality and consistency of data for specific tasks.
Some important steps in data preparation include:
Data Collection: This involves collecting raw data from a large number
of sources, such as databases, spreadsheets, APIs, or even sensors.
Data Cleaning: It involves detecting and correcting errors, inconsistencies,
and missing values in a dataset. Treatment of outliers, duplicate entries,
or irrelevant data points is essential in this stage.

20 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 20 10-Jan-25 3:51:28 PM


DATA PREPARATION, SUMMARISATION AND VISUALISATION USING SPREADSHEET

Data Transformation: This refers to the conversion of data into a format Notes
or structure fit for analysis. Arguably, it will involve the normalization or
scaling of numeric values, encoding categorical variables, and aggregation
or disaggregation of data.
Data Integration: This is the integration of data from different sources
into one dataset. It may involve table merging, dataset joining, or another
kind of data conflict resolution.
Data Reduction: This is a process for reducing either the size or the
complexity of the dataset, and it involves feature selection, dimensionality
reduction, and sampling, among others.
Data Formatting: Consistency in format, including standardized date
formats and variable naming conventions.
Data Splitting: Basically, it is the division of data into subsets, usually
training, validation, and test sets. These sets help a model builder to build
models with the data, tune their hyperparameters, and finally estimate
their performance.
Good data preparation is important in order for one to generate valid
and accurate insights; otherwise, if the data quality is low, meaningful
conclusions will not be obtained.

2.3 Data Cleaning


Data cleaning, also known as data cleansing or data scrubbing, is the
process of identifying, correcting, or removing inaccurate, incomplete,
or irrelevant data from a dataset. This step is crucial to ensure that the
data is of high quality, which is essential for accurate analysis, reporting,
and decision-making.
PAGE 21
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 21 10-Jan-25 3:51:28 PM


BUSINESS ANALYTICS

Notes Key Steps in Data Cleaning:


Handling Missing Data: Detect missing values in the dataset, which
can appear as blanks, NA, null, or other placeholders. This data can be
handled by using the following strategies.
Removal: Delete rows or columns with missing values if they are not
critical.
Imputation: Replace missing values with estimates, such as the mean,
median, mode, or by using more advanced methods like predictive modeling.
Placeholder: Leave the missing values as they are, but flag them for
attention in future analysis.
Removing Duplicate Data: For this identify duplicate records that may
occur due to repeated data entry or merging datasets and then, delete
duplicate values to prevent skewed results in analysis.
Correcting Data Errors: To correct commonly occurring data errors,
identify and rectify the following issues:
Inconsistent Data: Fix inconsistencies in formatting (e.g., date for-
mats, text case) or values (e.g., “NY” vs. “New York”).
Data Entry Errors: Identify and correct typographical errors or mis-
entered data, such as incorrect numerical values or misspelled words.
Standardizing Data: Data is standardized by normalization or
transformation.
Normalization: Ensure that data is consistent in format, especially for
categorical data (e.g., “Male” vs. “M” or “1/1/2024” vs. “01-Jan-2024”).
Transformation: Convert data into a common scale or unit, such as
converting all weights to kilograms or all prices to a single currency.
Outlier Detection and Treatment: Detect outliers that fall outside the
expected range of values. These could be due to errors or may require
special attention. Then, decide whether to remove, correct, or leave out-
liers in the dataset. Sometimes outliers are valid and should be kept, but
in other cases, they may need correction or exclusion.
Validating Data Accuracy: To validate accuracy of the data, check the
data against reliable sources or business rules to ensure accuracy. And,

22 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 22 10-Jan-25 3:51:28 PM


DATA PREPARATION, SUMMARISATION AND VISUALISATION USING SPREADSHEET

ensure that the data is logically consistent, such as ensuring all transac- Notes
tions have corresponding dates.
Removing Irrelevant Data: This can be done by filtering data. That is
by removing data that is not relevant to the analysis or that does not
contribute useful information. This can include unnecessary columns,
out-dated records, or noise in the data.
Formatting and Structuring Data: This is done by ensuring that data
is in the correct format, such as consistent date formats or proper text
casing. Also, re-structure the data to meet the needs of the analysis, such
as pivoting tables or separating combined fields into distinct columns.
IN-TEXT QUESTIONS
1. What is the primary goal of data cleaning in a spreadsheet?
(a) To improve the appearance of the spreadsheet
(b) To remove inconsistencies and errors in the data
(c) To format data for printing
(d) To reduce the size of the spreadsheet
2. In data cleaning, what does “imputation” refer to?
(a) Removing unnecessary columns
(b) Filling in missing data with estimated values
(c) Filtering out irrelevant data
(d) Detecting outliers

2.4 Data Summarization


Data summarization is the process of transforming a given large dataset
into a smaller form, usually presentable, for reporting, analysis, and
further examination. It involves extracting central insights and patterns
from data without losing vital information. It allows quick realization of
an overview of the structure and general features of the dataset, hence
facilitates further analysis and inference.

PAGE 23
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 23 10-Jan-25 3:51:28 PM


BUSINESS ANALYTICS

Notes Key Techniques in Data Summarization:


Data summarization can be further divided into different categories as
given below:
Descriptive Statistics: Measures of Central Tendency: Summarize data using
mean, median, and mode, which describe the middle of the distribution.
Measures of Dispersion: Describe dispersion or variability in data: range,
variance, and standard deviation.
Percentiles and Quartiles: The former provide insight into the distribu-
tion by telling about the relative standing of the data points.
Data Aggregation: This would involve combining many data points into
summary values. For example, the addition of the sales data by month
or the average score across different categories.
Data Grouping: The grouping of data into categories or segments and
summarizing each group in isolation. This can be done using techniques
like pivot tables which summarize data based on different dimensions.
Visualization: Charts and Graphs Setting up trends and distributions of
data with bar charts, histograms, pie charts, and line graphs. Box Plots
can be used to visualize distribution, central value, variability of the data,
and possible outliers.
Dimensionality Reduction: The techniques, like PCA or t-SNE, reduce
the number of variables in a dataset by keeping as much as possible of
the variability of the data while summarizing it into lower dimensions.
Summarization: Those are methods for summarizing large documents or
data sets, such as rapid keyword extraction, topic modeling, or abstract
generation.
Data Profiling: It contains information about the structure of a dataset,
the count of missing values, or the data types, or on the distribution of
categorical variables.

Why Summarize Data?


Simplifies Analysis: Summarization of data makes analysis and interpretation
of large datasets easy. It also helps to quickly identify patterns and trends.

24 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 24 10-Jan-25 3:51:29 PM


DATA PREPARATION, SUMMARISATION AND VISUALISATION USING SPREADSHEET

Facilitates Decision-Making: These summaries represent data in such Notes


a way that they become helpful in supporting stakeholders for making
certain decisions.
Improves Reporting: Since summaries of data are used in most reports,
dashboards, and presentations for effective communication. In other words,
data summarization reduces complex datasets into their major parts so that
understanding and action by an analyst, stakeholder, and decision-maker
is enabled with such information.

2.5 Data Sorting


Sorting helps users to organize data in a specific order. You can sort a
text column in alphabetical order (A-Z or Z-A). We can sort a numerical
column from largest to smallest or smallest to largest. We can also sort a
date and time column from oldest to newest or newest to oldest. In this
section, we will see how data can be sorted in MS Excel.
Example: Sorting a Column in descending order.
Step 1: Select the data and use the shortcut key Ctrl + Shift + L.
Step 2: Click on the down arrow on the column. Select Largest to Smallest
or Z to A.

2.6 Filtering Data


Filters are used to temporarily hide some of the data in a table. This helps
users to focus on the data that is important for the current task at hand.
Example: Filter a range of data
Step 1: Select the column on which to
apply filter.
Step 2: Select Data > Filter.
Step 3: Select the column header arrow
.
Step 4: In case of text data, uncheck the values that you want to see
(as shown in figure).

PAGE 25
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 25 10-Jan-25 3:51:29 PM


BUSINESS ANALYTICS

Notes

For filtering data on numeric values, you can even select a comparison,
like Between to see only the values that lie in a given range.

Step 5: Click on OK.


26 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 26 10-Jan-25 3:51:31 PM


DATA PREPARATION, SUMMARISATION AND VISUALISATION USING SPREADSHEET

Notes
2.7 Conditional Formatting
Conditional Formatting allows users to
fill cells with certain color depending
on the condition. This enhances data
visualization and its interpretation. It also
helps in identifying patterns in data. Let
us see how conditional formatting can
be done in MS Excel.
Example: Highlight cells that have a value greater than 350.
Step 1: Select the range of cells
on which conditional formatting
has to be applied.
Step 2: On the Home tab, under
Styles Group, click Conditional
Formatting.
Step 3: Click Highlight Cells Rules
> Greater Than....
Step 4: Enter the desired value and
select the formatting style.

Step 5: Click OK

PAGE 27
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 27 10-Jan-25 3:51:32 PM


BUSINESS ANALYTICS

Notes
2.8 Text to Column
Text to column feature is used to separate a single column data into mul-
tiple columns. This enhances readability of the data. For example, if a
column contains first name, last name and profession in a single column,
then this information can be separated in different columns. This allows
columns to have atomic values. Note that this separation is possible only
if multiple values are separated by the same delimiter in the cell. These
delimiters can be Comma, Semicolon, Space, or other characters. Let us
see how we can split data in MS Excel.
Step 1: Select the cell or column that contains the text to be split.
Step 2: Select Data > Text to Columns.
Step 3: In the Convert Text to Columns Wizard displayed on the screen,
select Delimited > Next.
Step 4: Select the Delimiters for your data.
Step 5: Select Next.
Step 6: Preview the split and select Finish.

2.9 Find and Remove Duplicates


Duplicate data is sometimes useful, but it often just makes the data harder
to understand. Finding, highlighting, and reviewing the duplicates before
removal is better than removing all the duplicates straightway.

2.10 Removing Duplicate Values


Select the range of cells containing duplicate values that should be re-
moved. To do this in MS Excel,
Step 1: Select the data from which duplicate values have to be removed.
Step 2: Select Data > Remove Duplicates.
Step 3: Uncheck the columns to be purged to remove duplicate records.
Step 4: Click OK.

28 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 28 10-Jan-25 3:51:32 PM


DATA PREPARATION, SUMMARISATION AND VISUALISATION USING SPREADSHEET

Notes

IN-TEXT QUESTIONS
3. What does Conditional Formatting allow you to do in a spreadsheet?
(a) Apply formulas automatically
(b) Highlight cells based on certain criteria
(c) Change data values based on formatting
(d) Sort data based on custom rules
4. To highlight only duplicate values in a range of data using
Conditional Formatting, which rule would you apply?
(a) Text that contains
(b) Top/Bottom Rules
(c) Highlight Cell Rules > Duplicate Values
(d) New Rule > Use a Formula

2.11 Data Validation


Excel is a powerful tool for data analysis, reporting, and decision-making.
But, the reliability of these activities depends on the accuracy and integrity

PAGE 29
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 29 10-Jan-25 3:51:33 PM


BUSINESS ANALYTICS

Notes of the data. Data validation helps users control the input to ensure ac-
curacy and consistency.
While validating data, specific criteria for accepting data in cell(s) are
set. This restricts users from entering invalid data. Thus, validating data,
not only enhances accuracy, reliability and integrity of data but it also
cuts time in manual checking and correcting data entries. In Excel, this
can be done using the steps given below:
Step 1: Select the Cells for Data Validation
Step 2: In the Data Tab, click on Data Validation to open the Data Val-
idation Dialog Box
Step 3: In the Data Validation dialog box, under the Settings tab, define
the validation criteria:
Allow: Select the type of data. This data can be Whole Number, Decimal,
List (only values from a predefined list are allowed), Date, Time, Text Length
(only text of a certain length is allowed). The last option is Custom which
is used for more complex criteria and can be specified using a formula.
Data: Specify the condition (e.g., between, not between, equal to, not
equal to, etc.).
Minimum/Maximum: Enter the acceptable range or limits based on the
above selection. For example, to allow values between 100 and 1000,
select “Whole Number,” “between,” and then set the minimum to 100
and the maximum to 1000.
You can even configure (optional) an Input Message that will appear when
the cell is selected. For this, click on InputMessage Tab in the dialog
box. Give a brief title for the input message box and enter the guidance
text that will appear when someone selects the cell. The guidance text
will instruct user on what type of data to enter.
Another optional feature in MS Excel is that you can customize the Error
Alert. To do this, under the Error Alert tab, specify what would happen
if user enters invalid data:
Show Error Alert after Invalid Data is entered: Check this to enable
error alerts.
Style: Choose from Stop, Warning, or Information to indicate the severity
of the alert.

30 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 30 10-Jan-25 3:51:33 PM


DATA PREPARATION, SUMMARISATION AND VISUALISATION USING SPREADSHEET

Title: Enter a title for the error message box. Notes


Error Message: Type the message to be displayed. It must explain the
error and suggest ways to correct it.

2.12 Identifying Outliers in Data


When analysing, visualizing and interpreting data, outliers if present in
the data impacts accuracy, reliability and usability of the data. Therefore,
it is very important to identify and minimize outliers in order to avoid
potential discrepancies they might cause.
Basically, an outlier is a data point or a set of values that are significantly
different from the average or expected range. Presence of outliers can
give detrimental results while forecasting certain crucial values. Thus, to
ensure the accuracy of the data reports, we need to identify the outliers,
calculate their impact and minimize them. To handle outliers in MS Excel,
follow the steps given below:
Review the Data: Errors can creep in data while entering or transferring
data. So, review the data to ensure there are no typos or other errors that
create inaccuracies. This can be done manually or by using automated
tools.
Sort the Data Values: We have already seen how data can be sorted in
MS Excel.

PAGE 31
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 31 10-Jan-25 3:51:33 PM


BUSINESS ANALYTICS

Notes Analyze Data Values: After sorting the values, identify large data dis-
crepancies and outliers to eliminate them. Such values can be straight-
away deleted. But, a better option is to remove only statistical anomalies.
Identify Data Quartiles: To calculate the outliers in the data, calculate
quartiles using Excel’s automated quartile formula beginning with “=
QUARTILE ()” in an empty cell. After the left parenthesis, specify the
first and last cells in your data range separated by a colon and followed
by a comma and the quartile you want to define. For example, formu-
la like “= QUARTILE (A5:A50, 1)” or “= QUARTILE (B2:B200, 3).”
Will find values from A1 cell to A50 cells that belong to quartile 1 (the
25th percentile, or the value below which 25% of data points fall when
arranged in increasing order).
Define the Interquartile Range (IQR): IQR represents the expected
average range of the data (without outlier values). It is calculated by
subtracting the first quartile from the third quartile.
Calculate the Upper and Lower Bounds: Defining the upper and lower
bounds of data allows identification of values that are higher than expected
value (upper bound) and smaller than the lower bound.
Calculate the upper bound of data by multiplying IQR by 1.5 and adding it
to the third quartile. The formula can be given as, “= Q3 + (1.5 * IQR).”
Similarly, to find the lower bound of data, multiply the IQR by 1.5 and
subtract it to from your first quartile value. The formula can be given
as, “= Q1 + (1.5 * IQR).”
Remove the Outliers: After defining the upper and lower bounds of data,
review the data to identify values that are higher than the upper bound
or lower than the lower bound. These values are statistical outliers. So,
delete them for more accurate analysis or visualization reports.

2.13 Covariance
Covariance is a statistical function that calculates the joint variability of
two random variables, given two sets of data. To calculate covariance in
Excel, use the covariance.p functions. The syntax is = COVARIANCE.P
(array1, array2), where
Array1 is a range or array of integer values.

32 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 32 10-Jan-25 3:51:34 PM


DATA PREPARATION, SUMMARISATION AND VISUALISATION USING SPREADSHEET

Array2 is a second range or array of integer values. Notes


Note the following points:
If the given arrays contain text or logical values, they are ignored
by the COVARIANCE in Excel function.
The data should contain numbers, names, arrays, or references that are
numeric. If some cells do not contain numeric data, they are ignored.
The data sets should be of the same size, with the same number
of data points.
The data sets should be neither empty nor should the standard
deviation of their values be zero.
To find covariance in Excel and determine if there is any relation between
the two columns C and D, we can write =COVARIANCE.P(C1:C10,D1:D10).
Mathematically, covariance is calculated as:
( x x )( y y)
COV ( X , Y )
n

2.14 Correlation Matrix


A correlation matrix is a table that displays the correlation coefficients
for different variables. The matrix depicts the correlation between all the
possible pairs of values in a table. Such a table is very useful to summarize
a large dataset and to identify and visualize patterns in the given data.
A correlation matrix consists of rows and columns that show the correlation
coefficient between the variables. The correlation matrix is helpful in the
analysis of multiple linear regression models where several independent
variables are present.
Correlation is a statistical measure that describes the extent to which
two or more variables are related to each other. It indicates the strength
and direction of a relationship between variables. When two variables
are correlated, a change in one variable is associated with changes in
another—either positively or negatively.
Positive Correlation: When values of two variables increase or
decrease together, they are said to be positively correlated. For

PAGE 33
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 33 10-Jan-25 3:51:35 PM


BUSINESS ANALYTICS

Notes example, height and weight are positively correlated; as height


increases, weight tends to increase as well.
Negative Correlation: When two values are negatively correlated, an
increase in one variable results in decline of the other. For example,
speed and time are negatively correlated. When speed increases it
takes less time to reach the destination.
Figure shows this concept graphically.

In Excel, the CORREL function returns the Pearson correlation coefficient


for two sets of values. Its syntax is CORREL (array1, array2)
Where, Array1 is the first range of values and Array2 is the second range
of values. However, the two arrays should have equal length.
Assuming we have a set of independent variables (x) in B2:B13 and
dependent variables (y) in C2:C13, our correlation coefficient formula
goes as follows:
=CORREL(B2:B13, C2:C13)
However, remember that:
If cells in an array contain text, logical values or blanks, then they
are ignored.
If the arrays are of different lengths, N/A error is returned.

34 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 34 10-Jan-25 3:51:36 PM


DATA PREPARATION, SUMMARISATION AND VISUALISATION USING SPREADSHEET

If either of the arrays is empty or if the standard deviation of their Notes


values equals zero, then divide by zero occurs.
The PEARSON function in Excel does the same thing. It calculates
the Pearson Product Moment Correlation coefficient. The syntax of this
function is PEARSON(array1, array2) where, Array1 is a range of inde-
pendent values and Array2 is a range of dependent values. Continuing
with the same data, we can write = PEARSON(B2:B13, C2:C13) as
shown in the figure.

Interpreting Correlation Analysis Results


In the correlation matrix, the coefficients are shown at the intersection
of rows and columns. If the column and row coordinates are the same, it
has value 1. The negative coefficient shows a strong inverse correlation
between the dependent and the independent variable. Correspondingly,
positive coefficient value indicates a strong direct connection between
the variables.

2.15 Moving Average


Moving average also known as rolling average, run-
ning average or moving mean is defined as a series
of averages for different subsets of the same data set.
This measure is frequently used in statistics, season-
ally-adjusted economic and weather forecasting to get
insights into underlying trends. For example, in stock
trading, moving average gives the average value of
a security over a given period of time. Similarly, in

PAGE 35
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 35 10-Jan-25 3:51:39 PM


BUSINESS ANALYTICS

Notes business, moving average of sales for the last 3 months is calculated to
understand the market trends. To forecast weather, the moving average
of three-month temperatures is calculated.
We can compute different types of moving average - simple (or arithme-
tic), exponential, variable, triangular, and weighted. But in this section,
let us see how to calculate simple moving average. In Excel, simple
moving average is calculated by using formulas and trendline options.
A simple moving average can be calculated using the AVERAGE func-
tion. Given a list of average monthly temperatures in column B, moving
average for first 3 months can be calculated as = AVERAGE(B2:B4) or
=SUM(B2:B4)/3. To find subsequent averages, the formula can be copied
in other rows.

To visualize the moving average on a chart by drawing a trendline follow


the steps given below:
Step 1: Click anywhere in the chart.
Step 2: On the Layout tab, in the Analysis group, select the trendline
option.
Step 3: Click the desired option.

36 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 36 10-Jan-25 3:51:39 PM


DATA PREPARATION, SUMMARISATION AND VISUALISATION USING SPREADSHEET

Notes

IN-TEXT QUESTIONS
5. What is the main purpose of data validation in spreadsheets?
(a) To perform mathematical calculations on data
(b) To ensure that data entered meets specific criteria
(c) To visualize data using charts
(d) To automatically sort data
6. What is an outlier in a dataset?
(a) A value that is similar to other values
(b) A value that falls within the Interquartile Range (IQR)
(c) A value significantly different from other values in the
dataset
(d) A missing or blank value
7. Which statistical method can be used to detect outliers using
quartiles?
(a) Standard deviation
(b) Z-score
(c) Interquartile Range (IQR)
(d) Median

PAGE 37
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 37 10-Jan-25 3:51:40 PM


BUSINESS ANALYTICS

Notes
2.16 Finding Missing Values
Excel does not have any particular function to list missing values. But
it is important because of the following reasons:
Data Integrity which ensures that the dataset is complete.
Data Reconciliation that facilitates the reconciliation process (mostly
used in finance).
Quality Assurance to identify anomalies or data entry errors.
Efficient Analysis to perform accurate data analysis by spotting and
addressing gaps.
List missing Values in Excel
To identify and list missing values in Excel, you can use the following
functions:
IF, ISNUMBER and MATCH Functions:
IF: Returns one value if a condition is true and another if it’s false.
ISNUMBER: Checks if a value is a number.
MATCH: Searches for a value in a range and returns its relative
position.
Example: If a column A has a list of values in the range 1 to 100, then
missing values in this data can be identified by using the formula
= IF(ISNUMBER(MATCH(ROW(A1), A:A, 0)), “”, ROW(A1))
Note that the syntax of the MATCH
function is,
MATCH(lookup_value, lookup_ar-
ray, [match_type])
Where,
lookup_value is the value to be
matched in the lookup_array.
lookup_array is the range of cells
being searched.
match_type is optional. It can have

38 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 38 10-Jan-25 3:51:40 PM


DATA PREPARATION, SUMMARISATION AND VISUALISATION USING SPREADSHEET

value -1, 0, or 1. The default value is 1. The argument specifies how Notes
Excel matches lookup_value with values in lookup_array.
Now, drag and apply the formula from B1 to B100. This will result in
column B displaying the missing values in the list.
Missing values can also be identified using the Filter feature on column
B to display only the missing numbers by excluding blank cells.

2.17 Data Summarization


Data summarization in Excel can be done in multiple ways like:
Using Descriptive Statistics: For example, given a list of values in col-
umn A, we can use Excel functions to summarize the values.
SUM, AVERAGE, MEDIAN: Calculate the total, mean, and median of
a dataset.
Example: = SUM (A2:A100) sums all values in the range A2 to A100.
Example: = AVERAGE (A2:A100) calculates
the average.
COUNT, COUNTA: Count the number of cells
with numbers (COUNT) or any data (COUNTA).
Example: =COUNT (A2:A100) counts numeric
entries.
STDEV.P, VAR.P: Calculate the standard devi-
ation and variance of a dataset.
Example: =STDEV.P (A2:A100) for standard
deviation.
MIN, MAX: Find the smallest and largest values.
Example: =MIN (A2:A100) and = MAX (A2:A100).

2.18 Data Visualization


Data visualization helps users to transform raw data into meaningful
visual stories that enables them to spot trends in data and communicate
complex information effectively.

PAGE 39
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 39 10-Jan-25 3:51:41 PM


BUSINESS ANALYTICS

Notes Microsoft Excel provides different types of charts to visualize data in the
spreadsheet. To draw a chart, you need to follow the steps given below:
Step 1: Organize the data in rows and columns within the Excel sheet.
Every row and column should be labelled clearly to identify the data to
be visualized.
Step 2: Select the data by clicking and dragging mouse to highlight
the data to be visualized. In this selection, include the row and column
headers (as shown in the figure).

Step 3: Choose a chart type by clicking on the “Insert” tab. In the “Charts”
section, select the required chart option (Column, Line, Pie, Bar, Area,
Scatter, etc.) by clicking on the dropdown arrow below the chart type.

40 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 40 10-Jan-25 3:51:41 PM


DATA PREPARATION, SUMMARISATION AND VISUALISATION USING SPREADSHEET

Step 4: Insert the chart. Once the desired chart is selected, it is auto- Notes
matically created and inserted in the worksheet. Now, it can be clicked
and dragged to change its position or resized by using the sizing handles
at the corners.
Step 5: Customize the chart. For this, click on the chart to select it. Now,
you would be able to see two additional tabs: “Design” and “Format”.
Use these tabs to customize the chart’s appearance, style and layout. Im-
portant information like chart title, axis labels, legend, data labels, etc.
can be added to enhance visualization and data interpretation.

Step 6: Edit the data (optional). In case you wish to make changes to
the data, simply edit it in the worksheet. Excel will automatically update
the chart to reflect the changes.

2.19 Types of Data Visualizations in Excel


Excel offers a variety of charts to
visualize data. Some commonly
used charts for visualizing data
are:
Column Chart: It displays data
using vertical bars. Each bar
represents a category. A column
chart is preferred when certain

PAGE 41
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 41 10-Jan-25 3:51:42 PM


BUSINESS ANALYTICS

Notes values have to be compared across categories or to visualize trends over


time.
Bar Chart: It is similar to a column chart, but instead of vertical bars, it
has horizontal bars. They are usually used to compare values across cat-
egories when the category names are long or there are several categories.

Line Chart: The line chart plots data points and then connects these
points by lines. These lines show trends or change in values over time.
Line charts are widely used for continuous data like stock prices or
temperature measurements.

Pie Chart: A pie chart plots data as slices of a circle. Size of each slice
is proportional to the value it represents. That is, it represents the pro-
portion of each category within a whole.

42 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 42 10-Jan-25 3:51:42 PM


DATA PREPARATION, SUMMARISATION AND VISUALISATION USING SPREADSHEET

Notes

Scatter Plot: A scatter plot displays data points on a Cartesian coordinate


system, with each axis representing a variable. These charts depict the
relationship between two variables and identify patterns or correlations.

Thus, these charts in Excel help users to understand the composition,


distribution and overlapping of data. After effectively visualizing data,
users can formulate meaningful and engaging stories to decision makers.

2.20 Pivot Tables


Pivot tables are an important part of MS Excel that allows users to
quickly summarize large amounts of data, analyze numerical data in de-
tail, and answer unanticipated questions about the data. Such a table is
specifically designed to query data in user-friendly and interactive way.
For example, consider the dataset given in figure. There are 6 columns
and 213 rows.

PAGE 43
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 43 10-Jan-25 3:51:42 PM


BUSINESS ANALYTICS

Notes To insert a pivot table, follow the steps given below:


Step 1: Click any single cell inside the data set.
Step 2: Click on the Insert tab, in the Tables group.
Step 3: Click on Piv-
otTable.
Step 4: From the dia-
log box that appears,
Excel automatically
selects the data and
the default location set
for a new pivot table
is New Worksheet.
Step 5: Click on OK.
Step 6: Now, drag the
fields. For example,
to get the total amount exported of each product, drag the Product field
to the Rows area, Amount field to the Values area and the Country field
to the Filters area.

2.21 Pivot Chart


Pivot Chart is a dynamic visualization tool that helps users summarize
and analyze large datasets. Trends and patterns can be easily identified
by pivot charts.
Pivot charts are used to present complex data in a clear, concise, inter-
active and flexible manner. For example, with a pivot chart, users can
easily visualize how sales vary across different geographical regions,
product categories, or specific time periods.
To insert a pivot chart using data from the pivot table, follow the steps
given below:
Step 1: Click any cell inside the pivot table.
Step 2: On the PivotTable Analyze tab, click on PivotChart in the Tools group.

44 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 44 10-Jan-25 3:51:43 PM


DATA PREPARATION, SUMMARISATION AND VISUALISATION USING SPREADSHEET

Step 3: Click OK on the Insert Chart dialog box. Notes


The pivot chart will appear on the screen. Any changes made in the pivot
chart are immediately reflected in the pivot table and vice versa.

2.22 Interactive Dashboard


In Microsoft Excel, an interactive dashboard is usually a one-pager report
that allows business users to track and measure crucial business KPIs
and metrics under one roof. It combines charts, figures, and tables to
help users visualize complicated data in an easy-to-understand format. An
interactive dashboard can be created by following the steps given below:
Step 1: Define the Purpose of the Dashboard. For this, you must be clear
with answers for two questions - why is the dashboard being created and
for whom it is created. Different stakeholders or departments within the
organization want to analyze different facts and figures. For example, the
Chief Financial Officer (CFO) focuses on key financial metrics, while the
investor is interested in a summarized dashboard of all the departments.
So, it is important to understand the purpose of the dashboard and then
collect data around it for accurate and effective decision-making.
Step 2: Gather Data in the form of a table and then convert this table
into a pivot table. This is done by:
(i) Selecting the table.
(ii) In the Insert Tab, click on Pivot Table.
(iii) Click on OK and the Pivot Table will be inserted in a new sheet.

PAGE 45
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 45 10-Jan-25 3:51:43 PM


BUSINESS ANALYTICS

Notes If you want 3 pivot charts on the interactive dashboard then you must
have 3 pivot tables. So, you can simply duplicate the pivot table sheet
in the Excel workbook.
Step 3: Create Charts using the Pivot Table. For example:
The first chart would represent every product’s monthly sales. For this
chart, we need 3 data entries - Sales, Product, and Month. In the Pivot
table sheet, drag and drop the Month data in the rows area, product in
the columns area, and Sales in the values area.

Step 4: In the PivotTable Analyze group, click on PivotChart and select


a suitable chart from the chart drop-down.
Step 5: Click on Ok. The pivot chart will be created.
Once the chart is created, you can style it using formatting options. Just
click on the + sign that besides the chart and then, tick the following items.

Chart Title to change the title of the chart.


Legend to enable, disable, or edit the legend.
Axes to edit horizontal axis and vertical axis of the chart.

46 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 46 10-Jan-25 3:51:43 PM


DATA PREPARATION, SUMMARISATION AND VISUALISATION USING SPREADSHEET

Data Table to insert a table representing all values in the data table. Notes

After formatting the chart, it can be moved to the Interactive dashboard


sheet. Repeat the same steps to create other pivot charts and place them
on the interactive dashboard.
Step 6: Add Interactive Features to the dashboard design. For this, select
any chart and click on PivotChart Analyze.
Step 7: Click on Insert Timeline. How-
ever, to insert a timeline to any pivot
chart, there must be a Date column
in the data. Make sure that the Date
checkbox is ticked before you press OK.
Step 8: Like timeline, slicer can also be added on the interactive dash-
board. A slicer is just a fancy name for a filter. To add a slicer, perform
the following steps:
(i) Click on one of the pivot charts to activate it.
(ii) In the Insert tab, click on the Slicer option.
(iii) From the list of all the variables, perform slice and dice operations.
But before performing these operations, you need to connect the
Slicer to the Charts. To connect the slicer:
(a) Click on the slicer to activate it.
(b) In the Slicer tab, click on the Report Connections button.
(c) From the list of pivot tables, check all the boxes.

PAGE 47
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 47 10-Jan-25 3:51:44 PM


BUSINESS ANALYTICS

Notes

IN-TEXT QUESTIONS
8. Which of the following is the most suitable chart type for
displaying the proportion of different categories in a dataset?
(a) Line Chart
(b) Scatter Plot
(c) Pie Chart
(d) Histogram
9. Which of the following operations can you perform using a
pivot table?
(a) Filter data based on specific criteria
(b) Create complex formulas
(c) Sort data in a specific column
(d) All of the above
10. Which type of chart is commonly used in pivot charts to show
data changes over time?
(a) Bar Chart
(b) Pie Chart
(c) Line Chart
(d) Scatter Plot
11. What is a common benefit of using a dashboard for data analysis?
(a) It provides detailed data without summarization
(b) It allows for real-time monitoring of key metrics
(c) It removes the need for data visualization
(d) It only displays raw data without analysis

48 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 48 10-Jan-25 3:51:45 PM


DATA PREPARATION, SUMMARISATION AND VISUALISATION USING SPREADSHEET

Notes
2.23 Summary
The aim of data preparation is to ensure good quality and consistency of
data for specific tasks. While data preparation, we need to detect outliers
that fall outside the expected range of values. These unexpected values
could be due to errors or may require special attention. We need to decide
whether to remove, correct, or leave outliers in the dataset. Sometimes
outliers are valid and should be kept, but in other cases, they may need
correction or exclusion.
Moreover, to validate accuracy of the data, check the data against reliable
sources or business rules to ensure accuracy. And, ensure that the data is
logically consistent, such as ensuring all transactions have corresponding dates.
Data summarization is done to transform a given large dataset into a
smaller form, usually presentable, for reporting, analysis, and further
examination. It involves extracting central insights and patterns from
data without losing vital information. Pivot tables are an important part
of MS Excel that allows users to quickly summarize large amounts of
data, analyze numerical data in detail, and answer unanticipated questions
about the data. Correspondingly, Pivot Chart is a dynamic visualization
tool that helps users summarize and analyze large datasets. Trends and
patterns can be easily identified by pivot charts.

2.24 Answers to In-Text Questions


1. (b) To remove inconsistencies and errors in the data
2. (b) Filling in missing data with estimated values
3. (b) Highlight cells based on certain criteria
4. (c) Highlight Cell Rules > Duplicate Values
5. (b) To ensure that data entered meets specific criteria
6. (c) A value significantly different from other values in the dataset
7. (c) Interquartile Range (IQR)
8. (c) Pie Chart
9. (d) All of the above
10. (c) Line Chart
11. (b) It allows for real-time monitoring of key metrics
PAGE 49
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi

Business Analytics.indd 49 10-Jan-25 3:51:45 PM

You might also like