Business Analytics Notes
Business Analytics Notes
Unit - II
Visualisation Using Spreadsheet
Data Preparation
Data preparation is the process of
gathering, combining, structuring,
and organising data using various
analytical or business intelligence
tools and techniques. It acts as a
bridge between raw data and
insights and enhances data quality
and usability.
Applications of Data Preparation
1. Enhanced Result Reliability
2. Identification and Resolution of Data Issues
3. Informed Decision-Making
4. Cost Reduction
5. Time and Resource Savings
6. Financial Growth and Return on Investment (ROI)
Data Preparation Process
1) Data Collection
Gathering relevant data from various internal and external sources, Here the data professionals who have
actually collected the data must ensure that the data aligns with the objectives of the planned analytics
applications.
3) Data Cleansing
In this step anomalies are corrected to create complete data set. It involves handling missing values,
removing duplicates, resolving inconsistencies, and addressing any data quality problems.
Data Preparation Process
4) Data Structuring
In this step, the data is organised or modelled to meet the requirements of the analytics applications,
converting formats (e.g., CSV to tables)
Welcome page is the first page that viewers come across when they open MS-Excel. This page provides
quick access to various options and templates, allowing users to begin their work efficiently.
Ribbon
A Ribbon is known as a toolbox of MS-Excel. It is located at the top of the screen. A Ribbon has
three main components. These are: Tabs, Groups, and Commands.
Tabs
Group Commands
a. Tabs : Tab is the activity area of the MS-Excel. For example, 'File' tab has the tools used to store or save the data:
'Page layout' tab enables the user to set the layout of the Excel worksheet.
b. Groups : Groups show related tools together. For example: 'Font' group shows all the tools related to text
formatting, and 'Alignment' group shows all tools related to alignment settings
c. Commands : A command is a tool that is used to carry out operations. A command can be in the form of a button,
a box to enter information, or an expandable menu to choose option from.
Quick Access Toolbar
Quick Access Toolbar is a customizable
toolbar. It is located at the top of the MS-Office
window. It contains a set of commands that are
frequently used.
• Filtering enables users to display only the data that meets certain criteria, such as
showing records for specific regions or dates. This functionality is crucial for narrowing
down large datasets to focus on relevant information.
Sorting can be classified into three levels viz single level, multi-level, and custom
sorting:
Shorting > A. Single level sorting
• In single level sorting, the data is sorted based on a single criterion or column in ascending or
descending order.
Step 1: Select the columns from table whose data is to sorted.
Step 2: Click on the Data Tab.
Step 3: Under Data Tab SORT (A dialogue box will appear).
Step 4: Select all the three fields (Column, Sort on, Order).
Step 5: Click OK.
• Custom sorting enables the user to specify the order in which data should be arranged based on his
specific requirements.
Step 1: Select the columns from table whose data is to be sorted.
Step 2: Click on the Data Tab.
Step 3: Under Data Tab SORT (A dialogue box will appear).
Step 4: Select two fields (Column, Sort on).
Step 5: For the third field (Order) Select Custom list option.
Step 6: From the given custom lists, select either from the given custom lists or create a NEW LIST as
per needs and select it.
Step 7: Click OK.
CONDITIONAL FORMATTING
Conditional formatting is a special feature of Excel used to find unique and duplicate
values by formatting the cells. It allows users to apply specific formatting styles (like
colour fills, font styles, or icons) to cells based on predefined criteria or conditions.
It’s a powerful tool for visually analysing and interpreting data since it highlights
important information, trends or outliers, making patterns easier to identify.
Step 3: In data validation box, click on the 'Settings tab, then under 'Allow' option, select
an option:
(a) Whole Number - to restrict the cell to accept only whole numbers.
(b) Decimal - to restrict the cell to accept only decimal numbers.
(c) List - to pick data from the drop-down list.
(d) Date - to restrict the cell to accept only date.
(e) Time - to restrict the cell to accept only time.
(f) Text Length - to restrict the length of the text.
(g) Custom - for custom formula.
Step 4: In the data validation box, under the 'settings tab' click on 'Data' option, select a
condition:
❖ If Custom is chosen, then choose the formula according to which data needs to be
validated.
❖ Click the Input Message tab and write a custom message that will appear on entering
the wrong data.
❖ Select the Show input message when cell is selected checkbox to display the
message when the user selects or hovers over the selected cell.
❖ Select the Error Alert tab to customize the error message.
Step 5: Click OK.
Now, if a user enters a wrong or invalid value in the worksheet, an error message will pop
up on the screen.
IDENTIFYING OUTLIERS IN THE DATA
An outlier can be defined as a data value that distinctively varies from the average value
of the distribution or pattern of the dataset. It is a data point that lies far away from most
of the observations and is significantly different in value or nature. It is essential to
address outliers to prevent skewness, bias, and misinterpretation of data insights.
➢ The First quartile (Q1) is the lower portion of the data with smaller values.
➢ The Second quartile (Q2) is the middle value.
➢ The Third quartile (Q3) is the upper quarter with higher values.
The range of values from smaller quartile (Q1) to higher quartile (Q3) is the inter-quartile
range (IQR) ie., Q3- Q1. Outliers lies outside the IQR. Outliers lies 1.5 times below or
above the IQR.
Steps to calculate outliers using IQR:
Step 1: Determine the range of data (array) for which the quartile is to be calculated.
Step 2: Use the formula QUARTILE (array, 1) to calculate the first quartile. This will give
the value that separates the lowest 25% of the data from the rest.
Step 3: Use the formula QUARTILE (array, 3) to calculate the third quartile. This will give
the value that separates the lowest 75% of the data from the rest.
Step 4: Subtract the first quartile (Q1) from the third quartile (Q3) to calculate the inter-
quartile range. The IQR represents the range of the middle 50% of the data.
Step 5: Multiply the inter quartile range (IQR) by 1.5 and add it to the third quartile (Q3).
This will give the upper bound value, which is used to identify potential outliers.
Step 6: Multiply the inter-quartile range (IQR) by 1.5 and subtract it from the first quartile
(Q1). This will give the lower bound value, which is used to identify potential outliers.
Value lying outside the upper and lower bounds are the outliers.
By following these steps, quartiles can be calculated and used to analyse the
distribution and identify potential outliers in the dataset
1. Find the first quartile (Q1) using = QUARTILE(range, 1).
2. Find the third quartile (Q3) using = QUARTILE(range, 3).
3. Calculate IQR as = Q3 – Q1.
4. Define the lower boundary as = Q1 – 1.5*IQR and the upper boundary
as = Q3 + 1.5*IQR.
Any data points outside this range are considered outliers.
Conditional Formatting. Spreadsheets allow us to apply condit
TEXT TO COLUMN
The Text to Columns feature is used to split a single column of text data into
multiple columns based on a specified delimiter, such as commas, spaces or tabs.
This allows columns to have atomic values and facilitates easier data analysis
and manipulation by allowing users to work with individual data elements more
effectively.
For example, if a column contains first name, last name and profession in a single column, then
this information can be separated in different columns.
Moving averages are statistical calculations used to analyse data points by creating
averages of different subsets of the dataset over a specified period.
The calculation involves taking
the average of a set number of consecutive data points, then “moving” one data
point forward to calculate the next average.
Moving averages are widely used in time series analysis, finance and forecasting to
identify trends, cyclical patterns and potential reversals in data.
A moving average can be calculated using the AVERAGE function or SUM function.
Its syntax is AVERAGE(number1, number2,…) or SUM(number1, number2,…)/
FINDING MISSING VALUES
Finding and addressing missing values is crucial for maintaining data integrity, Quality
Assurance and Efficient Analysis.
Excel does not have any particular function to list missing values analysts commonly
use functions or tools that highlight blank cells or count the number of missing entries.
Functions like IF and ISNUMBER and commands like filter can be used to identify and
list missing values in Excel. Once identified, analysts can address missing values
through various methods: imputation (filling in missing values with mean, median or
mode), deleting rows or columns with excessive missing data, or utilizing predictive
modelling techniques to estimate missing values based on available data. Choosing an
appropriate method depends on the context and significance of the missing data.
Data Summarization
Data summarization is the process of transforming a given large dataset into a
smaller form, usually presentable, for reporting, analysis, and further examination.
It involves condensing a dataset into meaningful insights, making
it easier to understand and analyse, extracting central insights and patterns from
data without losing vital information. It allows quick realization of an overview of
the structure and general features of the dataset, hence facilitates further analysis
and inference.
Key Techniques in Data Summarization
1. Descriptive Statistics
• Measures of Central Tendency: Summarize data using mean, median, and mode,
which describe the central point of dataset.
• Measures of Dispersion: This include the range, variance and standard Deviation
which describe dispersion or variability in data.
3. Percentiles and Quartiles: The former provide insight into the distribution by telling
about the relative standing of the data points.
4. Frequency Distributions: Summarizing data by counting how often each value or
category appears.
5. Data Aggregation: This would involve combining many data points into summary
values. For example, the addition of the sales data by month or the average score
across different categories.
6. Data Grouping: The grouping of data into categories or segments and summarizing
each group in isolation. This can be done using techniques like pivot tables which
summarize data based on different dimensions.
7. Visualization: Charts and Graphs Setting up trends and distributions of data with bar
charts, histograms, pie charts, and line graphs. Box Plots can be used to visualize
distribution, central value, variability of the data, and possible outliers.
8. Data Profiling: It contains information about the structure of a dataset, the count of
missing values, or the data types, or on the distribution of categorical variables.
Importance of Data Summarization:
Simplifies large datasets: It makes large, complex datasets easier to understand and
interpret.
Identifies patterns: It helps identify key trends, outliers, and relationships between
variables.
Facilitates decision-making: Summarized data is useful for making quick decisions and
drawing meaningful conclusions.
Mean
Mean is used to calculate the numerical average
of a dataset. Arithmetic mean is calculated by
adding all the values of the given dataset and
dividing it by the by number of items therein.
Median
Median refers to the middle value of the series
when arranged in ascending or descending
order.
Mode
Mode refers to the most recurring value in the
sample. In other words, it refers to the most
frequent number of the given dataset.
Variance
Variance is like standard deviation however it
measures how firmly or free-floating values are
scattered around the mean.
Range
Range refers to the difference between the
largest and the smallest value of the dataset, This
measure indicates the distance between the
maximum and the minimum values of the sample.
DATA
Steps to draw
charts in excel 1
Sale (Kg)
Pivot Tables
Pivot tables are an important part of MS Excel
that allows users to quickly summarize large
amounts of data, analyze numerical data in detail
and answer unanticipated questions about the
data. Such a table is specifically designed to
query data in user-friendly and interactive way.
For example, consider the dataset given in figure.
There are 7 columns and 91 rows.
For example, to get the total sale of each product, drag the Product
field to the Rows area, Amount field to the Values area.
Pivot Chart
Pivot Chart is a dynamic visualization tool that helps users summarize and analyze large
datasets. Trends and patterns can be easily identified by pivot charts. It is a visual
representation of a pivot table, automatically synchronizing with it to reflect any changes.
It allows interactive visualization, enabling users to drill down into specific data points and filter
data directly from the chart. Pivot charts can be created in various formats such as bar, line,
pie, and area charts.
They are widely used in sales performance analysis, financial reporting, inventory
management, and survey analysis to visualize summarized data and make informed decisions.
Steps to insert a pivot chart using data from the pivot table:
Step 2: On the PivotTable Analyze tab, click on PivotChart in the Tools group.
Step 3: Click OK on the Insert Chart dialog box.
In Microsoft Excel, an interactive dashboard is usually a one-pager report that allows business
users to track and measure crucial business KPIs and metrics under one roof. It combines
charts, figures, and tables to help users visualize complicated data in an easy-to-understand
format.
Step 1: Define the Purpose of the Dashboard. For this, you must be clear with answers for two
questions - “Why is the dashboard being created ?” and “For whom it is created ?”.
Different stakeholders or departments within the organization want to analyze different facts
and figures. So, it is important to understand the purpose of the dashboard and then collect
data around it for accurate and effective decision-making.
Step 2: Gather Data in the form of a table and then convert this table into a pivot table.
If you want 3 pivot charts on the interactive dashboard then you must have 3 pivot tables. So,
you can simply duplicate the pivot table sheet in the Excel workbook.
After formatting the chart, it can be moved to the Interactive dashboard sheet. Repeat the
same steps to create other pivot charts and place them on the interactive dashboard.
Step 6: Add Interactive Features to the dashboard design. For this, select any chart and click
on PivotChart Analyze.
Step 7: Click on Insert Timeline. However, to insert a timeline to any pivot chart, there must be
a Date column in the data. Make sure that the Date checkbox is ticked before you press OK.
Step 8: Like timeline, slicer can also be added on the interactive dashboard. A slicer is just a
fancy name for a filter. To add a slicer, perform the following steps:
(i) Click on one of the pivot charts to activate it.
(ii) In the Insert tab, click on the Slicer option.
(iii) From the list of all the variables, perform slice and dice operations.
But before performing these operations, you need to connect the Slicer to the Charts. To
connect the slicer:
(a) Click on the slicer to activate it.
(b) In the Slicer tab, click on the Report Connections button.
(c) From the list of pivot tables, check all the boxes.