Unit - II Business Analytics
Unit - II Business Analytics
■ Data Collection
● Gathering raw data from various sources (databases,
spreadsheets, APIs, sensors).
■ Data Cleaning
● Detecting and correcting errors, inconsistencies, and
missing values.
● Handling outliers, duplicate entries, and irrelevant data
points.
■ Data Transformation
● Converting data into a suitable format.
● Normalization, scaling numerical values, encoding
categorical variables, aggregation/disaggregation.
■ Data Integration
● Merging data from multiple sources.
● Resolving conflicts in dataset joining.
■ Data Reduction
● Reducing dataset size or complexity.
● Feature selection, dimensionality reduction, and
sampling.
■ Data Formatting
● Standardizing formats (e.g., date formats, variable
naming conventions).
■ Data Splitting
❖Data Validation
➢ Ensures accuracy, consistency, and integrity of data in Excel.
➢ Helps restrict invalid data entry, reducing manual corrections.
➢Steps to Apply Data Validation in Excel
■ Select the cells where data validation is needed.
■ Go to Data Tab > Data Validation to open the dialog box.
■ Under the Settings tab, define validation criteria:
● Allow: Choose data type (Whole Number, Decimal,
List, Date, Time, Text Length, or Custom Formula).
● Data: Set conditions (e.g., between, equal to, not equal
to).
➢Optional Features
■ Input Message:
● Appears when the cell is selected.
● Provides instructions for valid data entry.
■ Error Alert:
● Triggers when invalid data is entered.
● Styles: Stop (blocks entry), Warning (allows but alerts),
Information (suggests correction).
● Title & Message: Explains the error and suggests
correction.
❖Identifying Outliers in Data
➢ Outliers affect accuracy, reliability, and usability of data.
➢ An outlier is a data point significantly different from the expected
range.
➢ Identifying and minimizing outliers ensures accurate data analysis
and forecasting.
➢Steps to Handle Outliers in MS Excel
■ Review the Data
● Check for errors (typos, data entry mistakes) manually
or using automated tools.
■ Sort the Data Values
● Arrange data in ascending/descending order for easier
analysis.
■ Analyze Data Values
● Identify large discrepancies.
● Remove statistical anomalies instead of deleting all
❖Data Sorting
➢ Organizes data in a specific order for better analysis.
➢ Types of Sorting:
■ Text: Alphabetical (A-Z or Z-A).
■ Numbers: Smallest to largest or vice versa.
■ Dates & Time: Oldest to newest or newest to oldest.
■ Example: Sorting a Column in Descending Order
■ Select the data and press Ctrl + Shift + L (Shortcut Key).
■ Click the down arrow on the column.
■ Select Largest to Smallest (numbers) or Z to A (text).
❖Filtering Data
➢ Temporarily hides unwanted data to focus on relevant information.
➢Filtering a Range of Data
■ Select the column to apply the filter.
■ Go to Data > Filter.
■ Click the column header arrow.
■ Click OK
❖Text to Column
➢ Splits data from a single column into multiple columns for better
readability.
➢ Separates first name, last name, and profession stored in one
column.
➢ Data must have a consistent delimiter (Comma, Semicolon, Space,
etc.).
➢Steps to Split Data in MS Excel
■ Select the cell or column containing the text to split.
■ Go to Data > Text to Columns.
■ In the Convert Text to Columns Wizard, select Delimited >
Next.
Data Summarization
★Covariance
○ Definition: Measures the joint variability between two random
variables.
○ Use Case: Helps determine relationships between two datasets.
○ Covariance Formula in Excel
○ Function:
➢Bar Chart
■ Similar to a column chart but uses horizontal bars.
■ Useful when category names are long or there are many
categories.
■ Provides better readability for large datasets compared to
column charts.
■ Example:
● Comparing Profit and Discount Across Products
➢Line Chart
➢Pie Chart
■ Represents data as slices of a circle, showing proportions of a
whole.
■ Best for percentage-based comparisons among a few
categories.
■ Becomes less effective when dealing with too many
categories.
■ Example:
● Diet Recommendation
➢Scatter Plot
■ Plots individual data points to analyze relationships between
two variables.
❖Pivot Tables
➢ Quickly sumrise large Datasets.
➢ Analyse numerical data in detail.
➢ Answer unexpected question about the data.
➢ They provide a user-friendly and interactive way to query data.
■ They can handle large datasets
■ like the one in the figure with 6 columns and 213 rows.
❖Interactive Dashboard
➢ An Interactive Dashboard in Excel is a one-page report that allows
businesses to track and measure crucial KPIs and metrics. It
provides a visual representation of data using charts, figures, and
tables, making it easier to analyze and interpret complex information.