Data Analytics Lab File - 040637
Data Analytics Lab File - 040637
KOTA
SUBMITTED TO SUMITTED BY
Mrs. Deepti Agarwal Abhinav Chauhan
Director MBA 1st Sem
1
ACKNOWLEDGEMENT
It gives the immense pleasure to thank all those who have helped me
during the course of my Data Analytics lab file.
Finally, I would also owe a great thanks to my parents and my friends for
their support and encouragement. Once again, I thanks to those who
directly or indirectly helped me in completing my lab manual.
Abhinav Chauhan
2
Index
1. Introduction to Excel
2. Basic features of MS Excel
3. Paste Special (Values, Transpose)
4. Relative Referencing and Absolute Referencing
5. Data Analytics using Excel
6. Sort and Filter
7. Formulas and Functions in MS Excel
8. HLOOKUP and VLOOKUP
9. Generating Multiple Reports in MS Excel
10. Find and Replace
11. MIS Reporting
12. Conditional Formatting
13. Chart and Tables in MS Excel
14. Excel’s Goal Seek feature
15. Data Validation
16. Statistical Functions in Excel
17. PV, NPV, XNPV, EMI, IRR in Excel:
18. To construct contingency table, compute conditional, marginal probability
and use of Bayes theorem
Summary
3
DATA ANALYTICS LAB
Introduction to MS Excel:
MS Excel is a spreadsheet program where one can record data in the form of tables. It
is easy to analyse data in an Excel Spreadsheet.
Features of MS Excel: -
Various editing and formatting can be done on an Excel spreadsheet. Discussed below
are the various features of MS Excel.
• Home
Comprises options like font size, font styles, font colour, background colour,
alignment, formatting options and styles, insertion and deletion of cells and
editing options.
• Insert
Comprises options like table format and style, inserting images and figures,
adding graphs, charts and sparklines, header and footer option, equation and
symbols.
• Page Layout
Themes, orientation and page setup options are available under the page layout
option.
• Formulas
Since tables with a large amount of data can be created in MS excel, under this
feature, you can add formulas to your table and get quicker solutions.
• Data
Adding external data (from the web), filtering options and data tools are
available under this category.
• Review
Proof-reading can be done for an excel sheet (like spell check) in the review
category and a reader can add comments in this part.
4
• View
Different views in which we want the spreadsheet to be displayed can be edited
here. Options to zoom in and out and pane arrangement are available under this
category.
Paste Special:
To perform a "Paste Special" operation using the "Values" option along with
"Transpose" in Microsoft Excel, follow these steps:
1. Copy Data: Select the data that you want to copy.
2. Paste Special - Values, Transpose:
• Click on the cell where you want to paste the transposed values.
• Right-click and choose "Paste Special" from the context menu.
3. Choose Options:
• In the "Paste Special" dialog box, select the "Values" checkbox. This
will paste the content as values instead of formulas.
• Then, also select the "Transpose" checkbox. This will switch the rows
and columns of the copied data when pasting.
4. Click OK: After selecting "Values" and "Transpose", click the "OK" button.
This action will paste the copied data's values transposed—meaning the rows will
become columns and the columns will become rows.
Please note that when you transpose data, the dimensions should match, i.e., if you
copy a range of 3 rows and 4 columns, when pasting using transpose, it will become 4
rows and 3 columns. If the dimensions do not match, Excel may not allow the
operation or may result in an error.
Relative Referencing:
• Relative referencing is the default type of referencing in Excel.
• When you create a formula and refer to a cell, Excel uses relative referencing
by default. For instance, if you write a formula in cell B2 referencing cell A1 as
=A1+B1, when copied to cell B3, it will automatically adjust to =A2+B2.
• In relative referencing, when you copy a formula to another cell, the formula
adjusts the cell references relative to the new location. For example, if you
copy a formula from B2 to C3, a reference to A1 will become B2 in the new
cell (relative to its new position).
Absolute Referencing:
• Absolute referencing is used when you want a cell reference to stay constant,
regardless of where the formula is copied.
5
• To create an absolute reference, you can add a dollar sign ($) before the
column letter, row number, or both. For instance, in a formula, =$A$1, the
dollar signs in front of the column and row make it an absolute reference.
• Absolute references do not change when copied to other cells. For example, if
you have =$A$1+B1 in cell C1 and copy it to C2, the reference to cell A1 will
remain constant in both formulas.
Mixed Referencing:
1. Data Importing:
• Import Data: Bring your data into Excel using the "Get & Transform Data"
feature (Power Query) available in newer versions of Excel. This feature allows
you to connect to various data sources, clean, transform, and load data into
Excel.
2. Data Cleaning and Preparation:
• Remove Duplicates: Use the "Remove Duplicates" function in the Data tab to
eliminate duplicate records.
• Filtering and Sorting: Arrange data using Excel's filtering and sorting
capabilities to focus on specific information.
• Data Validation: Set up data validation rules to ensure data quality and
consistency.
3. Data Analysis:
• Formulas and Functions: Utilize Excel's wide range of built-in functions
(SUM, AVERAGE, COUNT, etc.) to perform calculations on your data.
• Pivot Tables: Create pivot tables to summarize and analyse large datasets
easily. Pivot tables allow you to group, filter, and summarize data in various
ways.
• Charts and Graphs: Visualize your data using different chart types (bar, line,
pie, etc.) available in Excel. Charts can provide insights and make patterns
easier to understand.
4. Statistical Analysis:
• Statistical Functions: Excel offers various statistical functions (STDEV,
CORREL, etc.) to perform statistical analysis on your data.
6
• Data Analysis ToolPak: Enable the Data Analysis ToolPak add-in to access
advanced statistical tools and perform regression, t-tests, ANOVA, etc.
5. Data Visualization and Reporting:
• Dashboard Creation: Build interactive dashboards using charts, pivot tables,
and slicers to present key insights and trends.
• Conditional Formatting: Highlight important data trends using conditional
formatting to make your reports more visually appealing and easier to
understand.
6. What-If Analysis:
• Scenario Manager: Use the Scenario Manager to analyse different scenarios
based on changing variables.
• Goal Seek: Determine the input needed to achieve a specific goal using the
Goal Seek feature.
7. Automation and Macros:
• Macros: Create macros to automate repetitive tasks and streamline your data
analysis process.
8. Documentation and Collaboration:
• Comments and Notes: Use comments and notes to document your analysis
and assumptions.
• Sharing and Collaboration: Share your Excel files securely and collaborate
with others using features like OneDrive or SharePoint.
1. The example below shows how we have used the multiplication formula
manually with the ‘*’ operator.
7
Sample Formula: "=A2*B2"
2. The example below shows how we have used the function - ‘PRODUCT’ to
perform multiplication.
3. Sum: The SUM() function, as the name suggests, gives the total of the selected
range of cell values. It performs the mathematical operation which is addition.
Sum "=SUM(C2:C4)"
4. Average: The AVERAGE() function focuses on calculating the average of the
selected range of cell values. As seen from the below example, to find the avg of
the total sales, you have to simply type in:
AVERAGE =AVERAGE(C2, C3, C4)
It automatically calculates the average, and you can store the result in your desired
location.
5. Count: The function COUNT() counts the total number of cells in a range that
contains a number. It does not include the cell, which is blank, and the ones that
hold data in any other format apart from numeric.
COUNT =COUNT(C1:C4)
MODULUS =MOD(A2,3)
8
8. Left, Right, Mid: The LEFT() function gives the number of characters from the
start of a text string. Meanwhile, the MID() function returns the characters from
the middle of a text string, given a starting position and length. Finally, the right()
function returns the number of characters from the end of a text string.
9. UPPER, LOWER, PROPER: The UPPER() function converts any text string to
uppercase. In contrast, the LOWER() function converts any text string to
lowercase. The PROPER() function converts any text string to proper case, i.e., the
first letter in each word will be in uppercase, and all the other will be in lowercase.
10. IF Formula: The IF() function checks a given condition and returns a particular
value if it is TRUE. It will return another value if the condition is FALSE. In the
below example, we want to check if the value in cell A2 is greater than 5. If it’s
greater than 5, the function will return “Yes 4 is greater”, else it will return “No”.
9
12. SUMIF: The SUMIF() function adds the cells specified by a given condition or
criteria.
13. NestedIF: The goal is to assign a grade to each score in column C according to the
rules in the table in the range F4:G9. One way to do this in Excel is to use a series
of nested IF functions. Generally, nested IFs formulas are used to test more than
one condition and return a different result for each condition.
10
Fig: Hlookup function in Excel
Here, H23 has the lookup value, i.e., Jenson, G1:M5 is the table array, 4 is the row
index number, 0 is for an approximate match.
Once you hit enter, it will return “New York”.
VLOOKUP stands for the vertical lookup that is responsible for looking for a
particular value in the leftmost column of a table. It then returns a value in the same
row from a column you specify.
We will use the below table to learn how the VLOOKUP function works.
If you wanted to find the department to which Stuart belongs, you could use the
VLOOKUP function as shown below:
11
If you hit enter, it will return “Marketing”, indicating that Stuart is from the marketing
department.
12
• Apply Conditional Formatting: Apply conditional formatting to highlight
specific data points meeting certain criteria.
Choose the method that suits your data structure and reporting needs. Depending on
the complexity of your data and reports, a combination of these methods might be the
most efficient way to generate multiple reports in Excel. Experiment with these
techniques to find the most suitable approach for your specific scenario.
13
• Press Ctrl + F or go to the "Home" tab ➜ "Editing" group ➜ Click on
"Find & Select" ➜ Select "Find..." or "Replace..."
2. Find:
•Enter the content you want to find in the "Find what" field.
• Click "Find Next" to locate the first instance of the content. You can
continue clicking this button to find subsequent occurrences.
3. Replace:
• To replace found content, go to the "Replace" tab in the Find and
Replace dialog box.
• Enter the content you want to replace it with in the "Replace with" field.
• Click "Replace" to replace the current instance or "Replace All" to
replace all occurrences at once.
4. Options:
• Use additional options like "Match entire cell contents" or "Match case"
for specific search criteria.
• "Find All" can list all occurrences of the search term in a separate
window.
Advanced Find and Replace Options:
• Find and Replace Formatting:
• Click the "Options" button in the Find and Replace dialog box to access
additional options.
• You can search based on formatting attributes like font color, cell color,
etc., and replace only specific formatting.
• Using Wildcards:
• Enable "Use wildcards" in the Find and Replace dialog to search using
wildcard characters (* and ?) for more complex search patterns.
• Find and Replace within Specific Areas:
• Use the "Within" dropdown to specify whether to search within the
sheet, workbook, or selected cells.
Precautions:
• Be cautious: Before using "Replace All," review the changes carefully, as it
affects all occurrences.
• Undo: Excel has an undo feature (Ctrl + Z) if unintended changes are made.
The Find and Replace function in Excel is a powerful tool that can help you quickly
locate specific content, make changes efficiently, and perform bulk editing tasks
within your worksheet or workbook.
MIS Reporting:
Excel MIS Report is an Act of Information Management System use-case where Excel
is employed as the data storage and management system. Data/Business Analysts and
Business Head/Managers coordinate with each other and generate interactive reports.
14
These reports are sent to the higher authorities or decision-making board to take action
against the reports obtained and ensure they rectify the issues faced, and achieve
improvement.
Automatic row – wise Subtotal:
To automatically calculate row-wise subtotals in Excel, you can use the SUM function
in combination with Excel's features like Tables or the SUM function itself. Here's
how you can do it:
1. Manual Input:
• If you don't want to convert your data to a table, you can manually input
formulas to calculate subtotals.
• For each row where you want a subtotal, use the SUM function to add
up values in desired columns.
• For instance, if your values are in columns B, C, and D, in row 2 you
could use a formula like =SUM(B2:D2) to calculate the subtotal.
2. Auto-Fill:
• Once you have the formula in one row, you can use the AutoFill handle
(bottom right corner of the cell) to drag the formula down to calculate
subtotals for other rows.
Using PivotTables:
1. Create a PivotTable:
• Select your data range.
• Go to the "Insert" tab ➜ "PivotTable."
15
• Drag the desired fields to the Rows area to create a row-wise
breakdown.
• Drag the numeric field to the Values area and change the summary
function to "Sum" to display subtotals.
Choose the method that best fits your data structure and preferences. Using Excel
Tables is often preferred as it dynamically expands when new rows are added and
automatically includes subtotals in the Total Row. However, you can use formulas or
PivotTables to achieve similar results based on your specific requirements.
Conditional Formatting is a feature that allows us to sort the only cells that match
according to the condition that we provide.
It is mostly used to highlight or emphasise certain data and to visualize the data using
bars, scales, etc. Let’s get started now to see what is conditional formatting in excel
with example.
Select Home >> Conditional Formatting tool located in the Styles group, which has
further options. Now click on the arrow.
16
the users, Excel has an option that analyses your data and makes a recommendation of
the chart type that you should use.
1. Pivot Table: A pivot table can cut, slice, summarize and give meaningful results
from the data. Usually, after summarizing the data in Excel, we apply graphs or charts
to present the data graphically to tell the story visually. The pivot table does not
require your special charting techniques; it can build its chart using its data.
2. Bar Chart: A bar chart is one of Excel's primary chart types and a good choice
for categorical data. Bar charts plot data using horizontal bars, so they are very
easy to read because the human eye can easily compare bars. Also, because of
the horizontal layout, bar charts have room to accommodate longer category
names. Bar charts are also versatile.
17
3. Pie Chart: Pie charts can convert one column or row of spreadsheet data into a
pie chart. Each slice of pie (data point) shows the size or percentage of that
slice relative to the whole pie. You have only one data series. None of the data
values are zero or less than zero.
4. Line Chart: A line graph (aka line chart) is a visual that displays a series of
data points connected by a straight line. It is commonly used to visually
represent quantitative data over a certain time period.
5. Box and Whisker Chart: A box and whisker chart shows distribution of data
into quartiles, highlighting the mean and outliers. The boxes may have lines
extending vertically called “whiskers”. These lines indicate variability outside
the upper and lower quartiles, and any point outside those lines or whiskers is
considered an outlier.
18
6. Stock: The stock chart in excel, also known as a high-low-close chart,
represents the conditions of data in markets such as the stock market. The data
redlects the changes in the prices of the stocks. Users can insert it from the
“Insert” tab of the Excel application, and choose from the four types of stock
chart options.
19
8. Scatter plot chart: A scatter plot, sometimes referred to as a scatter chart or
XY chart, compares the relationship between two different data sets. This
makes it easier to visualize two sets of values in your Excel spreadsheet.\
• In this example, we aim to find what will be the rate of interest if the person
wants to pay $5000 per month to settle the loan amount. PMT function is
20
used when you want to calculate the monthly payment you need to pay to
settle the loan amount. Let’s go through this problem in steps to see how we
can calculate the interest rate that will settle a loan of $400,000 by $5,000 a
month payment. PMT formula should now be entered in the cell that is the
Payment cell adjacent. Currently, there is no value in the rate of interest cell,
Excel gives us the payment of $3,333.33 because it assumes the rate of
interest to be 0%. Ignore it.
• Set the monthly payment to -5,000. The deduction in amount signifies the
negative value. Set rate of interest as the changing cell.
21
• Click OK. You will see the goal seek function automatically gives the
interest rate that is required to pay the loan amount.
Outcome
22
Data Validation:
Excel Data Validation is a feature that restricts (validates) user input to a worksheet.
Technically, you create a validation rule that controls what kind of data can be entered
into a certain cell. Here are just a few examples of what Excel's data validation can do:
Allow only numeric or text values in a cell.
Data Validation Rules:
Explore subscription benefits, browse training courses, learn how to secure your
device, and more. Use data validation rules to control the type of data or the values
that users enter into a cell. One example of validation is a drop-down list (also called a
drop-down box or drop-down menu).
Statistical Functions in Excel:
Excel provides an extensive range of Statistical Functions, that perform calculations
from basic mean, median & mode to the more complex statistical distribution and
probability tests. Excel’s statistical functions are powerful tools for data analysis. With
these functions, you can perform calculations, derive insights, and draw conclusions
from your data. Whether you need to calculate descriptive statistics, test hypotheses,
analyze relationships, or generate random data, Excel has a wide range of functions to
support your needs. By utilizing these functions effectively, you can unlock the full
potential of your data and make data-driven decisions with confidence.
Statistical functions in Excel are built-in mathematical formulas that enable users to
perform various statistical calculations and analyses on their data. These functions allow
you to process and analyze numerical data to derive valuable insights and make
informed decisions. Excel provides a comprehensive set of statistical functions that
cover a wide range of statistical techniques. Here are some common types of statistical
functions available in Excel:
• Descriptive Statistics: These functions help in summarizing and describing
data. Examples include AVERAGE, COUNT, SUM, MIN, MAX, MEDIAN,
MODE, STDEV, and VAR.
• Correlation and Regression: Excel offers functions like CORREL,
COVAR, RSQ, and TREND for analyzing relationships between variables,
determining correlation coefficients, and performing regression analysis.
• Hypothesis Testing: Functions such as T.TEST, Z.TEST, and F.TEST allow
users to conduct various hypothesis tests, comparing sample means,
proportions, or variances.
• Probability Distributions: Excel provides functions for common
probability distributions, such as NORM.DIST (normal distribution),
BINOM.DIST (binomial distribution), POISSON.DIST (Poisson
distribution), and more.
• Sampling: Functions like RAND and RANDBETWEEN generate random
numbers or random selections, facilitating simulations and sampling
techniques.
23
These are just a few examples of the extensive range of statistical functions available in
Excel. They provide users with powerful tools for data analysis, modeling, and decision-
making.
1. Gather your data: Ensure you have a list of cash flows associated with the
investment.
2. Open Excel: Open a new or existing spreadsheet where you want to perform
the NPV calculation.
3. Organize your data: List the cash flows in a column, typically starting from
cell A1 or another cell of your choice.
4. Determine the discount rate: Have a specific discount rate in mind. This
could be the cost of capital or the rate of return you expect from the investment.
5. Use the NPV function: In a cell where you want the NPV result to appear,
type the following formula:
=NPV(discount_rate, cash_flow1, cash_flow2, ...)
For example:
=NPV(0.1, B1:B5)
Here, 0.1 is the discount rate (10% in this case), and B1:B5 represents the
range of cells where your cash flows are listed.
6. Press Enter: After entering the formula, press Enter to compute the NPV.
24
Remember a few important points:
Excel's NPV function calculates the present value of a series of cash flows based on a
discount rate. It doesn't include the initial investment in the result, so if your first cash
flow represents the initial investment, you might need to add it separately to the NPV
result.
Always review your inputs and understand the implications of the discount rate you're
using, as it significantly impacts the NPV outcome.
• XNPV: The XNPV function in Excel is an extension of the NPV function that
allows you to calculate the net present value of cash flows that are not
necessarily periodic. XNPV considers specific dates associated with each cash
flow, allowing for more accurate NPV calculations when cash flows occur at
irregular intervals.
1. Organize your data: Create a table that includes the dates of each cash flow
and their respective amounts.
2. Open Excel: Open a new or existing spreadsheet where you want to perform
the XNPV calculation.
3. Input your data: List the dates of each cash flow in one column and the
corresponding cash flow amounts in another column.
4. Determine the discount rate: Have a specific discount rate in mind. This
could be the cost of capital or the rate of return you expect from the investment.
5. Use the XNPV function: In a cell where you want the XNPV result to appear,
type the following formula:
25
Ensure that the dates are in chronological order and are in a format that Excel
recognizes as dates (e.g., mm/dd/yyyy or dd/mm/yyyy). Also, remember that XNPV
uses specific dates, unlike the NPV function, which assumes equal time intervals
between cash flows.
Using XNPV over NPV is advisable when dealing with cash flows occurring on
specific dates rather than regular intervals, as it provides a more accurate calculation
of the net present value in such scenarios.
• EMI: EMI (Equated Monthly Installment) is a fixed payment amount made by
a borrower to a lender at a specified date each calendar month. It is commonly
used in loans, such as home loans, car loans, personal loans, etc., where the
borrower repays the loan amount along with interest in regular installments
over a defined period.
In Excel, you can calculate the EMI using the PMT function. Here's how you can
do it:
26
Excel will return the EMI value based on the provided loan details.
Remember to adjust the formula based on your specific loan amount, interest rate, and
tenure. Additionally, consider any additional fees or charges that might affect the loan
calculation.
• IRR: The IRR (Internal Rate of Return) function in Excel is used to calculate
the internal rate of return for a series of cash flows occurring at regular
intervals. The IRR is the discount rate that makes the net present value (NPV)
of these cash flows equal to zero. It helps in assessing the potential profitability
of an investment.
1. Organize your cash flow data: List the cash flows in chronological order. The
initial investment should be negative (outflow), and subsequent cash inflows
should be positive.
2. Open Excel: Open a new or existing spreadsheet.
3. Input your data: Enter your cash flow data in a column, starting from a cell,
say A1.
4. Use the IRR function: In a cell where you want the IRR result to appear, type
the following formula:
=IRR(range_of_cash_flows)
For example:
=IRR(A1:A10)
1. Here, A1:A10 represents the range of cells where your cash flows are listed.
2. Press Enter: After entering the formula, press Enter to compute the IRR.
Excel will calculate and display the internal rate of return for the given cash flow
series.
• The cash flows should occur at regular intervals (if not, consider using the
XIRR function for irregular intervals).
• Ensure the initial investment is represented as a negative value.
• Excel's IRR function assumes that the cash flows are at regular intervals, and
there is at least one positive and one negative cash flow.
The IRR function is a useful tool to evaluate the potential return of an investment.
However, it's important to interpret the IRR result in context and consider other
factors such as risk, cost of capital, and the nature of the investment before making
decisions based solely on the IRR value.
27
To construct contingency table, compute conditional, marginal probability and
use of Bayes theorem:
Constructing a contingency table, computing conditional and marginal probabilities,
and applying Bayes' theorem in Excel involves a few steps:
Let's assume you have categorical data representing two variables: Variable A
and Variable B.
28
• P(B∣A) is the conditional probability of B given A.
• P(B) is the marginal probability of B.
Finally, interpret the results by understanding how variables A and B are related based
on the conditional and marginal probabilities. The conditional probabilities calculated
using Bayes' Theorem can provide insights into the relationship between the variables
given certain conditions.
29
SUMMARY
1. Spreadsheet Software:
• Excel is a widely used spreadsheet software developed by Microsoft,
part of the Microsoft Office suite.
• It offers a grid interface composed of rows and columns, where users
can organize, manipulate, and analyze data efficiently.
2. Data Organization and Manipulation:
• Excel is renowned for its ability to organize data into spreadsheets,
facilitating tasks like data entry, sorting, filtering, and data manipulation.
• Users can perform mathematical calculations, create formulas, and
utilize built-in functions to analyze and manipulate data easily.
3. Formulas and Functions:
• Excel provides an extensive library of formulas and functions, ranging
from basic arithmetic operations (SUM, AVERAGE, MAX, MIN) to
complex calculations (IF, VLOOKUP, INDEX-MATCH, etc.).
• Formulas and functions enable users to perform calculations, make
decisions, and perform data lookups efficiently.
4. Charts and Graphs:
• Excel offers various chart types (bar, line, pie, etc.) to visually represent
data, making it easier to understand trends, patterns, and relationships
within the data.
• Users can create and customize charts using Excel's charting tools.
5. Data Analysis Tools:
• Excel includes tools for data analysis such as sorting, filtering, pivot
tables, and conditional formatting.
• These tools aid in summarizing and visualizing data, identifying trends,
and extracting meaningful insights.
6. Data Import and Export:
• Excel supports importing data from various sources like text files,
databases, web queries, and other Excel workbooks.
• Users can also export Excel data to different file formats for sharing or
integration with other software.
7. Data Visualization and Reporting:
• Excel's features enable users to create professional-looking reports,
dashboards, and presentations using data visualization tools and
formatting options.
30
31