0% found this document useful (0 votes)
509 views28 pages

A Comprehensive Guide On Microsoft Excel For Data Analysis

This document provides an overview of using Microsoft Excel for data analysis. It discusses essential Excel functions for data analysis such as SUMIFS, AVERAGEIFS, COUNTIFS, VLOOKUP, and IF. It also covers methods for data analysis in Excel including using ranges and tables, data cleaning, conditional formatting, and sorting and filtering data. The document is intended as a guide for beginners to perform basic data analysis tasks in Excel.

Uploaded by

Khushi Budhiraja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
509 views28 pages

A Comprehensive Guide On Microsoft Excel For Data Analysis

This document provides an overview of using Microsoft Excel for data analysis. It discusses essential Excel functions for data analysis such as SUMIFS, AVERAGEIFS, COUNTIFS, VLOOKUP, and IF. It also covers methods for data analysis in Excel including using ranges and tables, data cleaning, conditional formatting, and sorting and filtering data. The document is intended as a guide for beginners to perform basic data analysis tasks in Excel.

Uploaded by

Khushi Budhiraja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

A Comprehensive Guide on Microsoft Excel for Data Analysis

BE G I NNE R E XC E L

This article was published as a part of the Data Science Blogathon.

Table of Contents

Introduction
15 Essential Excel Data Analysis Functions
Methods for Data Analysis in Excel
Data Analysis with Microsoft Excel
Simple Linear Regression Model in Microsoft Excel
Dataset

Introduction to Excel for Data Analysis

Data analysis is the process of cleansing, transforming, and analyzing raw data to obtain usable, relevant
information that can assist businesses in making educated decisions. By giving relevant insights and data,
which are commonly presented in charts, photos, tables, and graphs, the technique helps to lessen the
risks associated with decision-making.

Data analytics encompasses not just data analysis, but also data collecting, organization, storage, and the
tools and techniques used to delve deeper into data, as well as those used to present the findings, such as
data visualization tools. On the other hand, data analysis is concerned with the process of transforming
raw data into meaningful statistics, information, and explanations.

Data visualization is an interdisciplinary field concerned with the depiction of data graphically. When the
data is large, such as in a time series, it is a very effective manner of conveying.

The mapping establishes how these components’ characteristics change in response to the data. A bar
chart, in this sense, is a mapping of a variable’s magnitude to the length of a bar. Mapping is a basic
component of Data visualization since the graphic design of the mapping can negatively affect the reading
of a chart.

The iterative Data Analysis Process is comprised of the following phases:

• Specification of Data Requirements

• Data Gathering

• Data Processing

• Data Cleaning

• Data Analysis

• Data Communication

Data analysis is a valuable skill that can help you make better judgments. Microsoft Excel is one of the
most used data analysis programs, with the built-in pivot tables being the most popular analytic tool.

Microsoft Excel allows you to examine and interpret data in a variety of ways. The information could come
from several different places. A variety of formats and conversions are available for the data. Conditional
Formatting, Ranges, Tables, Text functions, Date functions, Time functions, financial functions, Subtotals,
Quick Analysis, Formula Auditing, Inquire Tool, What-if Analysis, Solvers, Data Model, PowerPivot,
PowerView, PowerMap, and other Excel commands, functions, and tools can all be used to analyse it.

15 Essential Excel Data Analysis Functions

Excel has hundreds of functions and trying to match the proper formula with the right kind of data analysis
can be overwhelming. It is not necessary for the most valuable functions to be difficult. You’ll wonder how
you ever lived without fifteen easy functions that will increase your ability to interpret data.

1. Concatenate

When conducting data analysis, the formula =CONCATENATE is one of the simplest to understand but
most powerful. Text, numbers, dates, and other data from numerous cells can be combined into a single
cell.

SYNTAX = CONCATENATE (text1, text2, [text3], …)


2. Len()

In data analysis, LEN is used to show the number of characters in each cell. It’s frequently utilised when
working with text that has a character limit or when attempting to distinguish between product numbers.

SYNTAX = LEN (text)

3. Days()

The number of calendar days between two dates is calculated using this function = DAYS.
SYNTAX =DAYS (end_date, star t_date)

4. Networkdays

The number of weekends is automatically excluded when using the function. It’s classified as a Date/Time
Function in Excel. The net workday’s function is used in finance and accounting for determining employee
benefits based on days worked, the number of working days available throughout a project, or the number
of business days required to resolve a customer problem, among other things.

SYNTAX = NETWORKDAYS (star t_date, end_date, [holidays])


5. Sumifs()

One of the “must-know” formulas for a data analyst is =SUMIFS. =SUM is a familiar formula, but what if you
need to sum data based on numerous criteria? It’s SUMIFS.

SYNTAX = SUMIFS (sum_range, range1, criteria1, [range2], [criteria2], …)

6. Averageifs()

AVERAGEIFS, like SUMIFS, lets you take an average based on one or more parameters.

SYNTAX = AVERAGEIFS (avg_rng, range1, criteria1, [range2], [criteria2], …)


7. Countsifs()

The COUNTIFS function is yet another powerful Excel data analysis tool. It’s a lot like the SUMIFS function.
The COUNTIFS function counts the number of values that satisfy a set of conditions. As a result, it doesn’t
need a sum range like SUMIFS.

SYNTAX = COUNTIFS (range, criteria)

8. Counta()

COUNTA determines whether a cell is empty or not. You’ll come across incomplete data sets daily as a data
analyst. Without needing to restructure the data, COUNTA will allow you to examine any gaps in the
dataset.

SYNTAX = COUNTA (value1, [value2], …)


9.Vlookup()

The acronym VLOOKUP stands for ‘Vertical Lookup.’ It’s a function that tells Excel to look for a specific
value in a column (the
so-called ‘table array’) to return a value from another column in the same row.

SYNTAX = VLOOKUP (lookup_value, table_array, column_index_num, [range_lookup])

10. Hlookup()

“Horizontal” is represented by the letter H in HLOOKUP. It looks for a value in the top row of a table or an
array of values, then returns a value from a row you specify in the table or array in the same column. When
your comparison values are in a row across the top of a data table and you wish to look down a specific
number of rows, use HLOOKUP. When your comparison values are in a column to the left of the data you
wish to find, use VLOOKUP.

SYNTAX = HLOOKUP (lookup_value, table_array, row_index, [range_lookup])

11. If()

The IF function comes in handy a lot. We can use this function to automate decision-making in our
spreadsheets. We could use IF to make Excel conduct a different computation or show a different value
based on the results of a logical test (a decision). The IF function will ask you to run a logical test, as well
as what action to take if the test is true and what action to take if the test is false.

SYNTAX = IF (logical_test, [value_if_true], [value_if_false])


12. Iferror()

We could display a more informative error than Excel does, or even execute an alternative computation, by
using IFERROR. Two things are required for the IFERROR function to work. What value should be checked
for an error and what action should be taken instead.

SYNTAX = IFERROR (value, value_if_error)

13. Find/Search

The FIND function in Excel returns the position of one text string within another (as a number). FIND
delivers a #VALUE error if the text cannot be located.

However, a =SEARCH for “Bigger” will return results for Bigger or bigger, broadening the scope of the query.
This is very helpful when searching for anomalies or unique identifiers.

SYNTAX = FIND (find_text, within_text, [star t_num])

SYNTAX = SEARCH (find_text, within_text, [star t_num])


14. Left/Right

=LEFT and =RIGHT are simple and efficient ways for retrieving static data from cells. =RIGHT returns the
“x” number of characters from the cell’s end, while =LEFT returns the “x” number of characters from the
cell’s beginning. In the sample below, the consumer’s area code is extracted from their phone number
using =LEFT, while the last four digits are extracted using =RIGHT.

SYNTAX = LEFT (text, [num_chars])

SYNTAX = RIGHT (text, [num_chars])


15. Rank()

Even though =RANK is an old Excel function, it is nevertheless useful for data analysis. =RANK is a quick
way to show how values in a dataset rank in ascending or descending order. RANK is being utilised in this
case to determine which clients order the most stuff.

SYNTAX = RANK (number, ref, [order])


Some of the Methods for Data Analysis in Excel are:

1) Ranges and Tables

The information you have can be in the form of a table or a range. Whether the data is in a range or a table,
certain actions can be performed on it. Certain procedures, however, are more successful when data is
stored in tables rather than ranges. There are some operations that are only applicable to tables. You will
also gain an understanding of how to analyze data in ranges and tables. You’ll learn how to name ranges,
how to utilise them, and how to manage them. The same may be said for table names.

2) Data Cleaning – Text Functions, Dates and Times

Before moving on to data analysis, you must clean and organize the data you’ve gathered from multiple
sources. The following approaches can be used to clean data in Excel.

• With Text Functions

• Containing Date Values

• Containing Time Values

3) Conditional Formatting

Conditional formatting instructions in Excel allow you to colour cells or fonts, as well as place symbols
next to values in cells, based on predetermined criteria. This aids in visualizing the most important values.

It allows you to highlight cells with a different colour depending on the value you set to them. Rules, data
bars, colour scales, icon Sets, finding duplicates, shading alternate rows, comparing two lists, conflicting
rules, checklists, and creating Heat Maps all benefit from conditional formatting.
4) Sorting and Filtering

You may need to sort and/or filter your data to prepare for data analysis and/or to display specific critical
data. You can perform the same thing in Excel using the simple sorting and filtering options. Sort and Filter
are the most used Excel functions. Within columns, sorting can be done in ascending or descending order.
Lists can be sorted by colour, reversed, or randomly generated. Filters are used to display data that meets
requirements. Number and Text Filters, Date Filters, Advanced Filter, Data Form, Remove Duplicates,
Outlining Data, and Subtotal are some of the options.

5) Subtotals with Ranges

PivotTables are commonly used to summarize data, as you are aware. However, Subtotals with Ranges is
another Excel function that allows you to group/ungroup data and summarize data in ranges in a few
simple steps.

6) QuickAnalysis

You can quickly execute numerous data analysis activities and create quick representations of the results
with Excel’s Quick Analysis function.

7) Understanding Lookup Functions

Excel Lookup Functions allow you to search through a large amount of data for data values that fit a set of
criteria. Vlookup and Hlookup are two different types of lookup engines. Analysts use Vlookup and
Hlookup to discover a value in a database and retrieve other values that correspond to that value. Data
analysts frequently use it to integrate and consolidate useful data from several excel sheets.

8) PivotTables

PivotTables allow you to summarise data and create dynamic reports by modifying the PivotTable’s
contents. You can use pivot tables to extract important data from a vast dataset. This is the most practical
method of data analysis. After inserting a Pivot Table, you can drag fields, sort, filter, or change the
summary calculation. Two-dimensional Pivot Tables are also possible. Group Pivot Table Items, Multi-level
Pivot Table, Frequency Distribution, Pivot Chart, Slicers, Update Pivot Table, Calculated Field/Item, and
GetPivotData are all important functions.

9) Data Visualization in Excel

Charts are simple to make and display data in a variety of ways, making them more helpful than a sheet.
You can make a chart, modify its type, adjust the row or column, the legend location, and the data labels.
Column Chart, Line Chart, Pie Chart, Bar Chart, Area Chart, Scatter Plot are some of the different types of
charts provided in Microsoft Excel.

10) Data Validation


Only valid values may need to be entered into cells. Otherwise, they risk producing erroneous results.
Using data validation commands, you can rapidly set up data validation values for a cell, an input message
prompting the user on what should be typed in the cell, validate the values provided against the supplied
criteria, and display an error message in the case of incorrect entries. It may be necessary to insert only
valid values into cells. Otherwise, they could result in inaccurate calculations. You may quickly set up data
validation values for a cell, an input message prompting the user on what should be typed in the cell,
validate the values entered against the given criteria, and display an error message in the case of wrong
entries using data validation commands.

11) Financial Analysis

Excel has several financial features. However, you may learn to employ a combination of these functions to
solve common situations that need financial analysis.

12) Working with Multiple Worksheets

It’s possible that you’ll need to run multiple identical calculations in different worksheets. Instead of
duplicating these calculations in each worksheet, you can complete them in one and have them display in
all of the others. You may also use a report worksheet to compile the data from the multiple worksheets.

13) Formula Auditing

When you utilise formulas, you should double-check that they are working correctly. Formula Auditing
commands in Excel assist you in tracing previous and dependent variables as well as error checking.

14) What-if Analysis

You can extract critical data from a large dataset using pivot tables. This form of data analysis is the most
practical. You can drag fields, sort, filter, and adjust the summary calculation after a Pivot Table has been
inserted. Pivot Tables can also be made in two dimensions. The functions of Group Pivot Table Items,
Multi-level Pivot Table, Frequency Distribution, Pivot Chart, Slicers, Update Pivot Table, Calculated
Field/Item, and GetPivotData are all essential.

Data Analysis with Microsoft Excel

Step 1 – DATA CLEANING USING TEXT TO COLUMN

 
 

SELECT FIRST  COLUMN AND THEN GO TO THE DATA AND SELECT “TEXT TO COLUMN”. SELECT
DELIMITED FROM THE APPEARING WINDOW AND PRESS NEXT.

THEN, TO SEPARATE THE DATA, SELECT DELIMITOR/SEPERATOR IN ACCORDANCE WITH THE DATASET
REQUIREMENTS. THE REQUIRED DELIMITOR FOR THE GIVEN DATASET WAS  ” ; “.
AFTER CLEANING THE DATASET, CHECK FOR THE DATA PREVIEW AND FINISH THE PROCESS.

FINALLY, YOU WILL BE ABLE TO GET THE CLEANED DATA.


STEP 2- CONDITIONAL FORMATING

By using Rules, you can specify any number of formatting conditions.

• Highlight cells rules can help you find the rules that are appropriate for you.

• Rules for the top and bottom

You can even make up your own set of rules. You can

• Add a rule

• Remove a rule that already exists.

• Keep track of the defined rules.

SELECT THE COLUMN FOR CONDITIONAL FORMATTING AND THEN SELECT “CONDITIONAL FORMATTING”
OPTION FROM THE HOME TAB. MANY RULES WILL BE VISIBLE UNDER CONDITIONAL FORMATTING, SO
SELECT THE RULE YOU WANT TO APPLY TO THE COLUMN.
SELECT THE REQUIRED VALUE AND THE COLOR TO BE APPLIED ON THE CELLS, AFTER SATISFYING THE
RULE.  

CLICK FINISH WHEN YOU’VE COMPLETED ALL OF THE REQUIRED DETAILS.


Step 3 – SORTING AND FILTERING

FOR ADDING FILTER TO A COLUMN SELECT THE COLUMN, THEN SELECT THE FILTER OPTION PRESENT
UNDER DATA.

NOW YOU WILL HAVE A DROPDOWN OPTION FOR THAT COLUMN AFTER ADDING THE FILTER OPTION TO
THAT COLUMN. CLICK ON THAT DROPDOWN MENU TO SEE ALL OF THE AVAILABLE OPTIONS. YOU CAN
SELECT THE REQUIRED FILTER FOR THE COLUMN AS WELL AS YOU CAN SORT THE COLUMN.
 

FOR EXAMPLE, IF YOU ONLY WANT CARS WITH EIGHT CYLINDERS, THEN TO DO SO, FROM THE
DROPDOWN OPTION, SELECT “8” AND CLICK OK TO COMPLETE.

YOU WILL BE ABLE TO SEE CARS WITH 8 CYLINDERS AFTER SELECTING THE FILTER CONDITION.
 

EXAMPLE: NOW WE NEED TO ORDER THE CARS IN ASCENDING ORDER BASED ON THEIR WEIGHT.

TO DO SO, SELECT “SORT SMALLEST TO LARGEST” FROM THE DROPDOWN OPTION.

THE CARS ARE NOW ORDERED IN ASCENDING ORDER BASED ON THEIR WEIGHT.
STEP 4 – PIVOT TABLES 

PRESS CNTRL-A, THEN GO TO INSERT AND CLICK ON THE PIVOT TABLE OPTION. A DIALOGUE BOX WILL
OPEN UNDER WHICH YOU MUST SELECT “NEW WORKSHEET” FOR THE PIVOT TABLE TO BE PLACED AND
THEN CLICK OK.
 

AFTER COMPLETING THE ABOVE STEP, YOUR EXCEL FILE WILL INCLUDE A NEW SHEET LIKE THIS. FIELDS
FROM YOUR DATA AND OPTIONS FOR PIVOT TABLE AS FILTERS, ROWS, VALUES, AND COLUMNS ARE ON
THE RIGHT SIDE OF THE SHEET.

DRAG AND DROP THE REQUIRED FIELDS AS PER THE OPTIONS PROVIDED BY THE PIVOT TABLE FEATURE
TO MAKE THE PIVOT TABLE.

FOR EXAMPLE, WE WOULD LIKE TO CHECK THE SUM OF CYLINDERS FOR ALL THE CARS THAT ARE
DIFFERENTIATED BY THEIR ORIGIN.
 

FOR EXAMPLE, WE WOULD LIKE TO CHECK THE “SUM OF HORSEPOWER” FOR ALL THE CYLINDERS BASED
ON THEIR ORIGIN.

WE CAN DEDUCT THE FOLLOWING FROM THE ABOVE STEP: –

• CARS WITH 3 CYLINDERS ARE ORIGINATED ONLY IN “JAPAN.”

• THE MAXIMUM HORSEPOWER OF THE CARS WITH 4 CYLINDERS IS ORIGINATED FROM “US.”

• CARS WITH 5 CYLINDERS ARE ORIGINATED ONLY IN “EUROPE”.

• THE MAXIMUM HORSEPOWER OF THE CARS WITH 6 CYLINDERS IS ORIGINATED FROM “US.”
• CARS WITH 8 CYLINDERS ARE ORIGINATED ONLY IN “US”.

Simple Linear Regression Model in Microsoft Excel

1. From the toolbar, choose “Data.” The menu “Data” appears.

2. Select “Data Analysis” from the drop-down menu. The dialogue box Data Analysis – Analysis Tools
appears.
3. Select “Regression” from the menu and click “OK.”
 

4. In the Regression dialogue box, pick the dependent variable data in the “Input Y Range” box (cardio
column).
5. Select the independent variable data in the “Input X Range” box.
6. Select “Labels” from the drop-down menu.
7. Select the output range by clicking in the Output Range box.
8. Select “Residuals” from the drop-down menu.
9. To complete the process, click OK.

10.  Finally, you’ll obt ain an excel spreadsheet w it h a simple linear regression model. You can now evaluat e t he result s.

 
The R2 number, also known as the coefficient of determination, indicates how well the regression model
fits the data by measuring the proportion of variance in the dependent variable explained by the
independent variable. The R2 value is a number that runs from 0 to 1, with a greater number indicating a
better match. The p-value, also known as the probability value, is a number that goes from 0 to 1 and
shows whether or not a test is significant. A smaller p-value, in contrast to the R2 value, is preferable
because it suggests a correlation between the dependent and independent variables.

Dataset For Excel Data Analysis

Dataset used for Data Analysis in Microsoft Excel

It’s a dataset of roughly 400 cars with eight different attributes, including car name, mpg, cylinders,
displacement, horsepower, acceleration, weight, origin, and model.

https://perso.telecom-paristech.fr/eagan/class/igr204/datasets

Dataset used for Simple Linear Regression Model in Microsoft Excel

It’s a dataset of cardiovascular patients with eleven different independent variables, including gender, age,
height, weight etc.

https://www.kaggle.com/sulianova/cardiovascular-disease-dataset?select=cardio_train.csv

End Notes:

Thank you for following with me all the way to the end. By the end of this article, we should have a good
understanding of Data Analysis and Data Visualization in Microsoft Excel.

I hope you found this article useful. Please feel free to distribute it to your peers.

Author

Hello, I’m Gunjan Agarwal from Gurugram, and I earned a master’s degree in Data Science from Amity
University in Gurgaon. I enthusiastically participate in Data Science hackathons, blogathons, and
workshops.

I’d like to connect with you on LinkedIn. Mail me here for any queries.

The media shown in this ar ticle is not owned by Analytics Vidhya and are used at the Author’s discretion

Article Url - https://www.analyticsvidhya.com/blog/2021/11/a-comprehensive-guide-on-microsoft-excel-


for-data-analysis/

Gunjan Agarwal

You might also like