0% found this document useful (0 votes)
2 views27 pages

Data Analytics Using Spreadsheets I-Reference Notes

The document provides an overview of data analytics using spreadsheets, emphasizing the importance of transforming raw data into actionable insights for business decision-making. It distinguishes between data analysis and data analytics, outlines the types of data analytics (descriptive, diagnostic, predictive, prescriptive), and details the phases of data analysis, including defining questions, data collection, cleaning, analysis, and visualization. Additionally, it discusses methods of data analysis in spreadsheets and categorizes data into quantitative and qualitative types.

Uploaded by

barekarakanksha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views27 pages

Data Analytics Using Spreadsheets I-Reference Notes

The document provides an overview of data analytics using spreadsheets, emphasizing the importance of transforming raw data into actionable insights for business decision-making. It distinguishes between data analysis and data analytics, outlines the types of data analytics (descriptive, diagnostic, predictive, prescriptive), and details the phases of data analysis, including defining questions, data collection, cleaning, analysis, and visualization. Additionally, it discusses methods of data analysis in spreadsheets and categorizes data into quantitative and qualitative types.

Uploaded by

barekarakanksha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Data Analytics using Spreadsheets I Theory

Introduction to Data Analytics and Spreadsheet Basics


Most companies collect loads of data all the time. This data is in its raw form. Data analytics
is the process of analyzing raw data in order to draw out meaningful, actionable insights,
which are then used to inform and drive smart business decisions.
A data analyst will extract raw data, organize it, and then analyze it, transforming it from
incomprehensible numbers into information. Having interpreted the data, the data analyst
will then pass on their findings in the form of suggestions or recommendations about what
the company’s next steps should be.
Data analytics is a form of business intelligence, used to solve specific problems and
challenges within an organization. Data analytics helps to make sense of the past and to
predict future trends and behaviors; rather than basing decisions and strategies on
guesswork, one makes informed choices based on what the data is telling. Armed with the
insights drawn from the data, businesses and organizations are able to develop a much
deeper understanding of their audience, their industry, and their company as a whole—and,
as a result, are much better equipped to make decisions and plan ahead.
Data analytics is performed:
• To identify trends and patterns.
• To seek out new opportunities.
• To determine possible risks and benefits.
• To make a strategy of action.

Definition of Data Analysis and Data Analytics


Data Analysis: The process of cleaning, manipulating, modeling, and questioning data to
discover relevant information is known as data analysis. Data analysis is a vital part of data
analytics. It helps us identify solutions by providing information.
Data Analytics: Data analytics is the process of analyzing raw data in order to draw out
meaningful, actionable insights, which are then used to inform and drive smart business
decisions. Data analytics is the broad field of using data and tools to make business decisions.
Data analytics includes all the steps you take, both human- and machine-enabled, to discover,
interpret, visualize, and tell the story of patterns in your data in order to drive business
strategy and outcomes.
Difference between Data Analysis and Data Analytics

Data Analysis Data Analytics


Data analysis is a specialized type of Data analytics is a traditional or generic type
analytics used in businesses to evaluate data of analytics used in enterprises to make
and gain insights. data-driven decisions.
Data analysis is an important element in the Data Analytics is a more general term which
data analytics life cycle. It is a subset of Data include data analysis. It is superset of Data
Analytics. Analysis

1|Page
Data Analytics using Spreadsheets I Theory

It cannot be used to find unknown relations. With the use of this, one might discover new
relationships.
Data analysis needs to be defined in the Data analytics consists of multiple stages
beginning as it involves cleaning and including collecting data and evaluating
transforming the raw data. business data.
Example: Example:
A new trader in stock market, researching As a result of the stock traders newfound
share-market and trend records to get a understanding of the stock pattern, he can
sense of what’s going on in the market. This now estimate the stock’s future market
technique includes data analysis. price and purchase some shares. This serves
as an example of a data analytics process.

Types of data analytics


The four main types of data analytics are: descriptive, diagnostic, predictive, and prescriptive.

• Descriptive analytics is a simple, surface-level type of analysis that looks at what has
happened in the past. The two main techniques used in descriptive analytics are data
aggregation and data mining—so, the data analyst first gathers the data and presents
it in a summarized format (that’s the aggregation part) and then “mines” the data to
discover patterns. The data is then presented in a way that can be easily understood
by a wide audience (not just data experts). It’s important to note that descriptive
analytics doesn’t try to explain the historical data or establish cause-and-effect
relationships; at this stage, it’s simply a case of determining and describing the “what”.
Descriptive analytics draws on the concept of descriptive statistics.
• Diagnostic analytics: While descriptive analytics looks at the “what”, diagnostic
analytics explores the “why”. When running diagnostic analytics, data analysts will
first seek to identify anomalies within the data—that is, anything that cannot be
explained by the data in front of them. For example: If the data shows that there was
a sudden drop in sales for the month of March, the data analyst will need to
investigate the cause. To do this, they’ll embark on what’s known as the discovery
phase, identifying any additional data sources that might tell them more about why
such anomalies arose. Finally, the data analyst will try to uncover causal
relationships—for example, looking at any events that may correlate or correspond
with the decrease in sales. At this stage, data analysts may use probability theory,
regression analysis, filtering, and time-series data analytics.
• Predictive analytics: Just as the name suggests, predictive analytics tries to predict
what is likely to happen in the future. This is where data analysts start to come up with
actionable, data-driven insights that the company can use to inform their next steps.
Predictive analytics estimates the likelihood of a future outcome based on historical
data and probability theory, and while it can never be completely accurate, it does
eliminate much of the guesswork from key business decisions. Predictive analytics can
be used to forecast all sorts of outcomes—from what products will be most popular
at a certain time, to how much the company revenue is likely to increase or decrease

2|Page
Data Analytics using Spreadsheets I Theory

in a given period. Ultimately, predictive analytics is used to increase the business’s


chances of “hitting the mark” and taking the most appropriate action.
• Prescriptive analytics: Building on predictive analytics, prescriptive analytics advises
on the actions and decisions that should be taken. In other words, prescriptive
analytics shows you how you can take advantage of the outcomes that have been
predicted. When conducting prescriptive analysis, data analysts will consider a range
of possible scenarios and assess the different actions the company might take.
Prescriptive analytics is one of the more complex types of analysis, and may involve
working with algorithms, machine learning, and computational modeling procedures.
However, the effective use of prescriptive analytics can have a huge impact on the
company’s decision-making process and, ultimately, on the bottom line.

Phases of Data Analysis


The five main steps in data analysis are:
Step 1: Define the question(s) you want to answer: The first step is to identify why you are
conducting analysis and what question or challenge you hope to solve. At this stage, you’ll
take a clearly defined problem and come up with a relevant question or hypothesis you can
test. You’ll then need to identify what kinds of data you’ll need and where it will come from.
For example: A potential business problem might be that customers aren’t subscribing to a
paid membership after their free trial ends. Your research question could then be “What
strategies can we use to boost customer retention?”
Step 2: Collect the data: With a clear question in mind, you’re ready to start collecting your
data. Data analysts will usually gather structured data from primary or internal sources, such
as CRM software or email marketing tools. They may also turn to secondary or external
sources, such as open data sources. These include government portals, tools like Google
Trends, and data published by major organizations such as UNICEF and the World Health
Organization.
Step 3: Clean the data: Once you’ve collected your data, you need to get it ready for
analysis—and this means thoroughly cleaning your dataset. Your original dataset may contain
duplicates, anomalies, or missing data which could distort how the data is interpreted, so
these all need to be removed. Data cleaning can be a time-consuming task, but it’s crucial for
obtaining accurate results.
Step 4: Analyze the data: Now for the actual analysis! How you analyze the data will depend
on the question you’re asking and the kind of data you’re working with, but some common
techniques include regression analysis, cluster analysis, and time-series analysis (to name just
a few).
Step 5: Visualize and share your findings: This final step in the process is where data is
transformed into valuable business insights. Depending on the type of analysis conducted,
you’ll present your findings in a way that others can understand—in the form of a chart or
graph, for example. At this stage, you’ll demonstrate what the data analysis tells you in
regards to your initial question or business challenge, and collaborate with key stakeholders

3|Page
Data Analytics using Spreadsheets I Theory

on how to move forwards. This is also a good time to highlight any limitations to your data
analysis and to consider what further analysis might be conducted.

Methods of Data Analysis in Spreadsheets


Spreadsheets are popularly used for data analysis because they have features and functions
that can be used to clean, aggregate, pivot, and graph data. Some of the features and
functions for data analysis in Excel are as follows:

• Pivot tables and pivot charts: Pivot tables provide a simple approach to reformatting
columns and rows, transforming them into groupings, statistics, or summaries. Pivot
charts visualize the data expressed in a pivot table, giving us insight at a glance.
• Conditional formatting: Conditional formatting allows you to highlight or hide cells
based on a rule you specify. It is useful for highlighting outliers, duplicates, or patterns
in data.
• Charts: Charts allow to illustrate workbook data graphically, which makes it easy to
visualize comparisons and trends.
• Remove duplicates: Data is often messy, so it is important that you know how to
remove duplicates. Using conditional formatting rules, you can highlight the duplicate
data to review it before deleting it. Spreadsheets have a remove duplicate feature.
• LOOKUP is used to lookup a value from a range. It essentially allows you to use a
selected range as a lookup table and return a “looked up” result to a cell. This is
essential to combine data from different data sources.
• IFERROR: The IFERROR function is used to create a custom error message when a
formula results in an error. This is useful in cleaning data.
• COUNTBLANK: The COUNTBLANK function is an important function for data cleaning
in analytics because many machine learning algorithms are sensitive to blank values.
By knowing how many values are blank, you have a better understanding of how to
approach them. For example, if a lot of values are blank you should drop the column.
If few values are blank you should assign a value to fill the blank. COUNTBLANK counts
the number of empty cells in a range.

Understanding Data: Data and types of data


Why do we need data?
We need data as evidence in support of the statements we make. For example: if we say “This
college has more female students compared to male students” then we need to prove the
statement with relevant data giving the number of female students and male students.
For any study undertaken, there will always be a need to provide various types of evidence in
support of the statements and propositions made. Similarly, we need evidence for the
conclusions contained in essays, presentations, reports or dissertations. The basic constituent
of this evidence base is known as data. The main function of data analysis is to develop
methods of transforming raw data into usable information.

4|Page
Data Analytics using Spreadsheets I Theory

Certain terms related to data


Given a data set as follows:

Sr. No. Name Roll No Age Gender


1 Shalini Sharma 2303008 17 Female
2 Rahul Singh 2303035 18 Male
3 Sarita Fernandes 2303067 19 Female
4 Anita Naik 2303105 18 Female

Data Element: A data element is simply the recorded observation of a specific property
possessed by a member of a particular group of individuals or objects. For Example: Male is a
data element which corresponds to the gender of the second person Rahul Singh; 19 is a data
element which corresponds to the Age of the third person Sarita Fernandes.
Record: The data corresponding to a row or line is called a record. For Example: (2, Rahul
Singh, 2303035, 18, Male) is one record.
Field variable: The header of a column is referred to as a field or attribute. For Example: Name
is a field, Roll No is a field, Age is a field, Gender is a field.
Data Set: The data elements are consistently gathered together to obtain a data set. Here the
data elements are corresponding to a certain object. For example, details corresponding to a
student can be one data set. Details corresponding to car details could be another data set.
One should not combine details corresponding to a person and car within a data set as that
may not make sense. The elements of a data set are related to each other in some way such
as they may be attributes (such as name, roll no, age, gender) corresponding to a person.

Understanding Data: Types of data


Data on which data analysis is done is generally categorized as quantitative data and
qualitative data. This categorization is done based on what properties of the variable can be
measured and in what way.
The key differences between quantitative and qualitative data:

Quantitative data Qualitative data


Quantitative data is numbers-based, Qualitative data is interpretation-based,
countable, or measurable. descriptive, and relating to language.
Quantitative data tells us how many, how Qualitative data can help us to understand
much, or how often in calculations. why, how, or what happened behind certain
behaviors.
Quantitative data is fixed and universal. It is Qualitative data is subjective and dynamic. It
factual. is open to interpretation.
Quantitative research methods are Qualitative research methods are
measuring and counting. interviewing and observing.
Quantitative data is analyzed using Qualitative data is analyzed by grouping the
statistical analysis. data into categories and themes.

5|Page
Data Analytics using Spreadsheets I Theory

Examples of Quantitative Data (Numerical) Examples of Qualitative Data (Categorical)


• Age • Gender
• Height • Religion
• Weight • Marital Status
• Income • Native Language
• College Size • Social Class
• Group Size • Qualifications
• Test Score • Type of teaching approach
• Percentage of lectures attended • Method of treatment
• Number of errors • Type of instruction

Quantitative data – discrete data, continuous data


Quantitative data refers to any information that can be quantified. If it can be counted or
measured, and given a numerical value, it’s quantitative data. Quantitative data can tell you
“how many,” “how much,” or “how often”—for example, how many people attended last
week’s induction programme? How much revenue did the company make in 2022? How often
does a certain customer group use online banking? To analyze and make sense of quantitative
data, we conduct statistical analyses.
Some everyday examples of quantitative data include:

• Measurements such as height, length, and weight.


• Counts, such as the number of website visitors, sales, or email sign-ups.
• Calculations, such as revenue.
• Projections, such as predicted sales or projected revenue increase expressed as a
percentage.
• Quantification of qualitative data—for example, asking customers to rate their
satisfaction on a scale of 1-5 and then coming up with an overall customer satisfaction
score
Quantitative data is either discrete or continuous:
• Discrete quantitative data takes on fixed numerical values and cannot be broken
down further. Discrete data will consist of whole numbers (integers). An example of
discrete data is when you count something, such as the number of people in a room.
If you count 32 people, this is fixed and finite.
• Continuous quantitative data can be placed on a continuum and infinitely broken
down into smaller units. When a data variable is capable of being measured to a large
number of decimal places, then that data variable is said to be continuous. It can take
any value; for example, a piece of string can be 20.4cm in length, or the room
temperature can be 30.8 degrees. Variables measured in units of time, weight, or
temperature are usually taken to be continuous data

6|Page
Data Analytics using Spreadsheets I Theory

Qualitative data - categorical data, ordinal data.


Unlike quantitative data, qualitative data cannot be measured or counted. It’s descriptive,
expressed in terms of language rather than numerical values. We use qualitative data to
answer “Why?” or “How?” questions. For example, if our quantitative data tells us that a
certain website visitor abandoned their shopping cart three times in one week, you’d
probably want to investigate why—and this might involve collecting some form of qualitative
data from the user. Perhaps we want to know how a user feels about a particular product;
again, qualitative data can provide such insights. In this case, we are not just looking at
numbers; we are asking the user to tell us, using language, why they did something or how
they feel. Qualitative data also refers to the words or labels used to describe certain
characteristics or traits—for example, describing the sky as blue or labeling a particular ice
cream flavor as vanilla.
Some examples of qualitative data include:
• Interview transcripts or audio recordings.
• The text included in an email or social media post.
• Product reviews and customer testimonials.
• Observations and descriptions; e.g. “I noticed that the teacher was wearing a red
jumper.”
• Labels and categories used in surveys and questionnaires, e.g. selecting whether you
are satisfied, dissatisfied, or indifferent to a particular product or service.
Qualitative data may be classified as categorical (nominal) or ordinal:

• Categorical (also called nominal) data is used to label or categorize certain variables
without giving them any type of quantitative value. For example, if you were collecting
data about your target audience, you might want to know where they live. Are they
based in the UK, the USA, Asia, or Australia? Each of these geographical classifications
count as categorical data. Another simple example could be the use of labels like
“blue,” “brown,” and “green” to describe eye color.
• Ordinal data is when the categories used to classify your qualitative data fall into a
natural order or hierarchy. For example, if you wanted to explore customer
satisfaction, you might ask each customer to select whether their experience with
your product was “poor,” “satisfactory,” “good,” or “outstanding.” It’s clear that
“outstanding” is better than “poor,” but there’s no way of measuring or quantifying
the “distance” between the two categories.
Categorical and ordinal data tends to come up within the context of conducting
questionnaires and surveys. However, qualitative data is not just limited to labels and
categories; it also includes unstructured data such as what people say in an interview, what
they write in a product review, or what they post on social media.

7|Page
Data Analytics using Spreadsheets I Theory

Understanding operators and functions essential for data analytics.


One of the most powerful features in spreadsheets is the ability to calculate numerical
information using formulas. Just like a calculator, spreadsheet can add, subtract, multiply, and
divide. Spreadsheet uses standard operators for formulas, such as a plus sign for addition (+),
a minus sign for subtraction (-), an asterisk for multiplication (*), a forward slash for division
(/), and a caret (^) for exponents. All formulas in spreadsheets must begin with an equals sign
(=). This is because the cell contains, or is equal to, the formula and the value it calculates.
While you can create simple formulas in spreadsheets manually (for example, =2+2 or =5*5),
most of the time you will use cell addresses to create a formula. This is known as making a
cell reference. Using cell references will ensure that your formulas are always accurate
because you can change the value of referenced cells without having to rewrite the formula.

Arithmetic operators and order of operations.


A simple formula is a mathematical expression with one operator, such as 7+9. A complex
formula has more than one mathematical operator, such as 5+2*8. When there is more than
one operation in a formula, the order of operations tells our spreadsheet which operation to
calculate first. In order to use complex formulas, we need to understand the order of
operations.
All spreadsheet programs calculate formulas based on the following order of operations:
1. Operations enclosed in parentheses
2. Exponential calculations (3^2, for example)
3. Multiplication and division, whichever comes first
4. Addition and subtraction, whichever comes first
A mnemonic that can help you remember the order is PEMDAS, or Please Excuse My Dear
Aunt Sally.
Example illustrating order of operations:
Solve: 10+(6-3)/2^2*4-1
P 10+(6-3)/2^2*4-1
E 10+3/2^2*4-1
M/D 10+3/4*4-1
M/D 10+0.75*4-1
A/S 10+3-1
A/S 13-1
Answer= 12

8|Page
Data Analytics using Spreadsheets I Theory

Functions: Parts of a function, arguments to a function, function library and


types of functions
A function is a predefined formula that performs calculations using specific values in a
particular order. All spreadsheet programs include common functions that can be used for
quickly finding the sum, average, count, maximum value, and minimum value for a range of
cells. In order to use functions correctly, one needs to understand the different parts of a
function and how to create arguments to calculate values and cell references.
The parts of a function
In order to work correctly, a function must be written a specific way, which is called the
syntax. The basic syntax for a function is an equals sign (=), the function name (SUM, for
example), and one or more arguments. Arguments contain the information you want to
calculate. The function =SUM (A1:A20) would add the values of the cell range A1:A20.
Arguments in a function
Arguments can refer to both individual cells and cell ranges and must be enclosed within
parentheses. You can include one argument or multiple arguments, depending on the syntax
required for the function. For example, the function =AVERAGE (B1:B9) would calculate the
average of the values in the cell range B1:B9. This function contains only one argument.
Multiple arguments must be separated by a comma. For example, the function =SUM (A1:A3,
C1:C2, E2) will add the values of all cells in the three arguments.
There are a variety of functions available in spreadsheets. Here are some of the most common
functions:
• SUM: This function adds all of the values of the cells in the argument.
• AVERAGE: This function determines the average of the values included in the
argument. It calculates the sum of the cells and then divides that value by the number
of cells in the argument.
• COUNT: This function counts the number of cells with numerical data in the argument.
This function is useful for quickly counting items in a cell range.
• MAX: This function determines the highest cell value included in the argument.
• MIN: This function determines the lowest cell value included in the argument.
Function Library and types of functions
While there are hundreds of functions in spreadsheets, the ones we use the most will depend
on the type of data our workbooks contain. There's no need to learn every single function,
but exploring some of the different types of functions will help us. We can even use the
Function Library on the Formulas tab to browse functions by category, such as Financial,
Logical, Text, and Date & Time. Functions are categorized by their functionality.
Some of the important types (categories) of functions in the function library are:
• Financial functions: These are used to calculate business equations such as interest,
depreciation, and valuation.

9|Page
Data Analytics using Spreadsheets I Theory

• Logical functions: These are used to compared data in different cells. Depending on
the logical functions used, spreadsheets populates the cell with the logical formula as
TRUE or FALSE depending on the calculation of the formula.
• Text functions: These are powerful components of spreadsheets that convert
numbers into letters and can also remove or copy letters or numbers from other cells
into the current cell.
• Date & Time functions: These format numbers into dates. There are many options
available to return the date as desired from a variety of data sets. Date functions work
with dates and times. Each function performs a simple operation and by combining
several functions within one formula you can solve more complex and challenging
tasks.
• Lookup and Reference formulas: These allow us to work with large sets of data, and
especially useful when you need to reference between multiple data sets. They can
provide information about a range of data, find the location of a given address or
value, or look up certain values in a large set of data.
• Statistical Functions: These are responsible for statistical analysis calculating items like
mean, median, mode, etc.

10 | P a g e
Data Analytics using Spreadsheets I Theory

Data Collection and Manipulation


Data Collection using online data collection tools
Surveys are powerful tools for businesses, researchers, and enthusiasts who are looking to
make data-driven decisions and understand how their audiences think. There are many tools
available online for creating surveys and collecting data. Some of the tools are as follows:
• Google Forms: Google forms is the free, easy-to-use tool for online data collection. To
use Google forms, we need an internet connection. Google forms integrate seamlessly
with Google Sheets. It is free with a Google account. Google Forms is good for small-
scale surveys, where questions are simple and straightforward.
• SurveyMonkey: SurveyMonkey, like Google Forms, is another entry-level data
collection tool. It’s easy to use, and has robust features for data collection.
SurveyMonkey integrates with virtually anything, including Salesforce, Mailchimp, and
Hubspot used for Customer Relationship Management. SurveyMonkey is great for
large surveys (especially marketing).
• SurveyCTO Collect: SurveyCTO Collect is the Android and iOS-compatible data
collection app used to collect data in over 165 countries. It’s especially useful to collect
data in large volumes in remote locations without wi-fi.
• KoboCollect: KoboCollect is the app companion to KoboToolbox, a data collection and
management tool by nonprofit Kobo. It’s targeted for people working in “challenging
environments” like humanitarian crises. KoboToolbox is on open source platform.
Kobo is a nonprofit, and its development is funded by partners and the community.
• Magpi: Magpi is a for-profit mobile data collection app serving anyone, but focused
on international development and healthcare. Their customers include the CDC, WHO,
UNICEF, and various hospitals and government organizations. Magpi is most known
for having good SMS survey capabilities, such as through WhatsApp.

Creating Spreadsheets online and collaboration


There are many spreadsheets’ applications such as Microsoft Excel, Google Sheets,
LibreOffice Calc, WPS Spreadsheets and more. Google Sheets allows us to create
spreadsheets online in our Google Account. It has all the features as any other spreadsheet
feature. Google Sheets allows you to organize, edit, and analyze different types of information
using spreadsheets.
The benefits of using online spreadsheets are as follows:
• The data is stored on the cloud or Internet and hence can be accessed from anywhere
and anytime.
• We can share the same spreadsheet with multiple people by adding collaborators.
This allows multiple people of a team to edit the same sheet simultaneously.
Sharing and collaborating on files: Google Drive makes sharing files simple. It also allows
multiple people to edit the same file, allowing for real-time collaboration. Whenever we share
a file from our Google Drive, we can let others view and even edit that same file. While we
can share any file stored on our Google Drive, it's important to note that we can only use the
collaboration features for files created within our Drive.

11 | P a g e
Data Analytics using Spreadsheets I Theory

For Example: If Reema is an art teacher, who uses her Google Drive to organize letters, lesson
plans, and more. Reema has many files. She decides if she wants to share a file or keep it
private without sharing. Some examples of files and how she controls the access are as
follows:
• Her spreadsheet with classroom expenses she decides to keep private and not share.
• Lesson-planning documents she creates, she shares with her co-teacher and lets her
edit.
• Newsletters and announcements she shares with her students and their parents
publicly but doesn't let others edit.
Others also share files with Reema. These include ones she can edit, like her co-teacher's
supply inventory; and ones she can't, like a schedule sent to her by the principal
As you can tell, no single sharing setting would be right for all of Reema's files. The settings
we choose for each of our shared files will probably depend on why we are sharing it in the
first place. When we share a file with a limited group of people, our collaborators must sign
in with a Google account to view or edit the file. However, when we share with a larger group
or make the file public, our collaborators will not need a Google account to access the file.
We can easily share a file with a larger group of people by providing a link to any file in our
Google Drive. A link is basically a URL or web address for any file we want to share. This can
be especially helpful for files that would be too large to send as an email attachment, like
music or video files. We can also share a file by posting the link to a public webpage. Anyone
who clicks the link will be redirected to the file.

Introduction to data cleansing and data modification


Data cleaning is a crucial step in data analytics, as it involves identifying and removing any
missing, duplicate, or irrelevant data. The goal of data cleaning is to ensure that the data is
accurate, consistent, and free of errors, as incorrect or inconsistent data can negatively
impact on the analysis. Data cleaning is also known as data cleansing or data preprocessing.
Why is data cleaning important?
Data cleaning is a crucial step in the data preparation process, playing an important role in
ensuring the accuracy, reliability, and overall quality of a dataset. For decision-making, the
integrity of the conclusions drawn heavily relies on the cleanliness of the underlying data.
Without proper data cleaning, inaccuracies, outliers, missing values, and inconsistencies can
compromise the validity of analytical results.
Data cleaning and data modification using functions
Most of the time the data you want to do analysis on is not in a usable format i.e., it contains
blank cells, duplicate values, merged columns, etc. Before using this data for analysis, we need
to clean it so that it does not provide any irrelevant results. It ensures accuracy and reliability
in your analyses.

12 | P a g e
Data Analytics using Spreadsheets I Theory

Spreadsheets like Excel provides some techniques and functions to clean the data. The most
widely used techniques and functions are:
• Removing Duplicates: Duplicate data refers to two or more entries that share the same
values in key fields. Identifying and handling duplicate data is essential for maintaining
data quality and ensuring accurate analyses.
• TRIM Function: The TRIM function is used to remove extra spaces from a text string,
leaving only a single space between words and no leading or trailing spaces.
• Convert Numbers Stored as Text into Numbers: It refers to the process of changing
numerical data that is stored as text in a digital format into actual numeric values.
Sometimes the numeric data is stored as text due to formatting issues or data
import/export processes. This can lead to issues when performing calculations or
analyses that require numeric data. The VALUE function is used to convert numbers
stored as text into numbers.
• Highlight Errors: In spreadsheets, you can easily highlight errors in your spreadsheet
to quickly identify and correct them. Errors can include things like #DIV/0!, #VALUE!,
#REF!, #NAME?, #NUM!, #N/A, or #NULL!. These errors can cause issues when
performing calculations or analyses that require numeric data. It is better to deal with
these errors before proceeding with further analysis. The errors can be highlighted
using conditional formatting and choosing “Highlight Cells With” and “Errors” under
Rule Type.
• Change Text to Lower/Upper/Proper Case: We can easily change the case (lowercase,
uppercase, or proper case) of text using built-in functions or formulas. This improves
the readability of the data. Use the UPPER function to convert text to uppercase. Use
the LOWER function to convert it to lowercase. To convert text to proper case
(capitalizing the first letter of each word), use the PROPER function.

Sorting criteria and types of sorting


Sorting data is an integral part of Data Analysis. You can arrange a list of names in alphabetical
order, compile a list of sales figures from highest to lowest, or order rows by colors or icons.
Sorting data helps you quickly visualize and understand your data better, organize and find
the data that you want, and ultimately make more effective decisions. You can sort by
columns or by rows. Most of the sorts that you use will be column sorts.
You can sort data in one or more columns by
• Text (A to Z or Z to A)
• Numbers (smallest to largest or largest to smallest)
• Dates and times (oldest to newest and newest to oldest)
• A custom list (E.g. Large, Medium, and Small)
• Format, including cell color, font color, or icon set

13 | P a g e
Data Analytics using Spreadsheets I Theory

The following pictures show different types of sorting

Unsorted Data

Data Sorted by Text (Title)

How to Sort by Cell Colour

14 | P a g e
Data Analytics using Spreadsheets I Theory

How to Sort by font Colour

How to Sort by Cell Icon

How to Sort by Custom List

15 | P a g e
Data Analytics using Spreadsheets I Theory

Filters and types of filters


Filtering allows you to extract data that meets the defined criteria from a given Range or table.
This is a quick way to display only the information that is needed. We can Filter data in a
Range, table or PivotTable.
We can filter data by:
• Selected values
• Text filters if the column you selected contains text
• Date filters if the column you selected contains dates
• Number filters if the column you selected contains numbers
• Font color if the column you selected contains font with color
• Cell icon if the column you selected contains cell icons
• Advanced filter
• Using slicers (In pivot tables)

Guidelines and examples for sorting and filtering data by colour


Overview of sorting and filtering data by colour and icon set: Many a times data is
highlighted using conditional formatting with colour or icons. This helps in data visualization.
For examples in the diagram below we can easily see the green up arrow is good performance,
yellow side arrow is medium performance and red down arrows is bad performance. When
we sort the data which is has colour or icons it helps in categorizing the data further. This is
shown in the diagram below.

16 | P a g e
Data Analytics using Spreadsheets I Theory

In addition, we could also filter the data based on colour or icons to narrow down and see
only the good performers or bad performers as shown below:

Using colour effectively when analyzing data


The goal of data visualization is to help viewers quickly digest information and remember it.
Colour plays an important role is data visualization and is one of the easiest to apply to data
visualization. Using colour effectively helps viewers understand the meaning and impact of
the information presented — and remember the most important details. If colour is not used
properly, it can distract from the story your visualization is trying to tell and people will not
understand the data. Rather than aiding understanding, it will confuse people. When we use
colours to visualize data we should careful about choosing colours appropriate for the data.
Normally Green colour is used to indicate good performance, yellow for average performance
and red for bad performance or danger.
Choosing the best colours in data visualization
There’s no one right way to use colour, but we can take what we know about how the brain
is influenced by colour and apply it to visualization design to get better results. Some tips to
keep in mind when choosing colours are:
• Use color to create associations: For example, use orange to represent safety
performance, deep green to represent profit, or light green to represent
environmental sustainability. Color palettes can also create associations in the
viewer’s mind, such as the colors of a country’s flag communicating data related to
that country.
• Use a single color to show continuous data: Using one color will help viewers to
quickly grasp that they’re viewing increases or decreases in a single metric. For
example, for data such as the unemployment rate or an infection rate over time we
can use single colour.
• Use contrasting colors to show comparison/contrast: When you’re comparing or
contrasting two metrics, using contrasting colors will help viewers intuit that you’re
differentiating between the two. You might be showing the difference between the
conversion rates on Facebook ads vs. Instagram ads, for example. In this particular

17 | P a g e
Data Analytics using Spreadsheets I Theory

case, you might use contrasting colors that are also associated with the two different
platforms — light blue and pink-purple.
• Use color to make important information stand out: When you’re trying to highlight
something important, such as data relevant to a particular county or zip code, a bright
or saturated color can help it stand out. For example, you may choose to use gray for
less-important variables and a deep red or orange for the most important variable.
You could also use muted colors for the less-important ones and a bright color for the
most important one.
• Don’t pick colors that aren’t easily distinguishable: If you can’t distinguish between
colours easily then the data can be confusing to understand.
• Don’t Use Too Many Colors: Because the brain struggles to process many different
things at once, using a limited color set in your visualizations will improve speed to
insight.

18 | P a g e
Data Analytics using Spreadsheets I Theory

Data Visualization and Summarization


Visualizing data
Data can be visualized or highlighted in different ways in Excel. Visualization helps us in
understanding the important trends in data.
Different techniques used to visualize data are as follows:
• Charts
• Conditional Formatting
• Pivot tables and pivot charts

Principles of charting
Charts is a type of data visualization which takes a bunch of numbers and information and
turns it into pictures or any kind of charts that are easier to understand. It takes a big pile of
information and sorts it into pictures (like bar charts, line graphs, or pie charts) that make it
easier to understand or see patterns and trends. Data can be a jumble of numbers and facts.
Charts and graphs turn that jumble into pictures that make sense. However, if the charts are
poorly prepared then the charts can be confusing and counter-productive.
Types of basic charts
Some types of basic charts are as follows:
• Column Charts: Column charts use vertical bars to represent data. They can work with
many different types of data. They are mostly used for comparing information.

• Line Charts: Line graphs are used to display data over time or continuous intervals.
Line charts are ideal for showing trends. The data points are connected with lines,
making it easy to see whether values are increasing or decreasing over time.

19 | P a g e
Data Analytics using Spreadsheets I Theory

• Pie Charts: Pie charts are circular graphs divided into sectors, where each sector
represents a proportion of the whole. The size of each sector corresponds to the
percentage or proportion of the total data it represents. Pie charts make it easy to
compare proportions. Each value is shown as a slice of the pie, so it is easy to see
which values make up the percentage of a whole.

• Bar Charts: Bar charts work just like column charts, but they use horizontal rather
than vertical bars.

• Area Charts: Area charts are similar to line graphs but with the area below the line
filled in with colour. They are used to represent cumulative totals or stacked data over
time. Area charts are effective for showing changes in composition over time and
comparing the contributions of different categories to the total.

20 | P a g e
Data Analytics using Spreadsheets I Theory

Some practicalities in preparing charts


When we prepare a chart for some data, we should be careful in selecting the data and decide
carefully what we want to visualize from the data and choose the appropriate chart
accordingly.
For example, for the table given next:

WindSport Inc. Sales


Month In Store Sales Mail Order Sales Web Site Sales Total Sales
May ₹ 6,206.00 ₹ 3,275.00 ₹ 12,016.00 ₹ 21,497.00
Jun ₹ 17,351.00 ₹ 5,328.00 ₹ 35,371.00 ₹ 58,050.00
Jul ₹ 11,360.00 ₹ 1,555.00 ₹ 10,822.00 ₹ 23,737.00
Aug ₹ 28,722.00 ₹ 5,913.00 ₹ 17,243.00 ₹ 51,878.00
Sep ₹ 7,995.00 ₹ 1,913.00 ₹ 11,764.00 ₹ 21,672.00
Total ₹ 71,634.00 ₹ 17,984.00 ₹ 87,216.00 ₹ 1,76,834.00
If we want the show the trend in total sales over a period of time we have to choose a line
chart.

Total Sales Trend


₹ 70,000.00
₹ 60,000.00
₹ 50,000.00
₹ 40,000.00
₹ 30,000.00
₹ 20,000.00
₹ 10,000.00
₹ 0.00
May Jun Jul Aug Sep

If we want to compare the proportion of total sales, we choose a pie chart.

Total Sales Proportion


May Jun Jul Aug Sep

Sep, ₹ May, ₹
21,672.00, 12% 21,497.00, 12%

Aug, ₹ Jun, ₹
51,878.00, 29% 58,050.00, 33%

Jul, ₹ 23,737.00,
14%

21 | P a g e
Data Analytics using Spreadsheets I Theory

If we want to compare the different types of sales we could use a column chart.

Sales Comparison
₹ 40,000.00
₹ 35,000.00
₹ 30,000.00
₹ 25,000.00 In Store Sales
₹ 20,000.00
Mail Order Sales
₹ 15,000.00
Web Site Sales
₹ 10,000.00
₹ 5,000.00
₹ 0.00
May Jun Jul Aug Sep

Or

Sales Comparison
₹ 40,000.00
May
₹ 30,000.00
Jun
₹ 20,000.00
Jul
₹ 10,000.00 Aug
₹ 0.00 Sep
In Store Sales Mail Order Sales Web Site Sales

Based on how we want to present the data we have to decide what we want to show on the
horizontal axis as shown above. In the first chart we can compare the different types of sales
within a month as horizontal axis has the month. However, in the second chart we can
compare sales performance for different months within a given type of sales. We could
choose a stacked column chart if we want compare both proportions across months in a
type of sale as well as performance across different types of sales as shown below.

22 | P a g e
Data Analytics using Spreadsheets I Theory

Conditional Formatting and its types


Conditional formatting provides another way to visualize data and make worksheets easier to
understand. Conditional formatting allows us to automatically apply formatting such as
colours, icons, and data bars to one or more cells based on the cell value. To do this, we need
to create a conditional formatting rule. For example, a conditional formatting rule might be:
If the value is less than 2000, colour the cell red. By applying this rule, we will be able to
quickly see which cells contain values less than 2000. We can apply multiple conditional
formatting rules to a cell range or worksheet, allowing us to visualize different trends and
patterns in our data. Excel has several predefined styles or presets we can use to quickly apply
conditional formatting to our data. They are grouped into three categories:
• Data Bars are horizontal bars added to each cell, much like a bar graph.

• Colour Scales change the colour of each cell based on its value. Each colour scale uses
a two or three colour gradient. For example, in the Green-Yellow-Red colour scale, the
highest values are green, the average values are yellow, and the lowest values are red.

• Icon Sets add a specific icon to each cell based on its value.

Functions used for data summarization


There are some data analysis functions which help us to summarize data easily. These
functions are as follows:
• SUMIF: The SUMIF function is a function in Excel, which calculates the sum of values
in a range based on a condition.
The syntax of the function is =SUMIF(range, criteria, [sum_range])
The condition is referred to as criteria, which can check things like:
o If a number is greater than another number >
o If a number is smaller than another number <
o If a number or text is equal to something =
The [sum_range] is the range where the function calculates the sum. Note: The
[sum_range] is optional. If not specified, the function calculates the sum of the same
range as the condition.

23 | P a g e
Data Analytics using Spreadsheets I Theory

Example of SUMIF in Pictures:

To find sum where type is Grass we do

• SUMIFS: The SUMIFS function is a premade function in Excel, which calculates the sum
of a range based on one or more conditions.
The syntax of the function is
=SUMIFS(sum_range, criteria_range1, criteria1, [criteria_range2, criteria2] ...)
The conditions are referred to as criteria1, criteria2, and so on, which can check things
like:
o If a number is greater than another number >
o If a number is smaller than another number <
o If a number or text is equal to something =
The criteria_range1, criteria_range2, and so on, are the ranges where the function
check for the conditions. The [sum_range] is the range where the function calculates
the sum.
Example of SUMIFS in Pictures:

24 | P a g e
Data Analytics using Spreadsheets I Theory

To find sum where type is Water and Generation is 1 we do

• COUNTIFS: The COUNTIFS function is a function in Excel, which counts cells in a range
based on one or more conditions.
The syntax of the function is
=COUNTIFS(criteria_range1, criteria1, [criteria_range2, criteria2], ...)
The conditions are referred to as criteria1, criteria2, and so on, which can check things
like:
o If a number is greater than another number >
o If a number is smaller than another number <
o If a number or text is equal to something =
The criteria_range1, criteria_range2, and so on, are the ranges where the function
check for the conditions.
Example of COUNTIFS in Pictures:

To find number of items with type water and Generation is 1 we do

25 | P a g e
Data Analytics using Spreadsheets I Theory

• AVERAGEIFS: The AVERAGEIFS function is a function in Excel, which calculates the


average of a range based on one or more conditions.
The syntax of the function is
=AVERAGEIFS(average_range, criteria_range1, criteria1, ...)
The conditions are referred to as criteria1, criteria2, and so on, which can check things
like:
o If a number is greater than another number >
o If a number is smaller than another number <
o If a number or text is equal to something =
The criteria_range1, criteria_range2, and so on, are the ranges where the function
check for the conditions. The average_range is the range where the function calculates
the average.
Example of AVERAGEIFS in Pictures:

To find Average Defense with type Grass and Generation is 1 we do

Pivot tables and its applications


When we have a lot of data, it can sometimes be difficult to analyze all of the information in
our worksheet. PivotTables can help make our worksheets more manageable by summarizing
our data and allowing us to manipulate it in different ways. A PivotTable can instantly
calculate and summarize the data in a way that will make it much easier to read. With a
PivotTable we can answer different questions by rearranging—or pivoting—the data. It is an
alternative to using data summarizing functions such as SUMIFS, COUNTIFS and such.
PivotTable is a functionality in Excel which helps you organize and analyze data. It lets us add
and remove values, perform calculations, and to filter and sort data sets. PivotTable helps us
structure and organize data to understand large data sets.

26 | P a g e
Data Analytics using Spreadsheets I Theory

Pivot tables’ usefulness is easy to understand: If we have a large chunk of data, Pivot Tables
help us to turn the data set into useful reports and summaries. Visualizations are available,
too in form of Pivot Charts. Some use cases for pivot tables are as follows:
• Run automatic calculations on summed or counted values: Pivot tables are efficient
at performing calculations on large data sets. By summarizing and organizing data, we
can effortlessly compute sums, counts, averages, and more. This feature is perfect if
our work requires us to deal with sensitive financial data. For instance, we can benefit
from pivot tables if we are a financial analyst tallying expenses or a sales manager
assessing our business’ revenue. Pivot tables are also great if we need to generate
quick insights but don’t have the time to do manual calculations.
• Create percentages of totals: With pivot tables, you have a straightforward way of
generating percentages of totals. This lets us grasp the proportional contribution of
each data category in the data set. This is perfect if we want to get a holistic view of
the data. Marketers, for example, find pivot tables useful when evaluating their
campaign’s performance. They’re also ideal for project managers assessing resource
allocation. Using pivot tables, we can transform absolute values into insightful
percentages. This improves our data interpretation and decision-making quality.
• Segment data by date, the user, or other variables and calculate totals: Pivot tables
allow us to segment data effortlessly. This enables in-depth analysis based on specific
criteria such as date, user name, or other customizable variable. With this feature, we
can generate time-sensitive reports invaluable for assessing trends, identifying
patterns, and making informed decisions. For example, if you’re a sales manager, you
can analyze revenue by quarter. If you’re an HR professional, you can analyze
employee performance by department. Project managers can track progress across
different phases.

27 | P a g e

You might also like