0% found this document useful (0 votes)
10 views

Chapter 1 Excel Function

Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Chapter 1 Excel Function

Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 18

Chapter 1: Excel Functions.

Statistical functions.

“Statistics is the grammar of Science.” – Karl Pearson

Let’s make that a bit more relevant for us – Statistics is the grammar of “Data”
Science. You’ll notice that almost every successful data science professional or
analytics professional has a solid understanding of statistics.

Just think of the most commonly used tool in most organizations for data
analysis. It’s Microsoft Excel! Honestly, Excel is the Swiss Army Knife for data
analytics professionals that helps you focus on what’s important (statistics in
our case) and handles the rest of the calculations and customizations itself.
Excel can help us answer questions like, ‘what is the largest value or minimum
value in a set of data,’ as well as plot graphs and even perform regression. We
will use 10 key statistical excel functions to answer questions for a dummy
sports company, ‘Khelo’, while perusing their data.
we will be answering the following questions using various Excel functions:

1. How many items are at a discount?


2. How many items/pieces of equipment are sold by the store?
3. What is the number of products sold without a discount?
4. Are there any products sold having cost more than 2000 along with a
discount rate greater than 50%?
5. What is the average number of products sold?
6. What is the median of the number of products sold?
7. What is the most frequent discount percentage?
8. What is the standard deviation of the number of products sold?
9. Is there any relationship between the number of products sold and the
discount percentage?

Basic Statistical Functions in Excel

MS Excel provides an array of useful statistical functions. Let us begin with


some of the basic yet extremely powerful functions. Honestly, you’ll find that
you’re using the basic statistical functions 90% of the time, and the rest 10% of
your time is taken by intermediate and advanced functions.

We will be talking majorly about the different kinds of count functions here.
These are very similar to other functions, such as sum, max, min, and average.

1. Count Function

We use the count function when we need to count the number of cells
containing a number. Remember, ONLY NUMBERS! Let’s see the function:

 COUNT(value1, [value2], …)
So, let’s try to find the answer to our first question – How many items were on
discount?

There are 11 products on discount.

2. Counta Function

While the count function only counts the numeric values, the COUNTA
function counts all the cells in a range that are not blank cells. The function is
useful for counting cells containing any type of information, including error
values and empty text.

 COUNTA(value1, [value2], …)

We’ll answer the second question using the counta function since it is able to
count all the non-empty values – How many items/pieces of equipment are sold
by the store?
The total number of items sold by the store is 13.

3. Countblank

The COUNTBLANK function counts the number of empty cells in a range of


cells. Cells with formulas that return empty text are also counted here, but cells
with zero values are not counted. This is a great function for summarizing
empty cells while analyzing any data.

 COUNTBLANK(range)

Summarizing empty cells is the requirement for our third question – What
products are not in the discount section? Let’s apply the function!
There are only 2 items not on discount.

4. Countifs Function

Countifs are one of the most used statistical functions in Excel. The COUNTIFS
function applies one or more conditions to the cells in the given range and
returns only those cells that fulfill all of the conditions.

 COUNTIFS(criteria_range1, criteria1, [criteria_range2, criteria2]…)

Note: Every new range must have the same number of rows and columns as
the criteria_range1 argument. The ranges do not have to be adjacent to each
other.

This function seems perfect to answer the fourth question – Are there any
products sold having cost more than 2000 along with a discount rate greater
than 50%?

The questions seemed complex, but it was really easy to find the answer in
Excel. Only 1 product, i.e., sneakers, cost more than 2000 and sold at a
discount rate greater than 20%.Wonderful, isn’t it? We have gone through
some basic statistical functions in MS Excel so far. Next, let’s have a look at the
intermediate statistical functions.
Intermediate Statistical Functions in Excel

We will discuss some of the intermediate statistical functions in MS Excel here


related to central tendency and dispersion. These functions are very useful in
our day-to-day life as an analyst.

We will discuss some of the intermediate statistical functions in MS Excel here


related to central tendency and dispersion. These functions are very useful in
our day-to-day life as data analysts.

5. Average Function

The most common function we usually use in our daily lives is the average (or
mean). The AVERAGE function simply returns the arithmetic mean of all the
cells in a given range:

 AVERAGE(number1, [number2], …)

But there’s one simple drawback to using averages – they are prone to outliers.
Therefore, they can paint a very unrealistic picture in our analysis. Let’s find
out the average number of goods sold:dispersed

The average comes out to be ~ 365.2.


We will be doing similar calculations for cost as well.

6. Median Function
The problem of outliers can be solved by using another function for the central
tendency – the median. The median function returns the middle value of the
given range of cells. The syntax is quite simple:

 MEDIAN(number1, [number2], …)

Let’s find the median of the number of goods sold in our sports store and see
how close this is to our average value:

We see that the median comes out to be ~ 320, which is pretty close to the
average value. It means there is not much fluctuation in our data.
Let’s see if this is the case for the cost of goods: The median and the
average value for the cost of each item vary a lot. For example, the cost of a ball
is 50, but the cost of a bat is 2000 – resulting in high dispersion.

7. Mode Function

For numerical values, mean and median usually, suffice, but what about
categorical values? Here, mode comes into the picture. Mode returns the most
frequent and repeated value in the given range of values:

 MODE.SNGL(number1,[number2],…)

Note: MODE.SNGL returns only a single value, whereas MODE.MULT returns


an array of the most commonly occurring values.

Well, this is a simple one. Let’s find the most frequent discount value given by
the sports store:
This discount value is 10%.

8. Standard Deviation Function

Standard Deviation is one of the ways to quantify dispersion. It is a measure of


how widely values are dispersed from the average value.

Here, we will be using the STDEV.P function, which is used to calculate


standard deviation based on the entire population given as arguments:

 STDEV.P(number1,[number2],…)

Note: The STDEV.P function assumes that its arguments are the entire
population. If that’s not the case, you may use the function STDEV.S() function.

For a large sample size, the standard deviation of the population and samples
will return approximately similar values. Previously, we calculated the mean
and median to get a picture of the central tendency. Let’s find out the standard
deviation to see the level of dispersion:
As expected, the standard deviation of the quantity sold is less, meaning that the
dispersion is less, whereas the standard deviation for the cost of products is
high.

9. Quartiles Functions

This is yet another function with abundant applications in the industry. It helps
us divide the population into groups. The QUARTILES.INC returns the quartile
of a dataset based on percentile values from 0 to 1, inclusive.

For example, you can use this function to find out the top 25% of your customer
base.

 QUARTILE.INC(array, quart)

10. Correlation Function

The CORREL() function is my personal favorite. It provides really powerful


insights that are not obvious to the naked eye. The CORREL function returns
the correlation coefficient of two cell ranges. But what is that? It basically tells
us how strong the relationship is between the two variables.

Note: It does not portray any cause-and-effect relationship.

 CORREL(array1, array2)

The range of correlation values is between -1 and 1.


Let’s head to our final and most interesting question – is there any relationship
between the number of goods sold and the percentage of discount?

Well, the correlation comes out to be ~0.8, which is pretty high. It seems
these are positively related – meaning the more the discount, the more the
quantity sold.

Mathematical Functions:

1) Sum function:

The SUM function adds values. You can add individual values, cell references
or ranges or a mix of all three.

For example:

 =SUM(A2:A10) Adds the values in cells A2:10.

 =SUM(A2:A10, C2:C10) Adds the values in cells A2:10, as well as


cells C2:C10.
Syntax:
SUM(number1,[number2],...)

2) Sumif Function:

You use the SUMIF function to sum the values in a range that meets the
criteria that you specify. For example, suppose that in a column that
contains numbers, you want to sum only the values that are larger than 5.
You can use the following formula: =SUMIF(B2:B25,">5")

Syntax

SUMIF(range, criteria, [sum_range])

3) Product function:

The PRODUCT function multiplies all the numbers given as arguments


and returns the product. For example, if cells A1 and A2 contain
numbers, you can use the formula =PRODUCT(A1, A2) to multiply
those two numbers together. You can also perform the same operation by
using the multiply (*) mathematical operator; for example, =A1 * A2.

The PRODUCT function is useful when you need to multiply many cells
together. For example, the formula =PRODUCT(A1:A3, C1:C3) is
equivalent to =A1 * A2 * A3 * C1 * C2 * C3.

Syntax

PRODUCT(number1, [number2], ...)

4) Power function:

Returns the result of a number raised to a power.


Syntax

POWER(number, power)

5) SQRT function:

Returns a positive square root.

Syntax

SQRT(number)

6) Round Function:

The ROUND function rounds a number to a specified number of digits.


For example, if cell A1 contains 23.7825, and you want to round that
value to two decimal places, you can use the following formula:

=ROUND(A1, 2)

The result of this function is 23.78.

Syntax

ROUND(number, num_digits)

7) Rand function:
RAND returns an evenly distributed random real number greater than or equal
to 0 and less than 1. A new random real number is returned every time the
worksheet is calculated.

Syntax

RAND()

The RAND function syntax has no arguments.


8) Mod function:

Returns the remainder after number is divided by divisor. The result has the
same sign as divisor.

Syntax

MOD(number, divisor)

The MOD function syntax has the following arguments:

 Number Required. The number for which you want to find the
remainder.
 Divisor Required. The number by which you want to divide
number.

9) Quotient function:

Returns the integer portion of a division. Use this function when you want to
discard the remainder of a division.

Syntax

QUOTIENT(numerator, denominator)

The QUOTIENT function syntax has the following arguments:

 Numerator Required. The dividend.


 Denominator Required. The divisor.
Database Functions:

Function Description
DAVERAGE Returns the average of selected database entries
function
DCOUNT Counts the cells that contain numbers in a database based on
function criteria.
DCOUNTA Counts nonblank cells in a database based on criteria.
function
DGET function Extracts from a database a single record that matches the
specified criteria
DMAX function Returns the maximum value from selected database entries
based on criteria.
DMIN function Returns the minimum value from selected database entries
DPRODUCT Multiplies the values in a particular field of records that match
function the criteria in a database
DSTDEV Estimates the standard deviation based on a sample of selected
function database entries based on the given criteria.
DSTDEVP Calculates the standard deviation based on the entire population
function of selected database entries
DSUM function Adds the numbers in the field column of records in the database
that match the criteria
DVAR function Estimates variance based on a sample from selected database
entries
DVARP function Calculates variance based on the entire population of selected
database entries.
Read more about database functions on:
https://support.microsoft.com/en-us/office/database-functions-reference-
ad87e69b-fc20-4d3d-9d52-d7dc023f5c23
Financial Functions:

PMT | RATE | NPER | PV | FV


To illustrate Excel's most popular financial functions, we consider a loan with
monthly payments, an annual interest rate of 6%, a 20-year duration, a present
value of $150,000 (amount borrowed) and a future value of 0 (that's what you
hope to achieve when you pay off a loan).
We make monthly payments, so we use 6%/12 = 0.5% for Rate and 20*12 =
240 for Nper (total number of periods). If we make annual payments on the
same loan, we use 6% for Rate and 20 for Nper.
PMT
Select cell A2 and insert the PMT function.

Note: the last two arguments are optional. For loans, Fv can be omitted (the
future value of a loan equals 0, however, it's included here for clarification). If
Type is omitted, it is assumed that payments are due at the end of the period.
Result. The monthly payment equals $1,074.65.
Tip: when working with financial functions in Excel, always ask yourself the
question, am I making a payment (negative) or am I receiving money
(positive)? We pay off a loan of $150,000 (positive, we received that amount)
and we make monthly payments of $1,074.65 (negative, we pay). Visit our page
about the PMT function for many more examples.
RATE
If Rate is the only unknown variable, we can use the RATE function to
calculate the interest rate.

NPER
Or the NPER function. If we make monthly payments of $1,074.65 on a 20-year
loan, with an annual interest rate of 6%, it takes 240 months to pay off this loan.

We already knew this, but we can change the monthly payment now to see how
this affects the total number of periods.

Conclusion: if we make monthly payments of $2,074.65, it takes less than 90


months to pay off this loan.
PV
Or the PV (Present Value) function. If we make monthly payments of $1,074.65
on a 20-year loan, with an annual interest rate of 6%, how much can we
borrow? You already know the answer.
FV
And we finish this chapter with the FV (Future Value) function. If we make
monthly payments of $1,074.65 on a 20-year loan, with an annual interest rate
of 6%, do we pay off this loan? Yes.

But, if we make monthly payments of only $1,000.00, we still have debt after
20 years.

IPMT:

Returns the interest payment for a given period for an investment based on
periodic, constant payments and a constant interest rate.

Syntax

IPMT(rate, per, nper, pv, [fv], [type])

The IPMT function syntax has the following arguments:

 Rate Required. The interest rate per period.


 Per Required. The period for which you want to find the interest
and must be in the range 1 to nper.
 Nper Required. The total number of payment periods in an annuity.
 Pv Required. The present value, or the lump-sum amount that a
series of future payments is worth right now.
 Fv Optional. The future value, or a cash balance you want to attain
after the last payment is made. If fv is omitted, it is assumed to be 0
(the future value of a loan, for example, is 0).
 Type Optional. The number 0 or 1 and indicates when payments
are due. If type is omitted, it is assumed to be 0.

PPMT:

Returns the payment on the principal for a given period for an investment based
on periodic, constant payments and a constant interest rate.

Syntax

PPMT(rate, per, nper, pv, [fv], [type])

Note: For a more complete description of the arguments in PPMT, see PV.

The PPMT function syntax has the following arguments:

 Rate Required. The interest rate per period.


 Per Required. Specifies the period and must be in the range 1 to
nper.
 Nper Required. The total number of payment periods in an annuity.
 Pv Required. The present value — the total amount that a series of
future payments is worth now.
 Fv Optional. The future value, or a cash balance you want to attain
after the last payment is made. If fv is omitted, it is assumed to be 0
(zero), that is, the future value of a loan is 0.
 Type Optional. The number 0 or 1 and indicates when payments
are due.

You might also like