Chapter 1 Excel Function
Chapter 1 Excel Function
Statistical functions.
Let’s make that a bit more relevant for us – Statistics is the grammar of “Data”
Science. You’ll notice that almost every successful data science professional or
analytics professional has a solid understanding of statistics.
Just think of the most commonly used tool in most organizations for data
analysis. It’s Microsoft Excel! Honestly, Excel is the Swiss Army Knife for data
analytics professionals that helps you focus on what’s important (statistics in
our case) and handles the rest of the calculations and customizations itself.
Excel can help us answer questions like, ‘what is the largest value or minimum
value in a set of data,’ as well as plot graphs and even perform regression. We
will use 10 key statistical excel functions to answer questions for a dummy
sports company, ‘Khelo’, while perusing their data.
we will be answering the following questions using various Excel functions:
We will be talking majorly about the different kinds of count functions here.
These are very similar to other functions, such as sum, max, min, and average.
1. Count Function
We use the count function when we need to count the number of cells
containing a number. Remember, ONLY NUMBERS! Let’s see the function:
COUNT(value1, [value2], …)
So, let’s try to find the answer to our first question – How many items were on
discount?
2. Counta Function
While the count function only counts the numeric values, the COUNTA
function counts all the cells in a range that are not blank cells. The function is
useful for counting cells containing any type of information, including error
values and empty text.
COUNTA(value1, [value2], …)
We’ll answer the second question using the counta function since it is able to
count all the non-empty values – How many items/pieces of equipment are sold
by the store?
The total number of items sold by the store is 13.
3. Countblank
COUNTBLANK(range)
Summarizing empty cells is the requirement for our third question – What
products are not in the discount section? Let’s apply the function!
There are only 2 items not on discount.
4. Countifs Function
Countifs are one of the most used statistical functions in Excel. The COUNTIFS
function applies one or more conditions to the cells in the given range and
returns only those cells that fulfill all of the conditions.
Note: Every new range must have the same number of rows and columns as
the criteria_range1 argument. The ranges do not have to be adjacent to each
other.
This function seems perfect to answer the fourth question – Are there any
products sold having cost more than 2000 along with a discount rate greater
than 50%?
The questions seemed complex, but it was really easy to find the answer in
Excel. Only 1 product, i.e., sneakers, cost more than 2000 and sold at a
discount rate greater than 20%.Wonderful, isn’t it? We have gone through
some basic statistical functions in MS Excel so far. Next, let’s have a look at the
intermediate statistical functions.
Intermediate Statistical Functions in Excel
5. Average Function
The most common function we usually use in our daily lives is the average (or
mean). The AVERAGE function simply returns the arithmetic mean of all the
cells in a given range:
AVERAGE(number1, [number2], …)
But there’s one simple drawback to using averages – they are prone to outliers.
Therefore, they can paint a very unrealistic picture in our analysis. Let’s find
out the average number of goods sold:dispersed
6. Median Function
The problem of outliers can be solved by using another function for the central
tendency – the median. The median function returns the middle value of the
given range of cells. The syntax is quite simple:
MEDIAN(number1, [number2], …)
Let’s find the median of the number of goods sold in our sports store and see
how close this is to our average value:
We see that the median comes out to be ~ 320, which is pretty close to the
average value. It means there is not much fluctuation in our data.
Let’s see if this is the case for the cost of goods: The median and the
average value for the cost of each item vary a lot. For example, the cost of a ball
is 50, but the cost of a bat is 2000 – resulting in high dispersion.
7. Mode Function
For numerical values, mean and median usually, suffice, but what about
categorical values? Here, mode comes into the picture. Mode returns the most
frequent and repeated value in the given range of values:
MODE.SNGL(number1,[number2],…)
Well, this is a simple one. Let’s find the most frequent discount value given by
the sports store:
This discount value is 10%.
STDEV.P(number1,[number2],…)
Note: The STDEV.P function assumes that its arguments are the entire
population. If that’s not the case, you may use the function STDEV.S() function.
For a large sample size, the standard deviation of the population and samples
will return approximately similar values. Previously, we calculated the mean
and median to get a picture of the central tendency. Let’s find out the standard
deviation to see the level of dispersion:
As expected, the standard deviation of the quantity sold is less, meaning that the
dispersion is less, whereas the standard deviation for the cost of products is
high.
9. Quartiles Functions
This is yet another function with abundant applications in the industry. It helps
us divide the population into groups. The QUARTILES.INC returns the quartile
of a dataset based on percentile values from 0 to 1, inclusive.
For example, you can use this function to find out the top 25% of your customer
base.
QUARTILE.INC(array, quart)
CORREL(array1, array2)
Well, the correlation comes out to be ~0.8, which is pretty high. It seems
these are positively related – meaning the more the discount, the more the
quantity sold.
Mathematical Functions:
1) Sum function:
The SUM function adds values. You can add individual values, cell references
or ranges or a mix of all three.
For example:
2) Sumif Function:
You use the SUMIF function to sum the values in a range that meets the
criteria that you specify. For example, suppose that in a column that
contains numbers, you want to sum only the values that are larger than 5.
You can use the following formula: =SUMIF(B2:B25,">5")
Syntax
3) Product function:
The PRODUCT function is useful when you need to multiply many cells
together. For example, the formula =PRODUCT(A1:A3, C1:C3) is
equivalent to =A1 * A2 * A3 * C1 * C2 * C3.
Syntax
4) Power function:
POWER(number, power)
5) SQRT function:
Syntax
SQRT(number)
6) Round Function:
=ROUND(A1, 2)
Syntax
ROUND(number, num_digits)
7) Rand function:
RAND returns an evenly distributed random real number greater than or equal
to 0 and less than 1. A new random real number is returned every time the
worksheet is calculated.
Syntax
RAND()
Returns the remainder after number is divided by divisor. The result has the
same sign as divisor.
Syntax
MOD(number, divisor)
Number Required. The number for which you want to find the
remainder.
Divisor Required. The number by which you want to divide
number.
9) Quotient function:
Returns the integer portion of a division. Use this function when you want to
discard the remainder of a division.
Syntax
QUOTIENT(numerator, denominator)
Function Description
DAVERAGE Returns the average of selected database entries
function
DCOUNT Counts the cells that contain numbers in a database based on
function criteria.
DCOUNTA Counts nonblank cells in a database based on criteria.
function
DGET function Extracts from a database a single record that matches the
specified criteria
DMAX function Returns the maximum value from selected database entries
based on criteria.
DMIN function Returns the minimum value from selected database entries
DPRODUCT Multiplies the values in a particular field of records that match
function the criteria in a database
DSTDEV Estimates the standard deviation based on a sample of selected
function database entries based on the given criteria.
DSTDEVP Calculates the standard deviation based on the entire population
function of selected database entries
DSUM function Adds the numbers in the field column of records in the database
that match the criteria
DVAR function Estimates variance based on a sample from selected database
entries
DVARP function Calculates variance based on the entire population of selected
database entries.
Read more about database functions on:
https://support.microsoft.com/en-us/office/database-functions-reference-
ad87e69b-fc20-4d3d-9d52-d7dc023f5c23
Financial Functions:
Note: the last two arguments are optional. For loans, Fv can be omitted (the
future value of a loan equals 0, however, it's included here for clarification). If
Type is omitted, it is assumed that payments are due at the end of the period.
Result. The monthly payment equals $1,074.65.
Tip: when working with financial functions in Excel, always ask yourself the
question, am I making a payment (negative) or am I receiving money
(positive)? We pay off a loan of $150,000 (positive, we received that amount)
and we make monthly payments of $1,074.65 (negative, we pay). Visit our page
about the PMT function for many more examples.
RATE
If Rate is the only unknown variable, we can use the RATE function to
calculate the interest rate.
NPER
Or the NPER function. If we make monthly payments of $1,074.65 on a 20-year
loan, with an annual interest rate of 6%, it takes 240 months to pay off this loan.
We already knew this, but we can change the monthly payment now to see how
this affects the total number of periods.
But, if we make monthly payments of only $1,000.00, we still have debt after
20 years.
IPMT:
Returns the interest payment for a given period for an investment based on
periodic, constant payments and a constant interest rate.
Syntax
PPMT:
Returns the payment on the principal for a given period for an investment based
on periodic, constant payments and a constant interest rate.
Syntax
Note: For a more complete description of the arguments in PPMT, see PV.