Excel
Excel
=
- creating formulas and referencing other cells
- Ex: =B2
SUM
- adds values. You can add individual values, cell references or
ranges or a mix of all three
- Ex: SUM (number1; number2; …)
- Еx: SUM (B1:B5)
Absolute References
- se absolute references (e.g., $B$3) to keep a cell reference
constant when copying formulas.
- win: F4; mac: fn+F4
Tables
- command + T
- command + uparrow / downarrow / leftarrow / rightarrow – takes
you to the bottom/top/left/right extreme of the table
- command + shift + arrow – highlights the entire row/column
- command + A – selects the entire table
- command + E - Flash Fill feature can automatically fill in data
based on patterns it detects
- Frequency Table
Charts
- Histogram
o chart design menu / add chart element / add title / primary
vertical
o right click on the chart / select data / horizontal axis table
o histograms and Bar (or Column) charts are easily confused.
Histograms plot distributions of quantitative (numerical) data.
Numerical ranges of the data are grouped into bins and
charted. Bar or column charts plot counts of categorical data
RAND
- random number generator
- random decimal:
o RAND()
o between 0 and 1. They are alive, each time the field is clicked
new numbers are generated
o to stabilize them, select numbers, hit COPY, then PASTE
SPECIAL – VALUES – the field turns into a number
- random integer:
o RANDBETWEEN(bottom; top)
o Bottom Required. The smallest integer RANDBETWEEN will
return
o Top Required. The largest integer RANDBETWEEN will
return.
- random integer with a decimal
o RAND()+RANDBETWEEN(bottom; top)
- random normal distribution
o data analysis table
- random array
o RANDARRAY ([rows]; [columns]; [min]; [max]; [integer])
- to generate a random real number between a and b, use:
=RAND()*(b-a)+a
- if you want to use RAND to generate a random number but don't
want the numbers to change every time the cell is calculated,
you can enter =RAND() in the formula bar, and then press F9 to
change the formula to a random number. The formula will
calculate and leave you with just a value
ROUND
- ROUND(number; num_digits)
COUNT
- COUNT(value1, [value2], ...)
- used to determine the number of data points in a dataset
COUNTA
- COUNTA(value1, [value2], ...)
- counts the number of cells that are not empty in a range
MAX
- MAX(number1, [number2], ...)
- highest values in a dataset
MIN
- MIN(number1, [number2], ...)
- lowest values in a dataset
- RANGE: = max – min
AVG
- AVERAGE(number1, [number2], ...)
- MEAN - sum of all data points divided by the number of data
points
MEDIAN
- MEDIAN(number1, [number2], ...)
- middle value of a dataset when it is ordered from smallest to
largest. If the dataset has an even number of values, the median
is the average of the two middle numbers
MODE
- MODE(number1,[number2],...)
- the most frequently occurring value(s) in a dataset
- MODE.MULT((number1,[number2],...)
- MODE.SNGL(number1,[number2],...)
- help identify single or multiple modes
- if none of the values are repeated, it returns #N/A
FREQUENCY
- calculates how often values occur within a range of values, and
then returns a vertical array of numbers
- FREQUENCY(data_array, bins_array)
o data_array Required. An array of or reference to a set of
values for which you want to count frequencies. If data_array
contains no values, FREQUENCY returns an array of zeros
o bins_array Required. An array of or reference to intervals
into which you want to group the values in data_array. If
bins_array contains no values, FREQUENCY returns the
number of elements in data_array
CORREL
- CORREL(array1, array2)
- returns the correlation coefficient of two cell ranges. Use the
correlation coefficient to determine the relationship between two
properties. For example, you can examine the relationship
between a location's average temperature and the use of air
conditioners.
- Ex: =CORREL(A2:A14;B2:B14)
Condition:
IF
- IF(logical_test; [value_if_true]; [value_if_false])
- it can return different values based on whether a condition is true
or false. The first parameter is the condition, the second is what
the cell value should be if the condition is true, and the optional
third parameter is the cell value if the condition is false (skipping
the third parameter will otherwise just show "FALSE" in the cell)
- Ex: =IF(OR(MAX(B2:D2)>10; E2>20); "Special Order"; "No")
IFS
- =IFS([Something is True1, Value if True1,Something is
True2,Value if True2,Something is True3,Value if True3)
- checks whether one or more conditions are met, and returns a
value that corresponds to the first TRUE condition. IFS can take
the place of multiple nested IF statements, and is much easier to
read with multiple conditions
COUNTIF
- COUNTIF(range, criteria)
o range (required) The group of cells you want to count.
Range can contain numbers, arrays, a named range, or
references that contain numbers. Blank and text values are
ignored
o criteria (required) A number, expression, cell reference, or
text string that determines which cells will be counted
- count the number of cells that meet a criterion
COUNTIFS
- COUNTIFS(criteria_range1, criteria1, [criteria_range2, criteria2]…)
- applies criteria to cells across multiple ranges and counts the
number of times all criteria are met
- Ex: COUNTIFS(C:C; “>=90”; C:C; “<=90”)
SUMIF
- SUMIF(range, criteria, [sum_range])
- use the SUMIF function to sum the values in a range that meet
criteria that you specify. For example, suppose that in a column
that contains numbers, you want to sum only the values that are
larger than 5
- Ex: =SUMIF(B2:B25,">5")
SUMIFS
- SUMIFS(sum_range, criteria_range1, criteria1, [criteria_range2,
criteria2], ...)
- adds all of its arguments that meet multiple criteria. For example,
you would use SUMIFS to sum the number of retailers in the
country who (1) reside in a single zip code and (2) whose profits
exceed a specific dollar value
AND
- AND(logical1, [logical2], ...)
- one of the logical functions, to determine if all conditions in a test
are TRUE
- returns TRUE if all its arguments evaluate to TRUE, and returns
FALSE if one or more arguments evaluate to FALSE
OR
- OR(logical1, [logical2], ...)
- one of the logical functions, to determine if any conditions in a
test are TRUE
- returns TRUE if any of its arguments evaluate to TRUE, and
returns FALSE if all of its arguments evaluate to FALSE
VAR.S
- VAR.S(number1,[number2],...)
- estimated variance for sample
VAR.P
- VAR.P(number1,[number2],...)
- estimated variance for entire population
STDEV
- STDEV(number1,[number2],...)
- estimates standard deviation based on a sample. The standard
deviation is a measure of how widely values are dispersed from
the average value (the mean)
- to calculate standard deviation, first find the variance (using
VAR.P for population or VAR.S for sample), then take the square
root of the variance
- interpretation: A low standard deviation means data points are
close to the mean, while a high standard deviation means they
are spread out
- empirical rule states that for a normally distributed dataset:
68% of data falls within one standard deviation of the
mean
95% of data falls within two standard deviations of the
mean
99.7% of data falls within three standard deviations of the
mean
o the empirical rule is most accurate when the data follows a
well-centered, symmetrical, bell-shaped curve
STDEV.S
- STDEV.S(number1,[number2],...) - standard deviation based on a
sample
STDEV.P
- STDEV.P(number1,[number2],...) - standard deviation based on
the entire population
Z-scores
- how many standard deviations a data point is from the mean of a
data set
- Z-scores are used to measure the distance of a data point from
the mean, helping identify outliers in the dataset
=((data_point - mean) / standard_deviation)
=(data_point - =AVERIGE() ) / =STDEV.P())
absolute z-score
- outlier is a data point or set of values that are significantly
different from the average or expected range in a statistical
sample or division
- relative distance: indicates how far a data point is from the mean
- positive/negative values: a positive z-score means the data point
is above the mean, while a negative z-score means it is below the
mean
- probability calculation: z-scores are used to calculate
probabilities in a continuous distribution
SKEW
- SKEW(number1, [number2], ...)
- skewness measures the symmetry of the data distribution,
degree of asymmetry of a distribution around its mean
- a skewness of zero indicates a symmetrical distribution
- positive skewness means a long tail on the right
- negative skewness means a long tail on the left
- SKEW.P(number 1, [number 2],…) - returns the skewness of a
distribution based on a population
- #DIV/0! – all data points are at the mean and median, no
deviation
KURT
- KURT(number1, [number2], ...)
- kurtosis measures the "tailedness" of the data distribution
- a kurtosis of zero indicates a normal distribution
- positive kurtosis means a taller, skinnier curve
- negative kurtosis means a flatter curve
- #DIV/0! – all data points are at the mean and median, no
deviation
ABS
- ABS(number)
- returns the absolute value of a number. The absolute value of a
number is the number without its sign
Descriptive statistics
- by selecting the data range and specifying the output range, you
can easily calculate statistics such as mean, median, mode,
standard deviation, skewness, and kurtosis
- data / data analysis / descriptive statistics / input range / output
range
- data / data analysis / histogram
Probability
- = count(countif/countifs) / sum(total outcomes) - Dividing the
number of occurrences by the total possible outcomes to find
probabilities, turn to %
- probability rule for addition P(A) + P(B)
- probability of A or B is calculated as P(A) + P(B) - P(A and B)
- probability of two independent events both occurring is the
product of their individual probabilities
o Ex: the probability of rolling a six on two dice is calculated by
multiplying the probability of rolling a six on each die (1/6 *
1/6 = 1/36 or 2.79%)
- probability trees are useful tools for visualizing and calculating
the probabilities of a series of events, such as flipping a coin
multiple times, provides a systematic approach
- Discrete probability
o random variables: these are outcomes of experiments where
the result is unknown or random, such as the sum of dots on
a roll of dice or the amount of rainfall in a month
o discrete random variables: these have a limited number of
possible outcomes, such as the number of drinks a customer
orders or the sum of dots on two dice
o continuous random variables: these can take any value within
a range, like the amount of rainfall or the time you wait in
line.
o discrete mean refers to the average value of a set of discrete
random variables, which are whole numbers (e.g., number of
drinks ordered). To calculate the discrete mean:
sum the total number of observations: add up all the
counts of each discrete variable
calculate relative frequencies: divide the count of each
variable by the total number of observations
find the weighted average: multiply each discrete variable
by its relative frequency and sum these products
for example, if customers at a cafe ordered 0, 1, 2, or 3
drinks, you would calculate the mean number of drinks
ordered by following these steps
o discrete standard deviation measures the spread or variability
of a set of discrete random variables (whole numbers). It
indicates how much the values deviate from the mean
(average). To calculate it:
find the mean: calculate the average of the values
subtract the mean: for each value, subtract the mean and
square the result
multiply by relative frequency: multiply each squared
difference by its relative frequency
sum and square root: sum these values and take the
square root to get the standard deviation
PERCENTILE
- a percentile indicates the percentage of scores below a particular
value. For example, being in the 95th percentile means 95% of
scores are below yours
- PERCENTILE(array,k)
o array Required. The array or range of data that defines
relative standing
o k Required. The percentile value in the range 0..1, inclusive
- returns the k-th percentile of values in a range. You can use this
function to establish a threshold of acceptance. For example, you
can decide to examine candidates who score above the 90th
percentile
- Remarks
o If k is non-numeric, PERCENTILE returns the #VALUE! error
value.
o If k is < 0 or if k > 1, PERCENTILE returns the #NUM! error
value.
o If k is not a multiple of 1/(n - 1), PERCENTILE interpolates to
determine the value at the k-th percentile.
PERCENTILE.INC
- PERCENTILE.INC(array,k)
- Returns the k-th percentile of values in a range, where k is in the
range 0 to 1, inclusive.
- You can use the PERCENTILE.INC function to establish a threshold
of acceptance. For example, you can decide to examine
candidates who score above the 90th percentile.
PERCENTILE.EXC
- PERCENTILE.EXC(array,k)
- returns the k-th percentile of values in a range, where k is in the
range 0..1, exclusive
PERCENTRANK
- PERCENTRANK(array,x,[significance])
o array Required. The range of data (or pre-defined array) of
numeric values within which percent rank is determined.
oX Required. The value for which you want to know the rank
within the array
o significance Optional. A value that identifies the number of
significant digits for the returned percentage value. If
omitted, PERCENTRANK uses three digits (0.xxx)
- returns the rank of a value in a dataset as a percentage of the
dataset -- essentially, the relative standing of a value within the
whole dataset. For example, you could use PERCENTRANK to
determine the standing of an individual's test score among the
field of all scores for the same test
PERCENTRANK.INC
- PERCENTRANK.INC(array,x,[significance])
- returns the rank of a value in a data set as a percentage (0..1,
inclusive) of the data set
PERCENTRANK.EXC
- PERCENTRANK.EXC(array,x,[significance])
- returns the rank of a value in a data set as a percentage (0..1,
exclusive) of the data set
PERMUT
- PERMUT(number, number_chosen)
o number Required. An integer that describes the number of
objects
o number_chosen Required. An integer that describes the
number of objects in each permutation
- permutations refer to the number of ways in which objects can
be arranged in a specific order
- formula for permutations as n! / (n - x)!, where n is the total
number of objects and x is the number of objects to be selected
- Ex: PERMUT(7,3) or 7!/2!
FACT
- FACT(number)
- use a factorial to count the number of ways in which a group of
distinct items can be arranged (also called permutations)
COMBIN
- COMBIN(number, number_chosen)
- returns the number of combinations for a given number of items.
Use COMBIN to determine the total possible number of groups for
a given number of items
- combinations refer to the selection of items where the order does
not matter
- the formula for combinations is n! / [(n - x)! * x!], where n is the
total number of objects and x is the number of objects chosen at
one time
PRODUCT
- PRODUCT(number1, [number2], ...)
- multiplies all the numbers given as arguments and returns the
product
INDEX
- INDEX(array, row_num, [column_num])
- Ex: =INDEX($D$2:$D$228; RANDBETWEEN(1; 229))
- INDEX function returns a value or the reference to a value from
within a table or range
- Array
o INDEX(array, row_num, [column_num])
o if you want to return the value of a specified cell or array of
cells, see Array form
o returns the value of an element in a table or an array,
selected by the row and column number indexes
- Reference
o INDEX(reference, row_num, [column_num], [area_num])
o if you want to return a reference to specified cells, see
Reference form
o returns the reference of the cell at the intersection of a
particular row and column. If the reference is made up of non-
adjacent selections, you can pick the selection to look in
FREQUENCY
- FREQUENCY(data_array, bins_array)
- calculates how often values occur within a range of values, and
then returns a vertical array of numbers. For example, use
FREQUENCY to count the number of test scores that fall within
ranges of scores. Because FREQUENCY returns an array, it must
be entered as an array formula
Distribution:
- in statistics, a distribution refers to how values in a dataset are
spread out or arranged. It shows the frequency of different
outcomes in a dataset. For example, a normal distribution (often
called a bell curve) is symmetrical and describes how data points
are distributed around the mean
- z-distribution - also known as the standard normal
distribution, is a bell-shaped distribution that is symmetrical
around the mean
o it is used when the sample size is large and the population
standard deviation is known
o the area under the z-distribution curve is equal to one
o it has only one curve, which means there is a single table for
z-scores.
- t-distribution is a type of probability distribution that is similar
to the normal distribution but is used when the sample size is
small, and the population standard deviation is unknown
o it consists of multiple curves, each representing different
sample size
o the shape of the T Distribution depends on the degrees of
freedom, which is calculated as the sample size minus one (n-
1). Smaller sample sizes result in flatter curves, while larger
sample sizes make the T Distribution look more like the
normal distribution
- degrees of freedom (DF) is calculated as the sample size
minus one (n-1). It affects the shape of the t distribution curve
BINOM.DIST
- BINOM.DIST(number_s,trials,probability_s,cumulative)
o Number_s Required. The number of successes in trials
o Trials Required. The number of independent trials
o Probability_s Required. The probability of success on each
trial
o Cumulative Required. A logical value that determines the
form of the function
If cumulative is TRUE, then BINOM.DIST returns the
cumulative distribution function, which is the probability
that there are at most number_s successes
if FALSE, it returns the probability mass function, which is
the probability that there are number_s successes
- binomial distributions describe the probability of achieving a
specific number of successes in a fixed number of trials, where
each trial has only two possible outcomes (success or failure)
- returns the individual term binomial distribution probability. Use
BINOM.DIST in problems with a fixed number of tests or trials,
when the outcomes of any trial are only success or failure, when
trials are independent, and when the probability of success is
constant throughout the experiment. For example, BINOM.DIST
can calculate the probability that two of the next three babies
born are male
NORM.S.DIST
- NORM.S.DIST(z,cumulative)
o z Required. This is the value for which you want the
distribution.
o cumulative Required. The cumulative argument can be
either TRUE or FALSE. This logical value determines the form
of the function
if cumulative is TRUE then NORM.S.DIST returns the
cumulative distribution function
if it is FALSE, it returns the probability mass function
cumulative probability refers to the probability that a
random variable is less than or equal to a certain value. In
the context of the normal distribution, it represents the
area under the curve to the left of a specific Z value
NORM.S.INV
- NORM.S.INV(probability)
- returns the inverse of the standard normal cumulative
distribution
- the distribution has a mean of zero and a standard deviation of
one
T.INV
- T.INV(probability,deg_freedom)
- used to find the t-value that corresponds to a given probability
and degrees of freedom
- useful when you have a small sample size and do not know the
population standard deviation
T.DIST
- T.DIST(x,deg_freedom, cumulative)
- used to calculate the cumulative distribution of the t-distribution
T.DIST.RT
- T.DIST.RT(x,deg_freedom)
- returns the right-tailed t-distribution
- useful when you want to determine the probability that a t-
statistic is greater than a specific value
T.DIST.2T
- T.DIST.2T(x,deg_freedom)
- returns the two-tailed t-distribution
- helps determine the probability that a value from a t-distribution
lies within two tails
SQRT
- SQRT(number)
- returns a positive square root
SQRTPI
- SQRTPI(number)
- returns the square root of (number * pi)
Text manipulation:
SUBSTITUTE
- SUBSTITUTE(text, old_text, new_text, [instance_num])
- substitutes new_text for old_text in a text string
- use SUBSTITUTE when you want to replace specific text in a text
string
REPLACE
- REPLACE(old_text, start_num, num_chars, new_text)
- use REPLACE when you want to replace any text that occurs in a
specific location in a text string
FIND
- FIND(find_text, within_text, [start_num])
- FIND locates one text string within a second text string, and
return the number of the starting position of the first text string
from the first character of the second text string
LEFT
- LEFT(text, [num_chars])
- LEFT returns the first character or characters in a text string,
based on the number of characters you specify
LEFTB
- LEFTB(text, [num_bytes])
- LEFTB returns the first character or characters in a text string,
based on the number of bytes you specify
RIGHT
- RIGHT(text,[num_chars])
- RIGHT returns the last character or characters in a text string,
based on the number of characters you specify.
MID
- MID(text, start_num, num_chars)
- MID returns a specific number of characters from a text string,
starting at the position you specify, based on the number of
characters you specify.
FORMULATEXT
- FORMULATEXT(reference)
- returns a formula as a string
CONCAT
- CONCAT(text1, [text2],…)
- CONCAT function combines the text from multiple ranges and/or
strings
will join together two or more strings. It's important to note that this
will not automatically add spaces between them, so make sure to
add spaces as formula parameters if you need them
TRIM
- TRIM(text)
- removes all spaces from text except for single spaces between
words
will help to remove excess whitespace from a string
PROPER
- PROPER(text)
- sets the first letter of each word to upper case, with the rest
lowercase
- capitalizes the first letter in a text string and any other letters in
text that follow any character other than a letter. Converts all
other letters to lowercase letters
UPPER
- UPPER(text)
- UPPER sets all letters to upper case
LOWER
- LOWER(text)
- LOWER sets all letters to lowercase
LOOKUP
- LOOKUP(lookup_value, lookup_vector, [result_vector])
- lookup function: a function that uses a keyword and index to
"look up" a value in a table. There are both horizontal and
vertical lookup functions
- when you need to look in a single row or column and find a value
from the same position in a second row or column
VLOOKUP
- VLOOKUP (lookup_value, table_array, col_index_num,
[range_lookup])
- when you need to find things in a table or a range by row
HLOOKUP
- HLOOKUP(lookup_value, table_array, row_index_num,
[range_lookup])
- when your comparison values are located in a row across the top
of a table of data, and you want to look down a specified number
of rows
Date
TODAY
- TODAY()
- returns the serial number of the current date
- if you want to view the serial number, you must change the cell
format to General or Number
- Ex: =TODAY()+5
EOMONTH
- EOMONTH(start_date, months)
- returns the serial number for the last day of the month that is the
indicated number of months before or after start_date
INDEX
- INDEX(array, row_num, [column_num])
- returns a value or the reference to a value from within a table or
range
- INDEX(reference, row_num, [column_num], [area_num])
- returns the reference of the cell at the intersection of a particular
row and column
MATCH
- MATCH(lookup_value, lookup_array, [match_type])
- searches for a specified item in a range of cells, and then returns
the relative position of that item in the range
- use MATCH instead of one of the LOOKUP functions when you
need the position of an item in a range instead of the item itself.
For example, you might use the MATCH function to provide a
value for the row_num argument of the INDEX function
- Ex: INDEX(color_, MATCH(Banana, fruit_, 0)) – returns Yellow,
since it is in row 3 of the array color_ - ONE CRITERIA
INDEX(score_,MATCH(1,
(“Sweet”=taste_)*(“Bright”=smell_),0) – MULTIPLE CRITERIAs
OFFSET
- OFFSET(reference, rows, cols, [height], [width])
- the purpose of the OFFSET function is to return a range that is a
specified number of rows and columns from a reference cell or
range
- returns a reference to a range that is a specified number of rows
and columns from a cell or range of cells
- the reference that is returned can be a single cell or a range of
cells
- you can specify the number of rows and the number of columns
to be returned
ssws