Practical Application Statistics
Practical Application Statistics
PRACTICAL APPLICATION OF
STATISTICS
Module Writer:
(IPIEF)
Department of Economics
June 2020
Contents
APPLICATIONS: DATA ANALYSIS IN EXCEL .............................................................................................. 1
PART I: INTRODUCTION ...................................................................................................................... 1
What is Microsoft Excel? ................................................................................................................. 1
Why Should Learn Microsoft Excel? ................................................................................................ 2
Where can get Microsoft Excel? ...................................................................................................... 2
How to Open Microsoft Excel? ........................................................................................................ 2
Understanding the Ribbon .............................................................................................................. 3
Understanding the worksheet......................................................................................................... 3
Customization Microsoft Excel Environment................................................................................... 4
Important Excel shortcuts ............................................................................................................... 9
PART II: EXCEL BASICS ....................................................................................................................... 10
The Excel interface ........................................................................................................................ 10
Cell Basics ...................................................................................................................................... 11
PART III: FORMULAS AND FUNCTIONS .............................................................................................. 14
Mathematical operators ............................................................................................................... 14
The order of operations ................................................................................................................ 14
Creating complex formulas ........................................................................................................... 15
Functions....................................................................................................................................... 17
Logical Functions ........................................................................................................................... 23
Cell References .............................................................................................................................. 24
The Function Library ..................................................................................................................... 28
Statistical Functions ...................................................................................................................... 28
PART IV: DATA ANALYSIS .................................................................................................................. 32
Sorting data................................................................................................................................... 32
Filtering data ................................................................................................................................. 33
Charts ............................................................................................................................................ 33
Analysis ToolPak ........................................................................................................................... 34
Histogram ..................................................................................................................................... 35
Descriptive Statistics ..................................................................................................................... 36
ANOVA – Analysis of Variance ...................................................................................................... 37
Sampling ....................................................................................................................................... 41
Covariance .................................................................................................................................... 41
Correlation .................................................................................................................................... 43
Regression ..................................................................................................................................... 45
F-Test ............................................................................................................................................ 48
t-Test ............................................................................................................................................. 50
z -Test ............................................................................................................................................ 54
One-Tailed Test ............................................................................................................................. 56
Two-Tailed Test ............................................................................................................................. 58
REFERENCES ...................................................................................................................................... 61
APPLICATIONS: DATA ANALYSIS IN EXCEL
PART I: INTRODUCTION
Microsoft Excel is one of the most used software applications of all time. You can use Excel to
enter all sorts of data and perform financial, mathematical or statistical calculations. The basic
Excel skills are – familiarity with Excel ribbons & UI, ability to enter and format data, calculate
totals & summaries thru formulas, highlight data that meets certain conditions, creating simple
reports & charts, understanding the importance of keyboard shortcuts & productivity tricks.
Analysts, consultants, marketing professionals, bankers, and accountants all use Excel on a
consistent basis. You might even find that other random professionals like graphic designers and
engineers are working away with the powerful formulas and charts that come with Excel.
1
Why Should Learn Microsoft Excel?
We all deal with numbers in one way or the other. We all have daily expenses which we pay for
from the monthly income that we earn. For one to spend wisely, they will need to know their
income vs. expenditure. Microsoft Excel comes in handy when we want to record, analyze and
store such numeric data.
Learning Excel might even improve your job opportunities if you lack experience. People who
didn’t go to Statistics College can get into coding, of course, but the same can be said about
Microsoft Excel. Upon completing the right training, you automatically make yourself more
valuable in the modern day workforce.
Alternatively, you can also open it from the start menu if it has been added there. You can also
open it from the desktop shortcut if you have created one or by clicking right in desktop and select
new then Microsoft Excel Worksheet.
2
Understanding the Ribbon
The ribbon provides shortcuts to commands in Excel. A command is an action that the user
performs. An example of a command is creating a new document, printing a documenting, etc.
The image below shows the ribbon used in Excel 2013.
Ribbon components
Ribbon start button: it is used to access commands i.e. creating new documents, saving existing
work, printing, accessing the options for customizing Excel, etc.
Ribbon tabs: the tabs are used to group similar commands together. The home tab is used for
basic commands such as formatting the data to make it more presentable, sorting and finding
specific data within the spreadsheet.
Ribbon bar: the bars are used to group similar commands together. As an example, the Alignment
ribbon bar is used to group all the commands that are used to align data together.
A workbook is a collection of worksheets. By default, a workbook has three cells in Excel. You
can delete or add more sheets to suit your requirements. By default, the sheets are named Sheet1,
3
Sheet2 and so on and so forth. You can rename the sheet names to more meaningful names i.e.
Daily Expenses, Monthly Budget, etc.
Customization of ribbon
4
The above image shows the default ribbon in Excel 2013. Let's start with customization the ribbon,
suppose you do not wish to see some of the tabs on the ribbon, or you would like to add some tabs
that are missing such as the developer tab. You can use the options window to achieve this.
- On your right-hand side, remove the check marks from the tabs that you do not wish to see
on the ribbon. For this example, we have removed Page Layout, Review, and View tab.
5
Setting the color theme
To set the color-theme for your Excel sheet you have to go to Excel ribbon, and click on a “File
Option” command. It will open a window where you have to follow the following steps.
2. Look for color scheme under General options for working with Excel
3. Click on the color scheme drop-down list and select the desired color
4. Click on OK button
This option allows you to define how Excel behaves when you are working with formulas. You
can use it to set options i.e. autocomplete when entering formulas, change the cell referencing style
and use numbers for both columns and rows and other options.
6
If you want to activate an option, click on its check box. If you want to deactivate an option,
remove the mark from the checkbox. You can do this option from the Options dialogue window
under formulas tab from the left-hand side panel.
Proofing settings
This option manipulates the entered text entered into excel. It allows setting options such as the
dictionary language that should be used when checking for wrong spellings, suggestions from the
dictionary, etc. You can this option from the options dialogue window under the proofing tab from
the left-hand side panel.
7
Save settings
8
This option allows you to define the default file format when saving files, enable auto recovery in
case your computer goes off before you could save your work, etc. You can use this option from
the Options dialogue window under save tab from the left-hand side panel.
Summary
9
PART II: EXCEL BASICS
Excel is a spreadsheet program that allows you to store, organize, and analyze information. While
you may believe Excel is only used by certain people to process complicated data, anyone can
learn how to take advantage of the program's powerful features. Whether you're keeping a budget,
organizing a training log, or creating an invoice, Excel makes it easy to work with different types
of data.
1. The Quick Access Toolbar lets you access common commands no matter which tab is
selected. You can customize the commands depending on your preference.
10
2. The Ribbon contains all of the commands you will need to perform common tasks in Excel.
It has multiple tabs, each with several groups of commands.
3. The Tell me box works like a search bar to help you quickly find tools or commands you
want to use.
4. The Name box displays the location, or name, of a selected cell.
5. In the formula bar, you can enter or edit data, a formula, or a function that will appear in a
specific cell.
6. A column is a group of cells that runs from the top of the page to the bottom. In Excel,
columns are identified by letters.
7. Each rectangle in a workbook is called a cell. A cell is the intersection of a row and a
column. Simply click to select a cell.
8. A row is a group of cells that runs from the left of the page to the right. In Excel, rows are
identified by numbers.
9. Excel files are called workbooks. Each workbook holds one or more worksheets. Click the
tabs to switch between them, or right-click for more options.
10. There are three ways to view a worksheet. Simply click a command to select the desired
view.
11. Click and drag the slider to use the zoom control. The number to the right of the slider
reflects the zoom percentage.
12. The scroll bars allow you to scroll up and down or side to side. To do this, click and drag
the vertical or horizontal scroll bar.
Cell Basics
Whenever you work with Excel, you'll enter information—or content—into cells. Cells are the
basic building blocks of a worksheet. You'll need to learn the basics of cells and cell content to
calculate, analyze, and organize data in Excel.
Every worksheet is made up of thousands of rectangles, which are called cells. A cell is the
intersection of a row and a column—in other words, where a row and column meet.
11
Columns are identified by letters (A, B, C), while rows are identified by numbers (1, 2, 3). Each
cell has its own name—or cell address—based on its column and row. In the example below, the
selected cell intersects column B and row 2, so the cell address is B2.
Format Cells: When we format cells in Excel, we change the appearance of a number without
changing the number itself. We can apply a number format (0.8, $0.80, 80%, etc) or other
formatting (alignment, font, border, etc).
12
Cell Style
13
PART III: FORMULAS AND FUNCTIONS
One of the most powerful features in Excel is the ability to calculate numerical information using
formulas. Just like a calculator, Excel can add, subtract, multiply, and divide. You may have
experience working with formulas that contain only one operator, such as 7+9. More complex
formulas can contain several mathematical operators, such as 5+2*8. When there's more than one
operation in a formula, the order of operations tells Excel which operation to calculate first. To
write formulas that will give you the correct answer, you'll need to understand the order of
operations.
Mathematical operators
Excel uses standard operators for formulas, such as a plus sign for addition (+), a minus sign for
subtraction (-), an asterisk for multiplication (*), a forward slash for division (/), and a caret (^) for
exponents. All formulas in Excel must begin with an equals sign (=). This is because the cell
contains, or is equal to, the formula and the value it calculates.
14
3. Multiplication and division, whichever comes first
Excel follows the order of operations and first adds the values inside the parentheses:
(45.80+68.70+159.60) = 274.10. It then multiplies that value by the tax rate: 274.10*0.075. The
result will show that the sales tax is $20.56.
15
Note: It's especially important to follow the order of operations when creating a formula.
Otherwise, Excel won't calculate the results accurately. In our example, if the parentheses
are not included, the multiplication is calculated first and the result is incorrect.
Parentheses are often the best way to define which calculations will be performed first in
Excel.
In the example below, we'll use cell references along with numerical values to create a complex
formula that will calculate the subtotal for a catering invoice. The formula will calculate the cost
of each menu item first, then add these values.
1. Select the cell that will contain the formula. In our example, we'll select cell C5
2. Enter your formula. In our example, we'll type =B3*C3+B4*C4. This formula will follow
the order of operations, first performing the multiplication: 2.79*35 = 97.65 and 2.29*20
= 45.80. It then will add these values to calculate the total: 97.65+45.80.
3. Double-check your formula for accuracy, then press Enter on your keyboard. The formula
will calculate and display the result. In our example, the result shows that the subtotal for
the order is $143.45.
16
Note: You can add parentheses to any equation to make it easier to read. While it won't
change the result of the formula in this example, we could enclose the multiplication
operations within parentheses to clarify that they will be calculated before the addition.
Functions
A function is a predefined formula that performs calculations using specific values in a particular
order. Excel includes many common functions that can be used to quickly find the sum, average,
count, maximum value, and minimum value for a range of cells. In order to use functions correctly,
you'll need to understand the different parts of a function and how to create arguments to calculate
values and cell references.
The parts of a function: In order to work correctly, a function must be written a specific way,
which is called the syntax. The basic syntax for a function is the equals sign (=), the function name
(SUM, for example), and one or more arguments. Arguments contain the information you want to
calculate. The function in the example below would add the values of the cell range A1:A20.
17
Working with arguments: Arguments can refer to both individual cells and cell ranges and must
be enclosed within parentheses. You can include one argument or multiple arguments, depending
on the syntax required for the function.
For example, the function =AVERAGE (B1:B9) would calculate the average of the values in the
cell range B1:B9. This function contains only one argument.
Multiple arguments must be separated by a comma. For example, the function =SUM (A1:A3,
C1:C2, E1) will add the values of all of the cells in the three arguments.
Creating a function: There are a variety of functions available in Excel. Here are some of the most
common functions you'll use:
SUM: This function adds all of the values of the cells in the argument.
18
AVERAGE: This function determines the average of the values included in the argument.
It calculates the sum of the cells and then divides that value by the number of cells in the
argument.
COUNT: This function counts the number of cells with numerical data in the argument.
This function is useful for quickly counting items in a cell range.
MAX: This function determines the highest cell value included in the argument.
MIN: This function determines the lowest cell value included in the argument.
AutoSum command: The AutoSum command allows you to automatically insert the most
common functions into your formula, including SUM, AVERAGE, COUNT, MIN, and MAX. In
the example below, we'll use the SUM function to calculate the total cost for a list of recently
ordered items.
Example:
1. Select the cell that will contain the function. In our example, we'll select cell D13.
2. In the Editing group on the Home tab, click the arrow next to the AutoSum command.
Next, choose the desired function from the drop-down menu. In our example, we'll select
Sum.
3. Excel will place the function in the cell and automatically select a cell range for the
argument. In our example, cells D3:D12 were selected automatically; their values will be
added to calculate the total cost. If Excel selects the wrong cell range, you can manually
enter the desired cells into the argument.
4. Press Enter on your keyboard. The function will be calculated, and the result will appear
in the cell. In our example, the sum of D3:D12 is $765.29.
19
Note: The AutoSum command can also be accessed from the Formulas tab on the Ribbon.
Count and Sum Functions: The most used functions in Excel are the functions that count and
Sum. You can count and sum based on one criteria or multiple criteria.
1. Count:
To count the number of cells that contain numbers, use the COUNT function.
20
2. Countif
To count cells based on one criteria (for example, greater than 9), use the following COUNTIF
function.
3. Countifs
To count cells based on multiple criteria (for example, green and greater than 9), use the following
COUNTIFS function.
4. Sum
21
5. Sumif
To sum cells based on one criteria (for example, greater than 9), use the following SUMIF function
(two arguments).
6. Sumifs
To sum cells based on multiple criteria (for example, circle and red), use the following SUMIFS
function (first argument is the range to sum).
22
Logical Functions
Learn how to use Excel's logical functions, such as IF, AND, OR and NOT.
1. If
The IF function checks whether a condition is met, and returns one value if true and another value
if false. For example, take a look at the IF function in cell C2 below. If the score is greater than or
equal to 60, the IF function returns Pass, else it returns Fail.
2. And
The AND Function returns TRUE if all conditions are true and returns FALSE if any of the
conditions are false. For example, take a look at the AND function in cell D2 below. The AND
function returns TRUE if the first score is greater than or equal to 60 and the second score is greater
than or equal to 90, else it returns FALSE.
3. Or: The OR function returns TRUE if any of the conditions are TRUE and returns FALSE
if all conditions are false. For example, take a look at the OR function in cell D2 below.
The OR function returns TRUE if at least one score is greater than or equal to 60, else it
returns FALSE.
23
4. Not
The NOT function changes TRUE to FALSE, and FALSE to TRUE. For example, take a look at
the NOT function in cell D2 below. In this example, the NOT function reverses the result of the
OR function.
Cell References
Cell references in Excel are very important. Understand the difference between relative, absolute
and mixed reference, and you are on your way to success.
1. Relative Reference
By default, Excel uses relative references. See the formula in cell D2 below. Cell D2 references
(points to) cell B2 and cell C2. Both references are relative.
24
- Select cell D2, click on the lower right corner of cell D2 and drag it down to cell D5.
- Cell D3 references cell B3 and cell C3. Cell D4 references cell B4 and cell C4. Cell D5
references cell B5 and cell C5. In other words: each cell references its two neighbors on
the left.
2. Absolute Reference
- To create an absolute reference to cell H3, place a $ symbol in front of the column letter
and row number ($H$3) in the formula of cell E3.
25
- Now we can quickly drag this formula to the other cells.
The reference to cell H3 is fixed (when we drag the formula down and across). As a result, the
correct lengths and widths in inches are calculated.
3. Mixed Reference
- We want to copy this formula to the other cells quickly. Drag cell F2 across one cell, and
look at the formula in cell G2.
26
- Do you see what happens? The reference to the price should be a fixed reference to
column B.
- Place a $ symbol in front of the column letter ($B2) in the formula of cell F2. In a similar
way, when we drag cell F2 down, the reference to the reduction should be a fixed reference
to row 6. Solution: place a $ symbol in front of the row number (B$6) in the formula of
cell F2.
Note: we don't place a $ symbol in front of the row number of $B2 (this way we allow the reference
to change from $B2 (Jeans) to $B3 (Shirts) when we drag the formula down). In a similar way, we
don't place a $ symbol in front of the column letter of B$6 (this way we allow the reference to
change from B$6 (Jan) to C$6 (Feb) and D$6 (Mar) when we drag the formula across).
27
The Function Library
While there are hundreds of functions in Excel, the ones you'll use the most will depend on the
type of data your workbooks contain. There's no need to learn every single function, but exploring
some of the different types of functions will help you as you create new projects. You can even
use the Function Library on the Formulas tab to browse functions by category, such as Financial,
Logical, Text, and Date & Time.
To access the Function Library, select the Formulas tab on the Ribbon. Look for the Function
Library group.
Statistical Functions
This part gives an overview of some very useful statistical functions in Excel.
Average: To calculate the average of a group of numbers, use the AVERAGE function.
28
Averageif: To average cells based on one criteria, use the AVERAGEIF function. For example, to
calculate the average excluding zeros.
Median: To find the median (or middle number), use the MEDIAN function.
Mode: To find the most frequently occurring number, use the MODE function.
29
Standard Deviation: To calculate the standard deviation, use the STEDV function. Standard
deviation is a number that tells you how far numbers are from their mean.
Large: To find the third largest number, use the following LARGE function.
30
Small:
To find the second smallest number, use the following SMALL function.
31
PART IV: DATA ANALYSIS
Excel workbooks are designed to store a lot of information. Whether you're working with 20 cells
or 20,000, Excel has several features to help you organize your data and find what you need. You
can see some of the most useful features below. And be sure to check out the other lessons in this
tutorial to get step-by-step instructions for each of these features.
Sorting data
You can sort your Excel data on one column or multiple columns. You can sort in ascending or
descending order. You can quickly reorganize a worksheet by sorting your data. Content can be
sorted alphabetically, numerically, and in many other ways. For example, you could organize a list
of information by last name.
32
Filtering data
Filter your Excel data if you only want to display records that meet certain criteria.
Charts
A simple chart in Excel can say more than a sheet full of numbers. As you'll see, creating charts is
very easy. Line charts are used to display trends over time. Use a line chart if you have text labels,
dates or a few numeric labels on the horizontal axis. Use a scatter plot (XY chart) to show scientific
XY data. Pie charts are used to display the contribution of each value (slice) to a total (pie). Pie
charts always use one data series. A bar chart is the horizontal version of a column chart. Use a
bar chart if you have large text labels. An area chart is a line chart with the areas below the lines
filled with colors. Use a stacked area chart to display the contribution of each value to a total over
time. Use a scatter plot (XY chart) to show scientific XY data. Scatter plots are often used to find
out if there's a relationship between variable X and Y.
33
Analysis ToolPak
The Analysis ToolPak is an Excel add-in program that provides data analysis tools for financial,
statistical and engineering data analysis.
34
Histogram
This example teaches you how to create a histogram in Excel.
1. First, enter the bin numbers (upper levels) in the range C4:C8.
2. On the Data tab, in the Analysis group, click Data Analysis.
3. Select Histogram and click OK.
4. Select the range A2:A19.
5. Click in the Bin Range box and select the range C4:C8.
6. Click the Output Range option button, click in the Output Range box and select cell F3.
7. Check Chart Output.
35
Result
Descriptive Statistics
Descriptive statistics are used to describe the basic features of the data in a study. You can use the
Analysis ToolPak add-in to generate descriptive statistics. For example, you may have the scores
of 14 participants for a test.
To generate descriptive statistics for these scores, execute the following steps.
36
ANOVA – Analysis of Variance
ANOVA is a collection of statistical models and their associated estimation procedures (such as
the "variation" among and between groups) used to analyze the differences among group means in
a sample. Analysis of variance (ANOVA) is a statistical technique that is used to check if the
means of two or more groups are significantly different from each other. ANOVA checks the
impact of one or more factors by comparing the means of different samples.
This example teaches you how to perform a single factor ANOVA (analysis of variance) in Excel.
A single factor or one-way ANOVA is used to test the null hypothesis that the means of several
populations are all equal.
Example:
Below you can find the salaries of people who have a degree in economics, medicine or history.
37
H0: μ1 = μ2 = μ3
Result:
38
Info: df = Degree of freedom, K-1, K = Number of groups, K(n-1), n=number of sample, SS = Sum of
Squares, MS = Mean of Squares = SS/df, F = MSB/MSW
Conclusion: if F > F critical, we reject the null hypothesis. This is the case, 15.196 > 3.443.
Therefore, we reject the null hypothesis. The means of the three populations are not all equal. At
least one of the means is different. However, the ANOVA does not tell you where the difference
lies. You need a t-Test to test each pair of means.
Objective: Analysis the differences between groups with more than two factors
Example:
Results:
ANOVA
Source of Variation SS df MS F P-value F crit
Sample 6,25 1 6,25 3,338 0,078 4,171
Columns 5,056 2 2,53 1,350 0,275 3,316
Interaction 3,5 2 1,75 0,935 0,404 3,316
Within 56,17 30 1,872
0.05
Total 70,972 35
39
Conclusion: Samples: P-value> .05 means not significant. Also F < F Critical means no difference
between groups. Columns: P-value> .05 means not significant. Also F < F Critical means no
difference between Factors (math, statistics and language). Interaction also not significant.
Objective: Analysis the differences within group with more than two factors
Example:
Result:
ANOVA
Source of Variation SS df MS F P-value F crit
Rows 10,278 5 2,056 3,136 0,058 3,326
Columns 5,444 2 2,722 4,153 0,049 4,103
Error 6,556 10 0,656
Total 22,278 17
Conclusion: difference between individuals (Students) is not critical P-value .058> .05 also F < F
critical. If we are looking for differences between every student answer H0 is not rejected means
no difference. If we are looking for differences between subjects (math, statistics and language)
H0 is rejected means there is difference.
40
Sampling
Samples are parts of a population. For example, you might have a list of information on 100 people
(your “sample”) out of 10,000 people (the “population”). You can use that list to make some
assumptions about the entire population’s behavior.
Example:
Covariance
Covariance is a measure of how much two random variables vary together. It’s similar to variance,
but where variance tells you how a single variable varies, co variance tells you how two variables
vary together.
41
Objective: Covariance gives you a positive number if the variables are positively related. You’ll
get a negative number if they are negatively related. A high covariance basically indicates there is
a strong relationship between the variables. A low value means there is a weak relationship.
Example:
1. Enter your data into two columns in Excel. For example, type your X values into column
A and your Y values into column B.
2. Click the “Data” tab and then click “Data analysis.” The Data Analysis window will open.
3. Choose “Covariance” and then click “OK.”
4. Click “Input Range” and then select all of your data. Include column headers if you have
them.
5. Click the “Labels in First Row” check box if you have included column headers in your
data selection.
6. Select “Output Range” and then select an area on the worksheet. A good place to select is
an area just to the right of your data set.
7. Click “OK.” The covariance will appear in the area you selected in Step 5.
Note: Run the correlation function in Excel after you run covariance in Excel 2013.
Correlation will give you a value for the relationship. 1 is perfect correlation and 0 is no
correlation. All you can really tell from covariance is if there is a positive or negative
relationship.
42
Correlation
The correlation coefficient (a value between -1 and +1) tells you how strongly two variables are
related to each other. We can use the CORREL function or the Analysis Toolpak add-in in Excel
to find the correlation coefficient between two variables.
Example:
43
A correlation coefficient near 0 indicates no correlation.
To use the Analysis ToolPak add-in in Excel to quickly generate correlation coefficients between
multiple variables, execute the following steps.
Result:
Conclusion: variables A and C are positively correlated (0.91). Variables A and B are not
correlated (0.19). Variables B and C are also not correlated (0.11). You can verify these
conclusions by looking at the graph.
44
Regression
Regression analysis is a set of statistical processes for estimating the relationships between a
dependent variable and one or more independent variables. The most common form of regression
analysis is linear regression, in which a researcher finds the line (or a more complex linear
combination) that most closely fits the data according to a specific mathematical criterion.
Example: This example teaches you how to run a linear regression analysis in Excel and how to
interpret the Summary Output.
Below you can find data. The big question is: is there a relation between Quantity Sold (Output)
and Price and Advertising (Input). In other words: can we predict Quantity sold if we know Price
and Advertising?
Analysis Steps:
45
7. Check Residuals.
8. Click OK.
1. R Square
R Square equals 0.962, which is a very good fit. 96% of the variation in Quantity Sold is explained
by the independent variables Price and Advertising. The closer to 1, the better the regression line
(read on) fits the data.
To check if your results are reliable (statistically significant), look at Significance F (0.001). If this
value is less than 0.05, you're OK. If Significance F is greater than 0.05, it's probably better to stop
using this set of independent variables. Delete a variable with a high P-value (greater than 0.05)
and rerun the regression until Significance F drops below 0.05.
Note: Most or all P-values should be below 0.05. In our example this is the case. (0.000,
0.001 and 0.005).
46
3. Coefficients
The regression line is: y = Quantity Sold = 8536.214 -835.722 * Price + 0.592 * Advertising.
Y = a + b1 X1 + B2 X2
In other words, for each unit increase in price, Quantity Sold decreases with 835.722 units. For
each unit increase in Advertising, Quantity Sold increases with 0.592 units. This is valuable
information.
You can also use these coefficients to do a forecast. For example, if price equals $4 and Advertising
equals $3000, you might be able to achieve a Quantity Sold of 8536.214 -835.722 * 4 + 0.592 *
3000 = 6970.
4. Residuals
The residuals show you how far away the actual data points are form the predicted data points
(using the equation). For example, the first data point equals 8500. Using the equation, the
predicted data point equals 8536.214 -835.722 * 2 + 0.592 * 2800 = 8523.009, giving a residual
of 8500 - 8523.009 = -23.009.
47
F-Test
An “F Test” is a catch-all term for any test that uses the F-distribution. In most cases, when we
talk about the F-Test, what we are actually talking about is The F-Test to Compare Two Variances.
If you’re running an F Test using technology (for example, an F Test two sample for variances in
Excel), the only steps you really need to do are Step 1 and 4 (dealing with the null hypothesis).
Technology will calculate Steps 2 and 3 for you.
Example:
This example teaches you how to perform an F-Test in Excel. The F-Test is used to test the null
hypothesis that the variances of two populations are equal.
Below you can find the study hours of 6 female students and 5 male students.
48
H0: σ12 = σ22
H1: σ12 ≠ σ22
Result:
Note: be sure that the variance of Variable 1 is higher than the variance of Variable 2.
This is the case, 160 > 21.7. If not, swap your data. As a result, Excel calculates the correct
F value, which is the ratio of Variance 1 to Variance 2 (F = 160 / 21.7 = 7.373).
49
Conclusion: if F > F Critical one-tail, we reject the null hypothesis. This is the case, 7.373 > 6.256.
Therefore, we reject the null hypothesis. The variances of the two populations are unequal.
t-Test
The t test tells you how significant the differences between groups are; In other words it lets you
know if those differences (measured in means/averages) could have happened by chance.
The T Score.
The t score is a ratio between the difference between two groups and the difference within the
groups. The larger the t score, the more difference there is between groups. The smaller the t score,
the more similarity there is between groups. A t score of 3 means that the groups are three times
as different from each other as they are within each other. When you run a t test, the bigger the t-
value, the more likely it is that the results are repeatable.
Every t-value has a p-value to go with it. A p-value is the probability that the results from your
sample data occurred by chance. P-values are from 0% to 100%. They are usually written as a
decimal. For example, a p value of 5% is 0.05. Low p-values are good; they indicate your data did
not occur by chance. For example, a p-value of .01 means there is only a 1% probability that the
results from an experiment happened by chance. In most cases, a p-value of 0.05 (5%) is accepted
to mean the data is valid.
50
Paired Two Sample for Means is used when your sample observations are naturally paired. The
usual reason for performing this test is when you are testing the same group twice. For example,
if you are testing a new drug, you’ll want to compare the sample before and after they take the
drug to see if the results are different. This particular t test in Excel used a paired two-sample test
to determine if the before and after observations are likely to have been derived from distributions
with equal population means.
Objective: A two sample t test for means is normally used when you are testing twice on the same
subject. For example, in a medical trial you might want to know if a particular medicine is effective
so you test patients before the medication is administered and after. The t-test can tell you if the
results from the trial have statistical significance (i.e. it worked) or if the results probably occurred
by chance.
A two sample t test assuming equal variances is used to test data to see if there is statistical
significance or if the results may have occurred randomly. This is one of three t tests available in
Excel and of the three, it’s the one least likely to be used. Why? In the vast majority of cases in
hypothesis testing, you don’t know the population variances. This test should only be used if you
have been explicitly informed that the population variances are equal. If you don’t have this
information, you should be running the other t test (Two sample t test Assuming Unequal
variances).
Result:
1. Compare the alpha level you types into the two sample t test Assuming Equal variances
window (i.e. 0.05) to the alpha level listed in the output on the worksheet. If the alpha
level in the output is larger than the alpha level you chose, you will be unable to reject the
null hypothesis.
2. Compare the t-critical value in the output on the worksheet with the t-value listed. If the t-
value is larger than the t-critical value, you can reject the null hypothesis. There are two t-
51
critical values, one-tail and two-tail. If you aren’t sure if you have a one-tailed test or a
two-tailed test, always compare the t-value to the two-tail t critical value.
A two sample t test assuming unequal variances is the most common type of t test in Excel 2013.
You have three options in Excel for t tests: assuming equal variances, assuming unequal variances
and a paired two sample. The paired two sample for means in Excel is generally used if you have
a sample you’re testing twice (i.e. a “Before” and an “After”) while the two sample test assuming
equal variances is only used on the very rare occasion you know the population variance.
Result:
1. Reject the null hypothesis if the alpha level in the output is smaller than your stated alpha
level. For example, if the alpha level in the output is 0.03 and your alpha level from Step
9 was 0.05, you can reject the null hypothesis.
2. Compare the t-value with the t-critical value. If the t-value is larger than the t-critical value,
reject the null hypothesis. There are two t-critical values—one for a one-tailed test and one
for a two-tailed test. If you don’t know if you have a one or two tailed test, use the two
tailed test figure (How to tell if you have a one-tailed test or a two-tailed test).
Steps:
1. Type your data into Excel. As the two sample t test paired two sample for means is usually
used for “before” and “after” data, you’ll probably have three columns: the first column
for the subject identifier (i.e. a name or a number), the second column for the Before results
and the third column for the After Results.
2. State your null hypothesis. For example, your null hypothesis might be that the means are
the same.
3. Click the “Data” tab and then click “Data analysis”.
4. Click “t test paired two sample for means” from the options window then click “OK.”
52
5. Click the “Variable 1 Range” box and then select your first variable list (usually the Before
list).
6. Click the “Variable 2 Range” box and then select your second variable list (usually the
after list).
7. Type a number into the Hypothesized Mean Difference box. For example, if your null
hypothesis stated that there was no difference between the means, enter “0.” Otherwise, if
you are hypothesizing there is a difference, type that difference into the box.
8. Check the “Labels” box if you have included labels.
9. Type an alpha level into the alpha level box. An alpha level of 0.05, or 5%, is standard in
hypothesis testing so if you aren’t sure what alpha level you need, leave this at 0.05.
10. Click the Output Range box and select an area to the right of your data.
11. Click “OK.”
Result:
Your results will include a lot of data, some that’s obvious (like the number of data items). But
when you run a t-test you’re really only looking for two things: t-scores and alpha levels.
1. Compare the alpha level you chose (i.e. 0.05) to the p-value in the output. If the p-value in
the output is smaller than the alpha level you chose, reject the null hypothesis.
2. Compare the t-critical value in the output with the t-value. If the t-value is larger than the
t-critical value, reject the null hypothesis. There are two t-critical values, one-tail and two-
tail. If you aren’t sure if you have a one-tailed test or a two-tailed test, always compare the
t-value to the two-tail t critical value.
3. In order to fully reject the null hypothesis, use both values (p and t) in combination. In
other words, if you think you might reject the null based on the t-value, but your p-value is
large, then don’t reject the null.
Example:
This example teaches you how to perform a t-Test in Excel. The t-Test is used to test the null
hypothesis that the means of two populations are equal.
Below you can find the study hours of 6 female students and 5 male students
53
H0: μ1 - μ2 = 0 H1: μ1 - μ2 ≠ 0
Result:
Conclusion: We do a two-tail test (inequality). lf t Stat < -t Critical two-tail or t Stat > t Critical
two-tail, we reject the null hypothesis. This is not the case, -2.365 < 1.473 < 2.365. Therefore, we
do not reject the null hypothesis. The observed difference between the sample means (33 - 24.8)
is not convincing enough to say that the average number of study hours between female and male
students differ significantly.
z -Test
A z-test is a type of hypothesis test. Hypothesis testing is just a way for you to figure out if results
from a test are valid or repeatable. For example, if someone said they had found a new drug that
54
cures cancer, you would want to be sure it was probably true. A hypothesis test will tell you if it’s
probably true, or probably not true. A z test, is used when your data is approximately normally
distributed.
Use: A z-statistic, or z-score, is a number representing how many standard deviations above or
below the mean population a score derived from a z-test is.
Objective: A z-test is a statistical test used to determine whether two population means are different
when the variances are known and the sample size is large.
Several different types of tests are used in statistics (i.e. f test, chi square test, t test). You would
use a z-test if:
Steps:
1. To select the z-test tool, click the Data tab’s Data Analysis command button.
2. When Excel displays the Data Analysis dialog box, select the z-Test: Two Sample for
Means tool and then click OK.
3. In the Variable 1 Range and Variable 2 Range text boxes, identify the sample values by
telling Excel in what worksheet ranges you’ve stored the two samples.
4. Use the Hypothesized Mean Difference text box to indicate whether you hypothesize that
the means are equal.
Note: If you think that the means of the samples are equal, enter 0 (zero) into this text box
or leave the text box empty. If you hypothesize that the means are not equal, enter the
difference.
55
5. Use the Variable 1 Variance (Known) and Variable 2 Variance (Known) text boxes to
provide the population variance for the first and second samples.
6. In the Alpha text box, state the confidence level for your z-test calculation
Note: By default, the confidence level equals 0.05 (equivalent to a 5-percent confidence
level).
7. In the Output Options section, indicate where the z-test tool results should be stored.
8. Click OK.
Example:
Conclusion: Excel calculates the z-test results. Here’s the z-test results for a Two Sample for
Means test. The z-test results show the mean for each of the data sets, the variance, the number of
observations, the hypothesized mean difference, the z-value, and the probability values for one-
tail and two-tail tests.
One-Tailed Test
Basics: A basic concept in inferential statistics is hypothesis testing. Hypothesis testing is run to
determine whether a claim is true or not, given a population parameter. A test that is conducted to
show whether the mean of the sample is significantly greater than and significantly less than the
mean of a population is considered a two-tailed test. When the testing is set up to show that the
sample mean would be higher or lower than the population mean, it is referred to as a one-tailed
56
test. The one-tailed test gets its name from testing the area under one of the tails (sides) of a normal
distribution, although the test can be used in other non-normal distributions as well.
Note: Before the one-tailed test can be performed, null and alternative hypotheses have to
be established. A null hypothesis is a claim that the researcher hopes to reject. An
alternative hypothesis is the claim that is supported by rejecting the null hypothesis.
Characteristic:
1. A one-tailed test is a statistical hypothesis test set up to show that the sample mean would
be higher or lower than the population mean, but not both.
2. When using a one-tailed test, the analyst is testing for the possibility of the relationship in
one direction of interest, and completely disregarding the possibility of a relationship in
another direction.
3. Before running a one-tailed test, the analyst must set up a null hypothesis and an alternative
hypothesis and establish a probability value (p-value).
Example:
Let's say an analyst wants to prove that a portfolio manager outperformed the S&P 500 index in a
given year by 16.91%. He may set up the null (H0) and alternative (Ha) hypotheses as:
The null hypothesis is the measurement that the analyst hopes to reject. The alternative hypothesis
is the claim made by the analyst that the portfolio manager performed better than the S&P 500. If
the outcome of the one-tailed test results in rejecting the null, the alternative hypothesis will be
supported. On the other hand, if the outcome of the test fails to reject the null, the analyst may
carry out further analysis and investigation into the portfolio manager’s performance.
The region of rejection is on only one side of the sampling distribution in a one-tailed test. To
determine how the portfolio’s return on investment compares to the market index, the analyst must
run an upper-tailed significance test in which extreme values fall in the upper tail (right side) of
the normal distribution curve. The one-tailed test conducted in the upper or right tail area of the
curve will show the analyst how much higher the portfolio return is than the index return and
whether the difference is significant.
57
Note: 1%, 5% or 10% are the most common significance levels (p-values) used in a one-
tailed test.
To determine how significant the difference in returns is, a significance level must be specified.
The significance level is almost always represented by the letter "p", which stands for probability.
The level of significance is the probability of incorrectly concluding that the null hypothesis is
false. The significance value used in a one-tailed test is either 1%, 5% or 10%, although any other
probability measurement can be used at the discretion of the analyst or statistician. The probability
value is calculated with the assumption that the null hypothesis is true. The lower the p-value, the
stronger the evidence that the null hypothesis is false.
If the resulting p-value is less than 5%, then the difference between both observations is
statistically significant, and the null hypothesis is rejected. Following our example above, if p-
value = 0.03, or 3%, then the analyst can be 97% confident that the portfolio returns did not equal
or fall below the return of the market for the year. He will, therefore, reject H0 and support the
claim that the portfolio manager outperformed the index. The probability calculated in only one
tail of a distribution is half the probability of a two-tailed distribution if similar measurements were
tested using both hypothesis testing tools.
When using a one-tailed test, the analyst is testing for the possibility of the relationship in one
direction of interest, and completely disregarding the possibility of a relationship in another
direction. Using our example above, the analyst is interested in whether a portfolio’s return is
greater than the market’s. In this case, he does not need to statistically account for a situation in
which the portfolio manager underperformed the S&P 500 index. For this reason, a one-tailed test
is only appropriate when it is not important to test the outcome at the other end of a distribution.
Two-Tailed Test
Basics: In statistics, a two-tailed test is a method in which the critical area of a distribution is two-
sided and tests whether a sample is greater than or less than a certain range of values. It is used in
null-hypothesis testing and testing for statistical significance. If the sample being tested falls into
either of the critical areas, the alternative hypothesis is accepted instead of the null hypothesis. The
58
two-tailed test gets its name from testing the area under both tails of a normal distribution, although
the test can be used in other non-normal distributions.
Characteristics:
1. In statistics, a two-tailed test is a method in which the critical area of a distribution is two-
sided and tests whether a sample is greater or less than a range of values.
2. It is used in null-hypothesis testing and testing for statistical significance.
3. If the sample being tested falls into either of the critical areas, the alternative hypothesis is
accepted instead of the null hypothesis.
4. By convention two-tailed tests are used to determine significance at the 5% level, meaning
each side of the distribution is cut at 2.5%.
Significance: A two-tailed test is designed to examine both sides of a specified data range as
designated by the probability distribution involved. The probability distribution should represent
the likelihood of a specified outcome based on predetermined standards. This requires the setting
of a limit designating the highest (or upper) and lowest (or lower) accepted variable values
included within the range. Any data point that exists above the upper limit or below the lower limit
is considered out of the acceptance range and in an area referred to as the rejection range.
Example:
As a hypothetical example, imagine that a new stockbroker (XYZ) claims that his brokerage fees
are lower than that of your current stock broker's (ABC). Data available from an independent
research firm indicates that the mean and standard deviation of all ABC broker clients are $18 and
$6, respectively.
A sample of 100 clients of ABC is taken, and brokerage charges are calculated with the new rates
of XYZ broker. If the mean of the sample is $18.75 and the sample standard deviation is $6, can
any inference be made about the difference in the average brokerage bill between ABC and XYZ
broker?
59
- Rejection region: Z <= - Z2.5 and Z>=Z2.5 (assuming 5% significance level, split 2.5
each on either side).
- Z = (sample mean – mean) / (std-dev / sqrt (no. of samples)) = (18.75 – 18) / (6/(sqrt(100))
= 1.25
This calculated Z value falls between the two limits defined by: - Z2.5 = -1.96 and Z2.5 = 1.96.
Conclusion: This concludes that there is insufficient evidence to infer that there is any difference
between the rates of your existing broker and the new broker. Alternatively, the p-value = P (Z< -
1.25) +P (Z >1.25) = 2 * 0.1056 = 0.2112 = 21.12%, which is greater than 0.05 or 5%, leads to the
same conclusion.
60
REFERENCES
1. https://digital.com/blog/excel-tutorials/
2. https://edu.gcfglobal.org/en/excel2016/
3. https://www.guru99.com/introduction-to-microsoft-excel.html
4. https://www.youtube.com/watch?v=8L1OVkw2ZQ8
5. https://www.statisticshowto.com
6. https://www.investopedia.com/
61