Practical Manual
Practical Manual
Practical Manual
Minitab
1
1. Introduction to Statistical Software
Measurements are of little use until they are 'analyzed'. Data analysis includes
organizing measurements into a meaningful order or into groups, reducing the data into
manageable quantities, forming succinct descriptions of the main features of the data,
and elucidating any anomalies for subsequent examination. Analysis is the step between
obtaining data and applying it to solve practical problems.
SPSS stands for Statistical Package for the Social Sciences. It was one of the earliest
statistical packages with Version 1 being released in 1968, well before the advent of
desktop computers. It is now on Version 23. SAS stands for Statistical Analysis
System. It was developed at the North Carolina State University in 1966, so is
contemporary with SPSS. Stata is a more recent statistical package with Version 1
being released in 1985. Since then, it has become increasingly popular in the areas of
epidemiology and economics. S-plus is a statistical programming language developed
in Seattle in 1988. R is a free version of S-plus developed in 1996. MINITAB is a
particularly easy package to learn and to use; it has excellent self-help facilities, has
been well tested, includes modern statistical methods and is widely used both inside and
outside the University. MINITAB is an ideal package for learning statistics.
2
2. The Minitab User Interface
Minitab is a software package that is for statistical data analysis. There are lots of versions
of Minitab. In these practical sessions we are learning about Minitab 16.0.
There are three main windows in Minitab. By default, Minitab opens with two windows
visible and one window minimized.
Session window
The Session window displays the results of your analyses in text format. Also, in
this window, you can enter session commands instead of using Minitab’s menus.
(Ctrl+M)
Worksheet
The worksheet, which is similar to a spreadsheet, is where you enter and arrange
your data. You can open multiple worksheets. (Ctrl+D)
Project Manager
The third window, the Project Manager, is minimized below the worksheet. (Ctrl+I)
3
Project manager contains another few icons; in that, history window records all the
commands you have used earlier, graph window displays graphs that you have drawn,
and worksheet window shows information of active worksheets.
Save your work as a project file to keep all your data, graphs, dialog box settings, and
options together. If you need only to save data, save your work as a worksheet file. A
worksheet file can be used in multiple projects. Worksheets can have up to 4,000
columns. The number of worksheets that a project can have is limited only by your
computer's memory.
1. Click in the worksheet, then choose File Save Current Worksheet As.
2. Browse to the folder that you want to save your files in.
3. Enter a name for the worksheet.
4. Select the relevant file type as the save type.
5. Click Save.
4. Data Types
A worksheet can contain the following types of data.
Numeric data
Numbers, such as 264 or 5.28125.
Text data
Letters, numbers, spaces, and special characters, such as Test #4 or North America.
Date/time data
Dates, such as Mar-17-2013, 17-Mar-2013, 3/17/13, or 17/03/13.
Times, such as 08:25:22 AM.
Date/time, such as 3/17/13 08:25:22 AM or 17/03/13 08:25:22
4
5. How to Open a Worksheet
You can open a new, empty worksheet at any time. You can also open one or more files
that contain data, such as a Microsoft Excel file. When you open a file, it copies the
contents of the file into the current Minitab project. Any changes that you make to the
worksheet while you are working in the Minitab project do not affect the original file.
Go to File Open Worksheet Browse the file that you want to open
In a worksheet, data are arranged in columns, which are also called variables. The column
number and name are indicated at the top of each column.
Data in the data window can be corrected by simply clicking on a cell, typing in a correct
entry, and hitting Enter. For more extensive changes, the Editor menu can be used. Under
the Editor menu, you may choose to either insert cells, rows, or columns in the data set.
To insert one or more empty cells above the active cell of the data window, select
Editor Insert Cells from the menu.
Then the remaining cells in the column move down. The number of cells inserted will be
equal to the number of cells selected before you choose the command.
To insert one or more empty columns to the left of the active column. Select
5
Similarly, to insert one or more empty rows above the active row. Select
Editor Insert Rows from the menu.
Example 01
Enter the following data set into Minitab worksheet which consists of the variables,
Index Number, Mathematics, Statistics and Gender (male-1, female-2).
Example 02:
Change the data type of “Gender” from Numeric to Text and store it in column C6.
(Type C6 in “store text column in”)
6
8. Coding the Data
Example 03:
Code “Gender” as follows and store the result in the same column.
1 - male 2 - female
There are several options that we can use to extract a portion of data from the original
worksheet. Following are two such options.
This option can be used to separate the data in a worksheet using a qualitative
variable. It splits the whole worksheet by a variable (usually by a qualitative
variable) and produce new worksheets.
Example 04:
Split the worksheet by “Gender” and get separate worksheets for each gender.
This option is used to copy specified rows from the active worksheet to a new sub
worksheet. You can specify the subset based on row numbers or a condition.
7
Example 05:
8
Table 01: Example of Frequency table and Cross Tabulation table
Figure 01: Pie Chart (Graph Pie charts) and Simple Bar Chart (Graph Bar charts)
Female 50
32.4%
Percent
40
32.4
30
20
Male 10
67.6%
0
Female Male
Gender
Percent within all data.
Interpretation: According to the table 01 and Figure 01, we can say that the proportion of
males in the sample is twice as much as the proportion of females in the sample. Male
proportion is about 68% from the entire sample.
9
10.2 Analysis of Quantitative Variables
Table 02: Descriptive Statistics (Stat Basic statistics display descriptive statistics)
Interpretation: According to the Figure 02 and table 02, monthly sales income of the
sales representatives is approximately symmetrically distributed with mean Rs. 10251.
10
Figure 03: Histogram (Graph Histogram)
25
20
Frequency
15
10
0
8400 9000 9600 10200 10800 11400 12000
Monthly Sales Income
12000
Monthly Sales Income
11000
10000
9000
8000
14
12
10
Experience
0
15 20 25 30 35
A ge
11
Correlation (Stat Basic statistics Correlation)
Exercise 01
1. Code the Site variable as follows and store the result in C9. 1=Urban Area, 2=Rural
Area
2. Split the worksheet by using gender.
3. Find the mean, median and standard deviation of monthly sales income for each
gender separately
4. Obtain frequency tables for education qualification, and site. Interpret the outputs.
5. Obtain pie charts for education qualification, and site.
6. Obtain dot plots for experience and age variables.
7. Obtain histogram for coverage variable.
8. To find the relationship between monthly sales income and the experience get a
scatter plot and correlation coefficient.
9. Interpret the results that you obtained in part 8.
12
11. LINEAR REGRESSION
Fit the regression line to check the effect of the age for cholesterol level of the person,
using the above data.
13
Select Cholesterol level as the response variable and Age as the predictor variable.
The result will be as follows;
For any individual, his/her cholesterol level is completely determined by the equation:
Cholesterol = 1.089 + 0.057 (Age)
Interpretation: If the age of the patient increases by 1, we predict the cholesterol level of the
patient is increased by approximately 0.057 and we can predict the cholesterol level of the new
born baby (age=0) is approximately 1.089.
14
12. Hypothesis testing for a Single mean (Z test)
2. Suppose from the literature we came to know that two decades ago the mean weight
of people in Galle was 50 Kg. A researcher wants to see whether the mean weight of
people in Galle has changed or not. Suppose the mean weight of a sample of 40
people investigated recently is 51 Kg with SD of 2 Kg.
Interpretation: Since p value is less than 0.05, reject the Ho. (The z table value is
1.96. Test statistic value (3.16) is greater than 1.96). Then at 5% significance level we
can conclude that mean weight of the people in Galle has changed.
15
3. The researcher is interested in testing whether the mean weight of people in Galle today
is greater than the value observed in the past (50Kg).
4. Suppose you are conducting an experiment to see if a given therapy works to reduce
test anxiety in a sample of nursing students. A standard measure of test anxiety in this
nursing population is known to produce a µ = 20. In the sample of 20 nursing students
(n= 20) who had undergone the therapy, the mean score of test anxiety was 18 with
SD 9.
H0: The average test anxiety of nursing students who use the therapy is not different
from 20. (i.e. H0: μ 20)
H1: The average test anxiety of nursing students who use the therapy is lower than 20.
(i.e. H1: μ < 20)
16
Stat Basic Statistics1-Sample tSummarized data
Interpretation: Since p value is greater than 0.05, do not reject the Ho. (The t table
value with 19 df is -1.72. Test statistic (-0.99) is in the acceptance region). Then at 5%
significance level we can conclude that the therapy has not reduced the average anxiety
of nursing students.
5. Suppose you wish to test the effect of Prozac on the well-being of depressed
individuals, using a standardized "well-being scale" that could range from 0 to 20.
Higher scores indicate greater well-being. Before and after taking Prozac, scores
obtained for the measure of well-being on 9 subjects are given below.
17
Well-being Well-being
Subject
Score Score
(pre) (post)
1 3 5
2 0 1
3 6 5
4 7 7
5 4 10
6 3 9
7 2 7
8 1 11
9 4 8
18
Interpretation: Since p value is less than 0.05, reject the Ho. (The t table value with 8 df at
5% level is 2.31. Test statistic is greater than critical value). Then at 5% significance level
we can conclude that, after taking Prozac it will show a positive change on Well-being of
depressed.
6. Suppose a medical researcher is investigating the effectiveness of two pain killer drugs
(Drug A and Drug B). Drug A was given to 15 patients and drug B was given to 12
patients. Data are given below. Which drug is more effective in reducing pain? (t test)
Ho: There is no difference in the time taken to alleviate pain in the two drugs (Ho:
)
H1: There is a difference in the time taken to alleviate pain in the two drugs (H1:
)
19
Stat Basic Statistics2-Sample tSamples in different columns
Interpretation: Since p value is less than 0.05, reject the Ho. (The table value with 25 df at
5% level is -2.06. Test statistic is in the rejection region). Then at 5% significance level we
can conclude that there is a difference in the time taken to alleviate pain in the two drugs.
Note: Looking at the sample mean values, it can be concluded that the time taken to
alleviate pain in drug B is higher than that of drug A. So, the drug A is better than drug B as
a pain killer.
20
16. Chi-square test
7. Suppose a public health nurse had investigated 60 men and 40 women in her area
and found that 50 men and 25 women were physically less active. The data (called
observed data) can be presented in a contingency table as follows. Perform the chi-
square test to see whether there is any association between gender and physical
activity.
(The table value with 1 df at 5% level is 3.84. Test statistic is in the rejection region).
Then at 5% significance level we can conclude that there is an association between gender
and physical activity.
21
17. ANOVA
8. Suppose a nurse wishes to know whether the blood glucose levels of patients who have
undergone four (4) treatments are the same. The blood glucose levels of the sample of
patients who have undergone the four different treatment are given below. Find the best
treatment. (One-way ANOVA)
H0: µ1 = µ2 = µ3 = µ4
H1: At least one of the population means is different from the others
StatANOVAOne-way (Unstacked)
22
Interpretation: Since p value is less than 0.05, reject the Ho. Then at 5% significance level
we can conclude that at least one of the treatment means is different from the others.
Note: We can determine which population means differ from the other by doing Tukey’s
comparison test. (ComparisonsTukey)
23
In these results, the table shows that group A contains Treatment 1 and 4, group B
contains only Treatment 2, and group C contains only Treatment 3. Differences
between means that share a letter are not statistically significant. Treatment 2 and 3 do
not share a letter, which indicates that Treatment 3 has a significantly lower mean
than Treatment 2. That means treatment 3 has the lowest blood glucose level.
Therefore treatment 3 is the best treatment out of four treatments.
24