Vision & Mission
Vision & Mission
To impart high quality technical and professional education in order to mould the learners into
globally competitive professionals who are professionally deft, intellectually adept and socially
responsible.
To collaborate with industries and research organizations and excel in the emerging areas of
research.
VISION OF THE PROGRAM
TUES
RMSA
WED
RMSA
THU
RMSA
FRI
RMSA
SAT
RMSA
2b. COURSE SYLLABUS
RESEARCH METHODOLOGY & STATISTICAL ANALYSIS
I Year I Semester
Unit – I:
Unit – IV:
Analysis of Variance-One Way and Two Way ANOVA (with and without Interaction).
Chi-Square distribution: Test for a specified Population variance, Test for Goodness of fit,
Test for Independence of Attributes. Correlation Analysis-Scatter diagram, Positive and
Negative correlation, limits for coefficient of Correlation, Karl Pearson’s coefficient of
correlation, Spearman’s Rank correlation, concept of Multiple and partial
Correlation,Regression Analysis-Concept, least square fit of a linear regression, two lines
of regression, Properties of regression coefficients.
Unit – V:
REFERENCES:
1. Levin R.I., Rubin S. David, “Statistics for Management”, 2015, 7th Ed. Pearson.
2. Beri, “ Business Statistics ”, 2015, 1st Ed, TMH.
3. Gupta S.C, “Fundamentals of Statistics”, 2015, 6th Ed. HPH.
4. Amir D. Aczel and Jayavel Sounder pandian, “Complete Business Statistics”, TMH,
5. Levine , Stephan , krehbiel , Berenson -Statistics for Managers using Microsoft Excel,PHI .
6. J. K Sharma, “Business Statistics”, 2015, 2nd Ed. Pearson.
1. To create the students as effective professionals by solving real problems through the
use of management science knowledge and with attention to team work, effective
communication, critical thinking and problem solving skills.
2. To develop professional skills that prepares students for immediate employment and
for life-long learning in advanced areas of management and related fields.
3c.COURSE OBJECTIVES
4 Understand the concept of coefficient of skewness i.e karl pearsons coefficient of skewness,
Bowley’s coefficient of skewness.
1 Understand how to calculate and apply measures of location and measures of dispersion-
grouped and ungrouped data cases.
2 How to apply discrete and continous series in various business problems.
3 Perform test of hypothesis as well as calculate confidence interval for a population parameter
for single sample and two sample cases.
4 Learn non- parametric test such as chi-square test for Independence as well as goodness of fit.
4a. COURSEPLAN
TEXT BOOK
Levin R.I., Rubin S. David, “Statistics for Management”, 2015, 7th Ed. Pearson.
Beri, “ Business Statistics ”, 2015, 1st Ed, TMH.
REFERENCE BOOKS
Course
Objectives References
No. of & (Text Book,
Unit Lesson
Date Periods Topics / Sub-Topics Course Out Journal…)
No. No.
comes
Nos.
I COb 1 & T1, T2
29/8/14 CO 1,3
1 1
Introduction to Statistics-Overview
21/10/14
Diagrams and graphs COb 2& T1
20 23/10/14 CO 3
1
UNIT-IV Analysis of
COb 1& T1, T2
24 6/11/14 1 Variance-One Way. CO 3,7
Regression Analysis-Concept
COb 1& T1, T2
33 21/11/14 1 CO 3,7
Date: Date:
Note: 1. ENSURE THAT ALL TOPICS SPECIFIED IN THE COURSE ARE MENTIONED.
2. ADDITIONAL TOPICSCOVERED, IF ANY, MAY ALSO BE SPECIFIED IN BOLD
3. MENTION THE CORRESPONDING COURSE COBECTIVE AND CO COME NUMBERS AGAINST EACH
REFERENCE BOOKS
UNIT-I
UNIT-II
UNIT-III
To
2
21/10/1
4
To
2
31/10/1
4
UNIT-IV
UNIT-V
Date: Date:
Note: 1. ENSURE THAT ALL TOPICS SPECIFIED IN THE COURSE ARE MENTIONED.
2. ADDITIONAL TOPICS COVERED, IF ANY, MAY ALSO BE SPECIFIED IN BOLD
3. MENTION THE CORRESPONDING COURSE OBJECTIVE AND OUT COME NUMBERS AGAINST EACH
TOPIC.
INSTRUCTIONAL/LESSON OBJECTIVES:
TEACHING POINTS :
Introduction to Statistics-Overview
Assignment / Questions:
1. Define statistics? (obj-1, out-1,3)
2. Discuss the overview of statistics.(obj-1, out-1,3)
Signature of faculty
INSTRUCTIONAL/LESSON OBJECTIVES:
TEACHING POINTS :
Functions of Statistics
Assignment / Questions:
1. Explain the functions of statistics.. (obj-1, out-1,3)
Signature of faculty
INSTRUCTIONAL/LESSON OBJECTIVES:
TEACHING POINTS :
Assignment / Questions:
1. 1. Discuss the Managerial applications of statistics..(obj-1, out-1,3)
2. What are the branches and relationship of statistics with other subjects.? (obj-1, out-1)
Signature of faculty
INSTRUCTIONAL/LESSON OBJECTIVES:
TEACHING POINTS :
Measures of central Tendency-Mean, Median
Assignment / Questions:
1. Define central tendency.(obj-1, out-1)
2. Problems on mean, median.(obj-1, out-1)
Signature of faculty
INSTRUCTIONAL/LESSON OBJECTIVES:
TEACHING POINTS :
Assignment / Questions:
1. Problems on Mode (obj-1, out-1)
Signature of faculty
Lesson No: ……5…………………… Duration of Lesson: 45 mins
INSTRUCTIONAL/LESSON OBJECTIVES:
TEACHING POINTS :
Assignment / Questions:
2. Problems on Mode (obj-1, out-1)
Signature of faculty
INSTRUCTIONAL/LESSON OBJECTIVES:
Assignment / Questions:
1. problems on range and quartile deviation. (obj-1, out-1)
Signature of faculty
INSTRUCTIONAL/LESSON OBJECTIVES:
TEACHING POINTS :
Assignment / Questions:
1. problems on range and quartile deviation. (obj-1, out-1)
Signature of faculty
Lesson No: ……7…………………… Duration of Lesson: 45 mins
INSTRUCTIONAL/LESSON OBJECTIVES:
TEACHING POINTS :
Mean Deviation
Assignment / Questions:
1. problems on Mean Deviation (obj-1, out-3)
Signature of faculty
INSTRUCTIONAL/LESSON OBJECTIVES:
TEACHING POINTS :
Standard deviation
Assignment / Questions:
1. Problems on Standard deviation (obj-1, out-1)
Signature of faculty
INSTRUCTIONAL/LESSON OBJECTIVES:
TEACHING POINTS :
co-efficient of variation
Assignment / Questions:
1. problems on co-efficient of variation (obj-1, out-1)
Signature of faculty
INSTRUCTIONAL/LESSON OBJECTIVES:
On completion of this lesson the student shall be able to:
TEACHING POINTS :
co-efficient of variation
Assignment / Questions:
1. problems on co-efficient of variation (obj-1, out-1)
Signature of faculty
INSTRUCTIONAL/LESSON OBJECTIVES:
TEACHING POINTS :
Assignment / Questions:
1. what is skewness. Solve problems on Karle Pearson co-efficient of skewness (obj-1, out-1)
Signature of faculty
INSTRUCTIONAL/LESSON OBJECTIVES:
TEACHING POINTS :
Assignment / Questions:
1.problems on Bowleys co-efficient of skewness ? (obj-1, out-1)
Signature of faculty
INSTRUCTIONAL/LESSON OBJECTIVES:
TEACHING POINTS :
Assignment / Questions:
1.problems on Bowleys co-efficient of skewness ? (obj-1, out-1)
Signature of faculty
INSTRUCTIONAL/LESSON OBJECTIVES:
TEACHING POINTS :
Assignment / Questions:
1. solve problem on Kelleys co-efficient of skewness (obj-1, out-1)
Signature of faculty
INSTRUCTIONAL/LESSON OBJECTIVES:
TEACHING POINTS :
Kurtosis.
Assignment / Questions:
Signature of faculty
INSTRUCTIONAL/LESSON OBJECTIVES:
TEACHING POINTS :
Tabulation of Univariate
Assignment / Questions:
Signature of faculty
INSTRUCTIONAL/LESSON OBJECTIVES:
TEACHING POINTS :
Assignment / Questions:
Signature of faculty
INSTRUCTIONAL/LESSON OBJECTIVES:
TEACHING POINTS :
Assignment / Questions:
Signature of faculty
INSTRUCTIONAL/LESSON OBJECTIVES:
TEACHING POINTS :
Diagrammatic and graphical representation of data.
Assignment / Questions:
1. What is the purpose of Diagrammatic and graphical representation of data.? (obj-3, out-6)
Signature of faculty
INSTRUCTIONAL/LESSON OBJECTIVES:
TEACHING POINTS :
One dimensional
Assignment / Questions:
Signature of faculty
TEACHING POINTS :
Assignment / Questions:
Signature of faculty
INSTRUCTIONAL/LESSON OBJECTIVES:
TEACHING POINTS :
Two dimensional and three dimensional.
Assignment / Questions:
Signature of faculty
INSTRUCTIONAL/LESSON OBJECTIVES:
TEACHING POINTS :
Assignment / Questions:
Signature of faculty
Lesson No: ……21…………………… Duration of Lesson: 45 mins
INSTRUCTIONAL/LESSON OBJECTIVES:
TEACHING POINTS :
Assignment / Questions:
Signature of faculty
INSTRUCTIONAL/LESSON OBJECTIVES:
TEACHING POINTS :
Assignment / Questions:
Signature of faculty
INSTRUCTIONAL/LESSON OBJECTIVES:
TEACHING POINTS :
1. solve problems on Testing for one and two means (obj-4, out-2)
Signature of faculty
INSTRUCTIONAL/LESSON OBJECTIVES:
TEACHING POINTS :
Assignment / Questions:
1. solve problems on Testing for one and two means (obj-4, out-2)
Signature of faculty
Lesson No: ……23…………………… Duration of Lesson: 45 mins
INSTRUCTIONAL/LESSON OBJECTIVES:
TEACHING POINTS :
Paired t- test.
Assignment / Questions:
Signature of faculty
INSTRUCTIONAL/LESSON OBJECTIVES:
TEACHING POINTS :
Paired t- test.
Assignment / Questions:
Signature of faculty
INSTRUCTIONAL/LESSON OBJECTIVES:
TEACHING POINTS :
Assignment / Questions:
Signature of faculty
Lesson No: ……25…………………… Duration of Lesson: 45 mins
INSTRUCTIONAL/LESSON OBJECTIVES:
2. Understand the difference between one way and two way classification..
TEACHING POINTS :
Assignment / Questions:
Signature of faculty
INSTRUCTIONAL/LESSON OBJECTIVES:
2. Understand the difference between one way and two way classification..
Assignment / Questions:
Signature of faculty
Lesson Title: ……… Chi-Square distribution: Test for a specified Population variance
INSTRUCTIONAL/LESSON OBJECTIVES:
TEACHING POINTS :
Assignment / Questions:
1. what is chi-square test. Explain the properties of chi-square test. (obj-2, out-3).
2. Solve the problems on population variance (obj-2, out-3)..
Signature of faculty
Lesson Title: ……… Test for Goodness of fit, Test for Independence of Attributes.
INSTRUCTIONAL/LESSON OBJECTIVES:
1. Learn the importance of Test for Goodness of fit and Test for Independence of Attributes.
TEACHING POINTS :
Assignment / Questions:
Signature of faculty
Lesson Title: ……… Correlation Analysis-Scatter diagram, Positive and Negative correlation.
INSTRUCTIONAL/LESSON OBJECTIVES:
TEACHING POINTS :
Assignment / Questions:
Signature of faculty
INSTRUCTIONAL/LESSON OBJECTIVES:
2. Understand the difference between one way and two way classification..
TEACHING POINTS :
Assignment / Questions:
1. solve problem on Two Way ANOVA (obj-2, out-3)
Signature of faculty
INSTRUCTIONAL/LESSON OBJECTIVES:
TEACHING POINTS :
Assignment / Questions:
Signature of faculty
INSTRUCTIONAL/LESSON OBJECTIVES:
TEACHING POINTS :
Assignment / Questions:
Signature of faculty
INSTRUCTIONAL/LESSON OBJECTIVES:
TEACHING POINTS :
Assignment / Questions:
INSTRUCTIONAL/LESSON OBJECTIVES:
TEACHING POINTS :
Assignment / Questions:
Signature of faculty
INSTRUCTIONAL/LESSON OBJECTIVES:
On completion of this lesson the student shall be able to:
TEACHING POINTS :
Regression Analysis-Concept
Assignment / Questions:
Signature of faculty
INSTRUCTIONAL/LESSON OBJECTIVES:
TEACHING POINTS :
1. solve the problems on Least square fit of a linear regression, ((obj-1, out-3,7)
Signature of faculty
INSTRUCTIONAL/LESSON OBJECTIVES:
TEACHING POINTS :
Assignment / Questions:
Signature of faculty
Lesson No: ……35……Duration of Lesson: 45 mins
INSTRUCTIONAL/LESSON OBJECTIVES:
TEACHING POINTS :
Assignment / Questions:
Signature of faculty
INSTRUCTIONAL/LESSON OBJECTIVES:
TEACHING POINTS :
Assignment / Questions:
Signature of faculty
Lesson Title: ……… Models of Time Series–Additive, Multiplicative and Mixed models.
INSTRUCTIONAL/LESSON OBJECTIVES:
1. Learn the importance of Models of Time Series–Additive, Multiplicative and Mixed models.
TEACHING POINTS :
Models of Time Series–Additive, Multiplicative and Mixed models.
Assignment / Questions:
Signature of faculty
INSTRUCTIONAL/LESSON OBJECTIVES:
TEACHING POINTS :
Assignment / Questions:
1. solve the problems on Free hand curve, Semi averages (obj-1, out-3,7)
Signature of faculty
INSTRUCTIONAL/LESSON OBJECTIVES:
TEACHING POINTS :
Moving averages
Assignment / Questions:
Signature of faculty
TEACHING POINTS :
Moving averages
Assignment / Questions:
Signature of faculty
INSTRUCTIONAL/LESSON OBJECTIVES:
TEACHING POINTS :
Least Square methods
Assignment / Questions:
Signature of faculty
INSTRUCTIONAL/LESSON OBJECTIVES:
TEACHING POINTS :
Assignment / Questions:
Lesson Title: ……… Index numbers– introduction , Characteristics and uses of index numbers
INSTRUCTIONAL/LESSON OBJECTIVES:
TEACHING POINTS :
Assignment / Questions:
1.what is index number. explain the characteristics of index numbers. (obj-1, out-3,7)
Signature of faculty
INSTRUCTIONAL/LESSON OBJECTIVES:
Assignment / Questions:
Signature of faculty
INSTRUCTIONAL/LESSON OBJECTIVES:
TEACHING POINTS :
Assignment / Questions:
INSTRUCTIONAL/LESSON OBJECTIVES:
TEACHING POINTS :
Assignment / Questions:
Signature of faculty
INSTRUCTIONAL/LESSON OBJECTIVES:
TEACHING POINTS :
Assignment / Questions:
Signature of faculty
INSTRUCTIONAL/LESSON OBJECTIVES:
TEACHING POINTS :
Assignment / Questions:
1. solve the problem on Tests of adequacy and consumer price indexes. (obj-5, out-4)
Signature of faculty
Note: Mention for each question the relevant Objectives and Outcomes Nos.
1 2 3 4 5 6
Comprehension Application Analysis Evaluation
Knowledge Understanding of knowledge & Of whole w .r.t. its Synthesis
comprehension constituents Judgment
Infer Reconstruct
Summarize Reorganize
Revise
6. List Of Mappings/Matrix
Course-Objectives
COb1 2 1
COb2 1 2
COb3 2 1
COb4 2
D. Courses (with title & code)-Program Outcomes (POs) Relationship Matrix (Indicate
the relationships by mark “X”)
Data
Warehousing
and Data 1 2 2 2 1 2 3
Mining(DWDM
)
8. a.INDIRECT ASSESSMENT(RUBRIC)
TUTORIAL SHEET – 2
Please write the Questions / Problems / Exercises which you would like to give to the
students and also mention the Course Outcomes to which these Questions / Problems /
Exercises are related.
Date: Date:
ASSIGNMENT SHEET – 2
Please write the Questions / Problems / Exercises which you would like to give to the
students and also mention the Course Outcomes to which these Questions / Problems /
Exercises are related.
Date: Date:
12/9/14 7/10/14
2. UNIT - II 12
9/10/14 3/11/14 14
3. UNIT – III
6/11/14 25/11/14
4. UNIT – IV 14
27/11/14 18/12/14
5. UNIT – V 14
Total No. of Instructional periods available for the course: 65 Hours / Periods
10.EVALUATION STRATEGY
1. TARGET:
All the topics of this course covered by delivering lectures through PPTs and class room lectures and given
exercises , assignments to the students.
3. METHOD OF EVALUATION
3.2 √ Assignments/Seminars
Implementation 70%
Diagrams/Scripts 10%
3.4 Quiz
Writing 70%
Diagrams/Scripts 10%
3.6 √ Others
Analysis 10%
Writing 80%
Diagrams/Scripts 10%
4. List out any new topic(s) or any innovation you would like to introduce in teaching the subjects in this Semester .
…………………………………………………………………………………………………..
Date: Date:
PREREQUISITES
CORE TOPICS:
1. Managerial Applications of Statistics.
2. Measures of central tendency
3. Skewness.
4. Tabulation and classification of data.
5. T-test.
6. Anova one way and two way.
7. Regression analysis.
8. Time series.
TOPICS COVERED BEYOND THE SYLLABUS:
1. Link relative.
2. Bivariate frequency.
TEXT BOOK
Levin R.I., Rubin S. David, “Statistics for Management”, 2015, 7th Ed. Pearson.
Beri, “ Business Statistics ”, 2015, 1st Ed, TMH.
REFERENCE BOOKS
Date: Date:
a) Index numbers
b) Types of correlation.
c) Compute Price Index numbers for the year 2000 with 1995 as base year using
i) Laspeyre’s Method ii) Fisher Method
PART-B
1) From the following data find out the Karl persons coefficient of skewness.
Measurement: 10 11 12 13 14 15
Frequency: 2 4 10 8 5 1
3) Calculate 4 years moving average and least squares trend line for the production of an Engineering
Factory.
4) You are given the data below find the combined mean and standard deviation.
A B
Number of items 100 150
Mean 50 40
Standard deviation 5 6
5) From the following data obtain two regression equations and coefficient of correlation between the
regression.
Sales 91 97 108 121 67 124 51 73 111 57
Purchase 71 75 69 97 70 91 39 61 80 47
PART-B
1. Calculate the following data by suitable method with ( table value 3.07)
Batches hours
I 4 14 19 22 25 27 36
II 11 18 18 25 30 32 -
III 2 11 13 16 18 20 30
IV 0 4 5 10 22 - -
Sample 1 11 11 13 11 15 9 12 14
Sample 2 9 11 10 13 9 8 10 -
Is the difference between the means of sample are significant? (Table value 2.160)
i) Do you drink
ii) Are you in favour of local action on sale of liquor?
iii) Can you infer that opinion on local actions is dependent on whether or not an individual
drinks?
Question (a)
Yes 56 31
No 18 6
http://itfeature.com/time-series-analysis-and-forecasting/component-of-time-series-data
http://www.statisticshowto.com/anova/
http://study.com/academy/lesson/application-of-statistics-in-business.html
http://stat420.blogspot.in/2009/07/definition-of-statistics-scope-and.html
19. Material
Introduction:
Statistics is the study of numerical information, which is called data. People use statistics as tools to
understand information. Learning to understand statistics helps a person react intelligently to statistical
claims. Statistics are used in the fields of business, math, economics, accounting, banking, government,
astronomy, and the natural and social sciences.
Businesses use statistics to find customer preferences, check the quality of products, market products,
estimate costs and make decisions. Statistics help make mathematical theories more accurate. Economists
use statistics to find the relationship between supply and demand, the relationship between imports and
exports, the inflation rate, the per capita income, and the national income rate. Accountants use statistics
to see how well a company is doing, discover trends and create projections for the next year. Bankers use
statistics to estimate the number of people depositing money versus the number requesting loans. The
government uses statistics federally and locally to make budgets, set the minimum wage and discover the
cost of living. Astronomers use statistics to estimate the distance between objects in space and the timing
of interstellar events. People working in the natural and social sciences use statistics to form theories on
what they're studying, whether it's estimating large populations, predicting the weather or studying the
human brain.
Definition of Statistics:
1. Statistics can be defined as the collection presentation and interpretation of
numerical data.- Croxton and Crowed.
1. Statistics and planning: Statistics in indispensable into planning in the modern age
which is termed as “the age of planning”. Almost all over the world the govt. are re-
storing to planning for economic development.
6. Statistics and modern science: In medical science the statistical tools for
collection, presentation and analysis of observed facts relating to causes and
incidence of dieses and the result of application various drugs and medicine are of
great importance.
2. Production: In the field of production statistical data and method play a very
important role. The decision about what to produce? How to produce? When to
produce? For whom to produce is based largely on statistical analysis.
D USES OF STATISTICS
One of the ways to appreciate what is being studied is to know its origin and how it developed and helped mankind. Statistics arose from the
need of states to collect data on their people and economies, in order to administer them. Its meaning broadened in the early 19 th century to
include the collection and analysis of data in general. Today statistics is widely employed in government, business, in the natural and social
sciences.
Statistics as a field of knowledge proved so much in the world as a very powerful tool in almost all fields of work. It can be found in the field of
sciences, in business and economics, in education, politics, psychology and in research.
In social sciences, particularly in the field of education and psychology, statistics is an indispensable tool.
In education, the school administration and staff, the faculty and the students use statistics as a tool in performance of their
respective functions and responsibilities.
In psychology, statistical methods are used to analyze and interpret data on intelligence test scores, aptitude tests, personality test
ratings, entrance examinations. etc. for the better understanding of the individual tested, and for better administration and
management.
In government, statistics provides the various government agencies organized and systematic records of data needed in the
formulation of national policies for the betterment of the nations' well being.
In research, many scientific investigations produce large quantities of raw data that are overwhelming and difficult to interpret hence,
statistical tool are employed.
Thus today, STATISTICS which can be defined as the art and science that deals with the collection, organization, creative presentation, analysis
and interpretation of data. Statistics has an interesting history of development. Before becoming a science in its modern sense, statistics had a
long history of development. numerical data relating to particular events, were being used already in antiquity.
The history of statistics can be traced back at least to the biblical times in ancient Egypt, Babylon, and Rome.
EGYPT (3,500 BC)
Used statistics by
MODERN TIMES
Statistical methods have been used to record and predict such things as:
Limitations of statistics:
A measure of central tendency is a single value that attempts to describe a set of data
by identifying the central position within that set of data. As such, measures of
central tendency are sometimes called measures of central location. They are also
classed as summary statistics. The mean (often called the average) is most likely the
measure of central tendency that you are most familiar with, but there are others, such
as the median and the mode.
The mean, median and mode are all valid measures of central tendency, but under
different conditions, some measures of central tendency become more appropriate to
use than others. In the following sections, we will look at the mean, mode and
median, and learn how to calculate them and under what conditions they are most
appropriate to be used.
Mean (Arithmetic)
The mean (or average) is the most popular and well known measure of central
tendency. It can be used with both discrete and continuous data, although its use is
most often with continuous data (see our Types of Variable guide for data types). The
mean is equal to the sum of all the values in the data set divided by the number of
values in the data set. So, if we have n values in a data set and they have values x 1, x2,
..., xn, the sample mean, usually denoted by (pronounced x bar), is:
This formula is usually written in a slightly different manner using the Greek capitol
letter, , pronounced "sigma", which means "sum of...":
You may have noticed that the above formula refers to the sample mean. So, why
have we called it a sample mean? This is because, in statistics, samples and
populations have very different meanings and these differences are very important,
even if, in the case of the mean, they are calculated in the same way. To acknowledge
that we are calculating the population mean and not the sample mean, we use the
Greek lower case letter "mu", denoted as µ:
The mean is essentially a model of your data set. It is the value that is most common.
You will notice, however, that the mean is not often one of the actual values that you
have observed in your data set. However, one of its important properties is that it
minimises error in the prediction of any one value in your data set. That is, it is the
value that produces the lowest amount of error from all other values in the data set.
An important property of the mean is that it includes every value in your data set as
part of the calculation. In addition, the mean is the only measure of central tendency
where the sum of the deviations of each value from the mean is always zero.
Median
The median is the middle score for a set of data that has been arranged in order of
magnitude. The median is less affected by outliers and skewed data. In order to
calculate the median, suppose we have the data below:
6 5 8 5 3 1 5 5 8 4 9
5 5 9 6 5 4 6 5 7 5 2
We first need to rearrange that data into order of magnitude (smallest first):
1 3 4 5 5 5 5 6 8 8 9
4 5 5 5 5 6 6 5 7 9 2
Our median mark is the middle mark - in this case, 56 (highlighted in bold). It is the
middle mark because there are 5 scores before it and 5 scores after it. This works fine
when you have an odd number of scores, but what happens when you have an even
number of scores? What if you had only 10 scores? Well, you simply have to take the
middle two scores and average the result. So, if we look at the example below:
65 55 89 56 35 14 56 55 87 45
14 35 45 55 55 56 56 65 87 89
Only now we have to take the 5th and 6th score in our data set and average them to
get a median of 55.5.
Mode
The mode is the most frequent score in our data set. On a histogram it represents the
highest bar in a bar chart or histogram. You can, therefore, sometimes consider the
mode as being the most popular option. An example of a mode is presented below:
Normally, the mode is used for categorical data where we wish to know which is the
most common category, as illustrated below:
We can see above that the most common form of transport, in this particular data set,
is the bus. However, one of the problems with the mode is that it is not unique, so it
leaves us with problems when we have two or more values that share the highest
frequency, such as below:
We are now stuck as to which mode best describes the central tendency of the data.
This is particularly problematic when we have continuous data because we are more
likely not to have any one value that is more frequent than the other. For example,
consider measuring 30 peoples' weight (to the nearest 0.1 kg). How likely is it that we
will find two or more people with exactly the same weight (e.g., 67.4 kg)? The
answer, is probably very unlikely - many people might be close, but with such a
small sample (30 people) and a large range of possible weights, you are unlikely to
find two people with exactly the same weight; that is, to the nearest 0.1 kg. This is
why the mode is very rarely used with continuous data.
Another problem with the mode is that it will not provide us with a very good
measure of central tendency when the most common mark is far away from the rest
of the data in the data set, as depicted in the diagram below:
In the above diagram the mode has a value of 2. We can clearly see, however, that the
mode is not representative of the data, which is mostly concentrated around the 20 to
30 value range. To use the mode to describe the central tendency of this data set
would be misleading.
Now we can iterate this operation with g1 taking the place of x and h1 taking the place
of y. In this way, two sequences (gn) and (hn) are defined:
Both of these sequences converge to the same number, which we call the geometric–
harmonic mean M(x, y) of x and y. The geometric–harmonic mean is also designated
as the harmonic–geometric mean. (cf. Wolfram Math World below.)
The existence of the limit can be proved by the means of Bolzano–Weierstrass theorem in a
manner almost identical to the proof of existence of arithmetic–geometric mean.
UNIT II: MEASURES OF DISPERSION:
Measures of average such as the median and mean represent the typical value for a
dataset. Within the dataset the actual values usually differ from one another and from
the average value itself. The extent to which the median and mean are good
representatives of the values in the original dataset depends upon the variability or
dispersion in the original data. Datasets are said to have high dispersion when they
contain values considerably higher and lower than the mean value.
In figure 1 the number of different sized tutorial groups in semester 1 and semester 2
are presented. In both semesters the mean and median tutorial group size is 5
students, however the groups in semester 2 show more dispersion (or variability in
size) than those in semester 1.
Range
The simplest method of studying the variation in the distribution is the range. The
range is defined as the difference between the largest item and the smallest item in
the set of observations. So, in a set of observations if L is the largest item and S is the
smallest item, then range is given by
Range = L – S
In a grouped frequency distribution, range is the difference between the upper limit of
the largest class and lower limit of the smallest class.
The range is the absolute measure of dispersion.It cannot be used to compare two
distributions with different units. So, the relative measures corresponding to the range
known as the coefficient of range is defined by
coefficientofrange=L−SL+S
Merits:
It is rigidly defined.
Range is simple to understand and easy to calculate.
Only minimum time is required to know the variability with the help of range.
Demerits
The measure of dispersion depending upon the lower and upper quartiles is known as
the quartile deviation. The difference between the upper and lower quartile is known
as the Interquartile range. Half the interquartile range is known as Semi-interquartile
range or quartile deviation.
∴Quartiledeviation=Q3−Q12
The relative measure based on the lower and upper quartiles known as coefficient of
quartile deviation is given by
Coeffi,ofQ.D=Q3−Q1Q3+Q1
The variability of the items will be less or greater according to the value of the
quartile deviation is less or greater. If the quartile deviation is small then the
variability is less or the uniformity is great. In the same way, If the quartile deviation
is greater then the variability is greater or the uniformity is less.
Merits
It is rigidly defined.
It is simple to understand and easy to calculate.
It is the better measure of dispersion in comparison to range as it is based on
50% of central items.
It is not affected by extreme values.
It can be calculated even when end classes are open.
Demerits
Mean deviation is defined as the arithmetic mean of the deviations of the items from
mean, median and mode when all deviations are considered positive.
IFx¯¯¯,MdMobethearithmeticmean,medianandmodeofthesetofvariat
evaluesx,
thenthemeandeviation(M.D)arecomputedbythefollowingformulae:
M.D.frommean=Σ/x−x¯¯¯/n=Σ/d/n
AlsoM.D.frommean=Σf/x−x¯¯¯/N=Σf/d/N
where/d/=/x−x¯¯¯/
andreadasmodulusófdorx−x¯¯¯,n=numberofobservationsandN=tota
lfrequency.
M.D. from median and mode can similarly be obtained by replacing mean by M d and
M0 respectively.
coeff.ofM.D.frommean=M.D.frommeanmean
coeff.ofM.D.frommedian=M.D.frommedianmedian
STANDARD DEVIATION:
The standard deviation is a measure that summarises the amount by which every
value within a dataset varies from the mean. Effectively it indicates how tightly the
values in the dataset are bunched around the mean value. It is the most robust and
widely used measure of dispersion since, unlike the range and inter-quartile range, it
takes into account every variable in the dataset. When the values in a dataset are
pretty tightly bunched together the standard deviation is small. When the values are
spread apart the standard deviation will be relatively large. The standard deviation is
usually presented in conjunction with the mean and is measured in the same units.
In many datasets the values deviate from the mean value due to chance and such
datasets are said to display a normal distribution. In a dataset with a normal
distribution most of the values are clustered around the mean while relatively few
values tend to be extremely high or extremely low. Many natural phenomena display
a normal distribution.
For datasets that have a normal distribution the standard deviation can be used to
determine the proportion of values that lie within a particular range of the mean
value. For such distributions it is always the case that 68% of values are less than one
standard deviation (1SD) away from the mean value, that 95% of values are less than
two standard deviations (2SD) away from the mean and that 99% of values are less
than three standard deviations (3SD) away from the mean. Figure 3 shows this
concept in diagrammatical form.
If the mean of a dataset is 25 and its standard deviation is 1.6, then
1. 68% of the values in the dataset will lie between MEAN-1SD (25-1.6=23.4)
and MEAN+1SD (25+1.6=26.6)
2. 99% of the values will lie between MEAN-3SD (25-4.8=20.2) and
MEAN+3SD (25+4.8=29.8).
If the dataset had the same mean of 25 but a larger standard deviation (for example,
2.3) it would indicate that the values were more dispersed. The frequency distribution
for a dispersed dataset would still show a normal distribution but when plotted on a
graph the shape of the curve will be flatter as in figure 4.
CO-EFFICIENT OF VARIATION:
The coefficient of variation (CV) is the ratio of the
standard deviation to the mean. The higher the
coefficient of variation, the greater the level of
dispersion around the mean. It is generally expressed as
a percentage. Without units, it allows for comparison
between distributions of values whose scales of
measurement are not comparable.
Coefficient of Skewness =
Coefficient of skewness lies within the limit ± 1. This method is
quite convenient for determining skewness where one has already
calculated quartiles.
Univariate, bivariate and multivariate are the various types of data that are based on
the number of variables. Variables mean the number of objects that are under
consideration as a sample in an experiment. Usually there are three types of data sets.
These are;
Univariate Data:
Univariate data is used for the simplest form of analysis. It is the type of data in
which analysis are made only based on one variable. For example, there are sixty
students in class VII. If the variable marks obtained in math were the subject, then in
that case analysis will be based on the number of subjects fall into defined categories
of marks.
Bivariate Data:
Bivariate data is used for little complex analysis than as compared with univariate
data. BiTvariate data is the data in which analysis are based on two variables per
observation simultaneously.
Multivariate Data:
Multivariate data is the data in which analysis are based on more than two variables
per observation. Usually multivariate data is used for explanatory purposes.
The collection data, after classification, are recorded in rows and columns to give
them tabular form. Tabular presentation of data, more conveniently known as
tabulation, may be defined as “the orderly or systematic presentation of numerical
data in rows and columns designed to clarify the problem under consideration and to
facilitate the comparison between the figures”.
ADVANTAGES OF TABULATION:
PARTS OF TABLE:
A table consists of the following parts. These may be considered as the essentials of a
satisfactory table.
(i) Table Number: Every table should be identified by a number. It facilitates easy
reference. The table number may be given at the beginning of the title of the table, or
can be centered above the title of the table.
(ii) Title: A table must have a title which is to be written either below the table
number or after the table number in the same line. The title should convey the full
description of the contents in the table.
(iii) Stub: The extreme left hand column of the table which contains the headings of
the rows is called stub.
(iv) Caption: Caption is the headings for the columns. It is the upper part of the table.
There may be sub-heads or sub-captions in each caption.
(v) Body: It is the main part of the table containing the numerical figures.
(vi) Totals: The totals and sub-totals of all the rows and columns should be given in
the table.
(vii) Footnote: Any explanatory note concerning the table itself, written directly
beneath the table, is called ‘footnote’. The purpose of footnote is to clarity some of
the specific items given in the table.
(viii) Source: The source or sources of the data embodied in the table should be
mentioned beneath the table if data are collected from secondary sources. It is given
below the footnote.
Usually data are presented with the help of the following diagrams:
(a) Bar Diagram (Chart)
(b) Rectangular and Square Diagrams (Chart)
(c) Circular and Pie Diagrams (or Pie Chart)
We shall limit our discussions to various types of bar diagrams as well as pie diagram
only as these are the most commonly used diagrams. (a) Bar Diagram: In bar
diagram only the length is considered, the breadth may be of any finite magnitude.
The bar diagrams are divided into the following three categories:
(i) Simple Bar Diagram
(ii) Multiple Bar Diagram
(iii) Sub-Divided or Compound Bar Diagram
(i) Simple Bar Diagram: Only one type of data are presented with the help of simple
bar diagram. For example, the volume of production of rice in Assam during the last
five years can be presented with the help of simple bar diagram. In order to draw
simple bar diagram, a bar is drawn for each datum. All the bars are on the same
general base. The heights of the bars will be as per the magnitudes of the data. The
breadth of each bar must be same and the gaps among the bars must be uniform.
Generally the gap between two consecutive bars should not be less than half the
breadth of a bar. The bars are drawn either on a common horizontal or on a
common vertical base. Data can be easily compared with the help of the heights of
the bars.
Graphs enable us in studying the cause and effect relationship between two variables.
Graphs help to measure the extent of change in one variable when another variable
changes by a certain amount.
Graphs also enable us in studying both time series and frequency distribution as they
give clear account and precise picture of problem. Graphs are also easy to understand
and eye catching.
One-dimensional diagrams:
height is used and the width is not considered. These diagrams are
2.Simple Diagram
Line Diagram:
diagram is prepared by drawing a vertical line for each item according to the
scale. The distance between lines is kept uniform.
Simple bar diagram can be drawn either on horizontal or vertical base, but
bars on horizontal base more common. Bars must be uniform width and
intervening space between bars must be equal. While cons
tructing a simple bar diagram, the scale is determined on the basis of the
highest value in the series. To make the diagram attractive, the bars can be
coloured.
This is another form of component bar diagram. Here the components are not
the actual values but percentages of the whole.The main difference between
the sub-divided bar diagram and percentage bar diagram is that in the former
the bars are of different heights since their totals may be different whereas in
the latter the bars are of equal height since each bar represents 100 percent.
In the case of data having sub-division, percentage bar diagram will be more
appealing than sub-divided bar diagram.
Two-dimensional Diagrams:
1.Rectangles
2. Squares
3. Pie-diagrams
Rectangles:
Squares:
Three-dimensional diagrams:
Pictograms are not abstract presentation such as lines or bars but really
depict the kind of data we are dealing with. Pictures are attractive and easy to
comprehend and as such this method is
Graphs:
1.Histogram
2. Frequency Polygon
3.Frequency Curve
4. Ogive
5.Lorenz Curve
When samples are small, n<30, when the population is normally distributed, and
when the population variance has to be estimated from sample data, the
distribution of the sample mean is no longer normal. A small sample distribution,
known as the t-distribution, has to be used in this case.
In the case of the mean, the sampling distribution was normal because the variable
was distributed normally in the population or because the Central
Limit Theorem ensured normality for large samples. n the case of proportions, the
normal distribution was used as an approximation for the
(n≤30). When samples are small, n<30, when the population is normally distributed,
and when the population variancehas to be estimated from sample data, the
distribution of the sample meanisno longer normal. A small sample distribution,
known as the t-distribution, has to be used in this case. When samples are small and
the distribution of the variable in the population is not normal, there is no readily
available sampling distribution. When dealing with proportions coming from small
samples, it is necessary to use the exact binomial distribution. 6.1 The t-distribution
Assume that the variables is distributed normally in the population with
mean μ and variance σ2, i.e. X~N(μ,σ2). If σ2 is known, then the sample
mean is normally distributed, and we have no problem. However, in almost all cases
we do not in fact know the population variance, σ2, and must estimate it. We have
seen that the estimator
This does not have a normal distribution. It can be shown that this statistic, the t-
statistic, has the t-distribution with n-1 degrees of freedom.For largen, the t-
distribution resembles the standard normal distribution, but we are interest here in
small samples. The formula for the t-distribution is quite complicated, and depends
on the number of degrees of freedom. However, it is symmetric about 0, so the same
useful shortcuts, such as P(t>a)=P(t<a) can be used as for the standard normal. It
can be shown that E(t)=0, and Var(t)=k/(k-2), where k is the number of degrees of
freedom, so in this case, Var(t)=(n-1)/(n-3). Tables of the cumulative t-distribution
for different numbers of degrees of freedom are available. There is also a t-
distribution function in Excel: For x>0, and k degrees of freedom, the function
TDIST(x,n,1) will return P(t>x), while the function TDIST(x,n, 2) will return the 2-
tailed test, P(t>x OR t<-x). There is also a function TINV(p,n) will return the critical
X value Cfor a 2-tailed t-distribution with n degrees of freedom, such that P(|t|
>XC)=p. The distribution of X in the population has to be normal for the t-statistic
small deviations from normality in the population will not invalidate it.
In statistics, t-tests are a type of hypothesis test that allows you to compare means.
They are called t-tests because each t-test boils your sample data down to one
number, the t-value. If you understand how t-tests calculate t-values, you’re well on
your way to understanding how these tests work.
In this series of posts, I'm focusing on concepts rather than equations to show how t-
tests work. However, this post includes two simple equations that I’ll work through
using the analogy of a signal-to-noise ratio.
Minitab statistical software offers the 1-sample t-test, paired t-test, and the 2-sample t-
test. Let's look at how each of these t-tests reduce your sample data down to the t-
value.
ONE MEAN T-TEST:
The paired sample t-test, sometimes called the dependent sample t-test, is a statistical
procedure used to determine whether the mean difference between two sets of
observations is zero. In a paired sample t-test, each subject or entity is measured
twice, resulting in pairs of observations. Common applications of the paired sample
t-test include case-control studies or repeated-measures designs. Suppose you are
interested in evaluating the effectiveness of a company training program. One
approach you might consider would be to measure the performance of a sample of
employees before and after completing the program, and analyze the differences
using a paired sample t-test.
ASSUMPTIONS:
The one sample t-test is a statistical procedure used to determine whether a sample of
observations could have been generated by a process with a specific mean. Suppose
you are interested in determining whether an assembly line produces laptop
computers that weigh five pounds. To test this hypothesis, you could collect a sample
of laptop computers from the assembly line, measure their weights, and compare the
sample with a value of five using a one-sample t-test.
As a parametric procedure (a procedure which estimates unknown parameters), the
one sample t-test makes several assumptions. Although t-tests are quite robust, it is
good practice to evaluate the degree of deviation from these assumptions in order to
assess the quality of the results. The one sample t-test has four main assumptions:
• The dependent variable should not contain any outliers Understanding this process
is crucial to understanding how t-tests work. I'll show you the formula first, and then
I’ll explain how it works.
The two-sample t-test (Snedecor and Cochran, 1989) is used to determine if two
population means are equal. A common application is to test if a new process or
treatment is superior to a current process or treatment.
1. The data may either be paired or not paired. By paired, we mean that there is
a one-to-one correspondence between the values in the two samples. That is,
if X1, X2, ..., Xn and Y1, Y2, ... , Yn are the two samples, then Xi corresponds to Yi.
For paired samples, the difference Xi - Yi is usually calculated. For unpaired
samples, the sample sizes for the two samples may or may not be equal. The
formulas for paired data are somewhat simpler than the formulas for
unpaired data.
2. The variances of the two samples may be assumed to be equal or unequal.
Equal variances yields somewhat simpler formulas, although with computers
this is no longer a significant issue.
3. In some applications, you may want to adopt a new process or treatment only
if it exceeds the current treatment by some threshold. In this case, we can
state the null hypothesis in the form that the difference between the two
populations means is equal to some constant μ1−μ2=d0 where the
constant is the desired threshold.
Data
The response variable must be numeric. You can enter the sample data from
each population into separate columns of your worksheet (unstacked case),
or you can stack the response data in one column with another column of
level values identifying the population (stacked case). In the stacked case,
the factor level column can be numeric, text, or date/time. If you wish to
change the order in which text levels are processed from their default
alphabetical order, you can define your own order. See Ordering Text
Categories in the Manipulating Data chapter of MINITAB User’s Guide 1. You
do not need to have the same number of observations in each level. You can
use Cal Make Patterned Date to enter repeated factor levels.
Balanced ANOVA if your data are balanced, and use General Linear Model if
your data are unbalanced or if you wish to compare means using multiple
comparisons.
Data
The response variable must be numeric and in one worksheet column. You
must have a single factor level column for each of the two factors. These can
be numeric, text, or date/time. If you wish to change the order in which text
categories are processed from their default alphabetical order, you can define
your own order. See
The Chi Square distribution is the distribution of the sum of squared standard
normal deviates. ... A Chi Square calculator can be used to find that the probability
of a Chi Square (with 2 df) being six or higher is 0.050. The mean of a Chi Square
distribution is its degrees of freedom.
A standard normal deviate is a random sample from the standard normal distribution.
The Chi Square distribution is the distribution of the sum of squared standard normal
deviates. The degrees of freedom of the distribution is equal to the number of standard
normal deviates being summed. Therefore, Chi Square with one degree of freedom,
written as χ2(1), is simply the distribution of a single normal deviate squared. The
area of a Chi Square distribution below 4 is the same as the area of a standard normal
distribution below 2, since 4 is 22.
Consider the following problem: you sample two scores from a standard normal
distribution, square each score, and sum the squares. What is the probability that the
sum of these two squares will be six or higher? Since two scores are sampled, the
answer can be found using the Chi Square distribution with two degrees of freedom.
A Chi Square calculator can be used to find that the probability of a Chi Square (with
2 df) being six or higher is 0.050.
The mean of a Chi Square distribution is its degrees of freedom. Chi Square
distributions are positively skewed, with the degree of skew decreasing with
increasing degrees of freedom. As the degrees of freedom increases, the Chi Square
distribution approaches a normal distribution. Figure 1 shows density functions for
three Chi Square distributions. Notice how the skew decreases as the degrees of
freedom increases.
Figure 1. Chi Square distributions with 2, 4, and 6 degrees of freedom.
The Chi Square distribution is very important because many test statistics are
approximately distributed as Chi Square. Two of the more common tests using the
Chi Square distribution are tests of deviations of differences between theoretically
expected and observed frequencies (one-way tables) and the relationship between
categorical variables (contingency tables). Numerous other tests beyond the scope of
this work are based on the Chi Square
Chi-Square goodness of fit test is a non-parametric test that is used to find out how
the observed value of a given phenomena is significantly different from the expected
value. In Chi-Square goodness of fit test, the term goodness of fit is used to compare
the observed sample distribution with the expected probability distribution. Chi-
Square goodness of fit test determines how well theoretical distribution (such as
normal, binomial, or Poisson) fits the empirical distribution. In Chi-Square goodness
of fit test, sample data is divided into intervals. Then the numbers of points that fall
into the interval are compare
Compute the value of Chi-Square goodness of fit test using the following
formula:
b) CORRELATION ANALYSIS: