MODULE - III (Part-I)
MODULE - III (Part-I)
As stated earlier, analysis means computation of certain measures along with searching for patterns of
relationship that exist among the data groups. Analysis, particularly of survey or experimental data
involves estimating the values of unknown parameters of the population and testing of hypotheses for
drawing inferences.
i
an
1. DESCRIPTIVE ANALYSIS
The goal of descriptive analysis is to describe or summarize a set of data. Descriptive analysis is the
very first analysis performed in the data analysis process.It generates simple summaries about samples
aj
and measurements..It involves common, descriptive statistics like measures of central tendency,
variability, frequency and position. R
● Descriptive Analysis Example
ta
Take the Covid-19 statistics page on Google, for example. The line graph is a pure summary
of the cases/deaths, a presentation and description of the population of a particular country
am
2. DIAGNOSTIC ANALYSIS
r.M
Diagnostic analysis seeks to answer the question “Why did this happen?” by taking a more in-depth
look at data to uncover subtle patterns. Diagnostic analysis typically comes after descriptive analysis,
taking initial findings and investigating why certain patterns in data happen. Diagnostic analysis may
D
involve analyzing other related data sources, including past data, to reveal more insights into current
data trends. Diagnostic analysis is ideal for further exploring patterns in data to explain anomalies.
For eg :A footwear store wants to review its website traffic levels over the previous 12 months. Upon
compiling and assessing the data, the company’s marketing team finds that June experienced
above-average levels of traffic while July and August witnessed slightly lower levels of traffic.
To find out why this difference occurred, the marketing team takes a deeper look. Team members
break down the data to focus on specific categories of footwear. In the month of June, they discovered
that pages featuring sandals and other beach-related footwear received a high number of views while
these numbers dropped in July and August.
Marketers may also review other factors like seasonal changes and company sales events to see if
other variables could have contributed to this trend.
i
Exploratory analysis involves examining or exploring data and finding relationships between
an
variables that were previously unknown. Exploratory Analysis Example
Climate change is an increasingly important topic as the global temperature has gradually risen over
aj
the years. One example of an exploratory data analysis on climate change involves taking the rise in
temperature over the years from 1950 to 2023 and the increase of human activities and
R
industrialization to find relationships from the data. For example, you may increase the number of
factories, cars on the road and airplane flights to see how that correlates with the rise in temperature.
ta
4. INFERENTIAL ANALYSIS
am
Inferential analysis involves using a small sample of data to infer information about a larger
population of data.
The goal of statistical modeling itself is all about using a small amount of information to extrapolate
r.M
and generalize information to a larger group. For example, a psychological study on the benefits of
sleep might have a total of 500 people involved. When they followed up with the candidates, the
candidates reported to have better overall attention spans and well-being with seven-to-nine hours of
D
sleep, while those with less sleep and more sleep than the given range suffered from reduced attention
spans and energy. This study drawn from 500 people was just a tiny portion of the 7 billion people in
the world, and is thus an inference of the larger population.
Inferential analysis extrapolates and generalizes the information of the larger group with a smaller
sample to generate analysis and predictions.
5. PREDICTIVE ANALYSIS
Predictive analysis involves using historical or current data to find patterns and make predictions
about the future
8. PRESCRIPTIVE ANALYSIS
Prescriptive analysis compiles insights from other previous data analyses and determines actions that
teams or companies can take to prepare for predicted trends.
i
an
aj
2. Describe the importance of interpretation and discuss the precautions that a researcher
should take while interpreting his findings.
IMPORTANCE OF INTERPRETATION: R
The success of research work largely depends upon the quality of interpretation. After collection of
data, it must be processed in terms of coding, classification and tabulation. The processed data is then
ta
analysed and from the analysis carried out, the researcher draws inferences.
am
a. Recommendations
In commercial and social research, interpretation of data helps to make recommendations. For
instance, if research is carried out to find incidence of poverty in backward areas, then the researcher
will draw inferences based on analysed data, and then provide recommendations to overcome the
problem of poverty. Social evils and inherited debt may be the major causes of poverty. In such cases,
r.M
the researcher may suggest measures to reduce the burden of inherited debt and the ways to overcome
social evils so as to reduce poverty.
b. Decision Making
Interpretation of data helps in decision making. For example, research may be conducted to find out
D
the causes of decline in sales. The analysis and interpretation of data may indicate the main cause of
decline in sales, i.e. poor after-sales-service. Therefore, the management may take suitable decisions
to improve after-sale-service such as :
● Training to the staff
● Opening of more service centres.
● Appointment of competent staff.
● Monitoring customer feedback on regular basis
c. Forecasting : Data Interpretation helps in Forecasting the Trends
● In case of commercial research, the trends in sales, profits, market share etc. can be
forecasted.
● In case of social research, the trends in the growth rate of population, literacy rate,
income levels etc. can be forecasted.
● Forecasts may help the concerned authorities to take necessary actions.
d. Development of Models
● Data interpretation helps in developing new models. For example, a research on brand
loyalty may help to develop a new model on brand loyalty. The model can be used for
academic and commercial purposes. For example, the AIDA (Attention, Interest,
Desire, Action) can be used by advertisers and marketers. This model indicates that
the customers attention must be attracted, interest must be developed in the minds of
the customers, strong desire must be created, and finally action can be induced i.e.
i
purchase or acceptance of idea.
an
e. Development of Hypothesis
● A pilot study can help to develop hypothesis. The interpretation of pilot study enables
the researcher to modify the preliminary hypothesis and accordingly a new hypothesis
aj
can be developed to proceed with the research activity.
f. Future Reference
●
R
Inferences drawn for particular research activity may be used for future reference.
Further studies can be conducted based on the conclusions drawn from the earlier
research activity.
g. Motivation to the Researcher
ta
● Proper interpretation may generate proper recommendations. On the basis of
recommendations, effective decisions can be taken by the organisation. If there are
excellent results, the researcher/interpreter may be rewarded with additional
am
incentives.
Data processing is the immediate step between the collection of data and Analysis of data. This phase
D
will play a vital role in quantitative data analysis. Both primary and secondary data collected through
different data collection methods i.e.,interview schedule or questionnaire is said to be raw data.
Such data has to be reduced into manageable and meaningful form for the analysis keeping the
objectives and hypothesis of the research in mind. To this editing, coding, verification of data,
classification and tabulation will become the essential components.
1. Editing
Editing is a process of examining the collected raw data to delete errors, alternative, inadequate
entries and to correct these when possible. Editing is the process of checking and adjusting the data
for omissions, legibility and consistency.
This type of Editing was used to be carried out in the course of Data Collection., generally the
supervisor who serves as the editor at that point of time is responsible to scrutinize the schedule
mostly on the same evening or the next day for looking into the problems like partially regarding the
answers to questions or blanks to certain questions, illegible handwriting, translation if any, at
inconsistencies in some answers.
i
This is also known as in house editing which will be done in the Central office mostly after Data
an
Collection or during data collection simultaneously.
Significance of Editing
● This process enables the researcher to detect errors/omissions and to see that they are
aj
corrected.
● It facilitates coding of data.
●
●
R
It coordinates with data entry and ensures uniform entries.
It helps to remove irrelevant data.
ta
2.Coding
am
Coding is the process by which response categories are summarized by numerals or other symbols to
carry out subsequent operations of data analysis. This process of assigning numerals or symbols to the
responses is called coding.
Significance of Coding
r.M
3.CLASSIFICATION
The data collected are usually very voluminous and large in quantity and as such they are not fit for
analysis and interpretation. For example, if we collect the data about marks secured by 1000 students
of a college we cannot say anything about the scoring behaviour of the students i.e. How many have
passed in first class, second class and third class etc., unless the data are arranged and presented
vividly in the form of tables. Such condensed data further facilitates easy comparison, analysis and
interpretation of the data. Hence the first step after collecting the data is to classify, and tabulate the
same. Classification is the first step in the tabulation though they are two distinct processes.
Thus classification is a process of arranging the data in groups or classes according to common
characteristics possessed by the items in the data. In other words, it is a process of sorting out similar
and dissimilar properties, which are present in the data. This process can be compared to sorting of the
letters in a post office.
i. Condensation : The vast data is summarised and brought to smaller and handy size so that
the data can be handled very easily. Thousands of figures having common characteristics such
that the similarities and dissimilarities contained in the data are shown very clearly.
ii. Simplification : The vast data collected is meaningless and complex in nature. When they are
classified and put in a condensed form the complexities are removed and the data are made
simple. Thus the complex data is simplified and the important characteristics are pinpointed
so that the reader can understand the subject matter very easily and quickly.
iii. Precision : Unnecessary details that are present in the data are dropped out during the process
i
an
of classification and data are made more concise and precise.
iv. Comparison : Classification enables us to sort out the individuals according to the different
characteristics - possessed by them. Thus the classified data present the information divided
into different parts. This facilitates comparison of various characteristics with one another.
aj
v. Statistical Treatment : Classification of data enables further processing and analysis of the
data collected. The different measures of central tendency, dispersion, skewness, kurtosis,
correlation and regression etc. can be very easily calculated for the classified data.
R
vi. Preparing the Final Report : Classification makes easy for further analysis of the data. On
the basis of the analysed data one can very easily write up the report and draw meaningful
conclusions, inferences, remarks etc.
ta
Bases of Classification :
a. Geographical Classification : In this method the data are classified on the basis of
geographical regions or places. For example, production of paper in India may be classified
according to states in the following way.
r.M
Karnataka 20
Maharashtra 18
Kerala 12
D
Tamil Nadu 15
Other states 45
Total 110
b. Chronological Classification : In this method the data are classified on the basis of different
points of time. For instance, the students studying in a college may be represented during the
years 2011-12 to 2014-15 as follows :
Year Number of Students
2011-12 750
2012-13 810
2013-14 880
2014-15 1,020
c. Qualitative Classification : In this method the data are classified on the basis of attribute or
quality like sex, blindness, literacy etc. Here the data are classified into two groups namely
the group possessing the quality and the other group not possessing that quality. We can take
any number of qualities and classify the data. For example classification of population
according to sex, literacy, etc. is as follows :
i
an
aj
R
ta
am
o
r.M
0 − 20 30
20 − 40 142
40 − 60 157
60 − 80 50
80 − 100 21
Total 400
o The above type of classification is also known as frequency distribution. Here we find
two important elements namely variable i.e. the marks in the above example and
Frequency i.e. number of students in each class. Hence a frequency distribution refers
to the manner in which the frequencies are distributed over a given variable.
TABULATION
Tabulation is one of the most important methods of presenting the classified data in a meaningful and
systematic fashion. It is a process of logical listing of the classified data in the form of a table
containing horizontal rows and vertical columns with all the necessary descriptions. It summarises the
data' simplifies complexities and provides the answers related to the questions, posed in any statistical
enquiry
i
an
Following are the important parts of a statistical table.
aj
The title of the table is placed above the table. If there are more than one table in a research, each
should bear a number for easy reference.
This is the main body of information needed for the research work.
r.M
This is placed below the table to convey the expansions of abbreviations to caption, stub or main
body.
D
6. Source note
If the table is based on outside information, it should be mentioned in the source note below.
i. To make the data easy to read and to understand : As the data are presented systematically
all the unnecessary details and repetitions are dropped out, the data are made simple and put
in the condensed form. The reader gets rows quickly a clear-cut meaning of the data put in
columns and rows
ii. To facilitate comparison : As the data are divided into number of parts in various columns
and rows, the comparison of the various parts of the data is facilitated. The relationship that
exists among the various parts of the data can be studied easily.
iii. To give an identity to the data : As the table provides the title, subtitle, column headings and
row headings etc. for the data presented, the distinct-identifications of the various parts of the
data can be made very easily.
iv. To reveal trend and patterns of the data : As the table displays the manner in which the
observations are distributed in various columns, rows and also subtotals and total, general
tendencies and patterns within the figures could be seen very easily.
i
an
aj
R
ta
am
r.M
Hypothesis Testing
Hypothesis is a statement being tested to prove its validity and generalisability. Hypothesis is an
instrument through which a researcher confirms the correctness of his assumptions.
Null hypothesis is formulated to represent a neutral state of relationship. Null hypothesis is framed
stating no relationship or no difference among variables. It is framed stating that the research variables
are not related or the difference between the variables is zero. For example if a researcher wants to
prove that sales and price of the product are related, he formulates null hypothesis stating that sales
and price of the product are not significantly related. If researcher intends to prove that there is a
difference in employee engagement between male and female employees; then the null hypothesis is
worded to show no significant difference in employee engagement between male and female
employees.
We know that hypothesis is an assumption about population parameters. To investigate whether the
assumption is true or not researcher collects data from representative sample groups. Collected data
are put for test with the help of hypothesis
Alternative Hypothesis is stated in opposite to null hypothesis. It is framed to express the assumed
relationship or variation among variables. It supports that the existence of a relationship or difference
between variables is not due to sampling fluctuations but due to actual variation. It is the hypothesis
i
that the researcher wishes to prove.
an
Alternative Hypothesis is denoted as H 1 .Suppose Researcher assumes that in a particular degree
course, in English language, students from rural background score below 60 marks. If researcher
wishes to test that Students from rural background score below 60 in English language then he may
aj
state null and alternative hypothesis as
H 0 : The average score of rural students in English language is not equal to 60.
R
H 1 : The average score of rural students in English language is below 60.
Illustration 2
ta
H 0: Employee engagement among male and female employees is equal.
Two tailed test: In two tailed tests, rejection region falls on two side of the distribution. Two tailed
tests are considered in such situations where in either higher or lower value of sample statistic than the
assumed population parameter is not preferred. Here researcher wants sample statistic to be exact or
very accurate.
D
Suppose a researcher is checking whether syringe filled with medicine to treat a particular disease, say
viral fever is 2ml. He frames hypothesis as shown in the text box
i
In this example the possible rejection of null hypothesis is under two situations
an
1. µ>; 2ml
2. µ<;2ml
aj
Thus in two tailed test, in a distribution, rejection region will be on two sides of the
In the discussed example the rejection region falls only on one side of the distribution that is left side
of the distribution. Thus it is a left tailed test.
If the alternative hypothesis is the average salary of the workers in Hotel industry is more than rs.
8000, µ>8000, then the rejection region will be on right side of the distribution, and it is right tailed
test.
In simple, we can say that in a distribution, if rejection region falls on two side of the
curve, it is two tailed test. On the other hand if rejection region falls only on one side of the curve, it is
one tailed test.
Type I and Type II error
In hypothesis testing sample statistics are considered for acceptance or rejection of hypothesis. If the
sample is not representative of the population, then there may be a possibility of wrong decision
regarding acceptance of rejection of hypothesis. These errors in decision about acceptance or rejection
of hypothesis are termed as type I and type II errors.
i
an
From the table it is clear that there are two correct decisions: one is accepting null hypothesis when it
is true and the other is rejecting the null hypothesis when it is false.
aj
But sometimes researcher may commit error in accepting or rejecting hypothesis.
Type I error occurs when a true hypothesis is rejected instead of accepting it. Type I error is the
R
probability of rejecting a true hypothesis which is denoted by the Greek letter α (alpha). Type I error is
called level of significance. Level of significance α is usually predetermined at 5% or 1%. Level of
significance at 5% means out of 100 ,5 chances of committing type I error. Researcher will be 95%
confident that the decision about rejection of null hypothesis is correct. Likewise if Level of
ta
significance is fixed at 1%, it means out of 100 chances of committing type I error is
1. Researcher will be 99% confident that the decision about rejection of null hypothesis is correct.
am
Type II error occurs when a researcher is accepting the false hypothesis. Type II error is denoted by β
(beta), known as β error. 1- β is called the power of the test. It is to be noted that both the types of
errors cannot be reduced simultaneously.
r.M
Hypothesis testing is a sequential process.Following are the major steps in hypothesis testing
1. Stating hypothesis
4. Computation
5. Decision
1. Stating hypothesis
The first and foremost important aspect of hypothesis testing is to state null and alternative hypotheses
clearly. Hypothesis should be based on the research question and objectives of the research.
Alternative hypothesis is framed opposite to null hypothesis. If a null hypothesis is rejected, an
alternative hypothesis is accepted. If one is accepted the other one is rejected.
For Example:
In this case the null hypothesis is stated with a specific population parameter value or range of values.
Here the researcher assumes a particular numerical value as population value. He compares sample
value to the population parameter for decision making.
i
an
For example
H 0 : The average customer care calls per day is not equal to 100.
aj
µ≠100
µ=100
R
H 1 : The average customer care calls per day is less than100.
ta
µ<100
µ>;100
When researcher presumes that two variables are related, then he frames hypothesis to examine
whether there is a significant relationship among variables.
D
For example
H 0 : There exists a significant relationship between employee attitude and customer rating.
2. Deciding significance level: Selecting a suitable level of significance is the next step in hypothesis
testing. Level of significance is called α (alpha)., level of significance is selected before drawing a
sample. Level of significance α is the probability of rejecting null hypothesis when actually it is true
and has to be accepted.
But usually in the majority of research work 5% or 1% level of significance is selected. If α is at 5%
(.05) means there is only 5 out of 100 chance of rejecting null hypothesis when it is true. In other
words there is 95% confidence of making right decision. Thus accepting or rejecting null hypothesis
depends on level of significance.
The test result is said to be ‘significant’ when null hypothesis is rejected at predetermined level of
significance.
Once the hypothesis is formulated and level of significance is selected then researcher should choose
an appropriate statistical test. Selection of suitable test statistic depends on the type of distribution. If
i
an
data follows normal distribution, then parametric tests are applied provided it holds good the
assumptions of particular test selected. T test, z test, F test or chi square tests are applied depending
upon the distribution and sample size.
aj
Critical region means the region or value that leads to rejection of null hypothesis. Before Actually
computing the test statistic critical region is to be specified. Critical region falls on both the sides of
R
distribution in two tailed tests. In two tailed test level of significance will be α/2. If α= .05, then .025
on either side of the distribution will be critical region.
In one tailed test, critical region falls either on the left or right side of the distribution. Once the
ta
critical region is decided then test statistic is computed either manually or with the help of software
like Microsoft excel or SPSS etc.
am
5.Decision: decision regarding accepting or rejecting null hypothesis depends on the fact whether
computed value falls on critical region. Once the calculation is done, if the value falls on critical
region then null hypothesis is rejected and alternative hypothesis is accepted.
r.M
Parametric tests are those statistical tests where the information about the population is completely
known by means and ways of its parameters.Parametric tests assumes that the data comes from a type
of probability distribution and then it makes inferences about the parameters of the distribution.
D
Nonparametric tests are those statistical tests where there is no knowledge about the population and
parameters but still it is required to test the hypothesis of the population. Nonparametric tests cover
techniques that do not rely on data belonging to any particular distribution.
The Z-test
Student t-test and
F test
Z test:
The z-test has been developed by Prof. R.A Fisher. It is based on the normal distribution; it is
widely used for testing the significance of several statistics such as mean, median, mode, coefficient
of correlation and others. This test is used even when binomial distribution or ‘t’ distribution is
applicable on the assumption that such a distribution tends to approximate normal distribution as the
sample size (n) becomes larger.
i
an
aj
R
ta
am
The t – test
The t-test was developed by W.S. Gosset in the year 1915. Since he published his finding under a pen
r.M
Student’s t test is suitable for testing the significance of a sample mean or for judging the significance
of difference between the means of two samples when the samples are less than 30 in number and
when the population variance is not known.
D
When two samples are related, the paired t-test is used. The t-test can also be used for testing the
significance of the coefficients of simple and partial correlations. The relevant test statistics, t, is
calculated from the sample data, and it is compared with its corresponding critical value in the t-
distribution table for the rejecting or accepting a null hypothesis.
i
an
aj
The F-test (ANOVA)
R
The F-test is used to compare the variances of two independent samples. It is also used in analysis of
variance (ANOVA) for testing the significance of more than two sample means at a time. The F-test is
based on F-distribution (which is a distribution skewed to the right, and tends to be more symmetrical,
as the number of degrees of freedom in the numerator and denominator increases. It is also used for
ta
judging the significance of multiple correlation coefficient.
am
The Chi-square test is the most popular nonparametric test of significance in social science and
r.M
business research. It is used to make comparisons between two or more nominal variables.
Unlike the other tests of significance the chi-square is used to make comparisons between frequencies
rather than between mean scores. This test evaluates whether the difference between the observed
frequencies and the expected frequencies under the null hypothesis can be attributed to chance or
actual population differences. The chi-square test is applicable to two or more independent samples
D
Testing of hypothesis and Numericals study from book and questions solved in the class.
D
r.M
am
ta
R
aj
an
i
D
r.M
am
ta
R
aj
an
i