Data Analysis RM

Uploaded by

ranisweta6744

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views21 pages

Data Analysis RM

Uploaded by

ranisweta6744

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 21

Data Analysis

Univariate & Bivariate Analysis

 Once the raw data is collected from both primary and secondary sources, the next step is to analyse
the same so as to draw logical inferences from them.
 The data collected in a survey could be voluminous in nature, depending upon the size of the
sample. In a typical research study there may be a large number of variables that the researcher
needs to analyse.
 The analysis could be univariate, bivariate and multivariate in nature.
 In the univariate analysis, one variable is analysed at a time.
 In the bivariate analysis two variables are analysed together and examined for any possible
association between them.
 In the multivariate analysis, the concern is to analyse more than two variables at a time.
 Descriptive analysis refers to transformation of raw data into a form that will facilitate easy
understanding and interpretation. Descriptive analysis deals with summary measures relating to the
sample data.
 The common ways of summarizing data are by calculating average, range, standard deviation,
frequency and percentage distribution. The first thing to do when data analysis is taken up is to
describe the sample.
Descriptive Analysis of Univariate Data
The first step under univariate analysis is the preparation of frequency distributions of each
variable. The frequency distribution is the counting of responses or observations for each of
the categories or codes assigned to a variable.

Missing observation should be assigned a number that should not be equal to the value of the
variable obtained as part of the survey. If the value of the missing observation was available; it
could perhaps lead to different research conclusions. The intensity of the deviation of the
actual results from the observed depends upon the number of missing observations and the
extent to which the missing data would be different from actual observation.

Generally, if the volume of missing data is small, it is unlikely to affect the conclusion from
the analysis. This may not always be the case. It is for this reason that the ‘valid per cent’
column should be used for interpreting the results.
Analysis of Multiple Responses
 At times, the researcher comes across multiple category questions where respondents could choose
more than one answer. In such a case, the preparation of frequency table and its interpretation is
slightly different.
 If the question in the research study is multiple category question and the respondents are allowed
to tick more than one choice, the percentage in such a case may not add up to 100.
 Ex- When accessing the internet at a cyber café, tick up to frequently used applications for which
you use the cyber café.
 1) E-mail
 2) Chat
 3) Browsing
 4) Downloading
 5) Shopping
 6) Entertainment
 7) Education
Analysis of Multiple Respondents
 In Table 11.7 the percentages are computed on the total sample size of 414. If these
percentages are added up, they would exceed more than 100 per cent. This is because of
multiplicity of answers as respondents were given the chance to choose
 more than one answer. The interpretation of the table would be based on a sample of 414
and is given as:
• The most used application at a cyber café is e-mail. It is seen that 94.9 per cent of the users
make use of this.
• The second popular application is chatting, and 76.3 per cent of the sample respondents
make use of it.
• Similarly, other applications in order of preference are browsing (56 per cent),
downloading (47.6 per cent), education 35.4 per cent), entertainment (32.6 per cent) and so
on.
Analysis of Ordinal Scaled Questions

 Rank the following five attributes while choosing a restaurant for dinner. Assign a rank of
1 to the most important, 2 to the next important ... and 5 to the least important.
• – Ambience
• – Food quality
• – Menu variety
• – Service
• – Location
 From a sample of 32, the responses obtained are given in Table 11.8. To construct
univariate tables out of the given data, one can take up one column at a time from Table
11.8 and prepare the separate frequency tables. For example, distribution of rank assigned
to attribute food quality may be considered in Table 11.9.
Grouping Large Data Sets

 Sometimes data collected is very large and needs to be collapsed for interpretation.
 Sometimes the data indicates that there are too many categories to allow quick
interpretation of the results.
 This could be facilitated by recoding the data into fewer broader categories.
 Similar analysis could be carried out in the case of interval scale data.
Measures of Central Tendency
 Mean
 Mean represents the arithmetic average of a variable.
 It is appropriate for interval and ratio scale data.
 Median
 The median can be computed for ratio, interval or ordinal scale data.
 The median is that value in the distribution such that 50 per cent of the observations are
below it and 50 per cent are above it.
 The median for the ungrouped data is defined as the middle value when the data is
arranged in ascending or descending order of magnitude.
 In case the number of items in the sample is odd, the value of (n + 1)/2th item gives the
median.
 However if there are even number of items in the sample, say of size 2n, the arithmetic
mean of nth and (n + 1)th items gives the median.
 The median could also be computed for the grouped data.
Measures of Central Tendency
 Mode
 Mode is that measure of central tendency which is appropriate for nominal or higher order scales.
 It is the point of maximum frequency in a distribution around which other items of the set cluster
densely.
 Mode should not be computed for ordinal or interval data unless these data have been grouped
first.
 The concept is widely used in business, e.g. a shoe store owner would be naturally interested in
knowing the size of the shoe that the majority of the customers ask for. Similarly, a garment
manufacturer is interested in determining the size of the shirt that fits most people so as to plan its
production accordingly.
 Formula: Mode = l + [(f – f1)/2f-f1-f2]*h
 l = lower limit of modal class
 f1,f2 = frequency of class preceding modal class and following modal class.
 f = frequency of modal class.
 h = size of the class interval
 Skewness
 It measures lack of symmetry in the distribution.
 In case of symmetrical distribution, mean = median = mode.
 For a positively skewed distribution, mean > median > mode.
 In such a case, the longer tail of the distribution is towards the right, the mode falls under
the peak and the mean changes its position as it is affected by extreme values. The same is
the case with negatively skewed distribution where arithmetic mean < median < mode.
 The skewness is measured by the difference between arithmetic mean and mode. If the
value of arithmetic mean is greater than mode, skewness is positive and if the value of the
expression is negative, skewness is negative.
Measures of Dispersion
 The measures of central tendency locate the centre of the distribution. However, they do not provide enough
information to the researcher to fully understand the distribution being examined.
 For example, measures of central tendency do not indicate how items are spread out on either side of the centre.
Therefore, there is a need to study the spread of a distribution of a variable and the methods which provide that are
called measures of dispersion.
 The study of dispersion could help in taking better decisions. This is because small dispersion indicates high
uniformity of the items, whereas large variability denotes less uniformity. If returns on a particular investment
show lot of variability (dispersion), it means a risky investment as compared to the one where variability is very
small. A company may not only be interested in finding out the average sales of a product but also the variability in
the sales over time.
 Measures of Dispersion:
 1) Range
 This is the simplest measure of dispersion and is defined as the distance between the highest (maximum) value and
the lowest (minimum) value in an ordered set of values.
 The range could be computed for interval scale and ratio scale data.
 Range = Xmax – Xmin
 Xmax = Maximum value of the variable
 Xmin = Minimum value of the variable
Measures of Dispersion
 Measures of Dispersion:
 1) Range
 This is the simplest measure of dispersion and is defined as the distance between the highest (maximum)
value and the lowest (minimum) value in an ordered set of values.
 The range could be computed for interval scale and ratio scale data.
 Range = Xmax – Xmin
 Xmax = Maximum value of the variable
 Xmin = Minimum value of the variable
 The limitation of range as a measure of dispersion is that it considers only the extreme value and ignores
all other data points.
 The value of range could vary considerably from sample to sample.
Measures of Dispersion
 2) Variance and standard deviation
 Variance is defined as the mean squared deviation of a variable from its arithmetic mean
 The positive square root of the variance is called standard deviation.
 The variance is a difficult measure to interpret and, therefore, standard deviation is used as a measure of
dispersion.
 The population standard deviation is denoted by s and computed using the following formula:

 X = Value of observations
u=
Descriptive Analysis of Bivariate Data
 Bivariate analysis examines the relationship between two variables.
 There are three types of measures used for carrying out bivariate analysis.
1) Cross-tabulation
2) Spearman’s rank correlation coefficient
3) Pearson’s linear correlation coefficient
 In simple tabulation, the frequency and the percentage for each question was calculated.
 In cross-tabulation, responses to two questions are combined and data is tabulated together.
 For example, in cross-tabulating a two- category measure of income (low- and high-income households)
with a two-category measure of purchase intention of a product (low and high purchase intentions) the
basic result is a cross-classification as shown in Table.
 The results of cross-tabulation show the number of sample respondents with low income having
low purchase intention, low income with high purchase intention, high income with low
purchase intention and high income with high purchase intention.
 As is the case with simple tabulations, the results of a cross-tabulation are more meaningful if
cell frequencies are computed as percentages.
 the percentages can be computed (1) row-wise so that the percentages in each row add up to 100
per cent; (2) column-wise so that the percentages in each column add up to 100 per cent or (3)
cell percentages, such that percentages added across all cells equal 100 per cent. The
interpretation of percentages is different in each of the three cases.
 The basis for calculating category percentage depends upon the nature of relationship between
the variables. One of the variables could be viewed as dependent variable and the other one as
independent variable.
 The purchase intention could be treated as dependent variable, which depends upon income
(independent variable). The rule is to cast percentages in the direction of independent (causal)
variable across the dependent variable.
 The results indicate that with increase in income, the purchase intention for the product increases.
 Just because there is a high association between two variables, it does not imply a cause-and-effect
relationship.

Correlation
 Correlation measures the degree of association between two or more variables. When we
are dealing with two variables, we are talking in terms of simple correlation and when
more than two variables are involved, the subject matter of interest is called multiple
correlation.
 Positive correlation: When two variables X and Y move in the same direction, the
correlation between the two is positive. If one variable increases, the other variable also
increases. Ex- sales revenue and the advertising expenditure.
 Negative correlation: When two variables X and Y move in the opposite direction, the
correlation is negative. If one variable increases, the other decreases and vice versa. Ex-
quantity demanded and the price of the commodity.
 Zero correlation: The correlation between two variables X and Y is zero when the
variables move in no connection with each other. If the variable X increases, Y may
increase or decrease in some situation.
Spearman’s Rank Order Correlation
 Suppose in a beauty contest two judges are asked to rank ten female participants. A rank
correlation coefficient between the ranks awarded by two judges would give how
consistent they are in awarding the rank. The Spearman’s rank correlation coefficient is
given by

 The rank correlation coefficient takes a value between –1 and +1. In case the value is +1, it
indicates a complete agreement between the ranks assigned by two judges, whereas the
value of –1 indicates a complete disagreement.
 Example: Two judges in a beauty contest evaluate ten participants. A rank of one was assigned
to the most beautiful candidate, two to the next and so on. Compute the rank order correlation
and comment on the value.

 It is seen that there is a high degree of positive rank correlation coefficient which implies that
there is a strong agreement between two judges on their opinion about the beauty of
contestants.
Karl Pearson Linear Correlation
 A quantitative estimate of a linear correlation between two variables X and Y is given by Karl Pearson as:

 The linear correlation coefficient takes a value between –1 and +1 (both values inclusive).
 If the value of the correlation coefficient is equal to 1, the two variables are perfectly positively correlated and the
scatter of the points of the variables X and Y will lie on a positively sloped straight line.
 Similarly, if the correlation coefficient between the two variables X and Y is –1, the scatter of the points of these
variables will lie on a negatively sloped straight line and such a correlation will be called a perfectly negative
correlation.
 It may be noted that the closer the scatter of points to the line, higher is the degree of correlation between the
Data Transformation

 Under data transformation, the original data is changed to a new format for performing
data analysis so as to achieve the objectives of the study. This is generally done by the
researcher through creating new variables or by modifying the values of the scaled data.
 At times it may become essential to collapse or combine adjacent categories of a variable
so as to reduce the number of categories of original variables. In a 5-point Likert scale,
having categories like strongly agree, agree, neither agree nor disagree, disagree and
strongly disagree can be clubbed into three categories. One can combine strongly agree
and agree category into one category. Similarly, disagree and strongly disagree responses
could be clubbed into a separate category and neither agree nor disagree could be treated
as a separate category. This is how a five-category scale can be collapsed into a three-
category one.

Soil Information System
No ratings yet
Soil Information System
2 pages
Ayahuasca Medicine The Shamanic World of Amazonian Sacred Plant Healing Total Access Ebook
100% (15)
Ayahuasca Medicine The Shamanic World of Amazonian Sacred Plant Healing Total Access Ebook
14 pages
Quantitative Data Analysis Assignment (Recovered)
100% (1)
Quantitative Data Analysis Assignment (Recovered)
26 pages
Bhanubhakta and Nepali Nation in Darjeel PDF
No ratings yet
Bhanubhakta and Nepali Nation in Darjeel PDF
1 page
Statistics
No ratings yet
Statistics
49 pages
Descriptive Analytics Notes
No ratings yet
Descriptive Analytics Notes
6 pages
UPES-CCE - MBA - SEM4 - Dissertation Topics
No ratings yet
UPES-CCE - MBA - SEM4 - Dissertation Topics
33 pages
Biological Science Major Part 8
No ratings yet
Biological Science Major Part 8
10 pages
CO3 and CO4
No ratings yet
CO3 and CO4
126 pages
170 Items General Education
No ratings yet
170 Items General Education
26 pages
Search Document
No ratings yet
Search Document
13 pages
Bi Unit-Iv
No ratings yet
Bi Unit-Iv
142 pages
Training Manual Fish Stock Assessment and Management - 2015 - T.V. Sathianandan
No ratings yet
Training Manual Fish Stock Assessment and Management - 2015 - T.V. Sathianandan
6 pages
Planning Data Analysis Using Statistics
No ratings yet
Planning Data Analysis Using Statistics
27 pages
Statistics Intro 1
No ratings yet
Statistics Intro 1
41 pages
Educational Statistics Notes
No ratings yet
Educational Statistics Notes
32 pages
Data Analysis and Presentation
No ratings yet
Data Analysis and Presentation
77 pages
Statistics - Basic Concepts Part 1
No ratings yet
Statistics - Basic Concepts Part 1
34 pages
Module 2
No ratings yet
Module 2
28 pages
Lecture 2-Descriptive Statistics
No ratings yet
Lecture 2-Descriptive Statistics
74 pages
Data Analysis Techniques
No ratings yet
Data Analysis Techniques
12 pages
Business Statistics - KMBN104
No ratings yet
Business Statistics - KMBN104
25 pages
Ge MMW Hybrid - 2
No ratings yet
Ge MMW Hybrid - 2
7 pages
Case 11 Citigroup
82% (11)
Case 11 Citigroup
18 pages
Chapter 4 Data Management Part 3
No ratings yet
Chapter 4 Data Management Part 3
68 pages
14 - Chapter 7 PDF
No ratings yet
14 - Chapter 7 PDF
39 pages
ISM - Session 1 - May 2025
No ratings yet
ISM - Session 1 - May 2025
54 pages
York County Court Schedule For March 15
No ratings yet
York County Court Schedule For March 15
2 pages
Analytical Techniques Lec 1
No ratings yet
Analytical Techniques Lec 1
42 pages
DOS 1.0 Jan82
No ratings yet
DOS 1.0 Jan82
307 pages
Education Skills: Video Editing and Post Production
No ratings yet
Education Skills: Video Editing and Post Production
1 page
Unit 1 Notes
No ratings yet
Unit 1 Notes
5 pages
pr2 c4 l5
No ratings yet
pr2 c4 l5
9 pages
What Is An Arithmetic Sequence?: Arithmetic Sequences and Series
No ratings yet
What Is An Arithmetic Sequence?: Arithmetic Sequences and Series
34 pages
LabModule - Exploratory Data Analysis - 2023ic
No ratings yet
LabModule - Exploratory Data Analysis - 2023ic
24 pages
Final Script Assembly Play
No ratings yet
Final Script Assembly Play
3 pages
d04634 41 Value Sheet Nortrol Mu
100% (1)
d04634 41 Value Sheet Nortrol Mu
8 pages
MCOM2004 Statistical Analysis
No ratings yet
MCOM2004 Statistical Analysis
140 pages
Data Analysis
No ratings yet
Data Analysis
40 pages
Analysis of Data-Statistic: Unit IV
No ratings yet
Analysis of Data-Statistic: Unit IV
30 pages
Data Analysis
No ratings yet
Data Analysis
30 pages
1483082741da Mod10 Q1 e Text
No ratings yet
1483082741da Mod10 Q1 e Text
12 pages
Unit 6 Crime and Punishment
No ratings yet
Unit 6 Crime and Punishment
2 pages
Assignment No 3
No ratings yet
Assignment No 3
16 pages
Important Measures of Central Tendency Are Mean, Median and Mode
No ratings yet
Important Measures of Central Tendency Are Mean, Median and Mode
31 pages
Measures of Central Tendency Position and Dispersion 1.Pptx 20241015 145631 0000
No ratings yet
Measures of Central Tendency Position and Dispersion 1.Pptx 20241015 145631 0000
44 pages
Literature
No ratings yet
Literature
15 pages
Week 2
No ratings yet
Week 2
27 pages
AL - I (Unit - I)
No ratings yet
AL - I (Unit - I)
19 pages
2.3 Descriptive Numerical Summary Measures
No ratings yet
2.3 Descriptive Numerical Summary Measures
67 pages
FROM DR Neerja Nigam
No ratings yet
FROM DR Neerja Nigam
75 pages
BRM CH - 07
No ratings yet
BRM CH - 07
7 pages
Unit Iii
No ratings yet
Unit Iii
152 pages
BoQ For Electrical, Voice Data
No ratings yet
BoQ For Electrical, Voice Data
3 pages
Hazard Identification and Risk Assessment Form
No ratings yet
Hazard Identification and Risk Assessment Form
4 pages
Session 1 ISM May 2024
No ratings yet
Session 1 ISM May 2024
59 pages
Quantitative Data Analysis
100% (2)
Quantitative Data Analysis
27 pages
M1 & M2 Supplementaries
No ratings yet
M1 & M2 Supplementaries
52 pages
Differential Equation
No ratings yet
Differential Equation
13 pages
BSQT PG II Sem II Notes Session (1 6)
No ratings yet
BSQT PG II Sem II Notes Session (1 6)
35 pages
08 Fog Lights
No ratings yet
08 Fog Lights
14 pages
Strategic Plan - UnderArmour
75% (4)
Strategic Plan - UnderArmour
21 pages
Nimisha Final Project
No ratings yet
Nimisha Final Project
79 pages
Statistics
No ratings yet
Statistics
68 pages
Quantitative Data Analysis
No ratings yet
Quantitative Data Analysis
31 pages
Analysis On The Causes of Cracking and Excessive Deflection of Long Span Box
No ratings yet
Analysis On The Causes of Cracking and Excessive Deflection of Long Span Box
18 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
9 pages
MMW Data Management
No ratings yet
MMW Data Management
35 pages
Mathematics in The Modern World
No ratings yet
Mathematics in The Modern World
13 pages
Type Study of Obelia
No ratings yet
Type Study of Obelia
21 pages
Interpreting Test Score: Online Workshop 8602 Aiou
100% (1)
Interpreting Test Score: Online Workshop 8602 Aiou
39 pages
Marketing Ii: Facultad de Economía y Negocios Universidad de Chile
No ratings yet
Marketing Ii: Facultad de Economía y Negocios Universidad de Chile
18 pages
f592b059 1643454320549
No ratings yet
f592b059 1643454320549
39 pages
Reviewer Part 1
No ratings yet
Reviewer Part 1
9 pages
Statistical Analysis - Descriptive Stat
No ratings yet
Statistical Analysis - Descriptive Stat
6 pages
Statistics in Research Processing and Data Analysis
No ratings yet
Statistics in Research Processing and Data Analysis
34 pages
1 - Chapter (1) Analysis of Data and Its Types Exercise
No ratings yet
1 - Chapter (1) Analysis of Data and Its Types Exercise
10 pages
Energy Day: From The Content Group To The Climate Champions
No ratings yet
Energy Day: From The Content Group To The Climate Champions
3 pages
Math-7 FLDP Quarter-4 Week-6
No ratings yet
Math-7 FLDP Quarter-4 Week-6
7 pages
Dragonshards - 2007-04-16 - The - Children - of - Khyber
No ratings yet
Dragonshards - 2007-04-16 - The - Children - of - Khyber
8 pages
PHY 20 Physics For Engineers
No ratings yet
PHY 20 Physics For Engineers
4 pages
Lecture Notes: (Introduction To Medical Laboratory Science Research)
No ratings yet
Lecture Notes: (Introduction To Medical Laboratory Science Research)
13 pages
09 - Data Analysis - Descriptive Statistics
No ratings yet
09 - Data Analysis - Descriptive Statistics
23 pages
Week One: Introduction To Quantitative Methods MBA 2013
No ratings yet
Week One: Introduction To Quantitative Methods MBA 2013
49 pages
LP Applied 2
No ratings yet
LP Applied 2
3 pages
Central Tendency Variation Skewness Individual Performance Relationships
No ratings yet
Central Tendency Variation Skewness Individual Performance Relationships
9 pages
Applied Basic Sciences in Paediatrics Addendum
No ratings yet
Applied Basic Sciences in Paediatrics Addendum
20 pages
Educ 201
No ratings yet
Educ 201
2 pages
Amjad Khan
No ratings yet
Amjad Khan
2 pages
Descriptive Statistics: Six Sigma Thinking, #3
From Everand
Descriptive Statistics: Six Sigma Thinking, #3
Sumeet Savant
No ratings yet

Data Analysis RM

Uploaded by

Data Analysis RM

Uploaded by

Data Analysis

Univariate & Bivariate Analysis

You might also like