0% found this document useful (0 votes)
797 views34 pages

Data Types - Research Methodology

A random sample of 987 students at XXX University were surveyed about smoking. Forty-three percent of students reported smoking regularly. With a margin of error of 3%, researchers can be confident that between 66% and 72% of all students at XXX University think smoking regularly should be illegal. Researchers studied the average heart rate of 130 male college students. Descriptive statistics like the mean, median and mode were calculated to describe the central tendencies of the data. Inferential statistics like confidence intervals and p-values help determine if results from a sample can be generalized to the larger population from which the sample was drawn.

Uploaded by

Vetri Padi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
797 views34 pages

Data Types - Research Methodology

A random sample of 987 students at XXX University were surveyed about smoking. Forty-three percent of students reported smoking regularly. With a margin of error of 3%, researchers can be confident that between 66% and 72% of all students at XXX University think smoking regularly should be illegal. Researchers studied the average heart rate of 130 male college students. Descriptive statistics like the mean, median and mode were calculated to describe the central tendencies of the data. Inferential statistics like confidence intervals and p-values help determine if results from a sample can be generalized to the larger population from which the sample was drawn.

Uploaded by

Vetri Padi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 34

Research Methodology

October 11, 2018


Types of Data
Data Types

Categorical Numerical
(Qualitative) (Quantitative)

Nominal Ordinal Discrete Continuous


Categorical Data
• Represent characteristics
• Can use numbers to represent them but do not have
mathematical meaning
• Example:
Gender : Male Female
Communication mode: Phone Email
Categorical Data
Categorical

Nominal Ordinal

Nominal Ordinal
- Used to label variables - Used to label variables
- Discrete values - Discrete values
- No order (can switch values) - With order (cannot switch
Male 1 Female 2 values)
(1 and 2 are labels to UG 1 PG 2
represent gender category – (1 and 2 are ordered labels to
you can label Male as 2 and represent educational
Female as 1) qualification)
Numerical Data

Discrete data Continuous data


• Represent discrete data • Represent measurements
• Can only take certain values • Cannot be counted but
• Cannot be measured but measured
counted
• Example:
• Example:
1. Height of the person
1. Number of accidents in a
month (= 15) (= 141.3)
2. Number of shops in a 2. Room Temperature
mall (= 52) (= 31.4)
Numerical Data
Numerical

Interval Ratio

Interval data Ratio data


• Ordered units • Ordered units
• Have the same difference • Have the same difference
• No true Zero • Have an absolute Zero
• Example:
• Example:
Temperature:
+10 Height of the plant (in cm):
+5 0
0 +1
-5 +2
-10 +3
Statistical Methods

Nominal Data:
Freq, Proportion, Percentage, Pie chart, bar chart
Ordinal Data:
Freq, Proportion, Percentage, Percentiles, Median,
Mode and the inter quartile range, Pie chart, bar chart
Continuous Data:
Percentiles, median, inter quartile range, mean, mode,
standard deviation, range, histogram, box-plot
Sample data
Amount
No. of times
S.No Age ( yrs) Age Group Gender spent in
shopped
shopping
1 19 1 Male 6 12001.50
2 24 2 Female 7 16455.00
3 28 2 Male 4 9800.75
4 35 3 Female 3 4344.00

Age -
Age group –
Gender –
No of times shopped –
Amount spent in shopping -
Sample data
Amount
No. of times
S.No Age ( yrs) Age Group Gender spent in
shopped
shopping
1 19 1 Male 6 12001.50
2 24 2 Female 7 16455.00
3 28 2 Male 4 9800.75
4 35 3 Female 3 4344.00

Age - Numeric, Continuous


Age group – Categorical, Ordinal
Gender – Categorical, Nominal
No of times shopped – Numeric, Discrete
Amount spent in shopping - Numeric, Continuous
Population and Sample
Population
• Any large collection of objects or individuals about
which information is sought
• Example – Indians, students, hospitals or trees

“A study of road accidents in India”

“ A study of academic achievement of private


school students in America”
Population
Population - Parameter
• Summary number for population
• Pertains only to population
• Population mean - µ ( Greek letter mu)
• Population proportion – p
• Example:
The average weight of all middle-aged female
Chinese - µ

The proportion of likely Indian students approving


the new education policy - p
Sample
• A group drawn from population that represents the
population
• Example:
To study Road accidents in India, take sample as:

Road accidents in 10 large cities across India


Road accidents in 10 medium size cities across India
Road accidents in 10 small size cities across India
Road accidents in 30 rural places across India
Sample
Sample - Statistic
• Summary number for sample
• Pertains only to sample
• Sample mean - x
• Sample proportion – p
• Example:
The average weight of a random sample of 100
middle-aged female Chinese

The proportion in a random sample of 1000 likely


Indian students approving the new education policy
Population and Sample

The main campus at XXX University has a population


of approximately 42,000 students. A research
question is "what proportion of these students
smoke regularly?" A survey was administered to a
sample of 987 XXX university students. Forty-three
percent (43%) of the sampled students reported that
they smoked regularly.

Population, Parameter, Sample, Statistic


Population and Sample

Assume that there exists a population of 7 million


college students in the United States today. The
average GPA of all of these college students is 2.7 (on
a 4-point scale). A random sample of 100 college
students were taken, and their average GPA was
found to be 2.9

Population, Parameter, Sample, Statistic


Population and Sample

• Very very difficult to find the population mean


• 99.99% impossible to find the population mean
• Can be estimated from sample and their statistics
• Can be estimated using confidence interval and
hypothesis testing

Lower value < Population mean < Upper value


Confidence Interval
Lower value < Population mean < Upper value
• Should using a hand-held cell phone while driving be
illegal?
• For example, a newspaper report (ABC News poll, May 16-20,
2001) was concerned whether or not U.S. adults thought using a
hand-held cell phone while driving should be illegal. Of the
1,027 U.S. adults randomly selected for participation in the poll,
69% thought that it should be illegal. The reporter claimed that
the poll's "margin of error" was 3%. Therefore, the confidence
interval for the (unknown) population proportion p is 69% ± 3%.
That is, we can be really confident that between 66% and 72% of
all U.S. adults think using a hand-held cell phone while driving a
car should be illegal.
Confidence Interval
• Let's take an example of researchers who are
interested in the average heart rate of male college
students. Assume a random sample of 130 male
college students were taken for the study.

The following is the Minitab Output of a one-sample t-


interval output using this data.

One-Sample T: Heart Rate


Confidence Interval
One-Sample T: Heart Rate
Descriptive Statistics &
Inferential Statistics
Descriptive Statistics – Measures of
Central Tendency
• 3 Ms - Mean, Median, Mode
Mean - Example:
Heart beats per minute for 10 adults are given below:

58, 61, 72, 65, 68, 60, 75, 69, 69, 73


Rearrange them in the ascending order
58, 60, 61, 65, 68, 69, 69, 72, 73, 75

1. Mean is the average of the given data


Mean= (58+ 60+ 61+ 65+ 68+ 69+ 69+ 72+ 73+ 75)/10=67.0
Descriptive Statistics
• 3 Ms - Mean, Median, Mode
Median - Example with even number of data:
Heart beats per minute for 10 adults are given below:
Rearrange them in the ascending order
58, 60, 61, 65, 68, 69, 69, 72, 73, 75
4 values 4values

2. Median is the score that divides the data in to half


(median is the mid value)
Median = Average of 68 and 69 (since there are even
number of data) = 68.5
Descriptive Statistics
• 3 Ms - Mean, Median, Mode
Median - Example with odd number of data:
Heart beats per minute for 9 adults are given below:
Rearrange them in the ascending order
58, 60, 61, 65, 68, 69, 72, 73, 75
4 values 4values

2. Median is the score that divides the data in to half


(median is the mid value)
Median = Mid value = 68 (since there are odd number of
data)
Descriptive Statistics
• 3 Ms - Mean, Median, Mode
Mode - Example:
Heart beats per minute for 10 adults are given below:

58, 61, 72, 65, 68, 60, 75, 69, 69, 73


Rearrange them in the ascending order
58, 60, 61, 65, 68, 69, 69, 72, 73, 75

1. Mode is the most frequently occurring value


Mode= 69
Statistical Significance & p-value
Significance:
A measure to check if the results of research are due to
chance
p-value:
The way in which significance is reported statistically
Example:
p<0.01 means that there is less than 1% chance that the
study results are due to random chance

Generally p-values are set as 0.05 or 0.01


Statistical Significance & p-value
Example:
• A study had one group of students (Group A) study using
notes they took in class; the other group (Group B)
studied using notes they took after class using a
recording of the lecture. Students in Group A scored
higher on a test than Group B. The study reports a
significance of p<.01 for the results.
• This means that whatever the reason students who took
notes in class did better on the test, there is only a 0 - 1%
chance that the results are due to some random factor
(such as Group A having smarter students than Group B).

Descriptive Statistics –
Measures of Dispersion
Measures of dispersion:
A measure that measures the spread of the data or the
variation around the central value

- Range
- Variance
- Standard Deviation
- Inter quartile Range
Descriptive Statistics –
Measures of Dispersion
Range:
• Difference between the largest and smallest sample
values
• Depends only on extreme values and provides no
information about how the remaining data is
distributed

Example:
Heart beats per minute for 10 adults are given below:
58, 60, 61, 65, 68, 69, 69, 72, 73, 75

Range = 75-58 = 17
Descriptive Statistics –
Measures of Dispersion
Variance and Standard deviation (for sample):
Measures the degree of spread in a variable’s values

n = number of observations
x = variable value
x = Mean value

Note: For population, variance


becomes as:
Variance
Coefficient of variation is the ratio between the standard
deviation and the mean ,expressed as a percentage. This
can be either (σ / µ)*100 or (s / x )*100
Variance - Example
The temperature in chemical reactor A and chemical
reactor B were measured every half hour under the
same conditions.

For A: 78.1°C, 79.2°C, 78.9°C, 80.2°C, 78.3°C, 78.8°C,


79.4°C
For B: 78.5°C, 79.1°C, 80.1°C, 80.2°C, 78.6°C, 78.7°C,
78.1°C

Which one of the reactor is better in terms of


maintaining the temperature over the measured
period
Variance – Example - Solution
Chemical reactor A :
Temp x x - Mean x - Mean (x - Mean)2
78.1 78.1-79.0 -0.89 0.78
79.2 79.2-79.0 0.21 0.05
78.9 78.9-79.0 -0.09 0.01
80.2 80.2-79.0 1.21 1.47
78.3 78.3-79.0 -0.69 0.47
78.8 78.8-79.0 -0.19 0.03
79.4 79.4-79.0 0.41 0.17
=2.99
Mean Temp x =79.0
Variance – Example - Solution
Chemical reactor B :
Temp y y - Mean y - Mean (y - Mean)2
78.5 78.5-79.0 -0.54 0.29
79.1 79.1-79.0 0.06 0.00
80.1 80.1-79.0 1.06 1.12
80.2 80.2-79.0 1.16 1.34
78.6 78.6-79.0 -0.44 0.20
78.7 78.7-79.0 -0.34 0.12
78.1 78.1-79.0 -0.94 0.89
=3.96
Mean Temp y =79.0
Variance – Example - Solution
Chemical reactor A : Chemical reactor B :
Mean Temp x =79.0 Mean Temp y =79.0
(x-Mean)2 =2.99 (y-Mean)2 =3.96
Variance = (x-Mean)2 Variance = (x-Mean)2
(n-1) (n-1)
= 2.99/(7-1) = 3.96/(7-1)
= 2.99/6 = 3.96/6
Variance = 0.4983 Variance = 0.6600

Std devn=SQRT of Variance Std devn=SQRT of Variance


= SQRT of 0.4983 = SQRT of 0.6600
Std devn= 0.7059 Std devn= 0.8124
Variance – Example - Solution
Chemical reactor A : Chemical reactor B :
Mean Temp x =79.0 Mean Temp y =79.0
Coeff of variation = Coeff of variation =
(S.D/Mean)*100 (S.D/Mean)*100

Coeff of varia = Coeff of varia =


(0.7059/79.0)*100 (0.8124/79.0)*100
= 0.89% = 1.028%

A has got less variance, standard deviation and


coefficient of variation than B which means A has got
less spread of temperature. Hence Reactor A is better as
temperature does not deviate much compared to B

You might also like