0% found this document useful (0 votes)

6 views50 pages

BRM Unit 3 & 5 Data Analysis

The document discusses different types of data and methods of data analysis. It defines categorical and continuous data and different types of each. It also explains various steps and methods used for data preparation, summarization, and analysis including tabulation, graphical representation, descriptive and inferential statistics.

Uploaded by

Aman Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views50 pages

BRM Unit 3 & 5 Data Analysis

Uploaded by

Aman Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 50

Dr.

Urooj A Siddiqui
 Data – Raw Facts, especially numerical facts,
collected together for reference or
information.
 Data is collected on some particular
variable/s
 Data analysis is processing of data to derive
useful information
 Knowledge communicated concerning some
particular fact
 The created knowledge helps in APPLICATION /
DECISION MAKING
 Categorical:Qualitative
 Continuous: Quantitative

Data

Categorical Continuous

Nominal Ordinal Interval Ratio

 Any phenomenon which takes at least two
different values/ observations

 Data:Set of values/ observations

collected on variable is called data
 Nominal
 Ordinal
 Interval
 Ratio
1. Data Preparation / Initial 2. Summarizing Data / Data
Operations Analysis Operations

 Tables / Crosstab
 Editing / Cleaning
 Graph / Figure
 Coding  Statistical Analysis
 Classification 1. Descriptive Methods
 Frequency, %age, Ratio,
 Tabulation
 Mean, Median, Standard
 Graphical Deviation (Variance)
Representation 2. Inferential Methods
 Comparison (t/z-test/Anova)
 Association (chi square test)
 Correlation (r)
 Prediction/ Regression
(y = ax + b)
 Editing / Data Cleaning
 examining the collected raw data to detect any errors
and omit/correct it if possible
 Coding
 assigning numerals to answers so that responses can
be put into a limited number of categories
 Classification
 Grouping of data on some basis (large volume of raw
data is reduced into homogenous groups
I. Attribute - on the basis of demographic bases
eg. gender, rural/urban, day scholar/hosteller
II. Class Interval – on the basis on some numeric range
eg. 0-10, 10-20 etc.
I. Tabulation
 is the process of displaying raw data in tabular
form and summarising it for further analysis
 orderly arranging data in columns and rows
Tabulation is essential because
 It conserves space and reduces statements
 It facilitates the process of summation of
items, comparison, detection of errors and
omissions
 Basis for various statistical computations
temp of
Gende Yrs in Pain
Name Caste Age Mob. No. Edu IQ locality
r school level
deg cel

Ram M Hindu 60 9450366367 NIL 0 16 Mild-0 -4

Akbar M Muslim 65 8004896712 HS 16 14 Mod-1 20

Sita F Hindu 309 9934876545 Int. 19 0 Mild-0 15

Shalini F Hindu 90 2542543598 HS 8 16 Mild-0 0

Mehnaj F Sikh 38 9458098734 UG 21 13 Severe-2 0

Ravi M Hindu 48 9412890112 PG 23 20 Mod-1 -1

Hari M Hindu 45 8796654398 Prim 12 10 Mod-1 30

temp of
Edu Yrs in Pain
Name Gender Caste Age Mob.No. IQ locality
level sch. level
deg cel

7 1 1 60 9450366367 1 0 16 0 4

2 1 2 65 8004896712 1 16 14 2 20

5 2 1 35 9934876545 2 19 0 0 15

4 2 1 90 2542543598 1 8 16 0 0

3 2 3 38 9458098734 3 21 13 3 0

6 1 1 48 9412890112 4 23 20 2 -1

1 1 1 45 8796654398 0 12 10 2 30

Nominal & Ordinal called qualitative . Interval and Ratio called quantitative
Roll. Age
 Single / Multi Variable Table - one or No (yr)
more variable (no interaction) 1 22
2 24
Single Variable Freq. Table
3 23
Age Group (years) Freq.
4 26
Below 20 2
5 19
20-22 28
6 25
22-24 16
. .
24-26 10
. .
Above 26 4
. .
60 . .
. .
**Multiple Variable Table – as presented in above slide
60 22
 Crosstabs – interaction of two or more
variables
Two Variable Interaction – Crosstab

Gender

Age Group Male Female Total

Below 20 1 1 2
20-22 18 10 28
22-24 9 7 16
24-26 7 3 10
Above 26 3 1 4
38 22 60
Graphical Representation of Data
 Pie Chart
 Bar Graph
 Histogram
 Line Graph
 Scatter Plot
 Scatter Plot & Correlation
Pie Charts
 It is used to represent %ages, distribution of 1
variable at various levels

Sales (in mn)

1.2,
8%
1.4,
10% 1st Qtr
2nd Qtr
3.2, 8.2, 58% 3rd Qtr
23%
4th Qtr
Bar Chart
 It is used to represent 1 variable at various levels
 Levels can be year/ groups etc.

4 Sales
3.5
3
2.5
4.3 4.5
2
3.5
1.5
2.5
1
0.5
0
2018 2019 2020 2021
Bar Chart
5 Clustered Bar
4.5
4
3.5
3 1st
2.5 2nd
2 4.3 4.4
4 3rd
3.5
1.5 3 3 4th
1 2.4 2.5 2.5
2 2 1.8
0.5
0
2018 2019 2020
Histogram
 To show the distribution of a Roll. Age
No (yr)
quantitative variable
1 22
2 24
3 23
12
4 26
10
5 19
8
Frequency

6 25
6
10 . .
4 8
6 . .
2 4 . .
2 0
0
10 20 30 40 50
. .
Class Interval/Variable Unit . .
60 22
Line Diagram
 To show change in variable in a particular time
period / on some reference range

₹ 7.40

₹ 7.20

₹ 7.00

₹ 6.80
Stock Price

₹ 6.60

₹ 6.40

₹ 6.20

₹ 6.00

₹ 5.80

₹ 5.60
1 2 3 4 5 6 7 8 9 10

Last 10 Days
Line Diagram
 May also be used to compare 2 or more variables
along the range
14
12
10
8 Adani
6 Tata
4 Reliance

2
0
1 2 3 4 5 6 7 8
Scatter Plot
 It is used to express relationships between two
variables
6
5
4
Sales in
3
Crore
2 Y-Values

1
0
0 1 2 3 4
Adv Budget in 10’Lacs
Scatter Plot
 to express relationships between two variables
Scatter Plot
 Trend Lines - Correlation
No. of
Income / day 80
families
70
0-500 20
60
500-1000 30
50

No.of families
1000-1500 50 40

1500-2000 70 30

2000-2500 40 20

2500-3000 30 10

3000-3500 10 0
0 1000 2000 3000 4000
Income
. .
age (xi) x-xi (x-xi) sqr.
A 21 2 4
B 22 1 1
C 23 0 0
D 24 -1 1
E 25 -2 4
10 (sum x-xi sq)
mean x 23 Sum 0

Avg Sq (variance) 2 (10 by 5), n=5

SD (root v) s 1.41
Roll. Age
No (yr) Age Group (years) Freq. Probability
1 22 Below 20 2 2/60
2 24
20-22 28 28/60
3 23
22-24 16 16/60
4 26
24-26 10 10/60
5 19
Above 26 4 4/60
6 22
60
. .
Mean 23 (years)
. . (x-sample-known)
. . (µ-population - unknown)

. . SD 2 (years)
(s-sample-known)
. . (𝜎 – population - unknown)
60 22
A distribution in frequencies of observations is
known – probability distribution

 Z- Normal Distribution/Test - Mean (µ), SD-

 To compare means (1 or 2 means)
t – Distribution/Test- Mean (x), SD (s)
 To compare means (1 or 2 means)
 Chi Square Distribution / Test
 To compare sample SD with population SD
F Test
 To compare two sample variances
A freq. distribution with bell shape curve and
some known properties
 Parameters - Mean (µ), SD (sigma)
 Known properties
 68% values are within µ ± 1 SD
 95% values are within µ ± 2 SD
 99% values are within µ ± 3 SD

 95% CI = µ ± 2.SD (range)

 Lower limit µ - 2.SD
 Upper limit µ + 2.SD
23

21 25

19 27

17 29
Example of our case
 95% CI = µ ± 2.SD
 Lower limit = µ - 2.SD, Upper limit = µ + 2.SD,
 LL = 23 - 2.2 = 19, UL = 23 + 2.2 = 27
 95% CI Range = 19-27 years
 95% of the students in the class are in the range
of 19-27 yrs
 We are 95% confident that if we randomly select
a student from the class his/her age will be
within this range (19-27 yrs)
 Reverse is Hypothesis Testing
 If mean and SD of any population is known and if
some value is given can we determine whether it
belongs to this population or distribution ?
0

-0.5 +0.5

-1
+1

-1.5 +1.5
Finding Probability
 Calculate z score (test statistic) of the observed
value or hypothesized value with the formula
 Determine p value associated with particular z
score at selected significance level (5%)
 P value can be seen in the tables of the particular
test

When Population SD is KNOWN When Population SD is UNKNOWN

t=
 Two types of Hypothesis, Null - H0, Alternate - Ha
P Value Method Table Value Method
 Determine p value  Calculate test statistic

 Compare with selected value – TSCal

alpha level (0.05)  Determine Critical value

 p ≤ 0.05 – Reject Null of test statistic at

selected significance level
 P > 0.05 – Fail to Reject
– TSTab
null / accept null
 If TSCal ≥ TSTab – Reject
 This method is generally
Null
employed by data analysis
software – Excel, SPSS  If TSCal < TSTab – Fail to
Reject null / accept null
 This method is generally
employed when manual
testing is done
No. of Marks Specialization
Gender Caste Age
RN Mob.No. Classes Obtained Opted
G C A
N M S

1 1 1 22 9450366367 87 72 HR-3

2 1 2 24 8004896712 65 68 HR-3

3 2 1 26 9934876545 48 56 Fin.-2

4 2 1 21 2542543598 95 83 Mktg.-1

5 2 3 22 9458098734 65 58 Fin.-2

6 1 1 23 9412890112 74 65 Mktg.-1

• Mean & Variance (SD) – Eg. A, N, M – sample stat. – x, s

• Correlation Eg. N-M, A-N, A-M –r
• Association between Gender and Sp. Opted (G n S) - chi
Note Sample Ch.c – Statistic , Population Ch.c - Parameter
 Assume a population – N, µ,
 Now assume we take many samples of size n and
calculate mean for each sample
 x1, x2, x3, x4, x5, x6, . . . . . . . . x100
 Can we make a freq. distribution of these values
and draw a curve?
 Now when we draw a distribution of these values
we will have an average (x) and SD (s)
 This average is called mean of means and
considered mean of population
 The SD of population is calculated as
which is called as Standard Error
 Sample mean & their difference - z / t
 Sample correlation statistic– z / t (derived from r)
 Variance (SD2) – F
 Association – Chi Sqr.

 Central Limit Theorem

 If we collect many samples and draw its
distribution the mean of this distribution is
population mean and SD of population is
 We use CLT in Hypothesis Testing
z - when is Known and sample size is ≥ 30
 t - when is Unknown and sample size < 30
 In sample estimation t test is employed

 Example - H0 & H1
 H0 – There is no difference b/w mean of two groups
 H1 – There is a significant difference b/w mean of two groups
 H0 – There is no difference b/w mean marks of males &
females
 H1 – There is a significant difference b/w male & females
 Hypothesis Testing steps
 Set Null Value (u1=u2, u1-u2=0) – Make Null Distribution –
Calculate z /t sample test statistic – compare with table
value/set p value – reject/accept null
 Used to compare variance of two samples
 Employed in ANOVA – analysis of variance
 When there are more than two groups and their
means are to be compared
 Example
 Comparison of marks among three streams of
students arts, commerce and science
 H0 – There is no difference among mean marks of three groups
 H1 – There is a significant difference among mean marks of three
groups

 Set Null Value (µ1=µ2=µ3) – Make Null Distribution – Calculate F

test statistic – compare with table value/p value – reject/accept
null
Test of Independence

 It is used to determine association between two
categorical variables (nominal & ordinal)
 Example
 Gender (M/F) and Opted Specialization (M/F/HR)
 Question like ‘is any specialisation is preferred by
females?’ are answered
 H0 – There is no association b/w gender and opted speclisa.n
 H1 – There is a significant association b/w gender & opted
speclisa.n
 Here, mean is not calculated instead frequency of categories
is taken into consideration
 Actual Frequency and Expected Frequency
 Cross tabs are used to calculate actual & expected freq

Two Variable Interaction – Crosstab

Opted Total Gender

Specialization (60) Male (40) Female (20)
Mktg. 30 20 8
Fin. 15 10 2
HR 15 10 10
60 40 20

 Hypothesis Testing steps

 Set Null Value (actual freq. = expected freq.) – Make Null
Distribution – Calculate chi sqr. sample test statistic –
compare with table value/set p value – reject/accept null
 Set Null and Alternate Hypothesis – H0 H1
 Select the null value
 Null – status quo, no difference, no effect
 Status quo – no change
 No difference – 0 difference
 No relationship – 0 effect / 0 correlation
 No association – 0 relationship (b/w nominal variab.)
 It is assumed that H0 is true in population
 Draw Null Distribution – find range of expected values
if null is true (µ ± 2.SE)
 Take observed value from sample and compare with
expected null values
 If observed value is among expected null range –
accept null
 If observed value is different from null range – reject
null
1. Univariate/Bi-variate 2. Muti-variate

 Mean/Variance  Correlation
Estimation  Regression
 Z test  Discriminant
 T test  Cluster Analysis etc.
 Chi Square
 F Test
 Correlation
 Regression analysis
 1 dependent variable/DV (continuous)
 many independent variables/IV (continuous)
 Y = a.x1 +b.x2 +c.x3…….+.x.n

 Discriminant analysis
 1 dependent variable (categorical)
 many independent variables (continuous)
 Z (yes/no) = a.x1 +b.x2 +c.x3…….+.x.n
 Cluster analysis
 No DV/IV
 Used to group respondents/customers in
various cluster
 Employed in market segmentation

 Factor analysis
 No DV/IV
 Used to group variables in various cluster of
more condensed variables

It0089 Finalreviewer
100% (1)
It0089 Finalreviewer
143 pages
Dissertation Ordinal Logistic Regression
100% (2)
Dissertation Ordinal Logistic Regression
5 pages
Business Research Chapter 4
No ratings yet
Business Research Chapter 4
62 pages
Data Analysis Tools.
No ratings yet
Data Analysis Tools.
51 pages
Data Visualization Notes Ou
No ratings yet
Data Visualization Notes Ou
125 pages
Data Analytics 1
No ratings yet
Data Analytics 1
74 pages
Nonparametric Test
No ratings yet
Nonparametric Test
75 pages
Presentation Chapter 7 Non Par Tests of Independent Samples (Compatibility Mode)
No ratings yet
Presentation Chapter 7 Non Par Tests of Independent Samples (Compatibility Mode)
41 pages
Effectiveness of Core Strengthening Exercises To Reduce Incidence of Side Strain Injury in Medium Pace Bowlers
No ratings yet
Effectiveness of Core Strengthening Exercises To Reduce Incidence of Side Strain Injury in Medium Pace Bowlers
12 pages
College 7 - Chapter 14&16 Zonder Antwoorden - Voor Student
No ratings yet
College 7 - Chapter 14&16 Zonder Antwoorden - Voor Student
42 pages
Real Statistics Using Excel - Examples Workbook Charles Zaiontz, 9 April 2015
No ratings yet
Real Statistics Using Excel - Examples Workbook Charles Zaiontz, 9 April 2015
1,595 pages
001 Glossary
No ratings yet
001 Glossary
7 pages
Types of Data
No ratings yet
Types of Data
2 pages
Jimoh
No ratings yet
Jimoh
15 pages
Unit IV - Analytics Tasks (Students)
No ratings yet
Unit IV - Analytics Tasks (Students)
127 pages
1 Introduction To Statistics SRWM A Guide To Statistical Problems
No ratings yet
1 Introduction To Statistics SRWM A Guide To Statistical Problems
142 pages
Statistical Primer For Cardiovascular Research: Hypothesis Testing
No ratings yet
Statistical Primer For Cardiovascular Research: Hypothesis Testing
5 pages
Intro SPSS by Sherif Modified
No ratings yet
Intro SPSS by Sherif Modified
45 pages
RMPA Chapter 6
No ratings yet
RMPA Chapter 6
23 pages
Enablers and Disablers For Contactless Payment Acc
No ratings yet
Enablers and Disablers For Contactless Payment Acc
13 pages
FITRIOS
No ratings yet
FITRIOS
8 pages
Summary of Lectures
No ratings yet
Summary of Lectures
36 pages
Consumer Behaviour in Selecting Mobile Phones
No ratings yet
Consumer Behaviour in Selecting Mobile Phones
24 pages
Moonlight Restaurant BRM
No ratings yet
Moonlight Restaurant BRM
11 pages
Features of Science: Research Methods Notes
No ratings yet
Features of Science: Research Methods Notes
22 pages
Stat Prelims Pointers
No ratings yet
Stat Prelims Pointers
11 pages
Statistics
No ratings yet
Statistics
64 pages
W1 - Introduction To Statistics
No ratings yet
W1 - Introduction To Statistics
58 pages
Case 1 Digiorno Pizza: Regino, Christine May Casiano, Lyca Ann Marie (100) Neyra, Amhir (100) Velasquez, Vhinn
No ratings yet
Case 1 Digiorno Pizza: Regino, Christine May Casiano, Lyca Ann Marie (100) Neyra, Amhir (100) Velasquez, Vhinn
5 pages
Emdad Rahman
No ratings yet
Emdad Rahman
85 pages
5.basic Statistics
No ratings yet
5.basic Statistics
43 pages
Biostatistics Notes
No ratings yet
Biostatistics Notes
47 pages
Data Analysis
100% (1)
Data Analysis
34 pages
Levels of Measurement
No ratings yet
Levels of Measurement
11 pages
Nature of Inquiry and Research
No ratings yet
Nature of Inquiry and Research
22 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
29 pages
3 4 Research 8 2
No ratings yet
3 4 Research 8 2
54 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
7 pages
Not 1
No ratings yet
Not 1
8 pages
D1UA401B Research Methodology-UNIT-4 Pazhanisamy-BBA IV Semester Section19
No ratings yet
D1UA401B Research Methodology-UNIT-4 Pazhanisamy-BBA IV Semester Section19
108 pages
DV Unit 1&2 Notes
No ratings yet
DV Unit 1&2 Notes
50 pages
CH 5
No ratings yet
CH 5
26 pages
CG8 Data-Analysis
No ratings yet
CG8 Data-Analysis
63 pages
Day 7 Biostatistics
No ratings yet
Day 7 Biostatistics
44 pages
Data Analysis: Florenda F. Cabatit RN MA Facilitator
No ratings yet
Data Analysis: Florenda F. Cabatit RN MA Facilitator
44 pages
Lecture 1,2&3
No ratings yet
Lecture 1,2&3
80 pages
Main Title: Planning Data Analysis Using Statistical Data
100% (1)
Main Title: Planning Data Analysis Using Statistical Data
40 pages
Logit R101
No ratings yet
Logit R101
27 pages
Medical Statistics New
No ratings yet
Medical Statistics New
46 pages
Statistics През
No ratings yet
Statistics През
46 pages
2.educational Statistics - Learning Insights
No ratings yet
2.educational Statistics - Learning Insights
26 pages
Quantitative Methods and Business Statistics For Decision Making (MSA606)
No ratings yet
Quantitative Methods and Business Statistics For Decision Making (MSA606)
63 pages
Multiple Linear Regression: Application
No ratings yet
Multiple Linear Regression: Application
22 pages
Quantitative Research Methods
No ratings yet
Quantitative Research Methods
18 pages
Introduction To Data Analysis: Professor David Richardson IIT Stuart School of Business
No ratings yet
Introduction To Data Analysis: Professor David Richardson IIT Stuart School of Business
31 pages
Advancedstatistics 130526200328 Phpapp02
No ratings yet
Advancedstatistics 130526200328 Phpapp02
104 pages
Exploring Statistics
No ratings yet
Exploring Statistics
33 pages
Business Statistics
No ratings yet
Business Statistics
25 pages
Scales of Measurement: By-Yukti Sharma
No ratings yet
Scales of Measurement: By-Yukti Sharma
15 pages
Research Aptitude Notes Unit 2
No ratings yet
Research Aptitude Notes Unit 2
87 pages
Comparative Scale
No ratings yet
Comparative Scale
21 pages
DATA PROCESSING, ANALYSING AND INTERPRETATION Ipmi
100% (1)
DATA PROCESSING, ANALYSING AND INTERPRETATION Ipmi
120 pages
Seminar 4
No ratings yet
Seminar 4
43 pages
Bio Statistics
No ratings yet
Bio Statistics
97 pages
Chapter 7
No ratings yet
Chapter 7
28 pages
Chapter 5 Data Analysis Ab
No ratings yet
Chapter 5 Data Analysis Ab
56 pages
Introduction To R: Nihan Acar-Denizli, Pau Fonseca
No ratings yet
Introduction To R: Nihan Acar-Denizli, Pau Fonseca
50 pages
Introduction To Statistics: Prepared By: Joshua Erdy A. Tan
No ratings yet
Introduction To Statistics: Prepared By: Joshua Erdy A. Tan
29 pages
Introduction To Statistics
100% (3)
Introduction To Statistics
43 pages
Statistical Techniques For Analyzing Quantitative Data
100% (1)
Statistical Techniques For Analyzing Quantitative Data
41 pages
Data Analysis Plan Handout
No ratings yet
Data Analysis Plan Handout
15 pages
Statistics Is The Study of The Collection, Organization, Analysis, Interpretation, and
No ratings yet
Statistics Is The Study of The Collection, Organization, Analysis, Interpretation, and
18 pages
Statistical Techniques - Bda
No ratings yet
Statistical Techniques - Bda
33 pages
Levels of Measurement: Nominal
No ratings yet
Levels of Measurement: Nominal
3 pages
1ST Sem Practical Researc Ii Grade 12 Module Complete
100% (2)
1ST Sem Practical Researc Ii Grade 12 Module Complete
89 pages
Biostatistics 140127003954 Phpapp02
No ratings yet
Biostatistics 140127003954 Phpapp02
47 pages
Statistics - Basic Concepts
No ratings yet
Statistics - Basic Concepts
29 pages
Introduction Book 1
No ratings yet
Introduction Book 1
41 pages
BRM Data Analysis Techniques
No ratings yet
BRM Data Analysis Techniques
53 pages
Chap3 Test
No ratings yet
Chap3 Test
4 pages
Mean, Median, Mode and Standard Deviation
No ratings yet
Mean, Median, Mode and Standard Deviation
42 pages
Unit II: Basic Data Analytic Methods
No ratings yet
Unit II: Basic Data Analytic Methods
38 pages
WK 1b Biostat
No ratings yet
WK 1b Biostat
38 pages
Course Introduction Inferential Statistics Prof. Sandy A. Lerio
No ratings yet
Course Introduction Inferential Statistics Prof. Sandy A. Lerio
46 pages

BRM Unit 3 & 5 Data Analysis

Uploaded by

BRM Unit 3 & 5 Data Analysis

Uploaded by

Dr.

Nominal Ordinal Interval Ratio

 Data:Set of values/ observations

Ram M Hindu 60 9450366367 NIL 0 16 Mild-0 -4

Akbar M Muslim 65 8004896712 HS 16 14 Mod-1 20

Sita F Hindu 309 9934876545 Int. 19 0 Mild-0 15

Shalini F Hindu 90 2542543598 HS 8 16 Mild-0 0

Mehnaj F Sikh 38 9458098734 UG 21 13 Severe-2 0

Ravi M Hindu 48 9412890112 PG 23 20 Mod-1 -1

Hari M Hindu 45 8796654398 Prim 12 10 Mod-1 30

Age Group Male Female Total

Sales (in mn)

Avg Sq (variance) 2 (10 by 5), n=5

 Z- Normal Distribution/Test - Mean (µ), SD-

 95% CI = µ ± 2.SD (range)

When Population SD is KNOWN When Population SD is UNKNOWN

 Compare with selected value – TSCal

 p ≤ 0.05 – Reject Null of test statistic at

• Mean & Variance (SD) – Eg. A, N, M – sample stat. – x, s

 Central Limit Theorem

 Set Null Value (µ1=µ2=µ3) – Make Null Distribution – Calculate F

Two Variable Interaction – Crosstab

Opted Total Gender

 Hypothesis Testing steps

You might also like