0% found this document useful (0 votes)

38 views77 pages

FBR & IT Applications: Compiled and Presented by DR - Deepak Joshi For Academic Use Only

This document contains summaries of multiple topics including: 1) A story about Cinderella and how she was mistreated by her stepmother and stepsisters but was able to attend the ball. 2) An introduction to descriptive statistics including measures of central tendency like mean, median, and mode. 3) A discussion of additional statistical concepts like dispersion, skewness, kurtosis, and normal distribution. 4) Brief instructions on how to import an Excel file into SPSS.

Uploaded by

MuskaanKanodia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views77 pages

FBR & IT Applications: Compiled and Presented by DR - Deepak Joshi For Academic Use Only

Uploaded by

MuskaanKanodia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 77

21-11-2023

https://www.ibm.com/account/reg/in-en/signup?formid=urx-
19774

FBR & IT Applications

Compiled and Presented by Dr.Deepak Joshi for Academic Use
Only

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

1
21-11-2023

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

Story
• Once upon a time, there was a beautiful girl named Cinderella. She
was 20 years Old. She had blue eyes with long golden hairs and fair
complexion. She lived unhappily with her two stepsisters and their
mother. They treated Cinderella very badly. One day, an invitation to a
ball at the palace arrives. But Cinderella’s stepmother would not let
her go. Cinderella was made to sew two new party gowns each for
her stepmother and stepsisters, and curl their hair. They then went to
the ball, leaving Cinderella alone at home

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

2
21-11-2023

Data Visualisation

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

Introdction to Desciptive Stat: Basic Statistical Analysis

• If I want to find out how much a student of this class spends on food
monthly..?
• Typically skewed by outliers (too high/too low)
• Mean (Avg)

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

3
21-11-2023

Now Lets say this is the Food Spend

1. 4200 4200
2. 4800 4800
3. 5000 5000
4. 5200 5200
5. 5500 5500
6. 5600 5600
7. 5600 5600
8. 6000 6000
9. 60000
41900/8=Rs.5237/- 101900/9=Rs.11,322/-

• Median: Mid Point of all Data, its not skewed but rarely of use further.
• Arrange Data from least to highest (Highest – Lowest = Range)
• Middlemost if odd, if even average of two middlemost
Thus Median of Second Case is 5500 (Spend of an Average Student NOT Average Spend of a person, as we sorted the Spend First)

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

Which is the most preferred food item of this class..?

• Mode: Most Common Value of Data Set

• No with highest frequency

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

4
21-11-2023

• 5*=91 (455)
• 4*=21 (84)
• 3*=14 (42)
• 2*=08 (16)
• 1*=31 (31)
• Avg Rating=Total Rating/No of
People
• 628/165=3.8

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

5
21-11-2023

We Discussed Measures of Central Tendency

• By using Mean, Median & Mode we are trying to find one number
that minimises error and is representative of middle.
• The mean minimises the Large errors
• The median minimises the error of outliers (As correct as possible,
gives minimum summed error)
• Mode minimises odd we go wrong, gives most correct guesses

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

Most of Us Like NORMAL

Dominos Assures 30
Mints Delivery and lets
say its Standard
Deviation is 2 Mints

Compiled and Presented by Dr.Deepak Joshi for Academic Use Pic Credits:https://www.geyerinstructional.com/
Only

6
21-11-2023

• Dispersion:
• Indicates degree to which data is spreading
around an average value(CT)
• Range, Inter Qartile Range, Std Dev & Variance

• Skewness :
• Indicates Symmetry in Data
• A dataset (looked for distribution), is distributed
equally from midpoint to right and left it its
evenly distributed (Symmetric)
• +ve Score: Right handed Skewed Outliers on
Right Side hence mean on Right side, -ve Score:
Left Skewed

• Kurtosis:
• Indication of concentration around central part
and measure of data being heavily tailed or
lightly tailed as per normal distribution
• Datasets with low Kurtosis are do not
concentrate heavily around midpoint
• Normally distributed data has near 0
Skewness and near 3 Kurtosis a: Kurtosis>3, i.e Leptokurtic
b: Kurtosis=3, i.e Normal
c: Kurtosis<3, i.e Platykurtic
Compiled and Presented by Dr.Deepak Joshi for Academic Use
Only

• ND is Key to Stat/CLT: Avg Calculated from independent, identically distributed random variables have approximately Normal Distribution. (As the sample size increases tend to
follow normality ~ mean of sample=mean of population)
• Normal Distribution:
• Std Dev is 1
• Zero Skewness
• Kurtosis is 3 (Normally tailed, rather than heavily tailed or lightly tailed)
• Mean, Median & Mode at 1 point

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

7
21-11-2023

How To Import Data/Creating a Data Set

Created a dummy file

in excel, now lets
import it into SPSS

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

1. Click this to 3.Select The

open/Import a file appropriate file

2. Select Excel

4. Select Open

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

8
21-11-2023

Will automatically
read first row as
variable names

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

Output:StatisticViewer to
see Results when we
Execute Something

This is Data
Editor where
we do the work

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

9
21-11-2023

In Variable View You can

Edit Variables by clicking
2 TYPES OF VIEW on the cell
1. DATA VIEW
2. VARIABLE VIEW

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

10
21-11-2023

Click this to
open a file
OPEN OTHER TYPES OF FILE LIKE CSV ETC

Select All Files

to see all types
of files you have

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

Select The file you wish

to open and enter files
of type…this is a .CSV file
so we coose the given
option

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

11
21-11-2023

Press this to
open

See we selected
the
corrsoponding
option

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

Look at the
preview and click
next

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

12
21-11-2023

See what is
indicated: Select
Yes if Top Line
indicates
variables

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

Select Next

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

13
21-11-2023

Select Next

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

14
21-11-2023

2 TYPES OF VIEW
1. DATA VIEW (Default
what you see)
2. VARIABLE VIEW

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

CLICK: VARIABLE VIEW to

see EDIT the Variables
(Name/Label /Measure etc

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

15
21-11-2023

You can edit by clicking on

the cell

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

Frequencies Descriptive Explore

Flexible Only Basic Detailed / Split

CLICK: ANALYSE >

Descriptive
Staistics>Frequencies

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

16
21-11-2023

Click Statistics Tab and

select: quartile, Mean,
Median, Mode etc

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

Click Charts and Selcet

Histogram and check Show
Normal Curve on
Histogram

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

17
21-11-2023

Age of an
Std Deviation = Deviation from Mean Score, Average Age
Average Person
summarises continuous data only, Larger
indicate more spread of observations from CT

• One Representative Age ?

• How old is your Typical User?
• For whom this product is for?

1st, 2nd & 3rd Quartile: Indicate Spread. As they

are less impacted by Outlieres.
Inter Quartile Range: Q3-Q1, Indicates
differences in Extremes

• Mim, Max, and Qartile

indicates Distribution
• Q1=25% are below 24
• Q2=50% are below 26
• Q2=75% are below 33
Compiled and Presented by Dr.Deepak Joshi for Academic Use
Business
Only Case: Mean or Median Depends on the Context

Histogram: Frequency of a Variable, See Skewness

from the Histogram
However
I Could have got a idea it from Mean and Max and
Min Value

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Median 50% Data Right 50% Left, from Table also (Median, Max,Min)
Only one can make out data pushed to right or simply SKWED to Right

18
21-11-2023

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

Case
The market research team at AdRight is assigned the task to identify the profile of the typical customer for each treadmill
product offered by CardioGood Fitness.
The market research team decides to investigate whether there are differences across the product lines with respect to
customer characteristics. The team decides to collect data on individuals who purchased a treadmill at a CardioGoodFitness
retail store during the prior three months. The data are stored in the CardioGoodFitness.csv file.
The team identifies the following customer variables to study:
• product purchased, TM195, TM498, or TM798;
• gender;
• age, in years;
• education, in years;
• relationship status, single or partnered; annual household income ($);
• average number of times the customer plans to use the treadmill each week;
• average number of miles the customer expects to walk/run each week;
• and self-rated fitness on an 1-to-5 scale, where 1 is poor shape and 5 is excellent shape.

Perform descriptive analytics to create a customer profile for each CardioGood Fitness treadmill product line.
Compiled and Presented by Dr.Deepak Joshi for Academic Use
Only

19
21-11-2023

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

20
21-11-2023

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

21
21-11-2023

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

The 'stem' on the left displays the first digit or digits.

The 'leaf' on the right and displays the last digit/s
For Age TM195
18, 19, 19, 19 – (4 Freq), indicates spread of data
around a point here most of the Data is spread
around

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

22
21-11-2023

Can I Put all

variable in
Dependent
List

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Dependent will not Accept Strings Only

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Dependent will not Accept Strings: So we can change the strings
Only to Numerical by coding Male =1, Female = 0

23
21-11-2023

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

24
21-11-2023

Input Old and

New Values
and Click Add

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

When you
have added
all values for
that click
Continue

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

25
21-11-2023

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

After
Continue you
shall land
here. Finally
Click OK

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

26
21-11-2023

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

27
21-11-2023

Probability Distribution & Hypothesis Testing

Compiled and Presented by Dr.Deepak Joshi for Academic Use
Only

Hypothesis
• Conjecture about a population
• A statement about a population parameter
• A premise or claim that one wants to test

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

28
21-11-2023

Null Hypothesis: There is no relationship/no effect between two variables

Babies show no preference Kids behaviour is not Age has no effect on

for food on the basis of affected by the type of learning ability
Color show
Compiled and Presented by Dr.Deepak Joshi for Academic Use
Only

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

29
21-11-2023

Hypothesis & Test….?

Propose a Form a Test (Data /

Explore Report
Question Hypothesis Experiment)

Analyse
• Accept
• Reject

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

30
21-11-2023

Sample Individuals
With Characterstics
- Age
- Gender
Population - Color
- Region

Variation & Uncertainty

Compiled and Presented by Dr.Deepak Joshi for Academic Use
Only

With Variance we reach to a concept

• Variance = (𝜎)2
• Sdt deviation (𝜎) = 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒
• Sdt deviation (𝜎) is amount of variation or
dispersion from mean value
• High means values are spreading out largely from
mean
• 𝜎= σ𝑛𝑖=0 𝑥𝑖 − 𝜇 2/𝑛
• µ=Polulation mean, if Sample Mean then n-1 for
division.
• Age: 3, 6, 7, 9, 10 Mean: (3+6+7+9+10)/5=35/5=7
• 𝜎=
{(7−3)2 + (7−6)2 + (7−7)2 + (7−9)2 + (7−10)2} / 5
Dominos Assures 30
• 𝜎 = 6 = 2.449 Mints Delivery and lets
say its Standard
Compiled and Presented by Dr.Deepak Joshi for Academic Use
Only
Deviation is 2 Mints

31
21-11-2023

Uncertainty Leads
• α (Alpha) = Significance Level = the probability of rejecting the null
hypothesis when it was in fact true.
• Often 5% (5 times out of 100 we shall be wrong in rejecting)
• P Value is Calculated Probability, the probability in the tail beyond the
sample mean assuming that the null hypothesis is correct
• Calculation might differ based on technique but interpretation is same (the
probability of obtaining your sample data, IF the null hypothesis is true,
thus)
• P Value > .05 (α) We accept Null Hypothesis (We want stronger evidence to support)
• P Value < .05 (α) We reject Null Hypothesis
• Confidence Ievel + Alpha = 1

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

• α (Alpha) = Significance Level = the probability of rejecting the null

hypothesis when it was in fact true.
• Often 5% (5 times out of 100 we shall be wrong in rejecting) if it increases
by 5% We shall not reject it.
• P Value is Calculated Probability, the probability in the tail beyond the
sample mean assuming that the null hypothesis is correct (the probability
of obtaining test results at least as extreme as actual, under assumption Ho
is correct)
• Significance Level is the Threshold Value to see if P Value is Low enough to
reject null hypothesis. (Smaller p Value Strong evidence against Null)
• If P Value is < α (Alpha): Result is significant & H0 Rejected
• Ho=Property/Parameter in Population is Zero / Does NOT exist

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

32
21-11-2023

Chi-Square
• Test for Independence / Pearson's Chi-square: Test of Association
Between 2 Categorical Variables
• Discovers if there is a relationship b/w 2 categorical variables
• 2 Categorical Variables Like: Gender, Areas, Profession, education level etc
• Is Gender associated with Shopping Frequency defined as High & Low
• Is gender associated with preferred buying mode (Online Physical)
• Young, Old are likely to vote equally for BJP/CONGRESS ETC

• We can reject null hypothesis of no relationship/association at the .o5 Level

• We have in sufficient evidence to reject null hypothesis (No Significant
Association) at the .05 level=0 There was a significant association between A
and B (Χ2(degree of freedom) = Value, p < .05) if p=0 the p<.001
Compiled and Presented by Dr.Deepak Joshi for Academic Use
Only

Refer Covid Data Set

• H0=There is No Relationship between Buying Mode (Online Vs
Offline) and Safety Concern

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

33
21-11-2023

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

34
21-11-2023

In Statsitcs,
see Chisquare
is Selecetd or
Not

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

In Cells, Don’t
Forget to
check
Expected

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

35
21-11-2023

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

36
21-11-2023

As the above is a chisquare tables of 2x5, and not more than 20% of all the cells have an expected count of
less than 5 (Yates, Moore & McCabe, 19999, p.734) and x2 (4) = 18.99, p=.001, hence considering the above
result we reject null hypothesis and it is concluded that the relationship between mode of buying and safety
concern is statistically significant.

Thus based on the above hypothesis, it is established that people are preferring buying cosmetics online as
they consider it safer than the online mode

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

Z & T Test
• Both compares 2 population means (same or different)
• Z when the population parameter (variances/sd) are known and the
sample size is large.
• T Test when population parameter are Unknown/sample size is less
(30)
• Z (Z score) indicates how many std dev above or below the population
mean the score calculated form Z test is.
• Z score=(x~- 𝜇)/ 𝜎 (x~=sample mean, 𝜇=Population Mean, 𝜎 =Std Dev)

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

37
21-11-2023

T Test
• 1 sample t-test: Compares mean of a single group against a known mean.
• A College may claim that the average Income of his entire Batch 2020 is Rs.50000 or
more than it.
• A School may claim that all his students have an above average IQ.
• Independent samples t-test: Compares mean for two groups
• MFM-D/MFM-C w.r.t Salary
• Type of Exercise (A/B) w.r.t BP Level
• Men & Women w.r.t Shopping Time
• Paired sample t-test: Compares means from the same group at different
times
• Spend on Medicine Pre Covid Vs Spend on Medicine Post Covid
• A Specific Training improved the Running Time of Runners (Pre Run Post Run Time)

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

1 Sample T Test (The Mean Air Quality of the City is 340)

1. Your dependent variable should be measured on a continuous scale.
(Ratio & Interval Scale)
2. The data is independent to eachother.
3. There should be no significant outliers.
4. The dependent variable should be approximately normally distributed for
each group of the independent variables.

NULL HYPO = THERE IS NO DIFFERENCE BETWEEN THE TRUE MEAN AND THE
COMPARED MEAN
There is no difference between sample mean and the normal population
mean
Compiled and Presented by Dr.Deepak Joshi for Academic Use
Only

38
21-11-2023

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

39
21-11-2023

Put the Value you

want to test here

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

Click Option if want to

Check Confidence
Interval (by default
95%)

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

40
21-11-2023

Mean Air Quality Score (M = 332.577, SD = 0.53) was lower

than the comparison score of 340.

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

Independent S T Test: Requirement for

reliable results
1. Your dependent variable should be measured on a continuous scale.
(Ratio & Interval Scale)
2. Your independent variable should consist of two categorical,
independent groups.
3. You should have independence of observations. (No Participant
common)
4. There should be no significant outliers.
5. Your dependent variable should be approximately normally distributed
for each group of the independent variables.
6. There needs to be homogeneity of variances. (Lavenes Test)
NULL: THERE IS NO DIFFERENCE IN THE MEAN INCOME LEVEL BASED ON THE
TRAINING PROGRAMME (TWO GROUPS)
Compiled and Presented by Dr.Deepak Joshi for Academic Use
Only

41
21-11-2023

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

42
21-11-2023

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

Click on Define Groups to define:

Here we have 2 groups as A and
B, indicate them , we may have 3
groups at time like Education as
UG, PG and XII in that case if we
want to compare UG with XII than
we shall put accordingly

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

43
21-11-2023

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

44
21-11-2023

According to this Batch B participants had statistically

significant lower Income (27833 ± 4244) at the end of training
programme compared to Batch A (32272 ± 4244), t(21) =
2.27, p=.034.

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

Paired S T Test: Requirement for reliable

results
1: Your dependent variable should be measured on a continuous scale.
(Ratio & Interval Scale)
2: Your independent variable should consist of two categorical,
matched groups (Same Subject).
3: There should be no significant outliers.
4: The distribution of the differences in the dependent
variable between the two related groups should be approximately
normally distributed.
NULL: NO Significant Difference in Expenditure Pre and Post Covid

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

45
21-11-2023

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

Click Option if want to

Check Confidence
Interval (by default
95%)

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

46
21-11-2023

There was a statistically significant increase in

Expenditure after the Covid 19 from 5.927 ± 1.45 to
6.395 ± std dev at, t(21) = -2.185, p = .040

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

Excel Z….Data> Dat Analysis

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

47
21-11-2023

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

48
21-11-2023

t-Test: Paired Two Sample for Means

Variable 1 Variable 2
Mean 5.927273 6.395454545
Variance 2.107792 1.857597403
Observations 22 22
Pearson Correlation 0.746811

Hypothesized Mean Difference 0

df 21
t Stat -2.18519
P(T<=t) one-tail 0.020175
t Critical one-tail 1.720743
P(T<=t) two-tail 0.04035
t Critical two-tail 2.079614

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

Anova one Way: (Independent T Test + More

Groups than 2)
• Independent samples t-test: Compares mean for two or more groups
• MFM (MFM-D/MFM-C/MFM H) w.r.t Salary
• Type of Exercise (A/B/V) w.r.t BP Level
• Age* (Young Middle and Old) w.r.t Shopping Time
*Converted Continuous into Categorical.
• Whether Salary Expected differed based on the Campuses among Students
• Null: There is no significant Difference in Salary Exp based on Campuses
• Whether Exam Performance differed based on Time Spent on Self Study of
Students
• Null: There is no significant Difference in Exam Performance based on Time Spebt
ANOVA DOES NOT CONFIRMS?TELLS WHICH GROUPS ARE DIFFERENT FROM EACH OTHER BUT TELLS AT
LEAST TWO DIFFER Compiled and Presented by Dr.Deepak Joshi for Academic Use
- FOR THAT WE DO POST HOC TEST Only

49
21-11-2023

ANOVA:Requirement for reliable results

1. Your dependent variable should be measured on a continuous scale.
(Ratio & Interval Scale)
2. Your independent variable should consist of two or more categorical,
independent groups.
3. You should have independence of observations. (No Participant
common)
4. There should be no significant outliers.
5. Your dependent variable should be approximately normally distributed
for each group of the independent variables.
6. There needs to be homogeneity of variances.
NULL: THERE IS NO DIFFERENCE IN THE MEAN INCOME LEVEL BASED ON THE
TRAINING PROGRAMME (THREE or MORE GROUPS)
Compiled and Presented by Dr.Deepak Joshi for Academic Use
Only

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

50
21-11-2023

WHY THIS WHAT 2 DO

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

RECODING OF DATA-NOT IN THE

PERVIEW BUT WE NEED TO LEARN

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

51
21-11-2023

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

52
21-11-2023

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

53
21-11-2023

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

After Entering the

Name and Label
of New Variable
Click on
Change……
Then we shall
move to Old and
New Variable
Value

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

54
21-11-2023

For Anova in
New Value we
shall put
Numerics i.e
1, 2, 3 etc..We
Otherwise we
can give
Strings value
also like
E..English etc
of any
character
width as
required…But
for that we
shall have to
check this

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

When
you have
added all
values for
Compiled and Presented by Dr.Deepak Joshi for Academic Use that click
Only Continue

55
21-11-2023

Click OK
Finally

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

Now we get a new

Variable here called
SPECIALISATION.
Just Remember
what we coded
I,2,3 for which
value

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

56
21-11-2023

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

57
21-11-2023

Click on Post Hoc

and Check Tukey
and then continue

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

Click on Option
and Check
Descriptives

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

58
21-11-2023

AT SIG .041,
THERE IS A STATISTICALLY
SIGNIFICANT DIFFERENCE
BETWEEN THE GROUPS…
BUT
WHICH OF THESE GROUP
ARE DIFFENT CANT BE
CONFIRMED.
FOR THAT WE
HAVE….?

F STATISTICS INDICATES THE

ANOVA SIG

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

Multiple Comparisons
shall indicate which
group are different
from each other….
THUS TUKEYS POST
HOC (Others as well) IS
THE BEST WAY FOR
MULTIPLE
COMPARISONS
NON SIGNIFICANT
B/W 3 and 2

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

59
21-11-2023

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

60
21-11-2023

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

61
21-11-2023

COORELATION IS
Pic courtesy: www.pinterest.com/pin/179440366372836984/

NOT CAUSATION

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

COORELATION
• Association between two variables
• Checks if one moves with the other
• Movement direction & strength varies
• Experience in Years and Salary
• Height & eight of kids
• Supply & Price

• Pearson's Product Momentum correlation coefficient = Pearson's

Correlation ‘r ‘ measures that direction and strength
Compiled and Presented by Dr.Deepak Joshi for Academic Use
Only

62
21-11-2023

COORELATION

1. Draws a line between 2 variables to see the best fit : (Continuous)

2. There should be linear relationship b/w the two
3. NO Significant Outliers
4. Both Data Approximately Normal
-1 to 1

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

COORELATION
• Relationship between Mother’s Height and Babies
• Relationship between Mother’s Weight and Babies

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

63
21-11-2023

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

Click on Options and

Check if you need
Descriptive analysis
Compiled and Presented by Dr.Deepak Joshi for Academic Use
Only

64
21-11-2023

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Pic Courtsey: blog.vantagecircle.com/job-satisfaction/ & www.adventure-in-a-box.com/
Only

65
21-11-2023

Factor Analysis
• Many Variable to Fewer Factors
• Many Observed correlated variables to Latent Variables
• Types
• Exploratory Factor Analysis
• Confirmatory Factory Analysis
• Method
• Principal Component Analysis (Max Variance put into 1st factor): Most
Common Used
• Other Methods like Common Factor Analysis (Finds Common Variation to put
into factors) which are less commonly used
Compiled and Presented by Dr.Deepak Joshi for Academic Use
Only

Assumptions
• All Variables Should be continuous (Mostly Ordinal are also used but
they should be equidistant, like Likert Scale)
• Sampling Adequacy: Large Sample 10 times the items (KMO at least
.5)
• Adequate correlation b/w variables. (Barlet Test of Sp,)
• No significant outiers

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

66
21-11-2023

Some IMP Keywords

• Eigen Values: Variance explained by a particular factor
• Select components whose Eigenvalue is at least 1.
• No of Components Derived = No of Underlying Factors
• Communalities: Variance a variable shares with all the other variables
being considered. This is also the proportion of variance explained by
the by our underlying factors.
• Value of r square shall indicate the variance
• Factor loadings. Factor loadings are simple correlations between the
variables and the factors, indicated in Component Matrix.
• Component Matrix: Given as idea which variables measure which
underlying concept or factor
Compiled and Presented by Dr.Deepak Joshi for Academic Use
Only

PROBLEM: IPL VIEWERSHIP DECLINING

• A Questionnaire Introduced to the Audience
• Now We want to find out, if some of the variables indicating the some
underlying concept. Factor Analysis (Some times it might be known
some time not)
• Use Fan Satisfaction File

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

67
21-11-2023

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

68
21-11-2023

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

Extraction: Principal Component

Most Preferred….Common Factor
Analysis by Principal Axis Factoring

Eigen Values More than 1, or if we

have already decided 2 /3/4
different concepts to see
than fixed no of factors

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

69
21-11-2023

In Rotation: Varimax
Rotation so that there is no
repetition of variables in
Component Matrix..

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

In Options:

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

70
21-11-2023

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

Tell How Many

Factors
Extracted..?

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

71
21-11-2023

.7 / .4

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

• You May like to give Names to

these 3 Underlying Concepts

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

72
21-11-2023

Condition of
stadium
Outer appearance Perception of Stadium
of stadium
Interior design
of stadium

Entry Price

Price of season
ticket Value

No of Star Players

Qlty of Team Comp Team Potential

Compiled and Presented by Dr.Deepak Joshi for Academic Use
Only

Regression - Definition

A statistical technique that attempts to determine the existence of a

possible relationship between one dependent variable (usually denoted
by Y) and a collection of Independent variables.

Regression is used for generating new hypothesis and for validating a

hypothesis

The Mothers Height has no effect on Babies Length

Remember “There is no relationship between Mothers Height and Babies

Height”

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

73
21-11-2023

Regression model establishes existence of association

between two variables, but not causation.

y = mx + c
y = c + mx

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

Conditions: Linear Regression

• Your2 variables should be measured on a continuous scale.

• You should have independence of observations (i.e., independence of residuals),.- Auto Correlation (Specifically in
time series). It decreases the p Value Dubrin-Watson Stat: (0-4)
• The regression model is linear in parameter.
• Your data needs to show homoscedasticity. PP-Plot ( variances along the line should remain similar as you move
along)
• There should be no significant outliers.
• The residuals (errors) are approximately normally distributed (PP Plot)

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

74
21-11-2023

Click on Analyse
then Regression
then Linear

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

Select Dependent
& Independent
accordingly Click Statistics and Select,
Rsuared Change, Durbin
Watson etc as per
requirement

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

75
21-11-2023

Compiled and Presented by Dr.Deepak Joshi for Academic Use

Only

Multiple Linear Regression

• Multiple linear regression means linear in regression parameters

(beta values). The following are examples of multiple linear
regression:

Y =  0 + 1 x1 +  2 x2 + ... +  k xk + 
y = mx + c
y = c + m1x1 + m 2 x 2 + err

An important task in multiple regression is to estimate the beta values

(1, 2, 3 etc…)
Compiled and Presented by Dr.Deepak Joshi for Academic Use
Only

76
21-11-2023

Conditions: Multiple Linear Regression

The assumptions that are made in multiple linear regression model are as follows:

• Assumption #1: Your dependent variable should be measured on a continuous scale.

• Assumption #2: You have two or more independent variables, which can be either continuous (i.e.,
an interval or ratio variable) or categorical (i.e., an ordinal or nominal variable).
• Assumption #3: You should have independence of observations (i.e., independence of residuals),.- Auto
Correlation (Specifically in time series). It decreases the p Value Dubrin-Watson Stat: (0-4)
• Assumption #4: The regression model is linear in parameter.
• Assumption #5: Your data needs to show homoscedasticity. PP-Plot
• Assumption #6: Your data must not show multicollinearity, which occurs when you have two or more independent
variables that are highly correlated with each other. VIF Factor (Less than 7 some suggest 10)
• Assumption #7: There should be no significant outliers, high leverage points or highly influential points.
• Assumption #8: Finally, you need to check that the residuals (errors) are approximately normally distributed
• 1, 2 Should be checked before Testing, 2-8 can be inferred from SPSS Statstics
Compiled and Presented by Dr.Deepak Joshi for Academic Use
Only

Business Analytics: Describing The Distribution of A Single Variable
No ratings yet
Business Analytics: Describing The Distribution of A Single Variable
58 pages
Consolidated ProfGarg
No ratings yet
Consolidated ProfGarg
283 pages
Lecture Week 2 Statistics
No ratings yet
Lecture Week 2 Statistics
57 pages
A28 Aa-Statisitcs N SPSS N Presentation-Writing
No ratings yet
A28 Aa-Statisitcs N SPSS N Presentation-Writing
49 pages
SCA - Module 4
No ratings yet
SCA - Module 4
49 pages
Statistics For Business Analytics (SBA)
No ratings yet
Statistics For Business Analytics (SBA)
38 pages
Fundamentals of Statistics With MS Excel
No ratings yet
Fundamentals of Statistics With MS Excel
83 pages
L2-Types of Data, Central Tendency and Dispersion-2
No ratings yet
L2-Types of Data, Central Tendency and Dispersion-2
81 pages
Descriptive Statistics SV
No ratings yet
Descriptive Statistics SV
77 pages
Stats Lecture 1
No ratings yet
Stats Lecture 1
45 pages
Bản Sao Của Chapter1 - Introduction - S
No ratings yet
Bản Sao Của Chapter1 - Introduction - S
92 pages
1-Descriptive Statistics
No ratings yet
1-Descriptive Statistics
44 pages
Week 8 Quantitative Data Analysis - Descriptive Statistics
No ratings yet
Week 8 Quantitative Data Analysis - Descriptive Statistics
59 pages
Unit 1 Computational Statistics
No ratings yet
Unit 1 Computational Statistics
58 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
73 pages
Ms Data Science S, 24 (WEEK# 1) Unlock
No ratings yet
Ms Data Science S, 24 (WEEK# 1) Unlock
31 pages
Ms Data Science S, 24 (WEEK# 1)
No ratings yet
Ms Data Science S, 24 (WEEK# 1)
30 pages
Statistics For Data Science 1
No ratings yet
Statistics For Data Science 1
65 pages
imageRUNNER ADVANCE 6500 III Series - Partscatalog - E - EUR
0% (1)
imageRUNNER ADVANCE 6500 III Series - Partscatalog - E - EUR
261 pages
DOM503 Session 1
No ratings yet
DOM503 Session 1
19 pages
Descriptive Analytics
No ratings yet
Descriptive Analytics
42 pages
Lecture 1
No ratings yet
Lecture 1
36 pages
Chapter 4 - Descriptive Statistical Measures
No ratings yet
Chapter 4 - Descriptive Statistical Measures
63 pages
Evans Analytics2e PPT 04 Revised
No ratings yet
Evans Analytics2e PPT 04 Revised
51 pages
Data science-Unit-3-Complete
No ratings yet
Data science-Unit-3-Complete
33 pages
Desc. Stat
No ratings yet
Desc. Stat
41 pages
E-Note 33325 Content Document 20250319114322AM
No ratings yet
E-Note 33325 Content Document 20250319114322AM
69 pages
Intro To Statistics and Assignments
No ratings yet
Intro To Statistics and Assignments
12 pages
Stats and Its Real World Applications.
No ratings yet
Stats and Its Real World Applications.
53 pages
Final - Dabm Lab Manual Dmice
No ratings yet
Final - Dabm Lab Manual Dmice
49 pages
Lecture 3
No ratings yet
Lecture 3
14 pages
Quantitative AnalysisJD
No ratings yet
Quantitative AnalysisJD
64 pages
Stats For Data Analytics
No ratings yet
Stats For Data Analytics
87 pages
RM EBBA Class 8 CH0 11 Quatitative Analysis
No ratings yet
RM EBBA Class 8 CH0 11 Quatitative Analysis
37 pages
Maths
No ratings yet
Maths
30 pages
Unit 2 - Descriptive Analytics
No ratings yet
Unit 2 - Descriptive Analytics
74 pages
1-Descriptive Statistics
No ratings yet
1-Descriptive Statistics
44 pages
Intro SRM
No ratings yet
Intro SRM
73 pages
Discriptive Statistics
No ratings yet
Discriptive Statistics
23 pages
Module I. Basic Calculations. Average, Standard Deviation by Excel
No ratings yet
Module I. Basic Calculations. Average, Standard Deviation by Excel
48 pages
Business Analytics
No ratings yet
Business Analytics
44 pages
Descriptive Stastistics
No ratings yet
Descriptive Stastistics
10 pages
2024 - Lecture 9 Statistical Analysis 2 - Tagged
No ratings yet
2024 - Lecture 9 Statistical Analysis 2 - Tagged
37 pages
Evans Analytics2e PPT 04
No ratings yet
Evans Analytics2e PPT 04
63 pages
BNM854 Descriptive Statistics Intro
No ratings yet
BNM854 Descriptive Statistics Intro
9 pages
Dsbda Unit 2
No ratings yet
Dsbda Unit 2
155 pages
Statistics Refresher
No ratings yet
Statistics Refresher
11 pages
Descriptive Analytics
No ratings yet
Descriptive Analytics
6 pages
Chapter 4 Fin534
No ratings yet
Chapter 4 Fin534
38 pages
Data Management
No ratings yet
Data Management
48 pages
Midterms Gec Math Adooooor
No ratings yet
Midterms Gec Math Adooooor
6 pages
Day 01-Basic Statistics
No ratings yet
Day 01-Basic Statistics
36 pages
Wa Nko Nalipay PR
No ratings yet
Wa Nko Nalipay PR
12 pages
Business Statistics SIM Semester 1 2019: Welcome - Lecture 1: Ms. Kathryn Bendell Email
No ratings yet
Business Statistics SIM Semester 1 2019: Welcome - Lecture 1: Ms. Kathryn Bendell Email
38 pages
EFFECTS OF ADDITION OF POLES AND ZEROS IN ROOT LOCUS SP
No ratings yet
EFFECTS OF ADDITION OF POLES AND ZEROS IN ROOT LOCUS SP
6 pages
Oup 8
No ratings yet
Oup 8
36 pages
Ge8 Statistics
No ratings yet
Ge8 Statistics
2 pages
Basics of Biostatistics: DR Sumanth MM
No ratings yet
Basics of Biostatistics: DR Sumanth MM
27 pages
Machine Design-Ii: Gears
100% (1)
Machine Design-Ii: Gears
50 pages
Business Statistics SIM Semester 1 2019: Welcome - Lecture 1: Ms. Kathryn Bendell Email
No ratings yet
Business Statistics SIM Semester 1 2019: Welcome - Lecture 1: Ms. Kathryn Bendell Email
38 pages
Recirculation Pump Sizing
No ratings yet
Recirculation Pump Sizing
3 pages
Bingham Yield Slurry
No ratings yet
Bingham Yield Slurry
124 pages
Autodesk Nastran User's Manual 2018
100% (1)
Autodesk Nastran User's Manual 2018
629 pages
Geometry Formula Sheet 2D Shapes For 11 Plus Exam GSD
No ratings yet
Geometry Formula Sheet 2D Shapes For 11 Plus Exam GSD
1 page
Excavador 330 BL Shematic System Electrical
No ratings yet
Excavador 330 BL Shematic System Electrical
11 pages
Stock Market Prediction Using MLP and Random Forest
No ratings yet
Stock Market Prediction Using MLP and Random Forest
18 pages
J.E. Maintenance Manual 2011 07
No ratings yet
J.E. Maintenance Manual 2011 07
8 pages
mnl100 Interface Manual
No ratings yet
mnl100 Interface Manual
40 pages
MQL Algorithmic Code Programming
No ratings yet
MQL Algorithmic Code Programming
63 pages
6 2 Reflections (Day 1) Lesson Plan
No ratings yet
6 2 Reflections (Day 1) Lesson Plan
3 pages
Likert Scales, Levels of Measurement and The 'Laws' of Statistics PDF
No ratings yet
Likert Scales, Levels of Measurement and The 'Laws' of Statistics PDF
8 pages
Oxymag English
No ratings yet
Oxymag English
40 pages
Finance and Costing
No ratings yet
Finance and Costing
32 pages
Determining The Amount of Acetic Acid in Vingar
No ratings yet
Determining The Amount of Acetic Acid in Vingar
13 pages
Infire HTC Speed Operating Instruction
No ratings yet
Infire HTC Speed Operating Instruction
56 pages
Sara Abid
No ratings yet
Sara Abid
66 pages
Repricing Algorithms in E-Commerce: July 2016
No ratings yet
Repricing Algorithms in E-Commerce: July 2016
38 pages
MSA - Manual
No ratings yet
MSA - Manual
18 pages
Chapters 3 To 7 Study Guide
No ratings yet
Chapters 3 To 7 Study Guide
38 pages
Biostatistics Classes PDF
No ratings yet
Biostatistics Classes PDF
156 pages
Forms of EMH
No ratings yet
Forms of EMH
33 pages
Job Order Costiing
No ratings yet
Job Order Costiing
14 pages
1995 Gom Amberjack Field Case History
No ratings yet
1995 Gom Amberjack Field Case History
10 pages
Muskaan Kanodia - Assignment - Trend Typologies
No ratings yet
Muskaan Kanodia - Assignment - Trend Typologies
5 pages
Atg - Format
No ratings yet
Atg - Format
8 pages
Theory of Automata Assignment
No ratings yet
Theory of Automata Assignment
4 pages
Assignment #2: Programming Fundamentals
No ratings yet
Assignment #2: Programming Fundamentals
7 pages
Liebert Apm 30 600 KW Brochure English
No ratings yet
Liebert Apm 30 600 KW Brochure English
8 pages
DSK Audio4 Reva 1
No ratings yet
DSK Audio4 Reva 1
15 pages
Math 102 Midterms Reviewer (With Mock Tests)
No ratings yet
Math 102 Midterms Reviewer (With Mock Tests)
3 pages
16 - 1 - Part I Angular Momentum - Explicit Solutions
No ratings yet
16 - 1 - Part I Angular Momentum - Explicit Solutions
4 pages