0% found this document useful (0 votes)
38 views77 pages

FBR & IT Applications: Compiled and Presented by DR - Deepak Joshi For Academic Use Only

This document contains summaries of multiple topics including: 1) A story about Cinderella and how she was mistreated by her stepmother and stepsisters but was able to attend the ball. 2) An introduction to descriptive statistics including measures of central tendency like mean, median, and mode. 3) A discussion of additional statistical concepts like dispersion, skewness, kurtosis, and normal distribution. 4) Brief instructions on how to import an Excel file into SPSS.

Uploaded by

MuskaanKanodia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views77 pages

FBR & IT Applications: Compiled and Presented by DR - Deepak Joshi For Academic Use Only

This document contains summaries of multiple topics including: 1) A story about Cinderella and how she was mistreated by her stepmother and stepsisters but was able to attend the ball. 2) An introduction to descriptive statistics including measures of central tendency like mean, median, and mode. 3) A discussion of additional statistical concepts like dispersion, skewness, kurtosis, and normal distribution. 4) Brief instructions on how to import an Excel file into SPSS.

Uploaded by

MuskaanKanodia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 77

21-11-2023

https://www.ibm.com/account/reg/in-en/signup?formid=urx-
19774

FBR & IT Applications


Compiled and Presented by Dr.Deepak Joshi for Academic Use
Only

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

1
21-11-2023

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

Story
• Once upon a time, there was a beautiful girl named Cinderella. She
was 20 years Old. She had blue eyes with long golden hairs and fair
complexion. She lived unhappily with her two stepsisters and their
mother. They treated Cinderella very badly. One day, an invitation to a
ball at the palace arrives. But Cinderella’s stepmother would not let
her go. Cinderella was made to sew two new party gowns each for
her stepmother and stepsisters, and curl their hair. They then went to
the ball, leaving Cinderella alone at home

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

2
21-11-2023

Data Visualisation

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

Introdction to Desciptive Stat: Basic Statistical Analysis


• If I want to find out how much a student of this class spends on food
monthly..?
• Typically skewed by outliers (too high/too low)
• Mean (Avg)

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

3
21-11-2023

Now Lets say this is the Food Spend


1. 4200 4200
2. 4800 4800
3. 5000 5000
4. 5200 5200
5. 5500 5500
6. 5600 5600
7. 5600 5600
8. 6000 6000
9. 60000
41900/8=Rs.5237/- 101900/9=Rs.11,322/-

• Median: Mid Point of all Data, its not skewed but rarely of use further.
• Arrange Data from least to highest (Highest – Lowest = Range)
• Middlemost if odd, if even average of two middlemost
Thus Median of Second Case is 5500 (Spend of an Average Student NOT Average Spend of a person, as we sorted the Spend First)

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

Which is the most preferred food item of this class..?

• Mode: Most Common Value of Data Set


• No with highest frequency

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

4
21-11-2023

• 5*=91 (455)
• 4*=21 (84)
• 3*=14 (42)
• 2*=08 (16)
• 1*=31 (31)
• Avg Rating=Total Rating/No of
People
• 628/165=3.8

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

5
21-11-2023

We Discussed Measures of Central Tendency


• By using Mean, Median & Mode we are trying to find one number
that minimises error and is representative of middle.
• The mean minimises the Large errors
• The median minimises the error of outliers (As correct as possible,
gives minimum summed error)
• Mode minimises odd we go wrong, gives most correct guesses

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

Most of Us Like NORMAL


Dominos Assures 30
Mints Delivery and lets
say its Standard
Deviation is 2 Mints

Compiled and Presented by Dr.Deepak Joshi for Academic Use Pic Credits:https://www.geyerinstructional.com/
Only

6
21-11-2023

• Dispersion:
• Indicates degree to which data is spreading
around an average value(CT)
• Range, Inter Qartile Range, Std Dev & Variance

• Skewness :
• Indicates Symmetry in Data
• A dataset (looked for distribution), is distributed
equally from midpoint to right and left it its
evenly distributed (Symmetric)
• +ve Score: Right handed Skewed Outliers on
Right Side hence mean on Right side, -ve Score:
Left Skewed

• Kurtosis:
• Indication of concentration around central part
and measure of data being heavily tailed or
lightly tailed as per normal distribution
• Datasets with low Kurtosis are do not
concentrate heavily around midpoint
• Normally distributed data has near 0
Skewness and near 3 Kurtosis a: Kurtosis>3, i.e Leptokurtic
b: Kurtosis=3, i.e Normal
c: Kurtosis<3, i.e Platykurtic
Compiled and Presented by Dr.Deepak Joshi for Academic Use
Only

• ND is Key to Stat/CLT: Avg Calculated from independent, identically distributed random variables have approximately Normal Distribution. (As the sample size increases tend to
follow normality ~ mean of sample=mean of population)
• Normal Distribution:
• Std Dev is 1
• Zero Skewness
• Kurtosis is 3 (Normally tailed, rather than heavily tailed or lightly tailed)
• Mean, Median & Mode at 1 point

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

7
21-11-2023

How To Import Data/Creating a Data Set

Created a dummy file


in excel, now lets
import it into SPSS

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

1. Click this to 3.Select The


open/Import a file appropriate file

2. Select Excel

4. Select Open

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

8
21-11-2023

Will automatically
read first row as
variable names

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

Output:StatisticViewer to
see Results when we
Execute Something

This is Data
Editor where
we do the work

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

9
21-11-2023

In Variable View You can


Edit Variables by clicking
2 TYPES OF VIEW on the cell
1. DATA VIEW
2. VARIABLE VIEW

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

10
21-11-2023

Click this to
open a file
OPEN OTHER TYPES OF FILE LIKE CSV ETC

Select All Files


to see all types
of files you have

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

Select The file you wish


to open and enter files
of type…this is a .CSV file
so we coose the given
option

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

11
21-11-2023

Press this to
open

See we selected
the
corrsoponding
option

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

Look at the
preview and click
next

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

12
21-11-2023

See what is
indicated: Select
Yes if Top Line
indicates
variables

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

Select Next

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

13
21-11-2023

Select Next

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

14
21-11-2023

2 TYPES OF VIEW
1. DATA VIEW (Default
what you see)
2. VARIABLE VIEW

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

CLICK: VARIABLE VIEW to


see EDIT the Variables
(Name/Label /Measure etc

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

15
21-11-2023

You can edit by clicking on


the cell

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

Frequencies Descriptive Explore

Flexible Only Basic Detailed / Split

CLICK: ANALYSE >


Descriptive
Staistics>Frequencies

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

16
21-11-2023

Click Statistics Tab and


select: quartile, Mean,
Median, Mode etc

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

Click Charts and Selcet


Histogram and check Show
Normal Curve on
Histogram

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

17
21-11-2023

Age of an
Std Deviation = Deviation from Mean Score, Average Age
Average Person
summarises continuous data only, Larger
indicate more spread of observations from CT

• One Representative Age ?


• How old is your Typical User?
• For whom this product is for?

1st, 2nd & 3rd Quartile: Indicate Spread. As they


are less impacted by Outlieres.
Inter Quartile Range: Q3-Q1, Indicates
differences in Extremes

• Mim, Max, and Qartile


indicates Distribution
• Q1=25% are below 24
• Q2=50% are below 26
• Q2=75% are below 33
Compiled and Presented by Dr.Deepak Joshi for Academic Use
Business
Only Case: Mean or Median Depends on the Context

Histogram: Frequency of a Variable, See Skewness


from the Histogram
However
I Could have got a idea it from Mean and Max and
Min Value

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Median 50% Data Right 50% Left, from Table also (Median, Max,Min)
Only one can make out data pushed to right or simply SKWED to Right

18
21-11-2023

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

Case
The market research team at AdRight is assigned the task to identify the profile of the typical customer for each treadmill
product offered by CardioGood Fitness.
The market research team decides to investigate whether there are differences across the product lines with respect to
customer characteristics. The team decides to collect data on individuals who purchased a treadmill at a CardioGoodFitness
retail store during the prior three months. The data are stored in the CardioGoodFitness.csv file.
The team identifies the following customer variables to study:
• product purchased, TM195, TM498, or TM798;
• gender;
• age, in years;
• education, in years;
• relationship status, single or partnered; annual household income ($);
• average number of times the customer plans to use the treadmill each week;
• average number of miles the customer expects to walk/run each week;
• and self-rated fitness on an 1-to-5 scale, where 1 is poor shape and 5 is excellent shape.

Perform descriptive analytics to create a customer profile for each CardioGood Fitness treadmill product line.
Compiled and Presented by Dr.Deepak Joshi for Academic Use
Only

19
21-11-2023

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

20
21-11-2023

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

21
21-11-2023

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

The 'stem' on the left displays the first digit or digits.


The 'leaf' on the right and displays the last digit/s
For Age TM195
18, 19, 19, 19 – (4 Freq), indicates spread of data
around a point here most of the Data is spread
around

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

22
21-11-2023

Can I Put all


variable in
Dependent
List

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Dependent will not Accept Strings Only

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Dependent will not Accept Strings: So we can change the strings
Only to Numerical by coding Male =1, Female = 0

23
21-11-2023

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

24
21-11-2023

Input Old and


New Values
and Click Add

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

When you
have added
all values for
that click
Continue

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

25
21-11-2023

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

After
Continue you
shall land
here. Finally
Click OK

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

26
21-11-2023

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

27
21-11-2023

Probability Distribution & Hypothesis Testing


Compiled and Presented by Dr.Deepak Joshi for Academic Use
Only

Hypothesis
• Conjecture about a population
• A statement about a population parameter
• A premise or claim that one wants to test

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

28
21-11-2023

Null Hypothesis: There is no relationship/no effect between two variables

Babies show no preference Kids behaviour is not Age has no effect on


for food on the basis of affected by the type of learning ability
Color show
Compiled and Presented by Dr.Deepak Joshi for Academic Use
Only

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

29
21-11-2023

Hypothesis & Test….?

Propose a Form a Test (Data /


Explore Report
Question Hypothesis Experiment)

Analyse
• Accept
• Reject

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

30
21-11-2023

Sample Individuals
With Characterstics
- Age
- Gender
Population - Color
- Region

Variation & Uncertainty


Compiled and Presented by Dr.Deepak Joshi for Academic Use
Only

With Variance we reach to a concept


• Variance = (𝜎)2
• Sdt deviation (𝜎) = 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒
• Sdt deviation (𝜎) is amount of variation or
dispersion from mean value
• High means values are spreading out largely from
mean
• 𝜎= σ𝑛𝑖=0 𝑥𝑖 − 𝜇 2/𝑛
• µ=Polulation mean, if Sample Mean then n-1 for
division.
• Age: 3, 6, 7, 9, 10 Mean: (3+6+7+9+10)/5=35/5=7
• 𝜎=
{(7−3)2 + (7−6)2 + (7−7)2 + (7−9)2 + (7−10)2} / 5
Dominos Assures 30
• 𝜎 = 6 = 2.449 Mints Delivery and lets
say its Standard
Compiled and Presented by Dr.Deepak Joshi for Academic Use
Only
Deviation is 2 Mints

31
21-11-2023

Uncertainty Leads
• α (Alpha) = Significance Level = the probability of rejecting the null
hypothesis when it was in fact true.
• Often 5% (5 times out of 100 we shall be wrong in rejecting)
• P Value is Calculated Probability, the probability in the tail beyond the
sample mean assuming that the null hypothesis is correct
• Calculation might differ based on technique but interpretation is same (the
probability of obtaining your sample data, IF the null hypothesis is true,
thus)
• P Value > .05 (α) We accept Null Hypothesis (We want stronger evidence to support)
• P Value < .05 (α) We reject Null Hypothesis
• Confidence Ievel + Alpha = 1

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

• α (Alpha) = Significance Level = the probability of rejecting the null


hypothesis when it was in fact true.
• Often 5% (5 times out of 100 we shall be wrong in rejecting) if it increases
by 5% We shall not reject it.
• P Value is Calculated Probability, the probability in the tail beyond the
sample mean assuming that the null hypothesis is correct (the probability
of obtaining test results at least as extreme as actual, under assumption Ho
is correct)
• Significance Level is the Threshold Value to see if P Value is Low enough to
reject null hypothesis. (Smaller p Value Strong evidence against Null)
• If P Value is < α (Alpha): Result is significant & H0 Rejected
• Ho=Property/Parameter in Population is Zero / Does NOT exist

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

32
21-11-2023

Chi-Square
• Test for Independence / Pearson's Chi-square: Test of Association
Between 2 Categorical Variables
• Discovers if there is a relationship b/w 2 categorical variables
• 2 Categorical Variables Like: Gender, Areas, Profession, education level etc
• Is Gender associated with Shopping Frequency defined as High & Low
• Is gender associated with preferred buying mode (Online Physical)
• Young, Old are likely to vote equally for BJP/CONGRESS ETC

• We can reject null hypothesis of no relationship/association at the .o5 Level


• We have in sufficient evidence to reject null hypothesis (No Significant
Association) at the .05 level=0 There was a significant association between A
and B (Χ2(degree of freedom) = Value, p < .05) if p=0 the p<.001
Compiled and Presented by Dr.Deepak Joshi for Academic Use
Only

Refer Covid Data Set


• H0=There is No Relationship between Buying Mode (Online Vs
Offline) and Safety Concern

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

33
21-11-2023

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

34
21-11-2023

In Statsitcs,
see Chisquare
is Selecetd or
Not

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

In Cells, Don’t
Forget to
check
Expected

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

35
21-11-2023

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

36
21-11-2023

As the above is a chisquare tables of 2x5, and not more than 20% of all the cells have an expected count of
less than 5 (Yates, Moore & McCabe, 19999, p.734) and x2 (4) = 18.99, p=.001, hence considering the above
result we reject null hypothesis and it is concluded that the relationship between mode of buying and safety
concern is statistically significant.

Thus based on the above hypothesis, it is established that people are preferring buying cosmetics online as
they consider it safer than the online mode

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

Z & T Test
• Both compares 2 population means (same or different)
• Z when the population parameter (variances/sd) are known and the
sample size is large.
• T Test when population parameter are Unknown/sample size is less
(30)
• Z (Z score) indicates how many std dev above or below the population
mean the score calculated form Z test is.
• Z score=(x~- 𝜇)/ 𝜎 (x~=sample mean, 𝜇=Population Mean, 𝜎 =Std Dev)

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

37
21-11-2023

T Test
• 1 sample t-test: Compares mean of a single group against a known mean.
• A College may claim that the average Income of his entire Batch 2020 is Rs.50000 or
more than it.
• A School may claim that all his students have an above average IQ.
• Independent samples t-test: Compares mean for two groups
• MFM-D/MFM-C w.r.t Salary
• Type of Exercise (A/B) w.r.t BP Level
• Men & Women w.r.t Shopping Time
• Paired sample t-test: Compares means from the same group at different
times
• Spend on Medicine Pre Covid Vs Spend on Medicine Post Covid
• A Specific Training improved the Running Time of Runners (Pre Run Post Run Time)

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

1 Sample T Test (The Mean Air Quality of the City is 340)


1. Your dependent variable should be measured on a continuous scale.
(Ratio & Interval Scale)
2. The data is independent to eachother.
3. There should be no significant outliers.
4. The dependent variable should be approximately normally distributed for
each group of the independent variables.

NULL HYPO = THERE IS NO DIFFERENCE BETWEEN THE TRUE MEAN AND THE
COMPARED MEAN
There is no difference between sample mean and the normal population
mean
Compiled and Presented by Dr.Deepak Joshi for Academic Use
Only

38
21-11-2023

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

39
21-11-2023

Put the Value you


want to test here

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

Click Option if want to


Check Confidence
Interval (by default
95%)

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

40
21-11-2023

Mean Air Quality Score (M = 332.577, SD = 0.53) was lower


than the comparison score of 340.

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

Independent S T Test: Requirement for


reliable results
1. Your dependent variable should be measured on a continuous scale.
(Ratio & Interval Scale)
2. Your independent variable should consist of two categorical,
independent groups.
3. You should have independence of observations. (No Participant
common)
4. There should be no significant outliers.
5. Your dependent variable should be approximately normally distributed
for each group of the independent variables.
6. There needs to be homogeneity of variances. (Lavenes Test)
NULL: THERE IS NO DIFFERENCE IN THE MEAN INCOME LEVEL BASED ON THE
TRAINING PROGRAMME (TWO GROUPS)
Compiled and Presented by Dr.Deepak Joshi for Academic Use
Only

41
21-11-2023

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

42
21-11-2023

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

Click on Define Groups to define:


Here we have 2 groups as A and
B, indicate them , we may have 3
groups at time like Education as
UG, PG and XII in that case if we
want to compare UG with XII than
we shall put accordingly

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

43
21-11-2023

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

44
21-11-2023

According to this Batch B participants had statistically


significant lower Income (27833 ± 4244) at the end of training
programme compared to Batch A (32272 ± 4244), t(21) =
2.27, p=.034.

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

Paired S T Test: Requirement for reliable


results
1: Your dependent variable should be measured on a continuous scale.
(Ratio & Interval Scale)
2: Your independent variable should consist of two categorical,
matched groups (Same Subject).
3: There should be no significant outliers.
4: The distribution of the differences in the dependent
variable between the two related groups should be approximately
normally distributed.
NULL: NO Significant Difference in Expenditure Pre and Post Covid

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

45
21-11-2023

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

Click Option if want to


Check Confidence
Interval (by default
95%)

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

46
21-11-2023

There was a statistically significant increase in


Expenditure after the Covid 19 from 5.927 ± 1.45 to
6.395 ± std dev at, t(21) = -2.185, p = .040

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

Excel Z….Data> Dat Analysis

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

47
21-11-2023

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

48
21-11-2023

t-Test: Paired Two Sample for Means

Variable 1 Variable 2
Mean 5.927273 6.395454545
Variance 2.107792 1.857597403
Observations 22 22
Pearson Correlation 0.746811

Hypothesized Mean Difference 0


df 21
t Stat -2.18519
P(T<=t) one-tail 0.020175
t Critical one-tail 1.720743
P(T<=t) two-tail 0.04035
t Critical two-tail 2.079614

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

Anova one Way: (Independent T Test + More


Groups than 2)
• Independent samples t-test: Compares mean for two or more groups
• MFM (MFM-D/MFM-C/MFM H) w.r.t Salary
• Type of Exercise (A/B/V) w.r.t BP Level
• Age* (Young Middle and Old) w.r.t Shopping Time
*Converted Continuous into Categorical.
• Whether Salary Expected differed based on the Campuses among Students
• Null: There is no significant Difference in Salary Exp based on Campuses
• Whether Exam Performance differed based on Time Spent on Self Study of
Students
• Null: There is no significant Difference in Exam Performance based on Time Spebt
ANOVA DOES NOT CONFIRMS?TELLS WHICH GROUPS ARE DIFFERENT FROM EACH OTHER BUT TELLS AT
LEAST TWO DIFFER Compiled and Presented by Dr.Deepak Joshi for Academic Use
- FOR THAT WE DO POST HOC TEST Only

49
21-11-2023

ANOVA:Requirement for reliable results


1. Your dependent variable should be measured on a continuous scale.
(Ratio & Interval Scale)
2. Your independent variable should consist of two or more categorical,
independent groups.
3. You should have independence of observations. (No Participant
common)
4. There should be no significant outliers.
5. Your dependent variable should be approximately normally distributed
for each group of the independent variables.
6. There needs to be homogeneity of variances.
NULL: THERE IS NO DIFFERENCE IN THE MEAN INCOME LEVEL BASED ON THE
TRAINING PROGRAMME (THREE or MORE GROUPS)
Compiled and Presented by Dr.Deepak Joshi for Academic Use
Only

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

50
21-11-2023

WHY THIS WHAT 2 DO

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

RECODING OF DATA-NOT IN THE


PERVIEW BUT WE NEED TO LEARN

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

51
21-11-2023

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

52
21-11-2023

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

53
21-11-2023

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

After Entering the


Name and Label
of New Variable
Click on
Change……
Then we shall
move to Old and
New Variable
Value

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

54
21-11-2023

For Anova in
New Value we
shall put
Numerics i.e
1, 2, 3 etc..We
Otherwise we
can give
Strings value
also like
E..English etc
of any
character
width as
required…But
for that we
shall have to
check this

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

When
you have
added all
values for
Compiled and Presented by Dr.Deepak Joshi for Academic Use that click
Only Continue

55
21-11-2023

Click OK
Finally

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

Now we get a new


Variable here called
SPECIALISATION.
Just Remember
what we coded
I,2,3 for which
value

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

56
21-11-2023

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

57
21-11-2023

Click on Post Hoc


and Check Tukey
and then continue

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

Click on Option
and Check
Descriptives

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

58
21-11-2023

AT SIG .041,
THERE IS A STATISTICALLY
SIGNIFICANT DIFFERENCE
BETWEEN THE GROUPS…
BUT
WHICH OF THESE GROUP
ARE DIFFENT CANT BE
CONFIRMED.
FOR THAT WE
HAVE….?

F STATISTICS INDICATES THE


ANOVA SIG

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

Multiple Comparisons
shall indicate which
group are different
from each other….
THUS TUKEYS POST
HOC (Others as well) IS
THE BEST WAY FOR
MULTIPLE
COMPARISONS
NON SIGNIFICANT
B/W 3 and 2

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

59
21-11-2023

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

60
21-11-2023

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

61
21-11-2023

COORELATION IS
Pic courtesy: www.pinterest.com/pin/179440366372836984/

NOT CAUSATION

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

COORELATION
• Association between two variables
• Checks if one moves with the other
• Movement direction & strength varies
• Experience in Years and Salary
• Height & eight of kids
• Supply & Price

• Pearson's Product Momentum correlation coefficient = Pearson's


Correlation ‘r ‘ measures that direction and strength
Compiled and Presented by Dr.Deepak Joshi for Academic Use
Only

62
21-11-2023

COORELATION

1. Draws a line between 2 variables to see the best fit : (Continuous)


2. There should be linear relationship b/w the two
3. NO Significant Outliers
4. Both Data Approximately Normal
-1 to 1

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

COORELATION
• Relationship between Mother’s Height and Babies
• Relationship between Mother’s Weight and Babies

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

63
21-11-2023

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

Click on Options and


Check if you need
Descriptive analysis
Compiled and Presented by Dr.Deepak Joshi for Academic Use
Only

64
21-11-2023

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Pic Courtsey: blog.vantagecircle.com/job-satisfaction/ & www.adventure-in-a-box.com/
Only

65
21-11-2023

Factor Analysis
• Many Variable to Fewer Factors
• Many Observed correlated variables to Latent Variables
• Types
• Exploratory Factor Analysis
• Confirmatory Factory Analysis
• Method
• Principal Component Analysis (Max Variance put into 1st factor): Most
Common Used
• Other Methods like Common Factor Analysis (Finds Common Variation to put
into factors) which are less commonly used
Compiled and Presented by Dr.Deepak Joshi for Academic Use
Only

Assumptions
• All Variables Should be continuous (Mostly Ordinal are also used but
they should be equidistant, like Likert Scale)
• Sampling Adequacy: Large Sample 10 times the items (KMO at least
.5)
• Adequate correlation b/w variables. (Barlet Test of Sp,)
• No significant outiers

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

66
21-11-2023

Some IMP Keywords


• Eigen Values: Variance explained by a particular factor
• Select components whose Eigenvalue is at least 1.
• No of Components Derived = No of Underlying Factors
• Communalities: Variance a variable shares with all the other variables
being considered. This is also the proportion of variance explained by
the by our underlying factors.
• Value of r square shall indicate the variance
• Factor loadings. Factor loadings are simple correlations between the
variables and the factors, indicated in Component Matrix.
• Component Matrix: Given as idea which variables measure which
underlying concept or factor
Compiled and Presented by Dr.Deepak Joshi for Academic Use
Only

PROBLEM: IPL VIEWERSHIP DECLINING


• A Questionnaire Introduced to the Audience
• Now We want to find out, if some of the variables indicating the some
underlying concept. Factor Analysis (Some times it might be known
some time not)
• Use Fan Satisfaction File

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

67
21-11-2023

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

68
21-11-2023

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

Extraction: Principal Component


Most Preferred….Common Factor
Analysis by Principal Axis Factoring

Eigen Values More than 1, or if we


have already decided 2 /3/4
different concepts to see
than fixed no of factors

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

69
21-11-2023

In Rotation: Varimax
Rotation so that there is no
repetition of variables in
Component Matrix..

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

In Options:

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

70
21-11-2023

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

Tell How Many


Factors
Extracted..?

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

71
21-11-2023

.7 / .4

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

• You May like to give Names to


these 3 Underlying Concepts

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

72
21-11-2023

Condition of
stadium
Outer appearance Perception of Stadium
of stadium
Interior design
of stadium

Entry Price

Price of season
ticket Value

No of Star Players

Qlty of Team Comp Team Potential


Compiled and Presented by Dr.Deepak Joshi for Academic Use
Only

Regression - Definition

A statistical technique that attempts to determine the existence of a


possible relationship between one dependent variable (usually denoted
by Y) and a collection of Independent variables.

Regression is used for generating new hypothesis and for validating a


hypothesis

The Mothers Height has no effect on Babies Length

Remember “There is no relationship between Mothers Height and Babies


Height”

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

73
21-11-2023

Regression model establishes existence of association


between two variables, but not causation.

y = mx + c
y = c + mx

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

Conditions: Linear Regression

• Your2 variables should be measured on a continuous scale.


• You should have independence of observations (i.e., independence of residuals),.- Auto Correlation (Specifically in
time series). It decreases the p Value Dubrin-Watson Stat: (0-4)
• The regression model is linear in parameter.
• Your data needs to show homoscedasticity. PP-Plot ( variances along the line should remain similar as you move
along)
• There should be no significant outliers.
• The residuals (errors) are approximately normally distributed (PP Plot)

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

74
21-11-2023

Click on Analyse
then Regression
then Linear

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

Select Dependent
& Independent
accordingly Click Statistics and Select,
Rsuared Change, Durbin
Watson etc as per
requirement

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

75
21-11-2023

Compiled and Presented by Dr.Deepak Joshi for Academic Use


Only

Multiple Linear Regression

• Multiple linear regression means linear in regression parameters


(beta values). The following are examples of multiple linear
regression:

Y =  0 + 1 x1 +  2 x2 + ... +  k xk + 
y = mx + c
y = c + m1x1 + m 2 x 2 + err

An important task in multiple regression is to estimate the beta values


(1, 2, 3 etc…)
Compiled and Presented by Dr.Deepak Joshi for Academic Use
Only

76
21-11-2023

Conditions: Multiple Linear Regression

The assumptions that are made in multiple linear regression model are as follows:

• Assumption #1: Your dependent variable should be measured on a continuous scale.


• Assumption #2: You have two or more independent variables, which can be either continuous (i.e.,
an interval or ratio variable) or categorical (i.e., an ordinal or nominal variable).
• Assumption #3: You should have independence of observations (i.e., independence of residuals),.- Auto
Correlation (Specifically in time series). It decreases the p Value Dubrin-Watson Stat: (0-4)
• Assumption #4: The regression model is linear in parameter.
• Assumption #5: Your data needs to show homoscedasticity. PP-Plot
• Assumption #6: Your data must not show multicollinearity, which occurs when you have two or more independent
variables that are highly correlated with each other. VIF Factor (Less than 7 some suggest 10)
• Assumption #7: There should be no significant outliers, high leverage points or highly influential points.
• Assumption #8: Finally, you need to check that the residuals (errors) are approximately normally distributed
• 1, 2 Should be checked before Testing, 2-8 can be inferred from SPSS Statstics
Compiled and Presented by Dr.Deepak Joshi for Academic Use
Only

77

You might also like