0% found this document useful (0 votes)

49 views51 pages

Module 4

The document discusses topics related to statistics including types of data, levels of measurement, descriptive statistics, inferential statistics, analyzing individual variables through measures of central tendency and dispersion, exploring relationships among variables through correlation and hypothesis testing, and applications of statistics. Key points covered include defining categorical and quantitative data, explaining descriptive and inferential statistics, and analyzing individual and relationships among variables.

Uploaded by

jalaj.joshi2020

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

49 views51 pages

Module 4

Uploaded by

jalaj.joshi2020

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 51

Topics

1. Introduction to Data, Types of Data

2. Levels of Measurement
3. Definition and Uses of Statistics
4. Types of Statistics – Descriptive, Inferential
5. Analyzing Individual Variables-
• Measures of Central Tendency and Dispersion

• Using graphs to Explore data

• Preliminary Analysis: Outlier detection, Missing value treatment

• Normal Distribution – Bell curve, Z score
• Descriptive statistics using Excel
6. Analyzing Relationship among Variables
• Correlation: Correlation coefficient, Correlation Matrix, 2D Scatter plot

• Inferential Statistics- Testing of Hypothesis, P-Value Concept

• Frequently used test- T-Test, F-Test etc.

7. Applications of Statistics
What is Data?
➢Data is often viewed as the lowest level of abstraction from which information and knowledge are derived.

➢Data can be numbers, words, measurements, observations or even just descriptions of things. Also, data is a
representation of a fact, figure and idea.

➢Data on its own carries no meaning. In order for data to become information, it must be interpreted and take
on a meaning.
An example of raw data table. It is just a collection of random info and data.
Exploring Data
Generally one of the first things to do with new data is to get to know it by asking some general questions like
but not limited to the following:
• What variables are included? What information are we getting?
• What is the format of the variables: string, numeric, etc.?
• What type of variables: categorical, continuous, and discrete?
• Is this sample or population data?

After looking at the data you may want to know

• How many males/females?
• What is the average age?
• How many undergraduate/graduates students?
• What is the average SAT score? It is the same for graduates and undergraduates?
• Who reads the newspaper more frequently: men or women?
Types of Data / Variables 6

Categorical Data is the data that is non

numeric. Variable
A variable is a value that may
e.g.. Favorite color, Place of Birth, Types
change within the scope of a given
of Car
problem or set of operations
Quantitative Data is numerical. There
are 2 types of quantitative data.
Discrete Continuous
1. Discrete data can only take
specific values; Random variable which takes only Random variable which takes
isolated values in its range of any value in its range of
e.g. shoe size, number of brothers,
variation. For example number of variation. For example, height
number of cars in a car park.
heads in 10 tosses of a coin of a person
2. Continuous data can take any
numerical value;
Nominal Ordinal
e.g. height, mass, length.
▪ Values do not have ordering ▪ Values are ordered
▪ Example categorical variables ▪ Example RSS scores
like color, nationality and so on
Examine the differences between Categorical and quantitativedata.
Categorical Data Quantitative Data

• Deals with descriptions. 1. Deals with numbers.

1. Data can be observed but not measured. 2. Data which can be measured.
2. Colors, textures, smells, tastes, appearance, beauty, 3. Length, height, area, volume, weight, speed, time,
etc. temperature, humidity, sound levels, cost,
3. Categorical → Category members, ages, etc.
4. Quantitative → Quantity
4. Ex: Oil Painting

1. blue/green color, gold frame 1. picture is 10" by 14"

2. smells old and musty 2. with frame 14" by 18"
3. texture shows brush strokes of oil paint 3. weighs 8.5 pounds
4. peaceful scene of the country 4. surface area of painting is 140 sq. in.
5. masterful brush strokes 5. cost $300
What is Statistics?

Statistics is the science of   

collecting, organizing,  
presenting, analyzing, and
interpreting numerical data to
assist in making more effective
decisions.
Types of Statistics
Descriptive Statistics
Study the basic features of the data that describe what is or what
the data shows.
Statistical methods can be used to summarize or describe a
collection of data.

involves the analysis of numeric data, pictures, graphs and figures.

Inferential Statistics
Study patterns, randomness and uncertainty in the data.
used to draw inferences about the process or population being
studied .
used to make conclusions and future predictions by analysing
numeric data.
Descriptive Statistics
Descriptive Statistics: Methods of organizing, summarizing, and presenting data in an informative way.

EX 1: A Gallup poll found that 49% of the EX 2: According to Consumer Reports,

people in a survey knew the name of the General Electric washing machine owners
first book of the Bible. The statistic 49 reported 9 problems per 100 machines during
describes the number out of every 100 2001. The statistic 9 describes the number of
persons who knew the answer. problems out of every 100 machines.
Inferential Statistics
Inferential Statistics: A decision, estimate, prediction, or generalization about a population, based on a
sample.

A Population is a Collection
of all possible individuals,
objects, or measurements of
interest.

A Sample is a portion, or
part, of the population of
interest
Examples of inferential statistics

Example 1: TV networks constantly Example 2: Wine tasters sip a few

monitor the popularity of their drops of wine to make a decision with
programs by hiring Nielsen and respect to all the wine waiting to be
other organizations to sample the #1 released for sale.
preferences of TV viewers.

Example 3: The accounting department of a large firm

will select a sample of the invoices to check for
accuracy for all the invoices of the company.
The data type and Statistical StatisticalAnalysis
Who Cares?

The type(s) of data collected

in a study determine the type
of statistical analysis used.

One of the primary purposes of classifying variables according to their level or scale of measurementis
to facilitate the choice of a statistical analysis used to analyze the data.
There are certain statistical analyses which are only meaningful for data which are measured atcertain
measurement scales.
Statistical representation of data

For example ...

Categorical data are commonly summarized using ?Frequencies/percentages? (or ?proportions?).
11% of students have a tattoo
2%, 33%, 39%, and 26% of the students in class are, respectively, freshmen, sophomores, juniors,
and seniors.

And for example ?

Measurement data are typically summarized using ?averages? (or ?means?).
Average number of siblings Fall 1998 Stat 250 students have is 1.9.
Average weight of male Fall 1998 Stat 250 students is 173 pounds.
Average weight of female Fall 1998 Stat 250 students is 138 pounds.
Analyzing Individual Variables- Univariate Independent Analysis
- Measures of Central Tendency
- Measures of Dispersion/Variability
- Using graphs to Explore data
Preliminary Analysis:
- Missing data
- Outlier detection
- Normal Distribution
Recall the Branches of Statistics
Statistics

Descriptive Statistics Inferential Statistics

Descriptive Statistics describes Inferential statistics is a set of

the data set that’s being methods that is used to draw
analyzed, but doesn’t allow us to conclusions or inferences about
draw any conclusions or make the characteristics of populations
any inferences about the data based on data from a sample

Measures of Central
Tendency & Dispersion
Estimation
Descriptive Statistics:
Tools for summarizing, organizing Summary Graphic Hypothesis
& simplifying data Inferential Statistics:
Tools Testing
Tools for generalizing beyond
Tables & Graphs actual observations
Measures of Central Tendency
Measures of Variability Generalize from a sample to a
population
Mean, median and mode are different measures ofcentral tendency
Histogram of weekly returns of xyz equity prices Measure

Percent Mean It is the easiest metric to

understand and communicate
Median = -1 Mean = 202
27 27 27
26 Mean is prone to presence of
outliers
18
16 15 Median Median is a more “robust” to
11
9
presence of outliers
8 8
It is more complicated to
3
1 2 2 communicate

Mode Not very practical since it is

affected by skewness
-24 -21 -18 -15 -11 -8 -5 -1 2 5 9 12 15 18 2000
Weekly returns Most real life distributions are
multimodal

17
Other Measures of CentralTendencies…

Weighted Mean : Geometric Mean: The geometric mean is the nth

= 588/28 root of the product of the scores. (used for
Logarithmic distributions)
= 21

sum

Harmonic Mean :

Gurgaon to Delhi you travel at 40 miles per hour, Delhi to Faridhabad you travel at 60 miles per hour, then your
average speed is given by the Harmonic Mean of 40 and 60, which is 48 miles per hour; that is; the total amount of time
for the trip is the same as if you travelled the entire trip at 48 miles per hour.
The Central Tendencies Summary
Mean:
It's just the average of the data, computed as the sum of the data points divided by the number of points

Mode:
Mode is the most common value in the data set.
Tricky circumstances:
If no value occurs more than once, then there is no mode
If two values occur as frequently as each other and more frequently than any other, then there are two modes (in
the same way, there could also be more than two modes).

Median:
Median is the value in the middle of the data set, when the data points are arranged from smallest to largest.
If there is an odd number of data points, then just arrange them and look for the middle value
Tricky circumstances:
If there is an even number of data points, you will need to take the average of the two middle values.
Appropriate Measures of Central Tendency
The selection should be based on level-of-measurement.

Tips for selecting

use the mode when...
variables are measured at the nominal level
you want a quick and easy measure for ordinal and interval-ratio variables
you want to report the most common score
use the median when...
variables are measured at the ordinal level
variables measured at the interval-ratio level have highly skewed distributions
you want to report the central score. The median always lies at the exact center of a distribution.
use the mean when…
variables are measured at the interval-ratio level
you want to report the typical score. The mean is "the fulcrum that exactly balances all of the scores."
you anticipate additional statistical analysis.
Examples
What is a typical student in the class doing? - Mean

To compare performance of any single student against group - Median

A parent wanting to know whether their child better or worse than typical child at - Mode
his grade level
Are these sufficient?
Auto Office Transport OwnCar
7 9 1
• There is the man who drowned crossing a
6 9 3
stream with an average depth of six
3 9 5
inches. ~W.I.E. Gates
8 9 7
12 9 9
• Say you were standing with one foot in the 9 9 9
oven and one foot in an ice 9 9 9
bucket. According to the averages, you 13 9 11
should be perfectly comfortable. 13 9 13
9 9 15
10 9 17
Mean 9 9 9
e.g. x1, x2, x3 Are the times taken to get
Median 9 9 9
to Delhi in different modes of transport
Mode 9 9 9

NO!!!
Measures of Dispersion (Variance)
Distributions with different dispersions
Dispersion refers to the spread or
variability in the data.

It determines how spread out are the scores

around the mean.

The basic question being asked is how much do the scores

deviate around the Mean? The more “bunched up” around
the mean the better your ability to make accurate
predictions.
Measures of Dispersion
▪ Box-plot
▪ Range • Reveals the spread of the data
• Outliers defined using the
▪ Inter-Quartile Range
Q1 - 1.5(Q3-Q1) and Q3 + 1.5(Q3-Q1)
▪ Mean Deviation outlier

▪ Standard Deviation

▪ Variance

▪ Percentiles/Quartiles

24
Variance and Standard Deviation

Variance: the arithmetic mean  (X - ) 2

 =
of the squared deviations from N
the mean.
(x −  )2 + (x −  )2 +   
=  

X is the value of an observation in the population
m is the arithmetic mean of the population

N is the number of observations in the population


Standard deviation: The square root of
 = 2 the variance.
Now try this…
Auto Office Transport O w n Car
7 9 1
6 9 3
3 9 5
8 9 7
12 9 9
9 9 9
9 9 9
13 9 11
13 9 13
9 9 15
10 9 17
Mean 9 9 9
Median 9 9 9
Mode 9 9 9

Std Dev 3.0 0.0 4.9

Variance 9.2 0.0 24.0

Coefficient of Variation
The coefficient of variation (CV) is a normalized measure of dispersion of a probability distribution. It is
defined as the ratio of the standard deviation to the mean :

• Measure of relative dispersion

•
_ 
•
Always a %
Shows variation relative to mean
CVx =  (100)
• Used to compare 2 or more groups

Which Cricketer do you like? Who is more consistent?

Dravid Sehwag

Dravid 150 150 130 125 145 110 100 152 120 50 128 Mean 123.636 123.636
Sehwag 230 240 150 50 173 23 20 300 45 1 128 Median 128 128

CV 24% 84%
Skewness
Lack of Symmetry
• A distribution is skewed if one of its tails is longer than the other.
• If the distribution of the data is symmetric then skewness is zero

Positive Skew Negative Skew

This means that the distribution has This means that the distribution has
a long tail to the right a long tail to the left
Mean > Median > Mode Mean < Median < Mode

Measure : Mean – Median or Mean – Mode

Kurtosis
 Kurtosis measures the "peakedness" of a distribution.
 Higher Kurtosis means more of the variance is the result of infrequent extreme
deviations, as opposed to frequent modestly sized deviations
 The Kurtosis of the Normal Distribution is 3.

Leptokurtic
Mesokurtic

Platykurtic
Descriptive statistics (using excel’s data analysistool)
Let’s get some descriptive statistics for this data. In excel go to Tools – Data
Analysis. If you do not see “data analysis” option you need to install it, go to
Tools – Add-Ins, a window will pop-up and check the “Analysis ToolPack” option,
then press OK. Try running data analysis again.
Descriptive statistics

Now we know something about our data

Data analysis using Graphs

Tables, charts and graphs are convenient ways to clearly

show your data.
Sample data
The cafeteria wanted to collect data on how much milk was sold in 1 week. The table below shows
the results. We are going to take this data and display it in 3 different types of graphs.

Day Chocolate Strawberry White

Monday 53 78 126
Tuesday 72 97 87
Wednesday 112 73 86
Thursday 33 78 143
Friday 76 47 162

✓Notice how each of the following examples are used to illustrate the data.
✓Choose the best graph form to express your results.
Graphical Representation of variables
Bar Graph Pie Graph Line Graph
• A bar graph is used to show relationships • A circle graph is used to show how a • A line graph is used to show
between groups. part of something relates to the continuing data; how one thing is
• The two items being compared do not whole. affected by another.
need to affect each other. • This kind of graph is needed to show • To see how things are going by the
• It's a fast way to show big differences. percentages effectively. rises and falls a line graph.
Notice how easy it is to read a bar graph.
Choc ol a te M I l k S o l d
Chocolate Milk Sold
Chocolate Milk Sold
120

120 100
112

80
100

Amount
60
80

Sold
76
Monday
Amount Sold

72
40
Tuesday
60
53 Wednesday
20
Thursday
40
33 Friday 0
M onday T ues day W ednes day T hurs day Friday

20 Day

Choc ol a te

0
Monday Tuesday Wednesday Thursday Friday
Monday Tuesday
Wednesday Thursday Day

On what day was the least amount On what day did they have a drop in
On what day did they sell the most chocolate milk sales?
of chocolate milk sold?
chocolate milk?
a. Tuesday b. Friday c. Wednesday a. Monday b. Tuesday c. Thursday a. Thursday b. Tuesday c. Monday
Graphical Representation of variables
Histogram Line charts Ogives
▪ A histogram is a special ▪ A representation of data ▪ In statistics, an ogive is a
kind of bar chart which allows varying over time, eg. graph showing the curve of
us to visualize the distribution commodity prices a cumulative distribution
of values of an function.
ordinal/continuous variable
▪ It provides insights like trend
of the data, seasonality or ▪ It provides insights like
▪ Can be developed in Excel presence of outliers distribution of population
2007 through Data>>>data within a given range
analysis>>>histogram Brent – 1 month forwards
$/barrel
150

100

Jul- Jan- Jul- Jan-

08 09 09 10

1 Footnote

SOURCE: Wik3ip5edia
Choosing the RightGraph

• Use a bar graph if you are not looking for trends (or patterns) over time; and the items (or
categories) are not parts of a whole.

•Use a pie chart if you need to compare different parts of a whole, there is no time involved and
there are not too many items (or categories).

•Use a line graph if you need to see how a quantity has changed over time. Line graphs enable
us to find trends (or patterns) over time.
Common Chart Types
Outliers
•An outlier is an observation that is numerically distant from the rest of the data.

•An outlying observation, or outlier, is one that appears to deviate markedly from other
members of the sample in which it occurs.

•Outliers can occur by chance in any distribution, but they are often indicative either of
measurement error or that the population has a heavy-tailed distribution.
Bill Gates makes $500 million a year. He’s in a room with 9 teachers, 4 of whom
make $40k, 3 make $45k, and 2 make $55k a year. What is the mean salary of
everyone in the room? What would be the mean salary if Gates wasn’t included?

Mean With Gates: Mean Without Gates:

$50,040,500 $45,000
Plots for analyzing outliers
A Scatterplot is useful for "eyeballing" the In a Box plot, a point beyond an inner fence on
presence of outliers. either side is considered a mild outlier. A point
beyond an outer fence is considered an
4
5 extreme outlier.
4
0
16 45
3 40
14
5 35
12 median
3 10 30
Q1
0 25
8 min
2 20
6 max
5 15
4 10 Q3
2
2 5
0 0 2 4 6 8 1
0 0 0
1
v a lu e
5
Stock Price of Peach Inc.
1 Hourly power prices
0
320 5 320
280 0 280
240 240
200 200
160 160
120 120
80 80
Jul-09 Oct-09 Jan-10 Apr-10 Jul-09 Oct-09 Jan-10 Apr-10
Missing Values Imputation Methods
•In the ideal data collection project, complete data would Some common imputation methods
exist for all variables across all experimental units (also • Mean (median, mode) imputation
called subjects, cases, or observations). • Pairwise deletion a.k.a. available case analysis
•Unfortunately, for a number of reasons it is inevitable that
• Dummy variable adjustment
some values won't be collected, will become lost, or will be
unusable. • List wise deletion a.k.a. complete case analysis
• Multiple imputation (MI)
There are a number of reasons why data become missing.
•sensor failures
•omitted entries in databases
•non-response in questionnaires.
•loss to follow up
•lack of overlap between linked data sets
•dropping out of school, graduation, etc.
•survey design: “skip patterns” between respondents
Some facts about missingdata
Why not just delete cases with missing values rather than impute values atall?
a.Deletion can introduce substantial bias into the study. And, the loss in sample size can appreciably diminish the
statistical power of the analysis.

b. As a rule of thumb, if a variable has more than 5% missing values, cases are not deleted.

Should I use original data or imputed data when reporting results?

a.The original dataset may be biased by a large number of non-random missing values.

b.The imputed dataset is a "what-if" hypothetical dataset which relies on estimation, though it is a "best guess" attempt
to present what choices respondents are likely to have made, given their responses on other items.

c.It is preferable to run all analyses on both the original and imputed datasets, and discuss in the report where
imputation would make a difference for the substantive interpretations.
Normal Distributions

• The normal distribution is a pattern for the distribution of a set of data which follows a bell
shaped curve. This also called the Gaussian distribution

• Normal Distribution has the mean, the median, and the mode all coinciding at its peak and with
frequencies gradually decreasing at both ends of the curve.

• The normal distribution is a theoretical ideal distribution. Real-life empirical distributions never
match this model perfectly. However, many things in life do approximate the normal distribution,
and are said to be “normally distributed.”
The Bell Shaped Curve 68-95-99.7 Rule
• The bell shaped curve has the following
characteristics:
• The curve is concentrated in the center and 68% of
decreases on either side. the data

• The bell shaped curve is symmetric and Unimodal

• The curve extends to + / - infinity 95% of the data

• Area under the curve = 1 99.7% of the data

The empirical rule states that for a normal

distribution:
•68% of the data will fall within 1 SD of mean
•95% of the data will fall within 2 SD’s of the
mean
•Almost all (99.7%) of the data will fall within 3
SD’s of the mean
Are my data “normal”?
• Not all continuous random variables are normally distributed!!
• It is important to evaluate how well the data are approximated by a normal distribution

Are my data normally distributed?

1. Look at the histogram! Does it appear bell shaped?

2. Compute descriptive summary measures—are mean, median, and mode similar?
3. Do 2/3 of observations lie within 1 std dev of the mean? Do 95% of observations lie within
2 std dev of the mean?
4. Look at a normal probability plot—is it approximately linear?
5. Run tests of normality (such as Kolmogorov-Smirnov). But, be cautious, highly influenced
by sample size!
Standard (Z) Scores – standard normal variable
• A standard score (also called Z score) is the
number of standard deviations that a given
raw score is above or below the mean.

X −
Z =


All normal distributions can be converted

into the standard normal curve by
subtracting the mean and dividing by the
How good is rule for real data?
standard deviation:
Check some example data:
The mean of the weight of the women = 127.8
The standard deviation (SD) = 15.5
Practice problem

If birth weights in a population are normally distributed with a mean of 109 oz and a standard
deviation of 13 oz
a. What is the chance of obtaining a birth weight of 141 oz or heavier when sampling
birth records at random?
b. What is the chance of obtaining a birth weight of 120 or lighter?
Answer
a. What is the chance of obtaining a birth b. What is the chance of obtaining a birth weight
weight of 141 oz or heavier when sampling of 120 or lighter?
birth records at random?

141 − 109 120 − 109

Z = = 2 .46 Z = = .85
13 13

From the chart or SAS → Z of 2.46 corresponds From the chart → Z of .85 corresponds to a left
to a right tail (greater than) area of: tail area of:

P(Z≥2.46) = 1-(.9931)= .0069 or .69 % P(Z≤.85) = .8023= 80.23%

Applications of Normal Distribution to BusinessAdministration

• Modern portfolio theory assumes that the returns of diversified asset portfolio follow a normal
distribution.

• In operations management, process variations often are normally distributed

• In human resource management, employee performance sometime is considered to be normally

distributed.
Correlation
What is the relationship between two variables?

Relationship between hours studying (X) and grades on a midterm (Y)?

Relationship between self-esteem (X) and depression (Y)?

The relationship between two variables over a period, especially one that shows a
close match between the variables' movements

Direction and strength of relationship between two variables

Graphical representation of data in a bivariatesetup
No association Strong linear relationship Strong linear relationship

Exact linear relationship Quadratic relationship Sinusoidal relationship (damped)

50
Correlation measures may be misleading in certain scenarios
Correlation and independence Spurious correlation
A spurious relationship is a mathematical relationship in
which two events or variables have no direct causal
connection, yet it may be wrongly inferred that they do, due to
either coincidence or the presence of a certain third, unseen
factor (referred to as a "confounding factor" or "lurking
variable")

No correlations: Does not imply no association

Another popular example is a series of Dutch statistics

showing a positive correlation between the number of
storks nesting in a series of springs and the number of
human babies born at that time. Of course there was no
causal connection; they were correlated with each other
only because they were correlated with the weather
nine months before the observations

51
Examples
• Increase in height results in weight increase for children
• Attending lessons leads to improved grades
• Age of the car impact its stopping distances
• More the years of education higher the income

Business Examples
• Rising unemployment leads to a decrease in sales of taste the difference products
• Increase in demand of a product leads to increase in supply
• More efficient the workers higher the productivity

Introduction To Educational Research A Critical Thinking Approach (W. (William) Newton Suter) (Z-Library)
100% (1)
Introduction To Educational Research A Critical Thinking Approach (W. (William) Newton Suter) (Z-Library)
678 pages
Child Development - 2003 - Hoff - The Specificity of Environmental Influence Socioeconomic Status Affects Early Vocabulary
No ratings yet
Child Development - 2003 - Hoff - The Specificity of Environmental Influence Socioeconomic Status Affects Early Vocabulary
11 pages
Correlation and Causation Worksheet
No ratings yet
Correlation and Causation Worksheet
4 pages
Educ 201
No ratings yet
Educ 201
2 pages
Bullshit Course
No ratings yet
Bullshit Course
31 pages
Statistical Foundations - Intro 64zlf
100% (2)
Statistical Foundations - Intro 64zlf
86 pages
Introduction and Descriptive Statistics
No ratings yet
Introduction and Descriptive Statistics
50 pages
Importance of Descriptive Statistics
No ratings yet
Importance of Descriptive Statistics
59 pages
Statistics
No ratings yet
Statistics
63 pages
Notes 3 Descriptive Statistics RJMurden 2021
No ratings yet
Notes 3 Descriptive Statistics RJMurden 2021
47 pages
Statistics
No ratings yet
Statistics
45 pages
Chapter1 Statistics
No ratings yet
Chapter1 Statistics
17 pages
Topic 2 - Descriptive - Statistics
No ratings yet
Topic 2 - Descriptive - Statistics
36 pages
Unit 8. Data Analysis
No ratings yet
Unit 8. Data Analysis
69 pages
1 Stats
No ratings yet
1 Stats
5 pages
Statistics Part1
No ratings yet
Statistics Part1
28 pages
MMW Reviewer
No ratings yet
MMW Reviewer
9 pages
Module 3 4 MMW
No ratings yet
Module 3 4 MMW
6 pages
Statistic Analysis
No ratings yet
Statistic Analysis
20 pages
C1S1 Statistics Packet
No ratings yet
C1S1 Statistics Packet
24 pages
UNIT II - Statistics For Data Science - New
No ratings yet
UNIT II - Statistics For Data Science - New
153 pages
SOC 212 - Introduction and Measures of Location
No ratings yet
SOC 212 - Introduction and Measures of Location
43 pages
Basic Statistics (3685) PPT - Lecture On 20-01-2019
100% (1)
Basic Statistics (3685) PPT - Lecture On 20-01-2019
64 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
26 pages
Lesson-6 - Data Analysis
No ratings yet
Lesson-6 - Data Analysis
24 pages
Statistics Note 1to 4 2
No ratings yet
Statistics Note 1to 4 2
25 pages
Lecture 1
No ratings yet
Lecture 1
38 pages
Exploring Data: AP Statistics Unit 1: Chapters 1-4
No ratings yet
Exploring Data: AP Statistics Unit 1: Chapters 1-4
83 pages
Statistics Theory
No ratings yet
Statistics Theory
3 pages
Inferential Statistics
No ratings yet
Inferential Statistics
92 pages
Bản Sao Của Chapter1 - Introduction - S
No ratings yet
Bản Sao Của Chapter1 - Introduction - S
92 pages
Basics of Statistics
No ratings yet
Basics of Statistics
32 pages
Basic Concepts in Statistics
No ratings yet
Basic Concepts in Statistics
42 pages
Basic Concepts of Statistics
No ratings yet
Basic Concepts of Statistics
41 pages
Introduction To Statistics and SPSS
100% (1)
Introduction To Statistics and SPSS
110 pages
01 Introduction
No ratings yet
01 Introduction
50 pages
Ch1 Prob&Stat NEW
No ratings yet
Ch1 Prob&Stat NEW
35 pages
Introduction To Statistics
100% (1)
Introduction To Statistics
60 pages
Statistics Ppt.1
No ratings yet
Statistics Ppt.1
39 pages
Statistics 24 04 2021 20210618114031
No ratings yet
Statistics 24 04 2021 20210618114031
41 pages
Data Management
No ratings yet
Data Management
81 pages
Probability and Statistics (Tutorial 1)
No ratings yet
Probability and Statistics (Tutorial 1)
35 pages
Statistics
No ratings yet
Statistics
41 pages
Data Management Part 1 2024
No ratings yet
Data Management Part 1 2024
68 pages
PDS Unit4
No ratings yet
PDS Unit4
18 pages
Article Review 1 Eng
No ratings yet
Article Review 1 Eng
30 pages
Basics of Statistics
No ratings yet
Basics of Statistics
40 pages
6 DATA Analysis 2
No ratings yet
6 DATA Analysis 2
46 pages
Chapter 01
No ratings yet
Chapter 01
56 pages
Statistics For Data Science
100% (1)
Statistics For Data Science
27 pages
Lecture 5
No ratings yet
Lecture 5
33 pages
Lecture 1-Statistics Introduction-Defining, Displaying and Summarizing Data
No ratings yet
Lecture 1-Statistics Introduction-Defining, Displaying and Summarizing Data
53 pages
Statistics - Def-Wps Office
No ratings yet
Statistics - Def-Wps Office
14 pages
Statistics
No ratings yet
Statistics
88 pages
Statistics For Data Science 1
No ratings yet
Statistics For Data Science 1
65 pages
Ch-1, What Is Statistics
No ratings yet
Ch-1, What Is Statistics
11 pages
Advanced Statistics1
No ratings yet
Advanced Statistics1
19 pages
Lesson 5 (Descriptive Statistics Part 1) - Oct 2024
No ratings yet
Lesson 5 (Descriptive Statistics Part 1) - Oct 2024
72 pages
Lecture 1 - Introduction To Statistics
No ratings yet
Lecture 1 - Introduction To Statistics
48 pages
Emdad Rahman
No ratings yet
Emdad Rahman
85 pages
Business Statistics I Essentials
From Everand
Business Statistics I Essentials
Louise Clark
5/5 (5)
Data Types: Getting Started With Statistics
From Everand
Data Types: Getting Started With Statistics
Lee Baker
No ratings yet
Elementary Statistics
From Everand
Elementary Statistics
jay prakash Maheshwari
5/5 (1)
Descriptive Statistics: Six Sigma Thinking, #3
From Everand
Descriptive Statistics: Six Sigma Thinking, #3
Sumeet Savant
No ratings yet
Data Analysis With Spss 2nd Edition Stephen A Sweet Karen Gracemartin Download
100% (1)
Data Analysis With Spss 2nd Edition Stephen A Sweet Karen Gracemartin Download
77 pages
Statistical Analysis: A Manual On Dissertation Statistics in SPSS
No ratings yet
Statistical Analysis: A Manual On Dissertation Statistics in SPSS
198 pages
Introduction Statistical Learning
No ratings yet
Introduction Statistical Learning
39 pages
Chapter 4-Sociological Research
No ratings yet
Chapter 4-Sociological Research
9 pages
Understanding Regression Analysis: by Amy Gallo
No ratings yet
Understanding Regression Analysis: by Amy Gallo
16 pages
Geostats Manual 2006
100% (3)
Geostats Manual 2006
237 pages
2.1 (Davis-Blake, A. Pfeffer J. 1989. Just A Mirage The Search For Dispositional Effects)
No ratings yet
2.1 (Davis-Blake, A. Pfeffer J. 1989. Just A Mirage The Search For Dispositional Effects)
16 pages
Topic 13 STAT 497 LN13 Cointegration
No ratings yet
Topic 13 STAT 497 LN13 Cointegration
70 pages
1 s2.0 S1048984318303217 Main PDF
No ratings yet
1 s2.0 S1048984318303217 Main PDF
15 pages
(FREE PDF Sample) Data Analysis With SPSS 2nd Edition Stephen A. Sweet Ebooks
100% (2)
(FREE PDF Sample) Data Analysis With SPSS 2nd Edition Stephen A. Sweet Ebooks
66 pages
Questions Stats and Trix
No ratings yet
Questions Stats and Trix
39 pages
What Is Sociology - Autosaved
100% (1)
What Is Sociology - Autosaved
99 pages
Babbie15e ch04
No ratings yet
Babbie15e ch04
54 pages
Haramaya Univer
No ratings yet
Haramaya Univer
162 pages
DADM NOTES and Cheat Sheet
No ratings yet
DADM NOTES and Cheat Sheet
11 pages
Unnatural Selection
No ratings yet
Unnatural Selection
245 pages
Afar
No ratings yet
Afar
7 pages
Rcihards 1993 Spurius Correlation
No ratings yet
Rcihards 1993 Spurius Correlation
14 pages
Data Analysis With SPSS 2nd Edition Stephen A. Sweet PDF Download
100% (1)
Data Analysis With SPSS 2nd Edition Stephen A. Sweet PDF Download
61 pages
Reaserch Aptitude (Part-10) - 6648559 - 2023 - 01 - 06 - 00 - 52
No ratings yet
Reaserch Aptitude (Part-10) - 6648559 - 2023 - 01 - 06 - 00 - 52
14 pages
BRM CH 4 Slides
No ratings yet
BRM CH 4 Slides
13 pages
Overview Research Methodology at Tilburg University (2021-2022)
No ratings yet
Overview Research Methodology at Tilburg University (2021-2022)
25 pages
Lesson 10 Relationship Between Variables
No ratings yet
Lesson 10 Relationship Between Variables
85 pages
Spruious Regression and Ghouse Equation
No ratings yet
Spruious Regression and Ghouse Equation
23 pages
1 s2.0 S1877117320300478 Main
No ratings yet
1 s2.0 S1877117320300478 Main
183 pages
Zhang 等 - 2023 - Causal reasoning in typical computer vision tasks
No ratings yet
Zhang 等 - 2023 - Causal reasoning in typical computer vision tasks
17 pages

Module 4

Uploaded by

Module 4

Uploaded by

Topics

1. Introduction to Data, Types of Data

• Using graphs to Explore data

• Preliminary Analysis: Outlier detection, Missing value treatment

• Inferential Statistics- Testing of Hypothesis, P-Value Concept

• Frequently used test- T-Test, F-Test etc.

After looking at the data you may want to know

Categorical Data is the data that is non

• Deals with descriptions. 1. Deals with numbers.

1. blue/green color, gold frame 1. picture is 10" by 14"

Statistics is the science of   

involves the analysis of numeric data, pictures, graphs and figures.

EX 1: A Gallup poll found that 49% of the EX 2: According to Consumer Reports,

Example 1: TV networks constantly Example 2: Wine tasters sip a few

Example 3: The accounting department of a large firm

The type(s) of data collected

For example ...

And for example ?

Descriptive Statistics Inferential Statistics

Descriptive Statistics describes Inferential statistics is a set of

Percent Mean It is the easiest metric to

Mode Not very practical since it is

Weighted Mean : Geometric Mean: The geometric mean is the nth

Tips for selecting

To compare performance of any single student against group - Median

It determines how spread out are the scores

The basic question being asked is how much do the scores

Variance: the arithmetic mean  (X - ) 2

N is the number of observations in the population

Std Dev 3.0 0.0 4.9

Variance 9.2 0.0 24.0

• Measure of relative dispersion

Which Cricketer do you like? Who is more consistent?

Positive Skew Negative Skew

Measure : Mean – Median or Mean – Mode

Now we know something about our data

Tables, charts and graphs are convenient ways to clearly

Day Chocolate Strawberry White

Jul- Jan- Jul- Jan-

Mean With Gates: Mean Without Gates:

Should I use original data or imputed data when reporting results?

• The bell shaped curve is symmetric and Unimodal

• Area under the curve = 1 99.7% of the data

The empirical rule states that for a normal

Are my data normally distributed?

1. Look at the histogram! Does it appear bell shaped?

All normal distributions can be converted

141 − 109 120 − 109

P(Z≥2.46) = 1-(.9931)= .0069 or .69 % P(Z≤.85) = .8023= 80.23%

• In operations management, process variations often are normally distributed

• In human resource management, employee performance sometime is considered to be normally

Relationship between hours studying (X) and grades on a midterm (Y)?

Relationship between self-esteem (X) and depression (Y)?

Direction and strength of relationship between two variables

Exact linear relationship Quadratic relationship Sinusoidal relationship (damped)

No correlations: Does not imply no association

Another popular example is a series of Dutch statistics

You might also like