0% found this document useful (0 votes)

14 views39 pages

Descriptive Stats

Uploaded by

e.markowicz86

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views39 pages

Descriptive Stats

Uploaded by

e.markowicz86

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 39

DESCRIPTIVE STATISTICS

Lecturer: Vania Filipova

Statistics Meaning?

The transformation of raw data into a form that will be easy

to understand and interpret; rearranging, ordering, and
manipulating data to generate descriptive information
 Number of people/variables in population/measured
characteristics of population
 Trends in employment
 The objective is to determine a set of statistics that
summarise or represent data
Distribution in statistics

 Thedistribution of a statistical data set (or a

population) is a listing or function showing all the
possible values (or intervals) of the data and how
often they occur.
Types of Probability Distributions

Examples of a discrete distribution are:

 The number of students in a class.

 The number of children in a family.
 The number of cars entering a carwash in a hour.
 Number of home mortgages approved by Irish
Life and Permanent this week
The Normal Distribution

 “Bell Shaped”
f(X)
 Symmetrical
 Mean, Median and
Mode are Equal X
 Interquartile Range 
Equals 1.33 s Mean
 Random Variable Median
Mode
Has Infinite Range

© 2003 Prentice-Hall, Inc.

Chap 6-5
Types of Probability Distributions

Examples of a continuous distribution include:

 The distance students travel to class.

 The time it takes to drive here from Blackrock
 The length of an afternoon nap.
 The length of time of a particular phone call.
Frequency distribution

 Businessresearchers often answer research

questions based on a single variable useful for large
quantity of data
 How many users of the brand may be characterized as loyal?
 What percentage of the market consists of heavy users, medium
users, light users and non-users?
 Frequency distributions examine one variable at a
time and provide counts of the different responses for
the various values of the variable. The objective of a
frequency distribution is to display the number of
responses associated with each value of a variable.
 Can help detect item non-response.
Frequency distribution
requency Distribution for Variable X “Proud” on
15

Employee Survey with Missing Data

Valid Frequency Percent Valid Cumulative
responses percent percent
4 10 14.1 14.5 14.5
5 28 39.4 40.6 55.1
6 21 29.6 30.4 85.5
Strongly 10 14.1 14.5 100.0
agree = 7
Total 69 97.2 100.0
Missing 2 2.8
Total 71 100.0
Simple survey example
Frequency distribution
Descriptive statistics can also be used
to check for mistakes…
Bar charts show the data in the form of bars that
can be displayed either horizontally or vertically
Pie Charts

 Good for
Number of People

categorical data
 Good for Green

reporting survey Blue

Brown
results
 Danger of
misunderstandin
g or confusion if Number of People

too many
segments
Green
Blue
 Pull out a Brown

segment for
emphasis
 Easy to construct
from Excel
Conclusions

 Statistical charts are about communication

 When assessing a chart you need to ask if it
succeeds in telling you what is going on.
 There are few “right” answers in terms of which
diagram to use,
 but some may be viewed as “more appropriate”
than others
 https://www.youtube.com/watch?time_continue=
7&v=kiQ6MUQZHSs

 https://www.youtube.com/watch?time_continue=
79&v=EqeVXI4WNHM
Measures of location, variability
and shape
 Measures of location (measures of central
tendency)
 Mean (average)

 Where
▪ Xi = observed variable X
▪ n = number of observations
Measures of location, variability and
shape
 Mode
 The value that occurs most frequently

 Median
 Middle value when arranged in ascending and
descending order
Calculating the mean = average

 Example: 5 salaries: £6500 £6500 £6500 £6500

£10500
 To calculate the mean, we add to find the total and
divide by the number included.

x
 x 6500  6500  6500  6500  10500 36500
  £7300
n 5 5
The Median and the Mode
 This list is already in order:
 £6500 £6500 £6500 £6500 £10500
 The middle one is the third value
 median = £6500
 The most frequently occurring value is the salary of
£6500
 Mode = £6500
Exercise
 Determine the mean and the median from the following data.
 The weekly pay (x-variable) of a sample of 6 workers is as follows:
 e220, e220, e180, e215, e208, e207
 The mode: The attendance at five mathematics tutorials is as
follows:
 15 ,18 ,17 ,17 ,20
Second Moment
The spread of the data
Measures of location, variability and
shape

 Measures of variability
 Range (max –min)
 Variance – Is the spread of the data around the mean?
▪ The difference between the mean and an observed value is
called the deviation from the mean
▪ When the datapoints are clustered around the mean the
variance is small
 Standard deviation
▪ Square root of the variance
Normal (Gaussian) Distribution

SMALL VARIANCE LARGE VARIANCE

Skewness

 It is a measure of symmetry (or not symmetry) of

a distribution
 If a distribution is perfectly symmetric it is
described as the NORMAL DISTRIBUTION and
the mean, median and mode are identical
 A distribution, or data set, is symmetric if it looks
the same to the left and right of the centre point.
DIRECTION OF SKEW

Consider the distributions in the figure. These tapering sides are called tails (or snakes),
and they provide a visual means for determining which of the two kinds of skewness a
distribution has:

1. Positive skew: The right tail is longer; the mass of the distribution is concentrated
on the left of the figure. The distribution is said to be right-skewed. An example
would be that of income distribution in which there are a few high incomes
2. Negative skew: The left tail is longer; the mass of the distribution is concentrated
on the right of the figure. The distribution is said to be left-skewed.
Parametric vs. Non-parametric
tests
 Parametric
 Ratio or Interval scales
 Large samples
 More powerful
 Stringent assumptions

 Non-parametric tests
 Nominal or ordinal scales
 Small samples
 Less assumptions
 Corresponding non-parametric techniques for many
parametric techniques
 Not as powerful/less sensitive
Scatter plot

 The scatter plot is one of the most important tools

in data visualization.

A scatter plot is based on two axes: the horizontal

axis represents one feature and the vertical axis
represents a second.

 Each instance in a dataset is represented by a point

on the plot determined by the values for that
instance of the two features involved.
Age and height data set
Age Height
32 5.3
34 5.4
36 5.5
16 4
18 4.2
22 4.4
34 5.6
56 6
24 6.2
Scatter plot of age and height
Data Quality
 Missing values
 Outliers
Missing values
 The data quality report highlights
the percentage of missing values
for each feature in your table
 If features have missing values, the
ﬁrst step is to try to determine why
Outliers
 Outliers are values that lie far away from the central
tendency of a feature. These variables are exceptionally
far away from the mainstream of the data.
 There are two kinds of outliers that might occur in your
sample data: Invalid outliers are values that have
been included in a sample through error and are often
referred to as noise in the data.

 Valid outliers are correct values that are simply very

different to all of the rest of the values for a feature.
To identify outliers

 Order data by size and scan top and bottom (min

and max values)
 Difference between mean and median
 Range, standard deviation
 Visualisations = bar plots, box plots
Dealing with outliers

Invalid outliers should either

be marked as missing values
or, if possible, replaced with
valid values sourced from original data sources.

Valid outliers can be allowed to remain in your

data or removed.
Crosstabulations
 Whilea frequency distribution describes one
variable at a time, a cross-tabulation describes
two or more variables simultaneously

 Cross-tabulation results in tables that reflect the

joint distribution of two or more variables with a
limited number of categories or distinct values
Crosstabulation:
Gender and Internet Usage

Gender
Internet Male Female Row
Usage Total
Light 5 10 15
15 10 5 15
Column 15 15 30
Total
Internet usage by gender

Gender
Internet Male Female
Usage
Light 33.3% 66.7%
15 66.7% 33.3%
Column 100% 100%
Total
Any questions?

Race Car Engineering and Mechanics (R-308) - Paul Van Valkenburgh - US, 1993 - P - Van Valkenburgh H - P - Books - 9780768007176 - Anna's Archive
No ratings yet
Race Car Engineering and Mechanics (R-308) - Paul Van Valkenburgh - US, 1993 - P - Van Valkenburgh H - P - Books - 9780768007176 - Anna's Archive
180 pages
Ebook Saint Germain Channeling Series
100% (3)
Ebook Saint Germain Channeling Series
68 pages
MTH302 Short Notes Lec 23 To 45 VUAnswer - Com-1
100% (1)
MTH302 Short Notes Lec 23 To 45 VUAnswer - Com-1
14 pages
Norms and The Meaning of Test Scores
No ratings yet
Norms and The Meaning of Test Scores
34 pages
(From NSCP 2015 TABLE 207D.1-1) : Analysis of Wind Load
No ratings yet
(From NSCP 2015 TABLE 207D.1-1) : Analysis of Wind Load
4 pages
Design Guides For Offshore Structures, Volume 1 - Welded Tubular Joints
No ratings yet
Design Guides For Offshore Structures, Volume 1 - Welded Tubular Joints
323 pages
Statistics
100% (4)
Statistics
124 pages
Assignment 02
100% (1)
Assignment 02
2 pages
Chapter 2 Descriptive Statistics
No ratings yet
Chapter 2 Descriptive Statistics
12 pages
Summary - The Origin of The Universe
No ratings yet
Summary - The Origin of The Universe
1 page
Introduction To Statistics and SPSS
100% (1)
Introduction To Statistics and SPSS
110 pages
Lecture Week 2 Statistics
No ratings yet
Lecture Week 2 Statistics
57 pages
Ch.2 PPT - Descriptive Stat
No ratings yet
Ch.2 PPT - Descriptive Stat
49 pages
Ch2 - Descriptives
No ratings yet
Ch2 - Descriptives
23 pages
Descriptive Statistics-Lc2
No ratings yet
Descriptive Statistics-Lc2
36 pages
Stats Lecture 1
No ratings yet
Stats Lecture 1
45 pages
Introduction To Business Statistics
No ratings yet
Introduction To Business Statistics
27 pages
Measures of Central Tendency & Variation
No ratings yet
Measures of Central Tendency & Variation
86 pages
Business and Statistics
No ratings yet
Business and Statistics
29 pages
Unit Iii
No ratings yet
Unit Iii
152 pages
ch-2 Data Analysis and Interpritaion
No ratings yet
ch-2 Data Analysis and Interpritaion
40 pages
Stat Distributions
No ratings yet
Stat Distributions
24 pages
Statistics Ppt.1
No ratings yet
Statistics Ppt.1
39 pages
HEAT TRANSFER - Problem Set
No ratings yet
HEAT TRANSFER - Problem Set
3 pages
DOM503 Session 1
No ratings yet
DOM503 Session 1
19 pages
Statistics
No ratings yet
Statistics
12 pages
Ch1 Prob&Stat NEW
No ratings yet
Ch1 Prob&Stat NEW
35 pages
Basic Statistics
100% (9)
Basic Statistics
73 pages
Modern Robotics Offers A: Comprehensive Contemporary Approach To The Modeling and Control of Robotic Mechanisms
No ratings yet
Modern Robotics Offers A: Comprehensive Contemporary Approach To The Modeling and Control of Robotic Mechanisms
3 pages
Statistics 24 04 2021 20210618114031
No ratings yet
Statistics 24 04 2021 20210618114031
41 pages
MÔ TẢ BIẾN SỐ
No ratings yet
MÔ TẢ BIẾN SỐ
48 pages
Introduction To Data Analytics: ITE 5201 Lecture5-Data Visualization-2
No ratings yet
Introduction To Data Analytics: ITE 5201 Lecture5-Data Visualization-2
77 pages
Chapter 1
No ratings yet
Chapter 1
51 pages
Topic 2 - Descriptive - Statistics
No ratings yet
Topic 2 - Descriptive - Statistics
36 pages
Basics of Statistics: Definition: Science of Collection, Presentation, Analysis, and Reasonable
100% (1)
Basics of Statistics: Definition: Science of Collection, Presentation, Analysis, and Reasonable
33 pages
Basic Concepts of Statistics
No ratings yet
Basic Concepts of Statistics
41 pages
Basic Biostats Part
No ratings yet
Basic Biostats Part
59 pages
Reliability Distribution 1
No ratings yet
Reliability Distribution 1
41 pages
Statistics Review
No ratings yet
Statistics Review
59 pages
Lesson2 - Measures of Tendency
No ratings yet
Lesson2 - Measures of Tendency
65 pages
2.descriptive Statistics
No ratings yet
2.descriptive Statistics
53 pages
RM Data Analysis
No ratings yet
RM Data Analysis
67 pages
Math
No ratings yet
Math
13 pages
Chapter 1 (Introduction)
No ratings yet
Chapter 1 (Introduction)
40 pages
The Numerical Solution of Fractional Differential Equations
No ratings yet
The Numerical Solution of Fractional Differential Equations
14 pages
1.ungrouped Data Mean, Median&Mode
No ratings yet
1.ungrouped Data Mean, Median&Mode
39 pages
RM EBBA Class 8 CH0 11 Quatitative Analysis
No ratings yet
RM EBBA Class 8 CH0 11 Quatitative Analysis
37 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
51 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
63 pages
RM Module 3
No ratings yet
RM Module 3
34 pages
Statistics For Css
No ratings yet
Statistics For Css
73 pages
Lesson 4 Notes
No ratings yet
Lesson 4 Notes
14 pages
STAB22 Lecture's Notes
No ratings yet
STAB22 Lecture's Notes
64 pages
Optimum Photovoltaic Array Size For A Hybrid Wind/PV System
No ratings yet
Optimum Photovoltaic Array Size For A Hybrid Wind/PV System
7 pages
Sampling Design and Analysis MTH 494: Ossam Chohan Assistant Professor CIIT Abbottabad
No ratings yet
Sampling Design and Analysis MTH 494: Ossam Chohan Assistant Professor CIIT Abbottabad
34 pages
2466939-EDA and STATISTICS NOTES
No ratings yet
2466939-EDA and STATISTICS NOTES
15 pages
Business Statistics: Qualitative or Categorical Data
No ratings yet
Business Statistics: Qualitative or Categorical Data
14 pages
Introduction To Statistics: Prepared By: Joshua Erdy A. Tan
No ratings yet
Introduction To Statistics: Prepared By: Joshua Erdy A. Tan
29 pages
Descriptive Statistics, Tables and Graphs 20
No ratings yet
Descriptive Statistics, Tables and Graphs 20
34 pages
Module 2 - Statistical Foundations
No ratings yet
Module 2 - Statistical Foundations
108 pages
Introduction To Descriptive Statistics I: Sanju Rusara Seneviratne Mbpss
No ratings yet
Introduction To Descriptive Statistics I: Sanju Rusara Seneviratne Mbpss
35 pages
Reliability Assessment of Distribution System With Distributed Generation
No ratings yet
Reliability Assessment of Distribution System With Distributed Generation
7 pages
Lec 2 - Descriptive Statistics
No ratings yet
Lec 2 - Descriptive Statistics
40 pages
QT Module-2
No ratings yet
QT Module-2
45 pages
f592b059 1643454320549
No ratings yet
f592b059 1643454320549
39 pages
QT Probability .
No ratings yet
QT Probability .
4 pages
NITKclass 1
No ratings yet
NITKclass 1
50 pages
Introduction To Biostatistics
No ratings yet
Introduction To Biostatistics
59 pages
Introduction To Stati Stics: There Are Three Kinds of Lies: Lies, Damned Lies, A ND Statistics." (B.Disraeli)
No ratings yet
Introduction To Stati Stics: There Are Three Kinds of Lies: Lies, Damned Lies, A ND Statistics." (B.Disraeli)
39 pages
Reviewer in IE-SAN1
No ratings yet
Reviewer in IE-SAN1
5 pages
W193674 - Equipment Passport
No ratings yet
W193674 - Equipment Passport
5 pages
BP6 Vapor X Água 20-65 10m3h FD
No ratings yet
BP6 Vapor X Água 20-65 10m3h FD
2 pages
Mathematics For Economics and Finance: Answer Key To Final Exam
No ratings yet
Mathematics For Economics and Finance: Answer Key To Final Exam
13 pages
11 Phy Worksheets
No ratings yet
11 Phy Worksheets
1 page
Intro Well Hydraulics
No ratings yet
Intro Well Hydraulics
6 pages
Bustat Reviewer
No ratings yet
Bustat Reviewer
6 pages
Process Safety Testing Training - Preread
No ratings yet
Process Safety Testing Training - Preread
9 pages
Mahesh Seminar
No ratings yet
Mahesh Seminar
34 pages
1PH0 1H MSC 20210211
No ratings yet
1PH0 1H MSC 20210211
30 pages
Design of Buried Thermoplastics Pipes: Results of A European Research Project by Apme & Teppfa
No ratings yet
Design of Buried Thermoplastics Pipes: Results of A European Research Project by Apme & Teppfa
34 pages
Design and Development of Automatic Sheet Metal Cutting Machine
No ratings yet
Design and Development of Automatic Sheet Metal Cutting Machine
11 pages
1980 - Uniform Convexity of The Hyperbolic Metric and Fixed Points of Holomorphic Mappings in The Hilbert Ball.
No ratings yet
1980 - Uniform Convexity of The Hyperbolic Metric and Fixed Points of Holomorphic Mappings in The Hilbert Ball.
11 pages
Greaves 1985 - Mammalian Postorbital Bar Torsion Helical Strut
No ratings yet
Greaves 1985 - Mammalian Postorbital Bar Torsion Helical Strut
12 pages
Solving A Basic Pursuit Curve Problem 29 March 2014
No ratings yet
Solving A Basic Pursuit Curve Problem 29 March 2014
4 pages
Hydraulic Fracturing Technique To Improve Well Productivity and Oil Recovery in Deep Libyan Sandstone Reservoir
No ratings yet
Hydraulic Fracturing Technique To Improve Well Productivity and Oil Recovery in Deep Libyan Sandstone Reservoir
7 pages
Siix23-M10007-Pcb Jig (2024)
No ratings yet
Siix23-M10007-Pcb Jig (2024)
4 pages
Chem Transes Unit 3
No ratings yet
Chem Transes Unit 3
5 pages
Ila-02 - 91-7932-00-00
No ratings yet
Ila-02 - 91-7932-00-00
2 pages
FieldMaxII-To QuickStartGuide RevAD
No ratings yet
FieldMaxII-To QuickStartGuide RevAD
2 pages
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
From Everand
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
Seaport AI Madhavan
No ratings yet

Descriptive Stats

Uploaded by

Descriptive Stats

Uploaded by

DESCRIPTIVE STATISTICS

Lecturer: Vania Filipova

The transformation of raw data into a form that will be easy

 Thedistribution of a statistical data set (or a

Examples of a discrete distribution are:

 The number of students in a class.

© 2003 Prentice-Hall, Inc.

Examples of a continuous distribution include:

 The distance students travel to class.

 Businessresearchers often answer research

Employee Survey with Missing Data

reporting survey Blue

 Statistical charts are about communication

 Example: 5 salaries: £6500 £6500 £6500 £6500

SMALL VARIANCE LARGE VARIANCE

 It is a measure of symmetry (or not symmetry) of

 The scatter plot is one of the most important tools

A scatter plot is based on two axes: the horizontal

 Each instance in a dataset is represented by a point

 Valid outliers are correct values that are simply very

 Order data by size and scan top and bottom (min

Invalid outliers should either

Valid outliers can be allowed to remain in your

 Cross-tabulation results in tables that reflect the

You might also like