Basic Statistical Descriptions of Data

Uploaded by

jaba123jaba

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views7 pages

Basic Statistical Descriptions of Data

Uploaded by

jaba123jaba

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 7

Basic Statistical Descriptions of Data

 For data preprocessing to be successful, it is essential to have an overall

picture of our data.
 Basic statistical descriptions can be used to identify properties of the data
and highlight which data values should be treated as noise or outliers.
 For data preprocessing tasks, we want to learn about data characteristics
regarding both central tendency and dispersion of the data.
 Measures of central tendency include mean, median, mode and
midrange.
 Measures of data dispersion include quartiles, interquartile range (IQR)
and variance.
 These descriptive statistics are of great help in understanding the distribution
of the data.

 Measuring the Central Tendency

 • We look at various ways to measure the central tendency of data, include:
Mean, Weighted mean, Trimmed mean, Median, Mode and Midrange.
 1. Mean :
 • The mean of a data set is the average of all the data values. The sample
mean x is the point estimator of the population mean μ.


 2. Median :
 Sum of the values of then observations Number of observations in the
sample
 Sum of the values of the N observations Number of observations in the
population
 • The median of a data set is the value in the middle when the data items are
arranged in ascending order. Whenever a data set has extreme values, the
median is the preferred measure of central location.
 • The median is the measure of location most often reported for annual
income and property value data. A few extremely large incomes of property
values can inflate the mean.
 • For an off number of observations:
 7 observations= 26, 18, 27, 12, 14, 29, 19.
 Numbers in ascending order = 12, 14, 18, 19, 26, 27, 29
 • The median is the middle value.
 Median=19
 • For an even number of observations :
 8 observations = 26 18 29 12 14 27 30 19
 Numbers in ascending order =12, 14, 18, 19, 26, 27, 29, 30
 The median is the average of the middle two values.
 3. Mode:
 • The mode of a data set is the value that occurs with greatest frequency. The
greatest frequency can occur at two or more different values. If the data have
exactly two modes, the data have exactly two modes, the data are bimodal. If
the data have more than two modes, the data are multimodal.
 • Weighted mean: Sometimes, each value in a set may be associated with a
weight, the weights reflect the significance, importance or occurrence
frequency attached to their respective values.
 • Trimmed mean: A major problem with the mean is its sensitivity to
extreme (e.g., outlier) values. Even a small number of extreme values can
corrupt the mean. The trimmed mean is the mean obtained after cutting off
values at the high and low extremes.
 • For example, we can sort the values and remove the top and bottom 2 %
before computing the mean. We should avoid trimming too large a portion
(such as 20 %) at both ends as this can result in the loss of valuable
information.
 • Holistic measure is a measure that must be computed on the entire data set
as a whole. It cannot be computed by partitioning the given data into subsets
and merging the values obtained for the measure in each subset.

 Measuring the Dispersion of Data

 • An outlier is an observation that lies an abnormal distance from other
values in a random sample from a population.
 • First quartile (Q1): The first quartile is the value, where 25% of the values
are smaller than Q1 and 75% are larger.
 • Third quartile (Q3): The third quartile is the value, where 75 % of the
values are smaller than Q3 and 25% are larger.
 • The box plot is a useful graphical display for describing the behavior of
the data in the middle as well as at the ends of the distributions. The box plot
uses the median and the lower and upper quartiles. If the lower quartile is
Q1 and the upper quartile is Q3, then the difference (Q3 - Q1) is called the
interquartile range or IQ.
 • Range: Difference between highest and lowest observed values
 Variance :
 • The variance is a measure of variability that utilizes all the data. It is based
on the difference between the value of each observation (x;) and the mean
(x) for a sample, u for a population).
 • The variance is the average of the squared between each data value and the
mean.


 Standard Deviation :
 • The standard deviation of a data set is the positive square root of the
variance. It is measured in the same in the same units as the data, making it
more easily interpreted than the variance.
 • The standard deviation is computed as follows:


 Difference between Standard Deviation and Variance


 Graphic Displays of Basic Statistical Descriptions

 • There are many types of graphs for the display of data summaries and
distributions, such as Bar charts, Pie charts, Line graphs, Boxplot,
Histograms, Quantile plots and Scatter plots.
 1. Scatter diagram
 • Also called scatter plot, X-Y graph.
 • While working with statistical data it is often observed that there are
connections between sets of data. For example the mass and height of
persons are related, the taller the person the greater his/her mass.
 • To find out whether or not two sets of data are connected scatter diagrams
can be used. Scatter diagram shows the relationship between children's age
and height.
 • A scatter diagram is a tool for analyzing relationship between two
variables. One variable is plotted on the horizontal axis and the other is
plotted on the vertical axis.
 • The pattern of their intersecting points can graphically show relationship
patterns. Commonly a scatter diagram is used to prove or disprove cause-
and-effect relationships.
 • While scatter diagram shows relationships, it does not by itself prove that
one variable causes other. In addition to showing possible cause and effect
relationships, a scatter diagram can show that two variables are from a
common cause that is unknown or that one variable can be used as a
surrogate for the other.
 2. Histogram
 • A histogram is used to summarize discrete or continuous data. In a
histogram, the data are grouped into ranges (e.g. 10-19, 20-29) and then
plotted as connected bars. Each bar represents a range of data.
 • To construct a histogram from a continuous variable you first need to split
the data into intervals, called bins. Each bin contains the number of
occurrences of scores in the data set that are contained within that bin.
 • The width of each bar is proportional to the width of each category and the
height is proportional to the frequency or percentage of that category.
 3. Line graphs
 • It is also called stick graphs. It gives relationships between variables.
 • Line graphs are usually used to show time series data that is how one or
more variables vary over a continuous period of time. They can also be used
to compare two different variables over time.
 • Typical examples of the types of data that can be presented using line
graphs are monthly rainfall and annual unemployment rates.
 • Line graphs are particularly useful for identifying patterns and trends in the
data such as seasonal effects, large changes and turning points. Fig. 1.12.1
show line graph. (See Fig. 1.12.1 on next page)

 • As well as time series data, line graphs can also be appropriate for
displaying data that are measured over other continuous variables such as
distance.
 • For example, a line graph could be used to show how pollution levels vary
with increasing distance from a source or how the level of a chemical varies
with depth of soil.
 • In a line graph the x-axis represents the continuous variable (for example
year or distance from the initial measurement) whilst the y-axis has a scale
and indicated the measurement.
 • Several data series can be plotted on the same line chart and this is
particularly useful for analysing and comparing the trends in different
datasets.
 • Line graph is often used to visualize rate of change of a quantity. It is more
useful when the given data has peaks and valleys. Line graphs are very
simple to draw and quite convenient to interpret.
 4. Pie charts
 • A type of graph is which a circle is divided into sectors that each represents
a proportion of whole. Each sector shows the relative size of each value.
 • A pie chart displays data, information and statistics in an easy to read "pie
slice" format with varying slice sizes telling how much of one data element
exists.
 • Pie chart is also known as circle graph. The bigger the slice, the more of
that particular data was gathered. The main use of a pie chart is to show
comparisons. Fig. 1.12.2 shows pie chart. (See Fig. 1.12.2 on next page)

 • Various applications of pie charts can be found in business, school and at
home. For business pie charts can be used to show the success or failure of
certain products or services.
 • At school, pie chart applications include showing how much time is
allotted to each subject. At home pie charts can be useful to see expenditure
of monthly income in different needs.
 • Reading of pie chart is as easy figuring out which slice of an actual pie is
the biggest.
 Limitation of pie chart:
 • It is difficult to tell the difference between estimates of similar size.
 Error bars or confidence limits cannot be shown on pie graph.
 Legends and labels on pie graphs are hard to align and read.
 • The human visual system is more efficient at perceiving and discriminating
between lines and line lengths rather than two-dimensional areas and angles.
 • Pie graphs simply don't work when comparing data.

Data Analysis
No ratings yet
Data Analysis
43 pages
Topic 9
No ratings yet
Topic 9
57 pages
Types of Graphs
100% (2)
Types of Graphs
16 pages
Quantitative Skills 1 Graphing
No ratings yet
Quantitative Skills 1 Graphing
40 pages
Types of Graphs and Charts and Their Uses
100% (1)
Types of Graphs and Charts and Their Uses
17 pages
Statistics Notes Part 1
No ratings yet
Statistics Notes Part 1
26 pages
Research Method Lecture Notes
No ratings yet
Research Method Lecture Notes
32 pages
Business and Statistics
No ratings yet
Business and Statistics
29 pages
Data Analytics Summary
No ratings yet
Data Analytics Summary
89 pages
2.descriptive Statistics
No ratings yet
2.descriptive Statistics
53 pages
Iba Unit - Ii
No ratings yet
Iba Unit - Ii
31 pages
AEB801 20222023-Lecture 03-1
No ratings yet
AEB801 20222023-Lecture 03-1
38 pages
Unit 2 - Merged
No ratings yet
Unit 2 - Merged
17 pages
Reasearch Methodology and Statistics
No ratings yet
Reasearch Methodology and Statistics
13 pages
Data Analytics Summary
No ratings yet
Data Analytics Summary
80 pages
Tutoring Session 2023 - Statistics For Business
No ratings yet
Tutoring Session 2023 - Statistics For Business
65 pages
Stats and Its Real World Applications.
No ratings yet
Stats and Its Real World Applications.
53 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
63 pages
Chapter 1
No ratings yet
Chapter 1
51 pages
Business Analytics
No ratings yet
Business Analytics
39 pages
Module 3
No ratings yet
Module 3
2 pages
Basic Stat
No ratings yet
Basic Stat
46 pages
Basic Statistics Notes
No ratings yet
Basic Statistics Notes
10 pages
Week - 1 Day - 1 Descriptive Statistics
No ratings yet
Week - 1 Day - 1 Descriptive Statistics
40 pages
1.ungrouped Data Mean, Median&Mode
No ratings yet
1.ungrouped Data Mean, Median&Mode
39 pages
It B.tech II Year II Sem DV (R18a0555)
No ratings yet
It B.tech II Year II Sem DV (R18a0555)
73 pages
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet
02data Edited v2
No ratings yet
02data Edited v2
43 pages
Year 9 Term 4 Study Guide Statistics and Probability
No ratings yet
Year 9 Term 4 Study Guide Statistics and Probability
12 pages
Graphical Presentation
No ratings yet
Graphical Presentation
6 pages
Glossary
No ratings yet
Glossary
9 pages
Charts
No ratings yet
Charts
11 pages
Probability+&+Statistics Formulas
No ratings yet
Probability+&+Statistics Formulas
47 pages
Data Presentation and Analysis
No ratings yet
Data Presentation and Analysis
71 pages
Descriptive Statistics, Tables and Graphs 20
No ratings yet
Descriptive Statistics, Tables and Graphs 20
34 pages
Article Review 1 Eng
No ratings yet
Article Review 1 Eng
30 pages
Data Presentation and Sumarization: Ungrouped Vs Grouped Data
No ratings yet
Data Presentation and Sumarization: Ungrouped Vs Grouped Data
43 pages
Math
No ratings yet
Math
13 pages
Lesson 2 Notes
No ratings yet
Lesson 2 Notes
11 pages
2.data Description
No ratings yet
2.data Description
57 pages
Descriptive Analytics
No ratings yet
Descriptive Analytics
15 pages
Diskusi 7 BING4102
100% (1)
Diskusi 7 BING4102
8 pages
Introduction To Descriptive Statistics I: Sanju Rusara Seneviratne Mbpss
No ratings yet
Introduction To Descriptive Statistics I: Sanju Rusara Seneviratne Mbpss
35 pages
Math Project (Section A)
No ratings yet
Math Project (Section A)
10 pages
Using Graphs To Display Data R 2-12 PDF
No ratings yet
Using Graphs To Display Data R 2-12 PDF
2 pages
NITKclass 1
No ratings yet
NITKclass 1
50 pages
Week 5A - Statistics Handout
No ratings yet
Week 5A - Statistics Handout
9 pages
FDSA Unit-2
No ratings yet
FDSA Unit-2
41 pages
1st Mid
No ratings yet
1st Mid
19 pages
Chapter 1: Descriptive Statistics: Example 1: Making Steel Rods
No ratings yet
Chapter 1: Descriptive Statistics: Example 1: Making Steel Rods
20 pages
Business Statistics: Qualitative or Categorical Data
No ratings yet
Business Statistics: Qualitative or Categorical Data
14 pages
Guiang Mamow Paper 1 Statistical Terms
No ratings yet
Guiang Mamow Paper 1 Statistical Terms
5 pages
4th PT 2023 Reviewer
No ratings yet
4th PT 2023 Reviewer
4 pages
Unit 01 Statistics
No ratings yet
Unit 01 Statistics
10 pages
Prelims Biostat
No ratings yet
Prelims Biostat
9 pages
BIOSTAT LESSON 2 - Descriptive Statistics
No ratings yet
BIOSTAT LESSON 2 - Descriptive Statistics
3 pages
E-Book On Essentials of Business Analytics: Group 7
No ratings yet
E-Book On Essentials of Business Analytics: Group 7
6 pages
Exercise Book
No ratings yet
Exercise Book
43 pages
Data Science Unit 1 Notes
No ratings yet
Data Science Unit 1 Notes
22 pages
PMBA502 Fall 2021 3a - Stats
No ratings yet
PMBA502 Fall 2021 3a - Stats
94 pages
Q4 - LESSON 1&2 - Median and Quartile of Ungrouped Data
No ratings yet
Q4 - LESSON 1&2 - Median and Quartile of Ungrouped Data
35 pages
STATS
No ratings yet
STATS
89 pages
Figure 1 Showing Discrete Variable
No ratings yet
Figure 1 Showing Discrete Variable
3 pages
Summary of Article
No ratings yet
Summary of Article
3 pages
Statistics For Business Topic - Chapter 3, 4 - Descriptive Statistics
No ratings yet
Statistics For Business Topic - Chapter 3, 4 - Descriptive Statistics
1 page
Desc. Stat
No ratings yet
Desc. Stat
55 pages
DS203 Exercise 5
No ratings yet
DS203 Exercise 5
29 pages
MCQ On Mean, Median, Mode, Range, MD, SD
No ratings yet
MCQ On Mean, Median, Mode, Range, MD, SD
19 pages
Introduction To Data and Statistics With R
No ratings yet
Introduction To Data and Statistics With R
45 pages
CISCON2024PAPAER522
No ratings yet
CISCON2024PAPAER522
8 pages
Data Science Unit 2 Notes
No ratings yet
Data Science Unit 2 Notes
35 pages
Bustat Reviewer
No ratings yet
Bustat Reviewer
6 pages
Keshava Et Al. 2021
No ratings yet
Keshava Et Al. 2021
16 pages
DACA1
No ratings yet
DACA1
10 pages
Chapter 7. The Central Limit Theorem Practice and Homework Solutions
No ratings yet
Chapter 7. The Central Limit Theorem Practice and Homework Solutions
9 pages
Syllabus-Outline Stat 1000 w2024 Template
No ratings yet
Syllabus-Outline Stat 1000 w2024 Template
11 pages
CW3551-DIS Unit 1 Notes
No ratings yet
CW3551-DIS Unit 1 Notes
18 pages
B. E.-Bme 2017 Regulation PDF
No ratings yet
B. E.-Bme 2017 Regulation PDF
140 pages
B. E.-Bme 2017 Regulation PDF
No ratings yet
B. E.-Bme 2017 Regulation PDF
140 pages
IT Syllabus 2017 Regulation
No ratings yet
IT Syllabus 2017 Regulation
136 pages
Quantitative Mathematics Module 2 PDF
No ratings yet
Quantitative Mathematics Module 2 PDF
13 pages
C Programming
No ratings yet
C Programming
130 pages
JavaScript Arrow Function Exercises and Practice Questions
No ratings yet
JavaScript Arrow Function Exercises and Practice Questions
6 pages
Cs8492 Notes DBMS
No ratings yet
Cs8492 Notes DBMS
126 pages
22 Hkcee Math 2011 Paper 1 Solution Only 1
No ratings yet
22 Hkcee Math 2011 Paper 1 Solution Only 1
7 pages
Stat 231 Course Notes
100% (1)
Stat 231 Course Notes
326 pages
Q4 Mod. 1 Measures of Position For Ungrouped Data Quartiles
No ratings yet
Q4 Mod. 1 Measures of Position For Ungrouped Data Quartiles
22 pages
Computer Programming-Unit 4
No ratings yet
Computer Programming-Unit 4
4 pages
GCSE To A Level Transition Booklet PDF
No ratings yet
GCSE To A Level Transition Booklet PDF
75 pages
Box-and-Whisker Plots
No ratings yet
Box-and-Whisker Plots
6 pages
Data Visualization 2
No ratings yet
Data Visualization 2
3 pages
Day 8 - AN - Module 5 - RE
No ratings yet
Day 8 - AN - Module 5 - RE
19 pages
CS6701/ Cryptography and Network Security
No ratings yet
CS6701/ Cryptography and Network Security
1 page
Module 2 Project Complete
No ratings yet
Module 2 Project Complete
10 pages
IBDP Math Applications & Interpretation HL COURSE OUTLINES
100% (3)
IBDP Math Applications & Interpretation HL COURSE OUTLINES
23 pages
SMK Fomra Institute of Technology Kelambakkam, Chennai - 603103. 11 Graduation Day MARCH 26, 2016
No ratings yet
SMK Fomra Institute of Technology Kelambakkam, Chennai - 603103. 11 Graduation Day MARCH 26, 2016
31 pages
Chapter 04 - Measures of Dispersion (Part 1)
No ratings yet
Chapter 04 - Measures of Dispersion (Part 1)
15 pages
UNIT - 3 Arrays and Strings Part - A (2mark Questions) 1
100% (1)
UNIT - 3 Arrays and Strings Part - A (2mark Questions) 1
10 pages
Array Helper Methods in ES6
No ratings yet
Array Helper Methods in ES6
12 pages
Lampiran 3. Hasil Analisis Data Descriptive Statistics
No ratings yet
Lampiran 3. Hasil Analisis Data Descriptive Statistics
5 pages
PSY 307 Midterm Examination Reviewer Chapters 1 3
No ratings yet
PSY 307 Midterm Examination Reviewer Chapters 1 3
13 pages
Sample Statistics Population Parameters: PTH Percentile
No ratings yet
Sample Statistics Population Parameters: PTH Percentile
6 pages
MYP5-Math Tasksheet 11 Statistics
No ratings yet
MYP5-Math Tasksheet 11 Statistics
8 pages
Consolidated 2020
No ratings yet
Consolidated 2020
20 pages
Computer Programming - Unit1
No ratings yet
Computer Programming - Unit1
11 pages
Ocs752 Introduction To C Programming
100% (3)
Ocs752 Introduction To C Programming
2 pages
JavaScript Spread Operator
No ratings yet
JavaScript Spread Operator
4 pages
UNIT - 2 C' Programming Basics Part - A (2mark Questions)
No ratings yet
UNIT - 2 C' Programming Basics Part - A (2mark Questions)
7 pages
Write A JavaScript Function That Reverse A Number & Explanation
No ratings yet
Write A JavaScript Function That Reverse A Number & Explanation
2 pages
Math IB Revision Statistics SL
100% (2)
Math IB Revision Statistics SL
21 pages
SMK Fomra Institute of Technology Graduands Prize List
No ratings yet
SMK Fomra Institute of Technology Graduands Prize List
3 pages
Computer Programming-Unit 5
No ratings yet
Computer Programming-Unit 5
4 pages
Descriptive Statistics: Six Sigma Thinking, #3
From Everand
Descriptive Statistics: Six Sigma Thinking, #3
Sumeet Savant
No ratings yet
No Due - ME
No ratings yet
No Due - ME
1 page
S.M.K Fomra Institute of Technology, Chennai Internal Assessment Test - 2
No ratings yet
S.M.K Fomra Institute of Technology, Chennai Internal Assessment Test - 2
1 page
Seventh Semester Information Technology
No ratings yet
Seventh Semester Information Technology
1 page

Basic Statistical Descriptions of Data

Uploaded by

Basic Statistical Descriptions of Data

Uploaded by

Basic Statistical Descriptions of Data

 For data preprocessing to be successful, it is essential to have an overall

 Measuring the Central Tendency

 Measuring the Dispersion of Data

 Graphic Displays of Basic Statistical Descriptions

You might also like