0% found this document useful (0 votes)
22 views26 pages

ST1009 - Week 1

Uploaded by

Anon son
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views26 pages

ST1009 - Week 1

Uploaded by

Anon son
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

ST1009

EXPLORATORY DATA ANALYSIS

Dr. K. A. D. Deshani
Department of statistics, University of Colombo
[email protected]

1
YOU WILL LEARN …..

 The importance of Statistics


 About different types of data
 To organize data numerically
 To organize data graphically
 To interpret summary statistics
2
WHAT IS STATISTICS?
 Statistics are all around us!
 Statements about rain
 Best buy of chocolates
 Most convenient path to travel…
… virtually in every aspect of our lives. Professional and personal.
 Statistics is a science of;
 collecting
 organizing 3

 analyzing & interpreting data.


WHY DO WE NEED STATISTICS?

 Describe the situation as found during the study


 Compare the study results with other studies
 Study the relationships between the variables
 Study trends and changes over time
4
SOME EXAMPLES…

 In agriculture we may want to find out what factors


affect the paddy yield

 In medicine, we may have to compare a new drug with a


standard drug to examine whether it is more effective in
reducing the blood pressure

5
EXAMPLES CTD….

 In marketing, suppose a company is introducing a new


type of a product, then it would be desirable to know
what the approximate demand for that product would be

 In computer science, computer systems and networks are


subject to failure and hence some methods are needed to
evaluate their reliability and availability.
6
SOME TERMS TO REMEMBER

 The population is the complete collection of individuals


or objects that are of interest to the study

Eg: If we are interested in studying the problems of


Colombo University students, our population is “all the
students of the Colombo University.”

7
SOME TERMS TO REMEMBER

 Sample is a subset of the population. Large populations


are difficult to study, and therefore, very often
information is obtained from a sample of the population

8
SOME TERMS TO REMEMBER

 Data is a collection of some information about some


individuals (Individuals are not only humans but may be
objects!)

 A variable is some characteristic about some individual

 An observation is a value that a variable assumes for a


single element of a population or sample.
9
EXAMPLE….

 Number of defective items in 50 batches of an electronic


components produced in a factory.
3 4 7 1 1 1 4 3 6 2
4 2 2 1 1 1 3 1 15 2
• What do these numbers mean to you?
1 2 1 3 5 2 1 4 2 4 • Are there any interesting feature/s you need to know?
1 3 2 5 3 2 7 2 5 8
1 3 5 1 4 1 1 1 5 2 10
EXAMPLE….

 It is difficult to look at each number in turn and draw


conclusions.

 We need to organize and summarize data

 We can use numerical and graphical methods to


summarize data. 11
IN ORDER TO ORGANIZE AND ANALYZE DATA, WE
MAY USE

 Descriptive methods
 Procedures used to summarize information about samples in
a convenient and understandable form without making any
conclusions about the data.

 Inferential methods
(A mixture of the two would be ideal in most situations) 12
EXAMPLE

 Marks of student A – 63, 41, 55


 Marks of student B – 60, 58, 59

On the basis of this information, we can report that student A


had an average of 53, and student B had an average of 59.

Here we have described the two data sets.


-That is descriptive statistics
13
VARIABLES CAN BE BROADLY CLASSIFIED INTO
TWO;

1. Quantitative (numerical) variable


is a variable whose values are numerical in nature.
 Weight of a person
 Exam marks
 Income

14
VARIABLES CAN BE BROADLY CLASSIFIED INTO
TWO;

2. Qualitative Variable
is a variable having categories or classifications that are not
numerical in nature.
 Gender (Male, Female) - dichotomous
 Social class of a person (High, Med., Low) -multinomial

15
QUANTITATIVE VARIABLES CAN BE FURTHER SUB
DIVIDED INTO DISCRETE OR CONTINUOUS….

 Discrete variable is a variable that can take only countable


or finite values.
Eg : 1. Number of customers arriving at a supermarket.
2. Number of children in a family.

16
QUANTITATIVE VARIABLES CAN BE FURTHER SUB
DIVIDED INTO DISCRETE OR CONTINUOUS….

 Continuous variable is a variable that can take


uncountable number of values or any real values.
Eg : 1. Amount of rainfall.
2.Time taken to complete a computer job.

17
Discrete data -- Gaps between possible values

Continuous data -- Theoretically,


no gaps between possible values

18
SUMMARY

Types of data

Quantitative Qualitative
(Numerical) (Categorical)

Discrete Continuous Discrete

19
EXAMPLES

 There are many situations where a continuous


quantitative variable is divided into arbitrary categories
and treated as a qualitative variable.
 Age considered as age categories
 Monthly salary is often considered as a qualitative variable by
grouping them into classes.
20
SCALES OF MEASUREMENTS

 Scales of measurements become important when it comes


to deciding what statistical methods can be used with the
data. There are four types:
 Nominal
 Ordinal
 Interval
 Ratio 21
NOMINAL SCALE

 A qualitative grouping
 A question could be “what different types of dogs you
have?”
 The answer would be the types and we could give
counts for each type.
 We may also talk about the mode ( the type that gives
the highest count ) with this measurement type.
22
ORDINAL SCALE

 There is ‘order’ in the measurement values.


 Class rank is a typical example.
 Student A with rank 1 had performed better at the
examination than student B with rank 2.
 We do not know how better student A is than student B.
 Mode and median can be used to describe this
measurement. 23
INTERVAL SCALE

 Preserves the order and tells you how far apart each
observation is.
 30 degrees F is 10 degrees warmer than 20 degrees F
and 80 degree F is 5 degrees cooler than 85 degrees F.
 No absolute zero in Fahrenheit scale.
24
RATIO SCALE

 Preserves the one unit difference across the scale to be the


same.
 There is a zero point
 4 units in a ratio scale is twice as high as 2 units.
 30 degrees C is twice as hot as 15 degrees C as C (Kelvin) has
a zero point.
 F does not have a zero point.
25
SUMMARY

• Statistics
• Applications
• Basic terms
• Types of variables

You might also like