Statistic and Data Analysis Topic 01b
Statistic and Data Analysis Topic 01b
Enroll in Smartv3
2. Anaconda
KD24103
Statistic and Data Analysis
TOPIC 01
THE NATURE OF PROBABILITY AND
STATISTICS
1
Outline
i. Descriptive and Inferential Statistics
ii. Variables and Types of Data
iii. Data Collection and Sampling Techniques
2
Descriptive and Inferential Statistics
A variable is a characteristic or attribute that can assume different values.
The values that a variable can assume are called data.
A population consists of all subjects (human or otherwise) that are studied.
A sample is a subset of the population.
3
Descriptive and Inferential Statistics
Descriptive statistics consists of the collection, organization,
summarization, and presentation of existing data.
Chart Title
6
5
4
A weight data
3
collection of 38
2
students
1
0
Category 1 Category 2 Category 3 Category 4
Series 1 Series 2 Series 3
4
Descriptive and Inferential Statistics
Inferential statistics consists of generalizing from samples to
populations, performing estimations and hypothesis tests,
determining relationships among variables, and making predictions.
?
RM 20,000
RM 15,000
RM 10,000
5
Descriptive and Inferential Statistics
Descriptive Inferential
Number/info that is being presented is a Numbers/info that is being presented is
real value/facts based on existing data an estimation/prediction/hypothesis
based on existing data
6
Descriptive and Inferential Statistics
Example:
7
Variables and Data
A variable is any characteristics, number, or quantity
that can be measured or counted.
Eg., Height, time, number of children in a family
8
Types of Data
Data
Quantitative Qualitative
Numerical variables Categorical variables
9
3.1 (one decimal place) - Ruler
Length of battery 3.11 (two decimal places)
3.1111 (four decimal places)
Continuous Discrete
Examples of Examples of discrete variables include the number of
continuous variables registered cars, number of business locations, and number
include height, time, of children in a family, all of which measured as whole
age, and temperature. units (i.e. 1, 2, 3 cars).
10
Types of Data – Qualitative Data
Data Qualitative
Categorical variables
11
Recorded Values and Boundaries
Variable Recorded Value Boundaries
Length 15 centimeters (cm) 14.5-15.5 cm
13
Types of Data – Levels of Measurement
Ordinal variable
Data that can be logically ordered or ranked.
i. Academic grades (A, B, C, D, E)
ii. Clothing size (XXL, XL, L, M, S, SS)
iii. Place finished in a race: 1st, 2nd, 3rd (where 1st is better than 2nd)
iv. Temperature (100 degree celcius, 75 degree celcius, 5 degree celcius)
14
Types of Data – Levels of Measurement
Nominal variable
i. Data that cannot be organized in a logical sequence.
ii. However, we can assign numbers to nominal variable’s data where different numbers
indicate different objects.
iii. For example:
• Assigning ‘0’ to male, ‘1’ to female iPhone = 1
• Assigning ‘1’ to true, ‘0’ to false Oppo = 2
Samsung = 3
Vivo = 4
Phone brand (variable)
Data: Samsung=1, Vivo=2, Oppo=3, iPhone=4, …
15
Types of Data – Levels of Measurement
Interval variable
i. Data that can be logically ordered (same as Ordinal) and has equal intervals between data.
ii. For example:
• Temperature in degree Celsius.
• The differences between 35 degrees and 36 degrees is the same as the differences between 17
degrees and 18 degrees.
• The differences is 1 degree.
• Coffee Cup A is ?? degrees, Coffee Cup B is ?? degrees. The differences is around 35 degrees.
16
Types of Data – Levels of Measurement
Ratio variable
It is a data where:
i. We have meaningful differences (like interval variable)
ii. We also have meaningful ratio
• For example: 10 kg is twice as much as 5 kg (10/5 = 2 where 2 is a meaningful ratio number)
• Beras = Beras A package and Beras B package. Person A (stronger) and Person B (normal). Beras A
is 3 times heavier than Beras B.
• A, B, C, D, E
Temperature ?
17
Data Collection and Sampling Techniques
Some Sampling Techniques
Random – random number generator
Systematic – every kth subject
Convenient – mall surveys (only include people who are easy to reach)
18
No. Name Age Height Weight Salary Covid19+ /
Covid19-
1 Positive
2 Negative
19
Data Collection and Sampling Techniques
Stratified Sampling
We then group the data according to category, or we “stratified” the data.
No. Name Age Height Address Salary Color
1 Blue
2 Red
3 Black
… …
10,000 Red
20
Stratified sampling = You need to make sure that the ratio between your categories are
1/8 = 0.125 = 12.5% always the same regardless of the size of your sampled data.
2/8 = 0.25 = 25% The example below shows a 40% sampled data based on the color attribute.
What if you decided to sample 20% from the whole data based on the color attribute?
Actual
Original ratio Blue = 2000 out of 1000
2000 : 7000 : 1000 Black = 7000 out of 1000
Ratio= 2 : 7 : 1 Red = 1000 out of 10000
1,000 out of 10,000
Stratified Sampled data If I want to sample 10%
200 : 700 : 100 from whole data using
2:7:1 stratified sampling,
therefore I need to selec
10% from each color.
21
No. Name Age Height Address Salary Covid19+ /
Covid19-
1 Positive
2 Negative
Kota Kinabalu
Tuaran
22
Data Collection and Sampling Techniques
Sampling error 20 balls (10% from each color)
10 red (10%)
10 green (10%)
23
Conclusion
We have covered:
i. Definition of differential and inferential statistics and their differences
ii. Variable and type of data
• Quantitative data
• Qualitative data
24