0% found this document useful (0 votes)
9 views

Statistic and Data Analysis Topic 01b

This is the note for probability and statistic for software engineering course in ums year 2 sem 3

Uploaded by

Khalifah Bumi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Statistic and Data Analysis Topic 01b

This is the note for probability and statistic for software engineering course in ums year 2 sem 3

Uploaded by

Khalifah Bumi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

1.

Enroll in Smartv3
2. Anaconda

KD24103
Statistic and Data Analysis
TOPIC 01
THE NATURE OF PROBABILITY AND
STATISTICS

1
Outline
i. Descriptive and Inferential Statistics
ii. Variables and Types of Data
iii. Data Collection and Sampling Techniques

2
Descriptive and Inferential Statistics
A variable is a characteristic or attribute that can assume different values.
The values that a variable can assume are called data.
A population consists of all subjects (human or otherwise) that are studied.
A sample is a subset of the population.

3
Descriptive and Inferential Statistics
Descriptive statistics consists of the collection, organization,
summarization, and presentation of existing data.
Chart Title
6
5
4
A weight data
3
collection of 38
2
students
1
0
Category 1 Category 2 Category 3 Category 4
Series 1 Series 2 Series 3

4
Descriptive and Inferential Statistics
Inferential statistics consists of generalizing from samples to
populations, performing estimations and hypothesis tests,
determining relationships among variables, and making predictions.
?
RM 20,000

RM 15,000

RM 10,000

It is estimated in 2022 that we will make RM25,000 of rental sales.

2019 2020 2021 2022

5
Descriptive and Inferential Statistics
Descriptive Inferential
Number/info that is being presented is a Numbers/info that is being presented is
real value/facts based on existing data an estimation/prediction/hypothesis
based on existing data

6
Descriptive and Inferential Statistics
Example:

Determine whether descriptive or inferential statistics were used.

1. The average life expectancy in Malaysia is 75.3 years. D

2. A diet high in fruits and vegetables will lower blood pressure.

7
Variables and Data
A variable is any characteristics, number, or quantity
that can be measured or counted.
Eg., Height, time, number of children in a family

Data can be defined as a systematic record of a 48kg, 50kg,


particular variable. 53kg, 61kg, …
65 kg

8
Types of Data

Data

Quantitative Qualitative
Numerical variables Categorical variables

Phone brand: Gender:


Height, Weight, Time
Apple Nokia Male
170cm 65kg 60secs
Samsung Female
Xiaomi Apache

9
3.1 (one decimal place) - Ruler
Length of battery 3.11 (two decimal places)
3.1111 (four decimal places)

Types of Data – Quantitative Data


Data Quantitative
Numerical variables

Continuous Discrete
Examples of Examples of discrete variables include the number of
continuous variables registered cars, number of business locations, and number
include height, time, of children in a family, all of which measured as whole
age, and temperature. units (i.e. 1, 2, 3 cars).

10
Types of Data – Qualitative Data
Data Qualitative
Categorical variables

Information that can’t actually be measured


- Softness of skin
- Happiness rating (1-10)
- Coffee temperature (very hot, or not too hot)

11
Recorded Values and Boundaries
Variable Recorded Value Boundaries
Length 15 centimeters (cm) 14.5-15.5 cm

Temperature 86 Fahrenheit (F) 85.5-86.5 F

Time 0.43 second (sec) 0.425-0.435 sec

Mass 1.6 grams (g) 1.55-1.65 g


12
Types of Data – Levels of Measurement
Measurement
Type

Ordinal Nominal Interval Ratio


variable variable variable variable

13
Types of Data – Levels of Measurement
Ordinal variable
Data that can be logically ordered or ranked.
i. Academic grades (A, B, C, D, E)
ii. Clothing size (XXL, XL, L, M, S, SS)
iii. Place finished in a race: 1st, 2nd, 3rd (where 1st is better than 2nd)
iv. Temperature (100 degree celcius, 75 degree celcius, 5 degree celcius)

14
Types of Data – Levels of Measurement
Nominal variable
i. Data that cannot be organized in a logical sequence.
ii. However, we can assign numbers to nominal variable’s data where different numbers
indicate different objects.
iii. For example:
• Assigning ‘0’ to male, ‘1’ to female iPhone = 1
• Assigning ‘1’ to true, ‘0’ to false Oppo = 2
Samsung = 3
Vivo = 4
Phone brand (variable)
Data: Samsung=1, Vivo=2, Oppo=3, iPhone=4, …

15
Types of Data – Levels of Measurement
Interval variable
i. Data that can be logically ordered (same as Ordinal) and has equal intervals between data.
ii. For example:
• Temperature in degree Celsius.
• The differences between 35 degrees and 36 degrees is the same as the differences between 17
degrees and 18 degrees.
• The differences is 1 degree.
• Coffee Cup A is ?? degrees, Coffee Cup B is ?? degrees. The differences is around 35 degrees.

iii. Basically, the data has a meaningful differences (interval)

16
Types of Data – Levels of Measurement
Ratio variable
It is a data where:
i. We have meaningful differences (like interval variable)
ii. We also have meaningful ratio
• For example: 10 kg is twice as much as 5 kg (10/5 = 2 where 2 is a meaningful ratio number)
• Beras = Beras A package and Beras B package. Person A (stronger) and Person B (normal). Beras A
is 3 times heavier than Beras B.
• A, B, C, D, E

iii. We also have a true zero point


• For example: 0 kg means there is no weight (true zero point)

Temperature ?

17
Data Collection and Sampling Techniques
Some Sampling Techniques
Random – random number generator
Systematic – every kth subject
Convenient – mall surveys (only include people who are easy to reach)

Stratified – divide population into “layers”


Cluster – use intact groups
Sampling error – Sample vs. population

18
No. Name Age Height Weight Salary Covid19+ /
Covid19-
1 Positive
2 Negative

Data Collection and Sampling Techniques 3


4
5
Positive
Positive
Positive
Stratified Sampling 6 Negative
7 Negative
Refer to the randomly arranged data in the figure below. … …
10,000 Positive

19
Data Collection and Sampling Techniques
Stratified Sampling
We then group the data according to category, or we “stratified” the data.
No. Name Age Height Address Salary Color
1 Blue
2 Red
3 Black
… …
10,000 Red

20
Stratified sampling = You need to make sure that the ratio between your categories are
1/8 = 0.125 = 12.5% always the same regardless of the size of your sampled data.
2/8 = 0.25 = 25% The example below shows a 40% sampled data based on the color attribute.
What if you decided to sample 20% from the whole data based on the color attribute?

Data Collection and Sampling Techniques


Stratified Sampling
Let say if we want a sample 40% from the whole data, then we randomly/sytematically1,000 out of 10,000
select
40% data from each category. 10%

Actual
Original ratio Blue = 2000 out of 1000
2000 : 7000 : 1000 Black = 7000 out of 1000
Ratio= 2 : 7 : 1 Red = 1000 out of 10000
1,000 out of 10,000
Stratified Sampled data If I want to sample 10%
200 : 700 : 100 from whole data using
2:7:1 stratified sampling,
therefore I need to selec
10% from each color.

21
No. Name Age Height Address Salary Covid19+ /
Covid19-
1 Positive
2 Negative

Data Collection and Sampling Techniques


3 Positive
… …
10,000 Positive

Cluster Sampling Area

Kota Kinabalu

Tuaran

In this case, we selects


the data by cluster
Penampang

22
Data Collection and Sampling Techniques
Sampling error 20 balls (10% from each color)
10 red (10%)
10 green (10%)

Sampling error happens when you unintentionally selected


data which does not represent the overall population. 100 green
100 red
In other words, unintentionally created bias in your sample
data

23
Conclusion
We have covered:
i. Definition of differential and inferential statistics and their differences
ii. Variable and type of data
• Quantitative data
• Qualitative data

iii. Data collection and sampling techniques

24

You might also like