Chapter 1_ Introduction
Chapter 1_ Introduction
Population:
- The process of statistics starts when we identify what
group we want to study about.
2
The Big Picture of Statistics
Producing data:
- The population, in most cases, is so large that
we can’t study all of it.
- A practical approach is to examine and collect
data from a subgroup of the population
(sample).
- It requires an effort to choose a sample in a
way that it will represent the population well
3
The Big Picture of Statistics
Exploratory Data Analysis (EDA):
- Summarizing the collected data to answer questions in order to explore and make sense
of the data.
4
The Big Picture of Statistics
Probability:
- probability is the machinery that allows us to draw conclusions about the population based on the data
collected about the sample.
5
The Big Picture of Statistics
Inference:
- We can use what we've discovered
about our sample to draw
conclusions about our population
- This final step can be: statistical
estimation, hypothesis testing,
prediction (eg., regression,
classification).
6
The Big Picture of Statistics
Example:
- In August 2021, a poll was conducted by
abcd company about the opinions of
Canadian adults about the prime minister
candidate X
- Producing Data: a representative sample of
2,048 adults was chosen, and each adult
was asked whether he or she favored or
opposed to vote for X 65% are in favor
What percentage
- Exploratory Data Analysis (EDA): The support Candidate X
collected data was summarized, and it was
found that 65% of the sampled adults favor
to vote for X’s political party.
- Probability and Inference: Based on the
sample result and our knowledge of 2048 responses
probability, it was concluded that, with 95% (favor or oppose)
confidence, the percentage of who favor to
vote for X is within 3% of what was obtained
in the sample (between 62% and 68%).
7
Course Structure
Second section: Chapter 3
Fourth section:
- Chapter 5: Statistical Estimation &
Hypothesis testing
- Chapter 6: Introduction to Statistical ML Third section: Chapter 4 8
Course Evaluation & Reading Books
Course evaluation:
Project 40%
- Individual implementation & report (80%)
- Group presentation (20%)
Labs 20%
Assignments/Quizzes 15%
Attendance 5%
2. Introduction to Statistics with Python: with applications in life sciences. Thomas Haslwanter, Switzerland : Springer, 2016. (UofW library link)
9
Introduction to Python Programming Language
Why Python for statistics:
10
Introduction to Python Programming Language
Python packages for statistics:
11
Exploratory Data Analysis
EDA - Examining Distributions
Instructor: Hieu Dang, PhD
The Big Picture
13
Data and Variables
Definitions. Variables
Individuals
characteristic of the individual.
14
Data and Variables
Variables can be classified into one of two Variables
types: categorical or quantitative
Individuals
of several groups. Each observation can be
placed in only one category, and the
categories are mutually exclusive.
15
Data and Variables
Variables can be classified into one of two Variables
types: categorical or quantitative
Individuals
of several groups. Each observation can be
placed in only one category, and the
categories are mutually exclusive.
16
Data and Variables
Variables can be classified into one of two Variables
types: categorical or quantitative
Individuals
measurement.
17
Data and Variables
Variables can be classified into one of two Variables
types: categorical or quantitative
Individuals
measurement.
18