0% found this document useful (0 votes)
3 views

Chapter 1_ Introduction

The document outlines the fundamental concepts of statistics, including data collection, summarization, and interpretation. It emphasizes the importance of sampling from a population to make inferences and discusses exploratory data analysis and probability. Additionally, it provides an overview of course structure, evaluation, and the use of Python for statistical analysis.

Uploaded by

felixhope30
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Chapter 1_ Introduction

The document outlines the fundamental concepts of statistics, including data collection, summarization, and interpretation. It emphasizes the importance of sampling from a population to make inferences and discusses exploratory data analysis and probability. Additionally, it provides an overview of course structure, evaluation, and the use of Python for statistical analysis.

Uploaded by

felixhope30
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Applied Statistics

Instructor: Hieu Dang, PhD


The Big Picture of Statistics
Statistics is a process in which we:
- Collect data,
- Summarize data, and
- Interpret data

Population:
- The process of statistics starts when we identify what
group we want to study about.

2
The Big Picture of Statistics
Producing data:
- The population, in most cases, is so large that
we can’t study all of it.
- A practical approach is to examine and collect
data from a subgroup of the population
(sample).
- It requires an effort to choose a sample in a
way that it will represent the population well

3
The Big Picture of Statistics
Exploratory Data Analysis (EDA):
- Summarizing the collected data to answer questions in order to explore and make sense
of the data.

4
The Big Picture of Statistics
Probability:
- probability is the machinery that allows us to draw conclusions about the population based on the data
collected about the sample.

5
The Big Picture of Statistics
Inference:
- We can use what we've discovered
about our sample to draw
conclusions about our population
- This final step can be: statistical
estimation, hypothesis testing,
prediction (eg., regression,
classification).

6
The Big Picture of Statistics
Example:
- In August 2021, a poll was conducted by
abcd company about the opinions of
Canadian adults about the prime minister
candidate X
- Producing Data: a representative sample of
2,048 adults was chosen, and each adult
was asked whether he or she favored or
opposed to vote for X 65% are in favor
What percentage
- Exploratory Data Analysis (EDA): The support Candidate X
collected data was summarized, and it was
found that 65% of the sampled adults favor
to vote for X’s political party.
- Probability and Inference: Based on the
sample result and our knowledge of 2048 responses
probability, it was concluded that, with 95% (favor or oppose)
confidence, the percentage of who favor to
vote for X is within 3% of what was obtained
in the sample (between 62% and 68%).

Conclusion: we can be 95% sure that the population percentage is


within 3% of 65%, between 62% and 68% of favor to vote for X

7
Course Structure
Second section: Chapter 3

First section: Chapter 2

Fourth section:
- Chapter 5: Statistical Estimation &
Hypothesis testing
- Chapter 6: Introduction to Statistical ML Third section: Chapter 4 8
Course Evaluation & Reading Books
Course evaluation:

Component Value (%)

Project 40%
- Individual implementation & report (80%)
- Group presentation (20%)

Labs 20%

Assignments/Quizzes 15%

Final exam 20%

Attendance 5%

Reading books (optional):


1. Statistics. Robert S. Wite and John S. Wite, Hoboken, NJ : Wiley, 11th edition, 2017. (UofW library link)

2. Introduction to Statistics with Python: with applications in life sciences. Thomas Haslwanter, Switzerland : Springer, 2016. (UofW library link)

9
Introduction to Python Programming Language
Why Python for statistics:

- It is the most elegant programming language.


- It is free
- It is powerful and commonly used in production for data science projects.
- It has huge ML/DS community support.
- It is the tool that data scientists use every day

10
Introduction to Python Programming Language
Python packages for statistics:

11
Exploratory Data Analysis
EDA - Examining Distributions
Instructor: Hieu Dang, PhD
The Big Picture

13
Data and Variables
Definitions. Variables

Data are pieces of information about


individuals organized into variables. By an
individual, we mean a particular person or
object. By a variable, we mean a particular

Individuals
characteristic of the individual.

A dataset is a set of data identified with


particular circumstances.

14
Data and Variables
Variables can be classified into one of two Variables
types: categorical or quantitative

Categorical variables take category or


label values and place an individual into one

Individuals
of several groups. Each observation can be
placed in only one category, and the
categories are mutually exclusive.

In our example of insurance records, which


variables are categorical variables?

15
Data and Variables
Variables can be classified into one of two Variables
types: categorical or quantitative

Categorical variables take category or


label values and place an individual into one

Individuals
of several groups. Each observation can be
placed in only one category, and the
categories are mutually exclusive.

16
Data and Variables
Variables can be classified into one of two Variables
types: categorical or quantitative

Quantitative variables take numerical


values and represent some kind of

Individuals
measurement.

In our example of insurance records, which


variables are quantitative variables?

17
Data and Variables
Variables can be classified into one of two Variables
types: categorical or quantitative

Quantitative variables take numerical


values and represent some kind of

Individuals
measurement.

18

You might also like