0% found this document useful (0 votes)
20 views30 pages

1.4.1. Estimation and Inference

Uploaded by

havietthang02
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views30 pages

1.4.1. Estimation and Inference

Uploaded by

havietthang02
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 30

Estimation and Inference

1
Learning Goals
In this section, we will cover:
- Statistical estimation and inference
- Parametric and non-parametric approaches to modeling
- Common statistical distributions
- Frequentist vs. Bayesian statistics

2
Estimation vs. Inference
Estimation: is the application of an algorithm, for example taking an average:

Inference: involves putting an accuracy on the estimate


(e.g. standard error of an average):

3
Machine Learning and Statistical Inference
Machine learning and statistical inference are similar
(a case of computer science borrowing from a long history in statistics).

In both cases, we're using data to learn/infer qualities of a distribution that


generated the data (often termed the data-generating process).

We may care either about the whole distribution or just features (e.g. mean).

Machine learning applications that focus on understanding parameters and


individual effects involve more tools from statistical inference (some
applications are focused only on results).

4
Example: Customer Churn
Customer churn occurs when a customer leaves a company

Data related to churn may include a target variable for


whether or not the customer left

Features could include:


-The length of time as a customer
- The type and amount purchased
- Other customer characteristics

Churn prediction is often approached by predicting a score


for individuals that estimates the probability the customer will
leave.
5
Customer Churn: Estimation
Estimation of factors driving customer churn
involves measuring the impact of each factor in
predicting churn

Inference involves determining whether these


measured impacts are statistically significant

6
Customer Churn: Example Dataset
IBM Cognos Customer Churn Dataset:
- Data from fictional telecommunications firm

- Includes account type, customer characteristics,


revenue per customer, satisfaction score, estimate
of customer lifetime value

- Includes information on whether customer


churned (and some categories of churn type)

7
Customer Churn Example: Plotting

8
Customer Churn Example: Plotting

9
Customer Churn Example: Plotting

10
Customer Churn Example: Plotting

11
Parametric vs. Non-parametric

If inference is about trying to find out the Data-Generating Process


(DGP), then we can say that a statistical model (of the data) is a set of
possible distributions or maybe even regressions.
A parametric model is a particular type of statistical model: it's also a set
of distributions or regressions, but they have a finite number of
parameters.

12
Non-parametric Statistics

In non-parametric statistics, we make fewer assumptions.


In particular, we don't assume that the data belong to any particular
distribution (also called distribution-free inference).
This doesn't mean that we know nothing, though!

13
Non-parametric Inference

An example of non-parametric inference is creating a distribution of


the data (CDF or cumulative distribution function) using a histogram.
In this case, we're not specifying parameters.

14
Parametric Models
A parametric model is a particular type of statistical model: it's also a
set of distributions or regressions, but they have a finite number of
parameters.
An example of a parametric model: the Normal Distribution.

15
Example: Customer Lifetime Value
Customer lifetime value is an estimate of the
customer's value to the company

Data related to customer lifetime value might include:


- The expected length of time as a customer
- The expected amount spent over time

To estimate lifetime value, we make assumptions


about the data

These assumptions can be parametric (assuming a


specific distribution), or non- parametric

16
Parametric Models: Maximum Likelihood
The most common way of estimating parameters in a parametric model
is through maximum likelihood estimation (MLE).

The likelihood function is related to probability and is a function of the


parameters of the model:

17
Parametric Models: Maximum Likelihood
We choose the value of 0 (parameters) that maximizes the likelihood function.

18
Commonly Used Distributions

19
Commonly Used Distributions

20
Commonly Used Distributions

21
Commonly Used Distributions

22
Commonly Used Distributions

23
Frequentist vs. Bayesian Statistics
A Frequentist is concerned with repeated observations in the limit.

Processes may have true frequencies, but we're interested in modeling


probabilities as many, many repeats of an experiment.

Frequentist approach:
1. Derive the probabilistic property of a procedure
2. Apply the probability directly to the observed data

24
Frequentist vs. Bayesian: Bayesian
A Bayesian describes parameters by probability distributions.

Before seeing any data, a prior distribution (based on the


experimenters' belief) is formulated.

This prior distribution is then updated after seeing data (a sample


from the distribution).

After updating, the distribution is called the posterior distribution.

25
Frequentist vs. Bayesian: Bayesian
We will consider two examples of probabilistic systems:
● Coin flips - What is the probability of an unfair coin coming up
heads?
● Election of a particular candidate for UK Prime Minister - What
is the probability of seeing an individual candidate winning, who has
not stood before?

26
Frequentist vs. Bayesian: Bayesian

27
Frequentist vs. Bayesian Statistics
We use much of the same math and the same formulas in both
Frequentist and Bayesian statistics.

The element that differs is the interpretation.

We will point out the difference in interpretation, where appropriate.

28
Summary
● Estimation and Inference
○ Inferential Statistics consist in learning characteristics of the population from a
sample. The population characteristics are parameters, while the sample
characteristics are statistics. A parametric model, uses a certain number of
parameters like mean and standard deviation.
○ The most common way of estimating parameters in a parametric model is through
maximum likelihood estimation.
○ Through a hypothesis test, you test for a specific value of the parameter.
○ Estimation represents a process of determining a population parameter based on a
model fitted to the data.
○ The most common distribution functions are: uniform, normal, log normal,
exponential, and poisson.
○ A frequentist approach focuses in observing man repeats of an experiment. A
29
bayesian approach describes parameters through probability distributions.
Learning Recap
In this section, we discussed:
- Statistical estimation and inference
- Parametric and non-parametric approaches to modeling
- Common statistical distributions
- Frequentist vs. Bayesian statistics

30

You might also like