0% found this document useful (0 votes)

8 views6 pages

MICROECONOMETRICSCV

This project focuses on developing a predictive model for defect occurrence in manufacturing using a dataset of 34,516 observations. By applying Logit and Probit models, the analysis identifies significant variables and evaluates model performance through accuracy, True Positive Rate (TPR), and False Positive Rate (FPR). The Probit model outperforms the Logit model in classification accuracy, making it the preferred choice for this dataset.

Uploaded by

Anh Khoa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views6 pages

MICROECONOMETRICSCV

Uploaded by

Anh Khoa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

THE JOINT BACHELOR PROGRAM IN APPLIED FINANCE

University of Economics Ho Chi Minh City & University of Rennes

PROJECT:
MICROECONOMETRICS

I. Introduction:
In modern manufacturing, predicting defects is vital to reducing waste, costs, and enhancing
customer satisfaction. This project uses a dataset of 34,516 observations with 14 variables to
develop a predictive model for defect occurrence on production lines. By applying Logit and
Probit models, the aim is to identify significant variables and optimize the model for better
accuracy and performance, ultimately helping businesses improve product quality and efficiency.
Evaluation will be based on metrics like accuracy, True Positive Rate (TPR), and False Positive
Rate (FPR).

II. Analyze the Data Statistically and Graphically:

Objective: Understand the distribution and relationships of the data.

Statistical Summary: Summarize the dataset (mean, median, standard deviation, etc.) for each
explanatory variable (X1 to X13) and the dependent variable (Y). See the data summary table
below:

Statistic Y X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13

Min. 0.000000 101.8 0.0 82.0 99.99 0.00 12.03 0.240 11.97 5.670 0.00 6.30 0.00 5.740
1st Qu. 0.000000 148.7 149.2 149.4 111.04 11.85 12.21 0.380 11.97 6.410 22.28 13.50 94.31 6.420
Median 0.000000 158.0 156.2 158.7 113.16 12.04 12.26 0.390 11.97 6.610 23.88 16.40 98.50 6.610
Mean 0.008837 159.9 156.9 159.6 113.35 11.97 12.26 0.388 11.97 6.548 23.63 17.88 97.70 6.551
3rd Qu. 0.000000 169.3 164.4 168.9 115.38 12.08 12.30 0.410 11.97 6.620 25.29 20.20 102.23 6.610
Max. 1.000000 198.3 196.9 198.1 177.95 12.19 12.50 0.420 11.99 6.670 43.41 84.60 127.30 6.670
NA's - - - - - - - 18627 - - - - - -

See the R Output below:

Call:
glm(formula = Y ~ ., family = binomial, data = data)

Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept) 1.500e+03 8.538e+02 1.757 0.078992 .
X1 3.093e-03 4.825e-03 0.641 0.521559
X2 9.998e-03 1.003e-02 0.997 0.318859
X3 -7.476e-03 4.988e-03 -1.499 0.133948
X4 2.404e-03 2.069e-02 0.116 0.907494

page 1
X5 -4.987e-01 5.229e-01 -0.954 0.340216
X6 -9.620e-01 1.283e+00 -0.750 0.453416
X7 4.207e+00 4.344e+00 0.968 0.332819
X8 -1.212e+02 7.132e+01 -1.699 0.089394 .
X9 4.804e-01 2.521e+00 0.191 0.848888
X10 1.598e-01 4.239e-02 3.770 0.000163 ***
X11 6.175e-03 1.271e-02 0.486 0.627213
X12 1.039e-02 1.390e-02 0.747 0.454980
X13 -7.231e+00 2.369e+00 -3.052 0.002275 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 2103.7 on 15887 degrees of freedom

Residual deviance: 2035.3 on 15874 degrees of freedom
(18627 observations deleted due to missingness)
AIC: 2063.3

Number of Fisher Scoring iterations: 8

Interpret the R results:

The output provided above is a summary of a logistic regression model fitted to the data
meaning that I am predicting the binary outcome Y using all the explanatory variables (X1 to
X13) in the dataset provided, with a logistic regression.

Intercept: The log odds of Y when all predictors are zero. It is very large and not
significant at the usual levels (p-value = 0.079).

Predictors X1, X2, X3, X4, X5, X6, X7, X8, X9, X11, X12 have p-values greater than
0.05, suggesting they are not statistically significant in the mode. They don't provide strong
evidence against the null hypothesis that their coefficients are zero.
X10: Significant predictor (p-value = 0.000163). It has a positive effect on the log odds
of Y, indicating a higher value of X10 increases the likelihood of Y = 1.
X13: Also significant (p-value = 0.002275). It has a large negative effect on the log odds
of Y, suggesting that higher values of X13 decrease the likelihood of Y = 1.
Model Fit Statistics

Null Deviance: 2103.7 (deviance of a model with only the intercept).

Residual Deviance: 2035.3 (deviance of the model with predictors).

Degrees of Freedom: 15887 (null model) and 15874 (residual model), indicating 13
predictors were used.

Page 2
Deviance Reduction: The reduction from null deviance to residual deviance suggests
that the model with predictors is better than the null model, but how much better needs to be
compared to the chi-square distribution.

AIC (Akaike Information Criterion): 2063.3. Lower AIC values indicate a better fit
with fewer predictors, but here it's used to compare different models.

Significance Codes:
‘***’: Highly significant (p < 0.001)
‘**’: Significant (0.001 < p < 0.01)
‘*’: Moderately significant (0.01 < p < 0.05)
‘.’: Marginally significant (0.05 < p < 0.1)
‘ ’: Not significant (p > 0.1)

Graphical Analysis:

1. Check for missing values: If the data contains missing values, they need to be
addressed (by removing, filling in values, etc.) before building the model, as they can bias the
analysis results.

2. Check for outliers: Outliers can negatively impact the model, especially regression
models. Detecting and handling outliers can help improve the model's accuracy.

3. Check for multicollinearity: Use a correlation matrix to identify multicollinearity

between independent variables. If variables have strong correlations (>0.8), consider combining
or removing these variables.
Correlation plot

Page 3
The correlation analysis shows no strong linear relationship between Y and the
independent variables (X1 - X13). X1 and X2 have a high correlation, potentially causing
multicollinearity issues, while X3 and X4 have a moderate correlation. Variables X5 - X8 have
low correlations, reducing multicollinearity risk. Variable selection methods like backward
elimination or AIC/BIC criteria are recommended for model refinement.

Correlation heatmap

The correlation heatmap shows the relationships between the dependent variable Y and the
independent variables (X1 - X13). The color scale ranges from 0 to 1, with lighter colors
representing higher correlations. The heatmap indicates that none of the variables have a strong
correlation with Y, as the values are generally low. However, some variables, like X1 and X2,
show higher correlations with each other, which could indicate potential multicollinearity. This
suggests that further investigation is needed to refine the model.

III. Interpretation of Accuracy, TPR, and FPR

1. Accuracy, TPR, and FPR for Probit Model

Probit Model Accuracy = 0.735

This means that the Probit model correctly classified approximately 73.59% of the observations.

Probit Model TPR = 0.5213

This indicates that the Probit model correctly identified 52.13% of the actual positive cases.

Probit Model FPR = 0.2622

This indicates that 26.22% of the actual negative cases were incorrectly classified as positive by
the Probit model.

Page 4
2. Accuracy, TPR, and FPR for Logit Model

Confusion Matrix:
Predicted FALSE Predicted TRUE
True 0 25,575 8,635
True 1 147 158

Logit Model Accuracy = 0.7456

This indicates that the Logit model correctly classified approximately 74.56% of the
observations. This is slightly higher than the Probit model's accuracy.
Logit Model TPR = 0.518
This indicates that the Logit model correctly identified 51.8% of the actual positive cases. This is
slightly lower than the Probit model's TPR.

Logit Model FPR = 0.2524

This indicates that 25.24% of the actual negative cases were incorrectly classified as positive by
the Logit model. This is lower than the Probit model's FPR.

IV. Summary of the ROC Curves (Logit vs. Probit):

Purpose of the ROC Curve:

- ROC (Receiver Operating Characteristic) curves are used to evaluate the performance of
classification models by plotting the True Positive Rate (TPR) against the False Positive Rate
(FPR) at various threshold levels.

Page 5
- Purpose in this context: The ROC curve helps compare the performance of two models: Logit
and Probit. The closer the curve is to the top-left corner, the better the model's performance in
distinguishing between classes.

Analysis of the ROC Curves:

- Logit vs. Probit Performance:
- Both curves for Logit and Probit are plotted on the same graph.
- Probit Curve: The curve appears slightly closer to the top-left corner compared to the Logit
curve, indicating marginally better performance.
- Logit Curve: While the Logit curve also shows good performance, it slightly underperforms
compared to the Probit model.

V. Conclusion:
Choosing the Best Model: Based on the ROC curves, the Probit model seems to perform
slightly better than the Logit model in this particular dataset, as it has a higher true positive rate
with fewer false positives across different thresholds. However, the difference is not very large,
and both models perform relatively well.
The Probit model is considered the best in this analysis due to its superior performance as
demonstrated by the ROC curve. The curve for the Probit model is consistently closer to the top-
left corner compared to the Logit model, indicating a higher True Positive Rate (TPR) with fewer
False Positives across various thresholds. This suggests that the Probit model has a better ability
to correctly classify positive cases while minimizing errors. Additionally, the Probit model may
offer a better fit for data that follows a normal distribution in the latent variable, further
enhancing its predictive accuracy. Although both models perform well, the Probit model's slight
edge in classification performance makes it the preferred choice for this dataset.

---THE END---

Page 6

1 Module 3: Peer Reviewed Assignment
No ratings yet
1 Module 3: Peer Reviewed Assignment
22 pages
C 2 Is Being Used For Receiving Inspection. Rejected Lots Are Screened, and All Defective Items
No ratings yet
C 2 Is Being Used For Receiving Inspection. Rejected Lots Are Screened, and All Defective Items
7 pages
Case 5 - Lanco Case
67% (3)
Case 5 - Lanco Case
16 pages
Week Two Assignment, Econometrics
No ratings yet
Week Two Assignment, Econometrics
4 pages
Logistic Regression:: PGP Dse Bangalore July 2018
No ratings yet
Logistic Regression:: PGP Dse Bangalore July 2018
62 pages
Regression Basics
No ratings yet
Regression Basics
27 pages
Machine Learning
No ratings yet
Machine Learning
9 pages
Machine Learning Model
No ratings yet
Machine Learning Model
9 pages
Logisttic Regression, ROC Curve, Cost Function
No ratings yet
Logisttic Regression, ROC Curve, Cost Function
10 pages
Receiver Operator Characteristic
No ratings yet
Receiver Operator Characteristic
5 pages
Background 2.1. Logistic Definition
No ratings yet
Background 2.1. Logistic Definition
6 pages
Machine Learning Basics 1683717543
No ratings yet
Machine Learning Basics 1683717543
15 pages
Predicting Earnings Manipulation - FinalDoc
No ratings yet
Predicting Earnings Manipulation - FinalDoc
29 pages
Seu Ds610 Mod03
No ratings yet
Seu Ds610 Mod03
45 pages
Data Mining
No ratings yet
Data Mining
2 pages
Capstone Assessment
No ratings yet
Capstone Assessment
18 pages
Assignment On Probit Model
No ratings yet
Assignment On Probit Model
17 pages
Project-1 (Data Preprocessing)
No ratings yet
Project-1 (Data Preprocessing)
5 pages
Session 7-8 - Data Cleaning and Logistic Regression For Classification
No ratings yet
Session 7-8 - Data Cleaning and Logistic Regression For Classification
30 pages
Logit Regression - R Data Analysis Examples
No ratings yet
Logit Regression - R Data Analysis Examples
12 pages
'Yatham Padma' 8 May 2022
No ratings yet
'Yatham Padma' 8 May 2022
82 pages
Describe The ROC Curve and Its Significance in Assessing The Performance of Binary Classification Mo
No ratings yet
Describe The ROC Curve and Its Significance in Assessing The Performance of Binary Classification Mo
2 pages
Group 6 CC07
No ratings yet
Group 6 CC07
36 pages
Unit 3-2
No ratings yet
Unit 3-2
20 pages
B-56 Sanket Jambhulkar MLA-3
No ratings yet
B-56 Sanket Jambhulkar MLA-3
7 pages
2 Modele Lineare
No ratings yet
2 Modele Lineare
43 pages
Blue Property
No ratings yet
Blue Property
10 pages
### Data Exploration: 'Yes' 'No' 'Agency' 'Direct' 'Employee Referral' 'Yes' 'No'
100% (1)
### Data Exploration: 'Yes' 'No' 'Agency' 'Direct' 'Employee Referral' 'Yes' 'No'
6 pages
Lab 4
No ratings yet
Lab 4
20 pages
XSTK Final
No ratings yet
XSTK Final
34 pages
Credit-Scoring-CASE
No ratings yet
Credit-Scoring-CASE
29 pages
Ca 3 Merged
No ratings yet
Ca 3 Merged
275 pages
Logistic Regression in R and Python
No ratings yet
Logistic Regression in R and Python
9 pages
DA Unit-3
No ratings yet
DA Unit-3
14 pages
Unit 3
No ratings yet
Unit 3
20 pages
Lead Scoring Group Case Study Presentation
100% (2)
Lead Scoring Group Case Study Presentation
19 pages
Domande Complete ML UNIPD
No ratings yet
Domande Complete ML UNIPD
12 pages
S4 LogisticRegression 15jan2025
No ratings yet
S4 LogisticRegression 15jan2025
25 pages
CO 2 Session 3
No ratings yet
CO 2 Session 3
39 pages
Assignment AI-ML
No ratings yet
Assignment AI-ML
13 pages
Logistic Regression
No ratings yet
Logistic Regression
14 pages
Logistic Regression With R
No ratings yet
Logistic Regression With R
58 pages
Module 2
No ratings yet
Module 2
72 pages
CH 4 Econ 4171
No ratings yet
CH 4 Econ 4171
58 pages
Accuracy Assessment and Confusion Matrix
No ratings yet
Accuracy Assessment and Confusion Matrix
23 pages
Sla4a 21im30005
No ratings yet
Sla4a 21im30005
11 pages
Logistic Regression
No ratings yet
Logistic Regression
30 pages
CH-4-Discrete Choice Models-Short
No ratings yet
CH-4-Discrete Choice Models-Short
58 pages
Unit 3
No ratings yet
Unit 3
24 pages
Course Regression Model Strategies PDF
No ratings yet
Course Regression Model Strategies PDF
307 pages
Logistic Regression
No ratings yet
Logistic Regression
18 pages
Model Evaluation
No ratings yet
Model Evaluation
80 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
208 pages
FAM Unit6
No ratings yet
FAM Unit6
32 pages
Logistic Regression
No ratings yet
Logistic Regression
41 pages
SAS Annotated Output
No ratings yet
SAS Annotated Output
8 pages
Data Mining Project
100% (1)
Data Mining Project
14 pages
Steps in Logistic Regression
No ratings yet
Steps in Logistic Regression
5 pages
Churn Assignment
No ratings yet
Churn Assignment
11 pages
Additional Techniques
No ratings yet
Additional Techniques
17 pages
Logistic Regression Monograph
No ratings yet
Logistic Regression Monograph
33 pages
Student Solutions Manual to Accompany Loss Models: From Data to Decisions, Fourth Edition
From Everand
Student Solutions Manual to Accompany Loss Models: From Data to Decisions, Fourth Edition
Stuart A. Klugman
4/5 (1)
Scale Invariant Feature Transform: Unveiling the Power of Scale Invariant Feature Transform in Computer Vision
From Everand
Scale Invariant Feature Transform: Unveiling the Power of Scale Invariant Feature Transform in Computer Vision
Fouad Sabry
No ratings yet
3.1 Multiple Linear Regression Model
No ratings yet
3.1 Multiple Linear Regression Model
11 pages
BBS11 PPT ch07
No ratings yet
BBS11 PPT ch07
45 pages
10 RD
No ratings yet
10 RD
16 pages
Mcom 3 Sem Statistical Analysis Cgs S 2019
No ratings yet
Mcom 3 Sem Statistical Analysis Cgs S 2019
4 pages
Statistics and Probability Letters: S.Y. Hwang, J.S. Baek, J.A. Park, M.S. Choi
No ratings yet
Statistics and Probability Letters: S.Y. Hwang, J.S. Baek, J.A. Park, M.S. Choi
8 pages
Introduction To Model Validation: Kasey Jones
No ratings yet
Introduction To Model Validation: Kasey Jones
23 pages
Lecture Notes For Empirical Finance
No ratings yet
Lecture Notes For Empirical Finance
94 pages
Pearson's Correlation
No ratings yet
Pearson's Correlation
45 pages
ANOVA Calculator - One Way ANOVA and Tukey HSD Test
No ratings yet
ANOVA Calculator - One Way ANOVA and Tukey HSD Test
5 pages
Worksheet #3
No ratings yet
Worksheet #3
4 pages
BCS301M55
No ratings yet
BCS301M55
29 pages
4th Grading Exam in Math 7
No ratings yet
4th Grading Exam in Math 7
3 pages
12 Housing Prices
No ratings yet
12 Housing Prices
12 pages
Statistics Mcqs - Estimation Part 1: Examrace
100% (1)
Statistics Mcqs - Estimation Part 1: Examrace
7 pages
Project STA108
No ratings yet
Project STA108
25 pages
Final Assignment of MR
No ratings yet
Final Assignment of MR
5 pages
Homicide Forecasting For The State of Guanajuato Using LSTM and Geospatial Information
No ratings yet
Homicide Forecasting For The State of Guanajuato Using LSTM and Geospatial Information
6 pages
Statistics Acadza
No ratings yet
Statistics Acadza
2 pages
Data Mining - Lab 2
No ratings yet
Data Mining - Lab 2
5 pages
JNTUH USED 07-11-2020 AM: Ax For X FX Elsewhere
No ratings yet
JNTUH USED 07-11-2020 AM: Ax For X FX Elsewhere
2 pages
The Cost Performance and Causes of Overruns in Infrastructure Development Projects in Asia
No ratings yet
The Cost Performance and Causes of Overruns in Infrastructure Development Projects in Asia
12 pages
A Complete Guide To K Nearest Neighbors Algorithm 1598272616
No ratings yet
A Complete Guide To K Nearest Neighbors Algorithm 1598272616
13 pages
Wiley Series in Probability and Statistics
No ratings yet
Wiley Series in Probability and Statistics
10 pages
Econometrics LL CH 2
No ratings yet
Econometrics LL CH 2
64 pages
An End-To-End Project On Time Series Analysis and Forecasting With Python
No ratings yet
An End-To-End Project On Time Series Analysis and Forecasting With Python
23 pages