0% found this document useful (0 votes)

445 views14 pages

CQF EXAM 3-Answer

Uploaded by

robert.yf.lu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

445 views14 pages

CQF EXAM 3-Answer

Uploaded by

robert.yf.lu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

CQF EXAM 3

Yifan Lu

- Q1 What is the cost function of Logistic Regression? Explain in

Detail.
The cost function used in logistic regression measures how well the model's predictions match
the actual outcomes. It adjusts the model parameters to minimize the difference between the
predicted values and the actual values. By minimizing this cost function, one can improve the
logistic regression model and make the predictions more reliable.

Logistic regression is used to predict the probability of an event (between 0 and 1), and the cost
function is a log-loss function. The overall cost function for the training set is the average of
individual log-losses over all N training examples.

To minimize the cost function, one can use the gradient descent method. This is done by
repeatedly adjusting the parameters in the direction that reduces the cost the most, based on
the slope of the cost function, until the lowest point is found.

- Q2 What are voting classifiers in ensemble learning?

Voting classifiers is used in ensemble learning to combine multiple predictions from different
models to make a final prediction. It uses serval different models (these models are called base
estimators) and combines their outputs to try to improve the overall prediction performance.

Types of voting classifiers:

1. Hard Voting
Each model (base estimators) makes a distinct prediction and the class that gets
the most votes is choosen.

Example: if 5 classifiers predict [0,1,1,0,0] then the final prediction is 0 because

it has the majority of votes.

2. Soft Voting
Each base estimator provides a probability for each class and the probs are
averaged and the class with the higher average probability is chosen.

Example: If 4 classifiers predict the probability [0.3, 0.7], [0.2, 0.8], [0.1, 0.9],
[0.5, 0.5] for class 1, then the average probability for class 1 is (0.7+0.8+0.9+0.5)
/ 4 = 0.725, so the final prediction is class 1.

Advantage of voting classifier:

- Easy to implement and understand compared to other ensemble methods like stacking
- Increased accuracy by combining multiple models, this often results in higher accuracy

Limitation of voting classifier:

- If the base estimators are more correlated and make similar errors, this can multiply the
wrongness of the entire combined model

- Q3 Follow the 7-steps to model building for your selected ticker

Step 1: Data Collection

I choose Coco-Cola (KO) to investigate and used the 20+ year daily data to conduct the study. I
downloaded the daily high, low, close, open, and volume data, and made sure the price are all
forward-adjusted so that they are fit for further analysis. The data was sourced from Bloomberg
Terminal and loaded into a Pandas data frame. Initial examination confirmed the correct
structure and types of data columns.

Step 2: Data Cleaning

1. Missing Values: The dataset was checked for missing values, and none were found.
2. Duplicates: No duplicate rows were present in the data.

3. Data Sorting: The data was verified to be sorted by date.

4. Data Types: Relevant columns (Open, Close, High, Low) were converted to float, and
Volume was converted to integer.
Step 3: Feature Engineering

New Features were created to capture various aspects of the stock's behavior:

1. O-C: Difference between Open and Close prices.

2. H-L: Difference between High and Low prices.
3. Sign: Sign of the log return.
4. Return: Log return of the close prices.
5. Momentum: Difference between the current close price and the close price 5 days ago.
6. SMA60: 60-day Simple Moving Average.
7. SMA120: 120-day Simple Moving Average.
8. EMA60: 60-day Exponential Moving Average.
9. EMA120: 120-day Exponential Moving Average.
10. Volume Change: Change in volume from the previous day.
11. Volume Ratio: Ratio of volume to its 20-day moving average.
12. OBV: On-Balance Volume.
13. Volume Volatility: 20-day standard deviation of volume.
Threshold Determination: The 68th percentile of returns was chosen as the threshold for
categorizing near-zero returns. This was to ensure to statically calculates the “small” returns and
treat them as non-positive (negative) daily price movement.

Step 4: Feature EDA & Scaling

Feature EDA

Create a grid of histograms for each feature in the dataset. This is done to visualize the
distribution of each feature. By doing so, one can gain insights into the characteristics of the
data, such as the range of values, the central tendency, and the presence of any outliers or
skewness. This step helps to understand the data and can guide subsequent feature engineering
process.
Feature Scaling

Different scaling methods were compared:

Standard Scaler: Centers data with mean 0 and standard deviation 1.

Minmax Scaler: Scales data to the range [0, 1].

Robust Scaler: Uses median and interquartile range to scale data, less sensitive to outliers.
From the charts, one can see:

Standard Scaler:

- Works well for most features (like O-C, Momentum, return).

Robust Scaler:

- Works well for features with extreme values (like H-L, Volume Change).

Overall

Considering SVM's sensitivity to feature scaling and the distribution of each feature in the charts,
I choose Standard Scaler. It works well for most features and is commonly used in SVM models.

Step 5: Feature Selection

Correlation Matrix: A correlation matrix was used to identify and visualize relationships between
features.
Feature selection using High Correlation Elimination: Features with high correlations (above
0.95) were removed to reduce redundancy:
Result: EMA60, EMA120, OBV, SMA120 were dropped.

Explanation:

- EMA60 and EMA120 are highly similar to SMA60 and SMA120. Keeping them doesn't
add new information and makes the model more complex.

- OBV is also highly similar to other features, so it is removed to simplify the model.

Step 6: Data Splitting

The dataset was split into training and testing sets (80:20 ratio). The training set was further split
into training and validation sets (80:20 ratio).
Step 7: Model Training and Tuning

Feature Importance Calculation: A Random Forest classifier was used to determine the
importance of each feature. This helped in selecting the most significant features for the SVM
model.

Hyperparameter Tuning:

GridSearchCV was used with a range of parameters for C and gamma.

TimeSeriesSplit was employed to maintain temporal integrity during cross-validation.

Best Features and Model

Best Number of Features: 7

Best Selected Features: ['H-L', 'Momentum', 'Volume Volatility', 'O-C', 'Volume Ratio', 'Volume
Change', 'SMA60']

Best Model Parameters: Identified as C=1, gamma=1 from GridSearchCV output.

Model Evaluation
Overall Performance Metrics

Accuracy: 0.6887

Precision: 0.6

Recall: 0.0135

F1 Score: 0.0264

AUC-ROC: 0.4693
Confusion Matrix

True Positives (TP): 3

True Negatives (TN): 486

False Positives (FP): 2

False Negatives (FN): 219

Interpretation:

The model correctly predicted 486 downtrend days (True Negatives).

It only correctly predicted 3 uptrend days (True Positives).

It incorrectly predicted 2 uptrend days as downtrend (False Positives).

It failed to predict 219 uptrend days, incorrectly labeling them as downtrend (False Negatives).

The model has a high number of true negatives but performs poorly in identifying true positives,
which indicates an imbalance in the prediction for the positive class.
Classification Report

The model shows a high precision and recall for class 0 but fails to adequately identify instances
of class 1. This is evident from the very low recall and F1-score for class 1.

ROC Curve

The ROC curve plots the true positive rate (TPR) against the false positive rate (FPR) for different
threshold values.
Interpretation:

The ROC curve is close to the diagonal, indicating that the model performs no better than
random guessing.

The AUC value of 0.47 further confirms the poor performance in distinguishing between uptrend
and downtrend days.

Summary

The model's performance metrics, confusion matrix, and ROC curve were used to evaluate
prediction quality. The accuracy was relatively high, but the recall and F1 scores for predicting
uptrends were low, indicating the model struggles to predict positive moves accurately. The ROC
curve and AUC value suggest the model does not effectively differentiate between uptrends and
downtrends.

Test Card Credit
100% (6)
Test Card Credit
32 pages
(Ebook) Practical Portfolio Performance Measurement and Attribution, With CD-ROM by Carl R. Bacon ISBN 9780470059289, 0470059281
No ratings yet
(Ebook) Practical Portfolio Performance Measurement and Attribution, With CD-ROM by Carl R. Bacon ISBN 9780470059289, 0470059281
52 pages
CQF January 2023 Final Project Brief
No ratings yet
CQF January 2023 Final Project Brief
23 pages
Vio 3 Service
No ratings yet
Vio 3 Service
84 pages
Introduction To Probability
No ratings yet
Introduction To Probability
88 pages
A&J Flashcards For SOA Exam P/CAS Exam 1
No ratings yet
A&J Flashcards For SOA Exam P/CAS Exam 1
28 pages
EXL Acquisition Scorecard Reject Inference Methodologies
No ratings yet
EXL Acquisition Scorecard Reject Inference Methodologies
13 pages
Mark Joshi, Nick Denson, Andrew Downes - Quant Job Interview Questions and Answers (2008, CreateSpace) PDF
No ratings yet
Mark Joshi, Nick Denson, Andrew Downes - Quant Job Interview Questions and Answers (2008, CreateSpace) PDF
329 pages
Quant Econ
No ratings yet
Quant Econ
462 pages
Effective Execution: Building High-Performing Organizations
From Everand
Effective Execution: Building High-Performing Organizations
Raghav S Nandyal
No ratings yet
Manual Deh 2250ub
0% (1)
Manual Deh 2250ub
112 pages
Y Hilpisch Derivatives Analytics Excerpt
100% (1)
Y Hilpisch Derivatives Analytics Excerpt
37 pages
WQU MScFE Discrete-Time Stochastic Processes Module 1
100% (1)
WQU MScFE Discrete-Time Stochastic Processes Module 1
52 pages
Financial Transmission Rights Primer 13 Mar 2009
100% (1)
Financial Transmission Rights Primer 13 Mar 2009
27 pages
Foundations of Quantitative Methods
No ratings yet
Foundations of Quantitative Methods
160 pages
Manual To Shreve
100% (1)
Manual To Shreve
64 pages
Deep Quant Finance Brochure
No ratings yet
Deep Quant Finance Brochure
21 pages
Lecture Notes Stochastic Calculus
100% (3)
Lecture Notes Stochastic Calculus
365 pages
FinGPT: Democratizing Internet-Scale Financial Data With LLMs
No ratings yet
FinGPT: Democratizing Internet-Scale Financial Data With LLMs
7 pages
Financial Risk Management Companion
No ratings yet
Financial Risk Management Companion
369 pages
Neural Networks Economics
No ratings yet
Neural Networks Economics
27 pages
PJM ARR and FTR Market
No ratings yet
PJM ARR and FTR Market
159 pages
Pattern Recognition and Machine Learning Errata and Additional Comments
0% (1)
Pattern Recognition and Machine Learning Errata and Additional Comments
7 pages
Econometric Methods With Applications in Business
No ratings yet
Econometric Methods With Applications in Business
9 pages
Four Derivations of The Black-Scholes Formula
No ratings yet
Four Derivations of The Black-Scholes Formula
19 pages
Probability and Statistics
No ratings yet
Probability and Statistics
110 pages
A Comprehensive Guide To Ensemble Learning (With Python Codes)
No ratings yet
A Comprehensive Guide To Ensemble Learning (With Python Codes)
21 pages
SSRN Id3177534 PDF
No ratings yet
SSRN Id3177534 PDF
11 pages
(1961) Tukey - The Future of Data Analysis
No ratings yet
(1961) Tukey - The Future of Data Analysis
68 pages
Conceptual Framework IFRS 2022
No ratings yet
Conceptual Framework IFRS 2022
27 pages
Christian Meyer and Peter Quell - Risk Model Validation (2011, Risk Books) - Libgen - Li
No ratings yet
Christian Meyer and Peter Quell - Risk Model Validation (2011, Risk Books) - Libgen - Li
158 pages
Effective Model Risk Management With MATLAB: White Paper
No ratings yet
Effective Model Risk Management With MATLAB: White Paper
8 pages
XAI Final
No ratings yet
XAI Final
18 pages
Lecture 2 Simple Regression Model
100% (1)
Lecture 2 Simple Regression Model
47 pages
Forecast Time Series With R Language
No ratings yet
Forecast Time Series With R Language
98 pages
Introduction To Probability and Random Signals
100% (9)
Introduction To Probability and Random Signals
139 pages
Imbalanced Data: How To Handle Imbalanced Classification Problems
No ratings yet
Imbalanced Data: How To Handle Imbalanced Classification Problems
17 pages
R Programming Course Notes
No ratings yet
R Programming Course Notes
28 pages
SciPy and StatsModels For Finan - Publishing, Reactive
No ratings yet
SciPy and StatsModels For Finan - Publishing, Reactive
502 pages
EDHEC Publi Performance Measurement For Traditional Investment PDF
No ratings yet
EDHEC Publi Performance Measurement For Traditional Investment PDF
66 pages
Stochastic Calculus: Summary. by Celine Azizieh (Université Libre de Bruxelles)
100% (1)
Stochastic Calculus: Summary. by Celine Azizieh (Université Libre de Bruxelles)
360 pages
BFC5935 - Tutorial 10 Solutions
No ratings yet
BFC5935 - Tutorial 10 Solutions
8 pages
MScFE 610 ECON - Video - Transcript - Lecture1 - M6 - U1
100% (1)
MScFE 610 ECON - Video - Transcript - Lecture1 - M6 - U1
3 pages
Building Credit Scorecard
No ratings yet
Building Credit Scorecard
58 pages
Stata Guide To Accompany Introductory Econometrics For Finance
No ratings yet
Stata Guide To Accompany Introductory Econometrics For Finance
175 pages
CV Kulbir Minhas
No ratings yet
CV Kulbir Minhas
3 pages
FN3026 Introduction
No ratings yet
FN3026 Introduction
10 pages
WQU - DTSP - Module 5 - Compiled - Content PDF
No ratings yet
WQU - DTSP - Module 5 - Compiled - Content PDF
32 pages
M5 - Custom Model Building With SQL in BigQuery ML Slides
No ratings yet
M5 - Custom Model Building With SQL in BigQuery ML Slides
32 pages
Advice
No ratings yet
Advice
16 pages
Computational Finance Using MATLAB
No ratings yet
Computational Finance Using MATLAB
26 pages
Portfolio Insurance: Determination of A Dynamic CPPI Multiple As Function of State Variables.
No ratings yet
Portfolio Insurance: Determination of A Dynamic CPPI Multiple As Function of State Variables.
23 pages
Mastering Probabilistic Graphical Models Using Python - Sample Chapter
No ratings yet
Mastering Probabilistic Graphical Models Using Python - Sample Chapter
36 pages
Model risk Second Edition
From Everand
Model risk Second Edition
Gerardus Blokdyk
No ratings yet
Logistic regression Second Edition
From Everand
Logistic regression Second Edition
Gerardus Blokdyk
No ratings yet
Get That Job on Wall Street: The Skills You Need To Land a Job on Wall Street
From Everand
Get That Job on Wall Street: The Skills You Need To Land a Job on Wall Street
S. Lee Clark
No ratings yet
Principles of Quantitative Development
From Everand
Principles of Quantitative Development
Manoj Thulasidas
No ratings yet
Statistical Analysis with Excel Complete Self-Assessment Guide
From Everand
Statistical Analysis with Excel Complete Self-Assessment Guide
Gerardus Blokdyk
No ratings yet
Coherent Stress Testing: A Bayesian Approach to the Analysis of Financial Stress
From Everand
Coherent Stress Testing: A Bayesian Approach to the Analysis of Financial Stress
Riccardo Rebonato
3/5 (1)
Invest and Earn Quick: Mastering Technical Analysis of the Financial Markets: Winning Strategies of Professional Investment
From Everand
Invest and Earn Quick: Mastering Technical Analysis of the Financial Markets: Winning Strategies of Professional Investment
Warren H. Lau
No ratings yet
Exercises of Advanced Statistics
From Everand
Exercises of Advanced Statistics
Simone Malacrida
No ratings yet
TradeStation EasyLanguage for Algorithmic Trading: Discover real-world institutional applications of Equities, Futures, and Forex markets
From Everand
TradeStation EasyLanguage for Algorithmic Trading: Discover real-world institutional applications of Equities, Futures, and Forex markets
Domenico D'Errico
No ratings yet
Quantitative Strategies for Achieving Alpha: The Standard and Poor's Approach to Testing Your Investment Choices
From Everand
Quantitative Strategies for Achieving Alpha: The Standard and Poor's Approach to Testing Your Investment Choices
Richard Tortoriello
4/5 (1)
Lecture 02 Running EnergyPlus
No ratings yet
Lecture 02 Running EnergyPlus
29 pages
Tarantella PDF
No ratings yet
Tarantella PDF
10 pages
PWP Chapter 5
No ratings yet
PWP Chapter 5
25 pages
CCNA Lab 1
No ratings yet
CCNA Lab 1
19 pages
Java For Selenium
No ratings yet
Java For Selenium
45 pages
Bpo
No ratings yet
Bpo
8 pages
Dell Optiplex 3020-Small Form Factor Owner'S Manual: Regulatory Model: D08S Regulatory Type: D08S001
No ratings yet
Dell Optiplex 3020-Small Form Factor Owner'S Manual: Regulatory Model: D08S Regulatory Type: D08S001
63 pages
Audio Amplifier
No ratings yet
Audio Amplifier
20 pages
CA ERwin Tutorial
No ratings yet
CA ERwin Tutorial
12 pages
Accurate Ignition Detection of Solid Fuel Particles Using Machine Learning
No ratings yet
Accurate Ignition Detection of Solid Fuel Particles Using Machine Learning
9 pages
Flaws in Applying Proof Methodologies To Signature
No ratings yet
Flaws in Applying Proof Methodologies To Signature
19 pages
Zlib 3 PDF
No ratings yet
Zlib 3 PDF
2 pages
Week8 Tree Worksheets
No ratings yet
Week8 Tree Worksheets
6 pages
Kawai CN290 Digital Piano Manual
No ratings yet
Kawai CN290 Digital Piano Manual
24 pages
Lecture # 1
No ratings yet
Lecture # 1
14 pages
Gigabyte RX470 V1.1
No ratings yet
Gigabyte RX470 V1.1
29 pages
Windows Application
No ratings yet
Windows Application
9 pages
B.Tech Open Elective I 3rd Year (VI Semester) PDF
No ratings yet
B.Tech Open Elective I 3rd Year (VI Semester) PDF
16 pages
Diagnostic Systematic Reviews Road Map V3
No ratings yet
Diagnostic Systematic Reviews Road Map V3
2 pages
KCNA The Linux Foundation Exam Practice Questions
No ratings yet
KCNA The Linux Foundation Exam Practice Questions
14 pages
Arpan Koley - Oe-Ec604c - Ca-1
No ratings yet
Arpan Koley - Oe-Ec604c - Ca-1
9 pages
Auto Cad Assignment 3
0% (1)
Auto Cad Assignment 3
3 pages
Radar Product Catalog v2
No ratings yet
Radar Product Catalog v2
4 pages
U1
No ratings yet
U1
2 pages
HP F210 User Manual
No ratings yet
HP F210 User Manual
31 pages
Razberi User Manual (Razberi VMS)
No ratings yet
Razberi User Manual (Razberi VMS)
34 pages
Triggering Circuit
No ratings yet
Triggering Circuit
26 pages

CQF EXAM 3-Answer

Uploaded by

CQF EXAM 3-Answer

Uploaded by

CQF EXAM 3

- Q1 What is the cost function of Logistic Regression? Explain in

- Q2 What are voting classifiers in ensemble learning?

Types of voting classifiers:

Example: if 5 classifiers predict [0,1,1,0,0] then the final prediction is 0 because

Advantage of voting classifier:

Limitation of voting classifier:

- Q3 Follow the 7-steps to model building for your selected ticker

Step 2: Data Cleaning

3. Data Sorting: The data was verified to be sorted by date.

1. O-C: Difference between Open and Close prices.

Step 4: Feature EDA & Scaling

Different scaling methods were compared:

Standard Scaler: Centers data with mean 0 and standard deviation 1.

Minmax Scaler: Scales data to the range [0, 1].

- Works well for most features (like O-C, Momentum, return).

Step 5: Feature Selection

Step 6: Data Splitting

GridSearchCV was used with a range of parameters for C and gamma.

TimeSeriesSplit was employed to maintain temporal integrity during cross-validation.

Best Features and Model

Best Number of Features: 7

Best Model Parameters: Identified as C=1, gamma=1 from GridSearchCV output.

True Positives (TP): 3

True Negatives (TN): 486

False Positives (FP): 2

False Negatives (FN): 219

The model correctly predicted 486 downtrend days (True Negatives).

It only correctly predicted 3 uptrend days (True Positives).

It incorrectly predicted 2 uptrend days as downtrend (False Positives).

You might also like