0% found this document useful (0 votes)
445 views14 pages

CQF EXAM 3-Answer

Uploaded by

robert.yf.lu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
445 views14 pages

CQF EXAM 3-Answer

Uploaded by

robert.yf.lu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

CQF EXAM 3

Yifan Lu

- Q1 What is the cost function of Logistic Regression? Explain in


Detail.
The cost function used in logistic regression measures how well the model's predictions match
the actual outcomes. It adjusts the model parameters to minimize the difference between the
predicted values and the actual values. By minimizing this cost function, one can improve the
logistic regression model and make the predictions more reliable.

Logistic regression is used to predict the probability of an event (between 0 and 1), and the cost
function is a log-loss function. The overall cost function for the training set is the average of
individual log-losses over all N training examples.

To minimize the cost function, one can use the gradient descent method. This is done by
repeatedly adjusting the parameters in the direction that reduces the cost the most, based on
the slope of the cost function, until the lowest point is found.

- Q2 What are voting classifiers in ensemble learning?


Voting classifiers is used in ensemble learning to combine multiple predictions from different
models to make a final prediction. It uses serval different models (these models are called base
estimators) and combines their outputs to try to improve the overall prediction performance.

Types of voting classifiers:

1. Hard Voting
Each model (base estimators) makes a distinct prediction and the class that gets
the most votes is choosen.

Example: if 5 classifiers predict [0,1,1,0,0] then the final prediction is 0 because


it has the majority of votes.

2. Soft Voting
Each base estimator provides a probability for each class and the probs are
averaged and the class with the higher average probability is chosen.

Example: If 4 classifiers predict the probability [0.3, 0.7], [0.2, 0.8], [0.1, 0.9],
[0.5, 0.5] for class 1, then the average probability for class 1 is (0.7+0.8+0.9+0.5)
/ 4 = 0.725, so the final prediction is class 1.

Advantage of voting classifier:

- Easy to implement and understand compared to other ensemble methods like stacking
- Increased accuracy by combining multiple models, this often results in higher accuracy

Limitation of voting classifier:

- If the base estimators are more correlated and make similar errors, this can multiply the
wrongness of the entire combined model

- Q3 Follow the 7-steps to model building for your selected ticker


Step 1: Data Collection

I choose Coco-Cola (KO) to investigate and used the 20+ year daily data to conduct the study. I
downloaded the daily high, low, close, open, and volume data, and made sure the price are all
forward-adjusted so that they are fit for further analysis. The data was sourced from Bloomberg
Terminal and loaded into a Pandas data frame. Initial examination confirmed the correct
structure and types of data columns.

Step 2: Data Cleaning

1. Missing Values: The dataset was checked for missing values, and none were found.
2. Duplicates: No duplicate rows were present in the data.

3. Data Sorting: The data was verified to be sorted by date.

4. Data Types: Relevant columns (Open, Close, High, Low) were converted to float, and
Volume was converted to integer.
Step 3: Feature Engineering

New Features were created to capture various aspects of the stock's behavior:

1. O-C: Difference between Open and Close prices.


2. H-L: Difference between High and Low prices.
3. Sign: Sign of the log return.
4. Return: Log return of the close prices.
5. Momentum: Difference between the current close price and the close price 5 days ago.
6. SMA60: 60-day Simple Moving Average.
7. SMA120: 120-day Simple Moving Average.
8. EMA60: 60-day Exponential Moving Average.
9. EMA120: 120-day Exponential Moving Average.
10. Volume Change: Change in volume from the previous day.
11. Volume Ratio: Ratio of volume to its 20-day moving average.
12. OBV: On-Balance Volume.
13. Volume Volatility: 20-day standard deviation of volume.
Threshold Determination: The 68th percentile of returns was chosen as the threshold for
categorizing near-zero returns. This was to ensure to statically calculates the “small” returns and
treat them as non-positive (negative) daily price movement.

Step 4: Feature EDA & Scaling

Feature EDA

Create a grid of histograms for each feature in the dataset. This is done to visualize the
distribution of each feature. By doing so, one can gain insights into the characteristics of the
data, such as the range of values, the central tendency, and the presence of any outliers or
skewness. This step helps to understand the data and can guide subsequent feature engineering
process.
Feature Scaling

Different scaling methods were compared:

Standard Scaler: Centers data with mean 0 and standard deviation 1.

Minmax Scaler: Scales data to the range [0, 1].

Robust Scaler: Uses median and interquartile range to scale data, less sensitive to outliers.
From the charts, one can see:

Standard Scaler:

- Works well for most features (like O-C, Momentum, return).

Robust Scaler:

- Works well for features with extreme values (like H-L, Volume Change).

Overall

Considering SVM's sensitivity to feature scaling and the distribution of each feature in the charts,
I choose Standard Scaler. It works well for most features and is commonly used in SVM models.

Step 5: Feature Selection

Correlation Matrix: A correlation matrix was used to identify and visualize relationships between
features.
Feature selection using High Correlation Elimination: Features with high correlations (above
0.95) were removed to reduce redundancy:
Result: EMA60, EMA120, OBV, SMA120 were dropped.

Explanation:

- EMA60 and EMA120 are highly similar to SMA60 and SMA120. Keeping them doesn't
add new information and makes the model more complex.

- OBV is also highly similar to other features, so it is removed to simplify the model.

Step 6: Data Splitting

The dataset was split into training and testing sets (80:20 ratio). The training set was further split
into training and validation sets (80:20 ratio).
Step 7: Model Training and Tuning

Feature Importance Calculation: A Random Forest classifier was used to determine the
importance of each feature. This helped in selecting the most significant features for the SVM
model.

Hyperparameter Tuning:

GridSearchCV was used with a range of parameters for C and gamma.

TimeSeriesSplit was employed to maintain temporal integrity during cross-validation.

Best Features and Model

Best Number of Features: 7

Best Selected Features: ['H-L', 'Momentum', 'Volume Volatility', 'O-C', 'Volume Ratio', 'Volume
Change', 'SMA60']

Best Model Parameters: Identified as C=1, gamma=1 from GridSearchCV output.


Model Evaluation
Overall Performance Metrics

Accuracy: 0.6887

Precision: 0.6

Recall: 0.0135

F1 Score: 0.0264

AUC-ROC: 0.4693
Confusion Matrix

True Positives (TP): 3

True Negatives (TN): 486

False Positives (FP): 2

False Negatives (FN): 219

Interpretation:

The model correctly predicted 486 downtrend days (True Negatives).

It only correctly predicted 3 uptrend days (True Positives).

It incorrectly predicted 2 uptrend days as downtrend (False Positives).

It failed to predict 219 uptrend days, incorrectly labeling them as downtrend (False Negatives).

The model has a high number of true negatives but performs poorly in identifying true positives,
which indicates an imbalance in the prediction for the positive class.
Classification Report

The model shows a high precision and recall for class 0 but fails to adequately identify instances
of class 1. This is evident from the very low recall and F1-score for class 1.

ROC Curve

The ROC curve plots the true positive rate (TPR) against the false positive rate (FPR) for different
threshold values.
Interpretation:

The ROC curve is close to the diagonal, indicating that the model performs no better than
random guessing.

The AUC value of 0.47 further confirms the poor performance in distinguishing between uptrend
and downtrend days.

Summary

The model's performance metrics, confusion matrix, and ROC curve were used to evaluate
prediction quality. The accuracy was relatively high, but the recall and F1 scores for predicting
uptrends were low, indicating the model struggles to predict positive moves accurately. The ROC
curve and AUC value suggest the model does not effectively differentiate between uptrends and
downtrends.

You might also like