CQF EXAM 3-Answer
CQF EXAM 3-Answer
Yifan Lu
Logistic regression is used to predict the probability of an event (between 0 and 1), and the cost
function is a log-loss function. The overall cost function for the training set is the average of
individual log-losses over all N training examples.
To minimize the cost function, one can use the gradient descent method. This is done by
repeatedly adjusting the parameters in the direction that reduces the cost the most, based on
the slope of the cost function, until the lowest point is found.
1. Hard Voting
Each model (base estimators) makes a distinct prediction and the class that gets
the most votes is choosen.
2. Soft Voting
Each base estimator provides a probability for each class and the probs are
averaged and the class with the higher average probability is chosen.
Example: If 4 classifiers predict the probability [0.3, 0.7], [0.2, 0.8], [0.1, 0.9],
[0.5, 0.5] for class 1, then the average probability for class 1 is (0.7+0.8+0.9+0.5)
/ 4 = 0.725, so the final prediction is class 1.
- Easy to implement and understand compared to other ensemble methods like stacking
- Increased accuracy by combining multiple models, this often results in higher accuracy
- If the base estimators are more correlated and make similar errors, this can multiply the
wrongness of the entire combined model
I choose Coco-Cola (KO) to investigate and used the 20+ year daily data to conduct the study. I
downloaded the daily high, low, close, open, and volume data, and made sure the price are all
forward-adjusted so that they are fit for further analysis. The data was sourced from Bloomberg
Terminal and loaded into a Pandas data frame. Initial examination confirmed the correct
structure and types of data columns.
1. Missing Values: The dataset was checked for missing values, and none were found.
2. Duplicates: No duplicate rows were present in the data.
4. Data Types: Relevant columns (Open, Close, High, Low) were converted to float, and
Volume was converted to integer.
Step 3: Feature Engineering
New Features were created to capture various aspects of the stock's behavior:
Feature EDA
Create a grid of histograms for each feature in the dataset. This is done to visualize the
distribution of each feature. By doing so, one can gain insights into the characteristics of the
data, such as the range of values, the central tendency, and the presence of any outliers or
skewness. This step helps to understand the data and can guide subsequent feature engineering
process.
Feature Scaling
Robust Scaler: Uses median and interquartile range to scale data, less sensitive to outliers.
From the charts, one can see:
Standard Scaler:
Robust Scaler:
- Works well for features with extreme values (like H-L, Volume Change).
Overall
Considering SVM's sensitivity to feature scaling and the distribution of each feature in the charts,
I choose Standard Scaler. It works well for most features and is commonly used in SVM models.
Correlation Matrix: A correlation matrix was used to identify and visualize relationships between
features.
Feature selection using High Correlation Elimination: Features with high correlations (above
0.95) were removed to reduce redundancy:
Result: EMA60, EMA120, OBV, SMA120 were dropped.
Explanation:
- EMA60 and EMA120 are highly similar to SMA60 and SMA120. Keeping them doesn't
add new information and makes the model more complex.
- OBV is also highly similar to other features, so it is removed to simplify the model.
The dataset was split into training and testing sets (80:20 ratio). The training set was further split
into training and validation sets (80:20 ratio).
Step 7: Model Training and Tuning
Feature Importance Calculation: A Random Forest classifier was used to determine the
importance of each feature. This helped in selecting the most significant features for the SVM
model.
Hyperparameter Tuning:
Best Selected Features: ['H-L', 'Momentum', 'Volume Volatility', 'O-C', 'Volume Ratio', 'Volume
Change', 'SMA60']
Accuracy: 0.6887
Precision: 0.6
Recall: 0.0135
F1 Score: 0.0264
AUC-ROC: 0.4693
Confusion Matrix
Interpretation:
It failed to predict 219 uptrend days, incorrectly labeling them as downtrend (False Negatives).
The model has a high number of true negatives but performs poorly in identifying true positives,
which indicates an imbalance in the prediction for the positive class.
Classification Report
The model shows a high precision and recall for class 0 but fails to adequately identify instances
of class 1. This is evident from the very low recall and F1-score for class 1.
ROC Curve
The ROC curve plots the true positive rate (TPR) against the false positive rate (FPR) for different
threshold values.
Interpretation:
The ROC curve is close to the diagonal, indicating that the model performs no better than
random guessing.
The AUC value of 0.47 further confirms the poor performance in distinguishing between uptrend
and downtrend days.
Summary
The model's performance metrics, confusion matrix, and ROC curve were used to evaluate
prediction quality. The accuracy was relatively high, but the recall and F1 scores for predicting
uptrends were low, indicating the model struggles to predict positive moves accurately. The ROC
curve and AUC value suggest the model does not effectively differentiate between uptrends and
downtrends.