Assign 3
Assign 3
Code:
from sklearn.model_selection import LeaveOneOut
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
# Sample data
X = [[1], [2], [3], [4]]
y = [0, 1, 1, 0]
# Leave-One-Out Cross-Validation
loo = LeaveOneOut()
Advantages Disadvantages
More computationally efficient than Still computationally demanding for large
LOOCV, as it involves fewer model datasets and complex models.
training iterations.
Provides a less biased and less variable Can be sensitive to the choice of K.
estimate of model performance than
LOOCV.
Code:
from sklearn.model_selection import KFold
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
# Sample data
X = [[1], [2], [3], [4]]
y = [0, 1, 1, 0]
# K-Fold Cross-Validation
kf = KFold(n_splits=2)
Holdout Method
Advantages Disadvantages
The implementation of this method is Can have high variance in performance
simple and it is computationally efficient. estimates, as it depends on the specific
data split.
Useful for large datasets where Might not fully utilize all available data for
computationally expensive methods are training.
impractical.
Code
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
# Sample data
X = [[1], [2], [3], [4]]
y = [0, 1, 1, 0]
# Holdout Method
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5,
random_state=42)
Avoids "data leakage" from future time Might not capture long-term patterns or
points into model training. trends if the validation set is too short.
‘For the given dataset we can’t apply time series split method.’
Shuffle split
Advantages Disadvantages
Introduces randomness, ensuring diverse Can disrupt patterns or relationships in
training and testing sets. data if shuffling is not appropriate.
Code
from sklearn.model_selection import ShuffleSplit
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
# Sample data
X = [[1], [2], [3], [4]]
y = [0, 1, 1, 0]
# Shuffle Split
shuffle_split = ShuffleSplit(n_splits=2, test_size=0.5, random_state=42)