Class Notes: The Basics of Machine Learning
Class Notes: The Basics of Machine Learning
Supervised Learning:
o Involves labeled data, where the model learns from input-output pairs.
o Common tasks include classification (e.g., spam detection) and regression
(e.g., predicting house prices).
Unsupervised Learning:
o Uses unlabeled data, allowing the model to find hidden patterns or groupings
without specific output labels.
o Common tasks include clustering (e.g., customer segmentation) and
association (e.g., market basket analysis).
Reinforcement Learning:
o Involves an agent learning to make a sequence of decisions to maximize a
reward.
o Common in robotics and game AI (e.g., autonomous driving).
Training Data:
o Data used to teach the model. It contains examples the model will learn from.
Testing Data:
o Data used to evaluate the model's performance after training.
Features:
o The input variables used to make predictions. For instance, in a housing price
model, features might include square footage, location, and number of
bedrooms.
Labels:
o The outputs associated with the training data, only present in supervised
learning.
Data Collection:
o Gather relevant data to solve the problem.
Data Preprocessing:
o Clean the data, handle missing values, and scale features for better
performance.
Feature Engineering:
o Selecting or creating features that improve model accuracy.
Model Selection:
o Choosing the right algorithm for the task (e.g., linear regression, neural
networks).
Training:
o The process of feeding data to the model and adjusting its parameters.
Evaluation:
o Testing the model’s performance using metrics like accuracy, precision, recall,
and F1 score.
Optimization:
o Fine-tuning model parameters to improve performance.
6. Evaluation Metrics
Accuracy:
o The percentage of correct predictions in the total predictions.
Precision:
o The ratio of true positive predictions to the total positive predictions.
Recall:
o The ratio of true positive predictions to the total actual positives.
F1 Score:
o A balance between precision and recall, especially useful for imbalanced
datasets.
Overfitting:
o Occurs when the model learns the noise in the training data, performing well
on training data but poorly on new data.
o Solutions include using regularization, simplifying the model, or collecting
more data.
Underfitting:
o Occurs when the model is too simple to capture patterns in the data.
o Can be improved by using a more complex model or including additional
features.
Data Quality:
o Good data is critical, and issues like missing or noisy data can impact
performance.
Bias and Fairness:
o Models may inherit biases present in the training data, leading to unfair
predictions.
Computational Costs:
o Training large models, especially deep learning models, requires significant
resources.
Interpretability:
o Complex models (e.g., neural networks) are often hard to interpret, which is
crucial for certain applications like healthcare.
Summary: Machine learning is transforming multiple fields by enabling data-driven
decision-making and automation. Its success depends on high-quality data, careful selection
of algorithms, and a structured approach to model evaluation and improvement.