Data Science Notes C
Data Science Notes C
Machine Learning (ML) is a branch of artificial intelligence (AI) focused on building systems
that can learn from and make decisions based on data. Unlike traditional programming, where
explicit instructions are coded, ML models improve their performance by identifying patterns in
data and adjusting their parameters to make predictions or decisions.
1. Supervised Learning
o Definition: Models learn from labeled data (input-output pairs).
o Goal: Predict the output for new, unseen data.
o Algorithms:
Classification (e.g., spam detection, image classification):
Logistic Regression, Decision Trees, KNN, Naive Bayes, SVM,
Neural Networks.
Regression (e.g., predicting house prices):
Linear Regression, Ridge/Lasso Regression, Support Vector
Regression.
o Evaluation Metrics:
Classification: Accuracy, Precision, Recall, F1-Score, ROC-AUC.
Regression: Mean Absolute Error (MAE), Mean Squared Error (MSE),
R².
2. Unsupervised Learning
o Definition: Models learn from data that has no labels. The goal is to identify
hidden patterns or structures.
o Algorithms:
Clustering (grouping similar items):
K-Means, DBSCAN, Hierarchical Clustering.
Dimensionality Reduction (reducing feature space):
Principal Component Analysis (PCA), t-SNE, Autoencoders.
o Evaluation Metrics:
Clustering: Silhouette Score, Davies-Bouldin Index, Adjusted Rand
Index.
3. Reinforcement Learning
o Definition: An agent learns by interacting with an environment and maximizing
cumulative rewards.
o Key Concepts:
Agent: Learner or decision maker.
Environment: The external system with which the agent interacts.
Reward: Feedback received after taking actions.
Policy: Strategy used by the agent to make decisions.
o Algorithms:
Q-Learning, Deep Q-Networks (DQN), Policy Gradient Methods.
Applications: Game-playing AI, Robotics, Autonomous Vehicles.
Hyperparameter Tuning:
Hyperparameters are the parameters set before training (e.g., learning rate, number of
trees in a forest). Common methods:
o Grid Search: Exhaustively tries different combinations of hyperparameters.
o Random Search: Samples random combinations of hyperparameters.
o Bayesian Optimization: Probabilistic model to find the optimal hyperparameters
efficiently.
Model Evaluation:
o Classification: Use metrics like accuracy, precision, recall, F1-score, and
ROC-AUC to evaluate model performance.
o Regression: Use MAE, MSE, RMSE, and R² to measure prediction accuracy.
5. Ensemble Learning
Neural Networks:
o Inspired by the human brain, consisting of layers of neurons.
o Feedforward Neural Networks (FNN): Basic type where input flows through
hidden layers to an output.
o Convolutional Neural Networks (CNN): Used for image processing and
computer vision tasks.
o Recurrent Neural Networks (RNN): Used for sequential data (e.g., time series,
text).
o Deep Learning: Involves deep (multiple-layer) neural networks that can capture
highly complex patterns in data.
Key Frameworks:
o TensorFlow, Keras, PyTorch, MXNet for building deep learning models.
Data Quality:
ML models require clean, relevant, and sufficient data. Data preprocessing, cleaning, and
feature engineering are essential tasks.
Interpretability:
Some models, especially deep learning, are often considered “black boxes,” meaning it is
difficult to explain how they make predictions. Methods like LIME and SHAP help
interpret these models.
Scalability:
Handling large-scale data requires efficient algorithms and infrastructure (e.g., distributed
computing frameworks like Apache Spark).
Bias and Fairness:
Models may inherit biases from training data, leading to unfair outcomes. Addressing
these biases is crucial to building ethical ML systems.
8. Advanced Topics and Trends
AutoML:
Automation of the machine learning pipeline, making it easier for non-experts to build
and deploy models.
Explainable AI (XAI):
Research into making ML models more interpretable and understandable, particularly for
high-stakes decisions like healthcare or criminal justice.
Federated Learning:
A decentralized approach where models are trained on devices (e.g., smartphones)
without transferring sensitive data to the server.
Conclusion
Machine learning is a powerful tool that powers many modern technologies. Understanding the
key concepts, algorithms, and challenges is crucial to applying ML effectively. With continuous
advancements, ML is evolving toward more accessible, transparent, and robust models capable
of solving complex real-world problems.