0% found this document useful (0 votes)

25 views9 pages

Introduction and Basics of Machine Learning

The document provides a comprehensive overview of machine learning, covering its definition, types (supervised, unsupervised, semi-supervised, and reinforcement learning), and applications. It details the machine learning workflow, performance metrics, model evaluation techniques, and various tools and libraries like Scikit-learn, TensorFlow, PyTorch, and Keras. Additionally, it discusses concepts such as overfitting, underfitting, bias-variance tradeoff, feature engineering, and deep learning architectures.

Uploaded by

stargazing854

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views9 pages

Introduction and Basics of Machine Learning

Uploaded by

stargazing854

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

1.

Introduction and Basics of Machine

Learning
What is Machine Learning?
Machine Learning (ML) is a field of artificial intelligence that allows computers to learn
patterns from data and make decisions or predictions without being explicitly programmed for
specific tasks.

Example: Teaching a spam filter to classify emails as spam or not based on previous emails.

Types of Machine Learning

● Supervised Learning: Learns from labeled data (input-output pairs).
○ Example: Predicting house prices from features like size, location.
● Unsupervised Learning: Finds patterns or structure in unlabeled data.
○ Example: Grouping customers into segments based on purchase behavior.
● Semi-supervised Learning: Uses a small amount of labeled data with a large amount of
unlabeled data.
● Reinforcement Learning: Learning by interacting with an environment and receiving
rewards or penalties.
○ Example: Teaching a robot to navigate a maze.

Applications of Machine Learning

● Spam detection
● Image and speech recognition
● Fraud detection
● Recommendation systems (Netflix, Amazon)
● Autonomous vehicles

Machine Learning Workflow

1. Collect Data
2. Preprocess Data (cleaning, feature engineering)
3. Split Data into training and testing sets
4. Choose a Model (e.g., decision tree, neural network)
5. Train Model on training data
6. Evaluate Model on testing data
7. Tune Hyperparameters
8. Deploy and Monitor
Overfitting vs Underfitting
● Overfitting: A model learns the training data too well, including noise, so it performs
poorly on new, unseen data.
● Underfitting: A model is too simple and cannot capture the underlying patterns well,
performing poorly even on the training data.

Bias-Variance Tradeoff
● Bias: The error due to overly simplistic assumptions in the model (often leads to
underfitting).
● Variance: The error due to a model's sensitivity to small fluctuations in the training data
(often leads to overfitting).
● Goal: Find a balance between bias and variance to minimize the total error and achieve
good generalization.

Train/Test Split and Cross-Validation

● Train/Test Split: Dividing the dataset into a training set (e.g., 80%) to train the model
and a testing set (e.g., 20%) to evaluate its performance on unseen data.
● Cross-Validation: A more robust evaluation technique where the data is split into k
folds. The model is trained and tested k times, with each fold serving as the test set
once. The results are then averaged for a more reliable performance estimate.

2. Supervised Learning
What is Supervised Learning?
Supervised learning trains a model on labeled data, where each example has an input and a
known output (target). The goal is to learn a function that maps inputs to inputs.
Regression
● Linear Regression: Predicts a continuous target variable as a weighted sum of input
features.
○ Example: Predict house price based on size, number of bedrooms, etc.
○ Formula: y=w0+w1x1+w2x2+⋯+wnxn
● Polynomial Regression: Extends linear regression by adding polynomial terms to
capture non-linear relationships.
● Logistic Regression: Used for binary classification (output is 0 or 1). Models the
probability that an input belongs to a class using a sigmoid function.
Classification
● k-Nearest Neighbors (k-NN): Classifies based on the majority class of the k closest
training examples in the feature space.
● Support Vector Machines (SVM): Finds the best boundary (hyperplane) that
separates classes with maximum margin.
● Decision Trees: Splits data based on feature thresholds to form a tree where leaves
represent class labels.
● Random Forests: An ensemble of decision trees trained on different subsets of data
and features, voting for the final class.
● Gradient Boosting Machines: Builds models sequentially to correct errors of previous
models (e.g., XGBoost).
● Neural Networks: Layers of interconnected nodes that can model complex
relationships.
Performance Metrics
● Accuracy: Fraction of correct predictions.
● Precision: How many predicted positives are actually positive.
● Recall: How many actual positives are correctly predicted.
● F1-score: Harmonic mean of precision and recall.
● ROC Curve & AUC: Plots true positive rate vs false positive rate; AUC summarizes
performance.
● Confusion Matrix: Table showing true positives, false positives, true negatives, false
negatives.
Hyperparameter Tuning
Hyperparameters (like number of trees, max depth, learning rate) are parameters set before
training.
Techniques:
● Grid Search: Try all combinations exhaustively.
● Randomized Search: Randomly sample combinations to save time.

3. Unsupervised Learning
What is Unsupervised Learning?
It finds patterns or structure in data without labeled outputs.
Clustering
● K-means Clustering: Assigns data points into k clusters by minimizing the distance
between points and cluster centers.
● Hierarchical Clustering: Builds a tree of clusters by either merging (agglomerative) or
splitting (divisive) clusters.
● DBSCAN: Density-based clustering that groups points that are closely packed and
marks points in low-density areas as noise.
Dimensionality Reduction
● Principal Component Analysis (PCA): Projects data into fewer dimensions while
preserving variance.
● t-SNE: Visualizes high-dimensional data by reducing dimensions while preserving local
structure.
● Autoencoders: Neural networks trained to compress and then reconstruct data,
learning efficient representations.
Association Rule Learning
● Apriori Algorithm: Finds frequent itemsets in data to identify association rules (e.g.,
market basket analysis).

4. Reinforcement Learning
Basics of RL
An agent interacts with an environment through actions, receives rewards, and learns a policy
to maximize cumulative rewards.
Markov Decision Processes (MDP)
Framework defining states, actions, transition probabilities, and rewards.
Q-Learning
A value-based method where the agent learns a function Q(s,a) that estimates the expected
return of taking action a in state s.
Deep Reinforcement Learning
Combines deep neural networks with RL (e.g., Deep Q-Networks) to handle high-dimensional
inputs like images.

5. Deep Learning
Neural Networks Basics
Composed of layers of neurons (nodes). Each neuron receives inputs, multiplies by weights,
adds bias, applies an activation function, and passes output to next layer.
Activation Functions
● ReLU (Rectified Linear Unit): Outputs input if positive, else zero.
○ f(x)=max(0,x)
○ Popular for hidden layers.
● Sigmoid: Outputs between 0 and 1, useful for probabilities.
○ f(x)=1+e−x1
● Tanh: Outputs between -1 and 1, zero-centered.
Feedforward Neural Networks
Information flows from input to output layer through hidden layers. Used for
regression/classification on tabular data.
Backpropagation and Gradient Descent
● Backpropagation: Calculates gradients of loss with respect to weights.
● Gradient Descent: Updates weights to minimize loss.
Convolutional Neural Networks (CNNs)
Designed for images. Uses convolutional layers that apply filters to detect edges, shapes,
textures. Followed by pooling layers to reduce spatial size.
Recurrent Neural Networks (RNNs), LSTM, GRU
Designed for sequential data (time series, text). RNNs have loops to maintain a state. LSTM
and GRU are special units that handle long-term dependencies better by controlling
information flow.
Transfer Learning
Using a pretrained model on a new but related task. Saves training time and improves
performance.
Generative Adversarial Networks (GANs)
Two networks: Generator creates fake data, Discriminator tries to distinguish real vs fake.
Trains both networks simultaneously to improve data generation.

6. Feature Engineering & Selection

Feature Scaling
● Normalization: Scale features to [0,1].
● Standardization: Scale features to have zero mean and unit variance.
Handling Missing Data
Techniques: Removing rows, imputing with mean/median/mode, using models to predict
missing values.
Encoding Categorical Variables
● One-Hot Encoding: Convert categories to binary vectors.
● Label Encoding: Assign integer values.
Feature Selection Methods
● Filter Methods: Select features based on statistics (correlation, chi-square).
● Wrapper Methods: Use model performance to select features (recursive feature
elimination).
● Embedded Methods: Feature selection during model training (Lasso regression).
Feature Extraction
Creating new features from raw data (e.g., PCA).

7. Model Evaluation & Validation

Cross-Validation Techniques
● k-fold: Split data into k parts; train on k-1 and validate on 1; repeat.
● Stratified k-fold: Maintains class distribution in folds.
Bias-Variance Decomposition
Helps understand error from bias and variance components.
Learning Curves
Plots of training and validation performance versus data size or epochs. Diagnose
overfitting/underfitting.
Model Interpretability
Understanding how models make decisions.
Explainable AI (SHAP, LIME)
Tools that explain prediction impact of features on individual predictions.
9. Machine Learning Tools & Libraries
Scikit-learn
● Type: Python library for classical machine learning.
● Focus: Easy-to-use, efficient tools for data mining and data analysis.
● Features:
○ Implements popular ML algorithms: linear/logistic regression, SVM, decision trees,
random forests, k-NN, clustering, PCA, etc.
○ Tools for preprocessing, model selection, hyperparameter tuning (GridSearchCV),
evaluation metrics, and pipelines.
● Use Cases:
○ Great for beginners and prototyping classic ML models on tabular data.
○ Used for standard ML workflows that don’t require deep learning.
● Example:
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier(n_estimators=100)
model.fit(X_train, y_train)
predictions = model.predict(X_test)

TensorFlow
● Type: Open-source deep learning framework developed by Google.
● Focus: Building and training large-scale deep learning models.
● Features:
○ Flexible computation graphs for complex model building.
○ Supports CPUs, GPUs, and TPUs.
○ High-level APIs (Keras) built on top for easier model design.
○ TensorBoard for visualization.
● Use Cases:
○ Deep learning projects like image recognition, NLP, reinforcement learning.
○ Production-level deployment with TensorFlow Serving and TensorFlow Lite for
mobile.
● Example:
import tensorflow as tf
model = tf.keras.Sequential([
tf.keras.layers.Dense(128, activation='relu', input_shape=(input_dim,)),
tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit(X_train, y_train, epochs=10)

PyTorch
● Type: Open-source deep learning framework developed by Facebook.
● Focus: Dynamic computation graphs, flexibility, and ease of use for research.
● Features:
○ Dynamic graph allows modification on-the-fly, great for debugging.
○ Strong Python integration, intuitive API.
○ TorchVision for computer vision tasks.
● Use Cases:
○ Research and experimentation in deep learning.
○ Rapid prototyping and complex model architectures.
○ Production-ready with TorchScript and deployment tools.
● Example:
import torch
import torch.nn as nn

class SimpleNN(nn.Module):
def __init__(self):
super(SimpleNN, self).__init__()
self.fc1 = nn.Linear(input_dim, 128)
self.fc2 = nn.Linear(128, 10)

def forward(self, x):
x = torch.relu(self.fc1(x))
x = torch.softmax(self.fc2(x), dim=1)
return x

model = SimpleNN()
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters())
# Training loop would follow here

Keras
● Type: High-level neural network API written in Python.
● Focus: User-friendly API to build and train deep learning models.
● Features:
○ Runs on top of TensorFlow (mostly), Theano, or CNTK backends.
○ Simplifies model building with Sequential and Functional APIs.
○ Easy to use for beginners and prototyping.
● Use Cases:
○ Quick prototyping of neural networks.
○ Ideal for beginners in deep learning.
○ Production-ready since it integrates well with TensorFlow ecosystem.
● Example:
from tensorflow import keras
model = keras.Sequential([
keras.layers.Dense(128, activation='relu', input_shape=(input_dim,)),
keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit(X_train, y_train, epochs=10)

Summary Table
Library Best For Strengths Typical Use Cases
Scikit-learn Classical ML Easy to use, extensive Tabular data,
algorithms algorithms prototyping
TensorFlow Deep Learning Scalable, Large scale DL,
production-ready production
PyTorch Deep Learning Flexible, dynamic Research,
research graphs experimentation
Keras Deep Learning Simple, high-level API Quick prototyping,
beginners/prototyping education

Hannah Arendt-Banality of Evil
50% (2)
Hannah Arendt-Banality of Evil
2 pages
Grade 7 SCIENCE Item-Analysis-for-item-bank
100% (1)
Grade 7 SCIENCE Item-Analysis-for-item-bank
5 pages
ISO 9001 Internal Auditor Training
100% (3)
ISO 9001 Internal Auditor Training
7 pages
Machine Learning Notes "2023
No ratings yet
Machine Learning Notes "2023
31 pages
ML Notes-1
No ratings yet
ML Notes-1
59 pages
Complete Report1 PLC
100% (3)
Complete Report1 PLC
90 pages
Machine Learning.
No ratings yet
Machine Learning.
50 pages
Data Science Notes C
No ratings yet
Data Science Notes C
4 pages
ML Notes
No ratings yet
ML Notes
16 pages
Choose The BEST Answer.: Practice Test 2 - Assessment of Learning Multiple Choice
100% (1)
Choose The BEST Answer.: Practice Test 2 - Assessment of Learning Multiple Choice
6 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
19 pages
Machine Learning
No ratings yet
Machine Learning
256 pages
Sequences
No ratings yet
Sequences
68 pages
WSP ELE ES 002 00 Engineering Specification For Electrical Facilities
No ratings yet
WSP ELE ES 002 00 Engineering Specification For Electrical Facilities
24 pages
Machine Learning
No ratings yet
Machine Learning
31 pages
Soccer Training For Goalkeepers
86% (7)
Soccer Training For Goalkeepers
170 pages
SocrAI Day 1
No ratings yet
SocrAI Day 1
104 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
64 pages
Power Electronics and DC Lectures
No ratings yet
Power Electronics and DC Lectures
159 pages
Machine Learning Is A Branch of Artificial Intelligence (AI)
No ratings yet
Machine Learning Is A Branch of Artificial Intelligence (AI)
80 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
48 pages
AI ML Concepts
No ratings yet
AI ML Concepts
97 pages
Lecture 2
No ratings yet
Lecture 2
36 pages
CSC413 Lecture Note
No ratings yet
CSC413 Lecture Note
32 pages
2022 - Digital Transformation Towards Education 4.0
No ratings yet
2022 - Digital Transformation Towards Education 4.0
28 pages
Machine Learning
No ratings yet
Machine Learning
14 pages
ML 7th Sem AIML ITE Notes Complete LONG (1) - 10-33
No ratings yet
ML 7th Sem AIML ITE Notes Complete LONG (1) - 10-33
24 pages
Rohit Unit 1 ML Notes
No ratings yet
Rohit Unit 1 ML Notes
27 pages
ML Module 1
No ratings yet
ML Module 1
12 pages
1 - Machine Learning Overview
No ratings yet
1 - Machine Learning Overview
56 pages
ML Unit1
No ratings yet
ML Unit1
25 pages
ML Unit 1
No ratings yet
ML Unit 1
21 pages
Lecture 8
No ratings yet
Lecture 8
11 pages
DSF Unit 4
No ratings yet
DSF Unit 4
12 pages
ML (Theory)
No ratings yet
ML (Theory)
11 pages
Writing Ten Core Concepts 2nd Robert P. Yagelski Robert P. Yagelski PDF Download
No ratings yet
Writing Ten Core Concepts 2nd Robert P. Yagelski Robert P. Yagelski PDF Download
25 pages
ML Unit 1
No ratings yet
ML Unit 1
9 pages
Notes Unit 1
No ratings yet
Notes Unit 1
13 pages
Kenny-230718-The Ultimate Machine Learning Cheat Sheet
No ratings yet
Kenny-230718-The Ultimate Machine Learning Cheat Sheet
20 pages
ETHICS Module 1 Lesson 1
No ratings yet
ETHICS Module 1 Lesson 1
16 pages
Module 3
No ratings yet
Module 3
11 pages
Basic of Machine Learning
No ratings yet
Basic of Machine Learning
7 pages
Cambridge IGCSE: Travel & Tourism 0471/21
No ratings yet
Cambridge IGCSE: Travel & Tourism 0471/21
12 pages
Supervised Learning Final With Diagrams Cleaned
No ratings yet
Supervised Learning Final With Diagrams Cleaned
7 pages
Machine Learning
No ratings yet
Machine Learning
12 pages
Machine Learning
No ratings yet
Machine Learning
38 pages
User'S Guide: 2. External Dimensions and Parts 5. Specifications
No ratings yet
User'S Guide: 2. External Dimensions and Parts 5. Specifications
8 pages
PSCS511 - Machine Learning
No ratings yet
PSCS511 - Machine Learning
23 pages
ML 1
No ratings yet
ML 1
17 pages
Assignment No 1
No ratings yet
Assignment No 1
9 pages
ML Video
No ratings yet
ML Video
8 pages
Social Work in A Digital Age - Ethical and Risk Management Challenges
No ratings yet
Social Work in A Digital Age - Ethical and Risk Management Challenges
12 pages
SDL Unit 1
No ratings yet
SDL Unit 1
7 pages
Machine Learning: Principles and Practices
No ratings yet
Machine Learning: Principles and Practices
5 pages
Ass Bigd
No ratings yet
Ass Bigd
9 pages
Fluid Mechanics Lab Report: STUDY OF PRESSURE DISTRIBUTION ON A CYLINDER
No ratings yet
Fluid Mechanics Lab Report: STUDY OF PRESSURE DISTRIBUTION ON A CYLINDER
11 pages
Examen Parcial AMERICA
No ratings yet
Examen Parcial AMERICA
11 pages
Lecture Notes On Machine Learning Concepts
No ratings yet
Lecture Notes On Machine Learning Concepts
5 pages
ML
No ratings yet
ML
18 pages
Solucionario Capitulo 17 Giancoli Septima Edicion
No ratings yet
Solucionario Capitulo 17 Giancoli Septima Edicion
28 pages
Machine Learning Engineer Interview Preparation Guide
No ratings yet
Machine Learning Engineer Interview Preparation Guide
14 pages
Machine Learning
No ratings yet
Machine Learning
6 pages
Machine Learning Concise Notes
No ratings yet
Machine Learning Concise Notes
7 pages
Advanced Topics in Machine Learning: Supervised Learning, Deep Learning, and Optimization Techniques
No ratings yet
Advanced Topics in Machine Learning: Supervised Learning, Deep Learning, and Optimization Techniques
5 pages
Fluid Statics Examples
No ratings yet
Fluid Statics Examples
14 pages
CLAIND Hygen en 2021 Brochure
No ratings yet
CLAIND Hygen en 2021 Brochure
4 pages
Data Science Notes B
No ratings yet
Data Science Notes B
5 pages
Machine Learning
No ratings yet
Machine Learning
3 pages
ML Basics
No ratings yet
ML Basics
3 pages
Machine Learning
No ratings yet
Machine Learning
3 pages
MCQ in Plane Geometry Part 2 ECE Board Exam
No ratings yet
MCQ in Plane Geometry Part 2 ECE Board Exam
10 pages
Unit 4 Learning
No ratings yet
Unit 4 Learning
5 pages
What Are The Basic Concepts in Machine Learning
No ratings yet
What Are The Basic Concepts in Machine Learning
3 pages
AI Module 1 Simple Notes
No ratings yet
AI Module 1 Simple Notes
14 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
6 pages
Communication in Freaky Friday
No ratings yet
Communication in Freaky Friday
4 pages
Nothing To Hide, The Blurring of The Physical and Temporal Line Between Life, Work and Education - Microcities
No ratings yet
Nothing To Hide, The Blurring of The Physical and Temporal Line Between Life, Work and Education - Microcities
7 pages
ML Week 2 Part 2
No ratings yet
ML Week 2 Part 2
6 pages
1.write The Formula For Sigmoid, Hyperbolic Tangen...
No ratings yet
1.write The Formula For Sigmoid, Hyperbolic Tangen...
3 pages
Technical Specifications / Tender Text Wöhr Autoparksysteme GMBH Parklift 462-2,0 D / 462-2,6 D
No ratings yet
Technical Specifications / Tender Text Wöhr Autoparksysteme GMBH Parklift 462-2,0 D / 462-2,6 D
1 page
Problem Solving and Conceptual Understanding
No ratings yet
Problem Solving and Conceptual Understanding
4 pages
Published Answer Marks Write A Detailed Account About The Second Pillar of Islam: Prayer (Salat) - Use The AO1 Marking Grid 10
No ratings yet
Published Answer Marks Write A Detailed Account About The Second Pillar of Islam: Prayer (Salat) - Use The AO1 Marking Grid 10
1 page
MAN Gas Engines
No ratings yet
MAN Gas Engines
18 pages
Tutorial 5
No ratings yet
Tutorial 5
3 pages
Feelings When Your Needs Are Satisfied: Engaged
No ratings yet
Feelings When Your Needs Are Satisfied: Engaged
4 pages
200 One Word Substitution With Examples
No ratings yet
200 One Word Substitution With Examples
14 pages

Introduction and Basics of Machine Learning

Uploaded by

Introduction and Basics of Machine Learning

Uploaded by

1.

Introduction and Basics of Machine

Types of Machine Learning

Applications of Machine Learning

Machine Learning Workflow

Train/Test Split and Cross-Validation

6. Feature Engineering & Selection

7. Model Evaluation & Validation

You might also like