0% found this document useful (0 votes)
5 views3 pages

Introduction to Machine Learning

This document provides an introduction to Machine Learning (ML), defining it as a subset of Artificial Intelligence that enables systems to learn from data. It covers various types of ML, data types, data quality issues, preprocessing steps, model selection, and the training/testing process. Key concepts such as model parameters, hypotheses, and loss functions are also discussed.

Uploaded by

Hari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views3 pages

Introduction to Machine Learning

This document provides an introduction to Machine Learning (ML), defining it as a subset of Artificial Intelligence that enables systems to learn from data. It covers various types of ML, data types, data quality issues, preprocessing steps, model selection, and the training/testing process. Key concepts such as model parameters, hypotheses, and loss functions are also discussed.

Uploaded by

Hari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

UNIT I: INTRODUCTION TO MACHINE LEARNING

1. Introduction to Machine Learning

 Definition:
Machine Learning (ML) is a subset of Artificial Intelligence (AI) that provides systems
the ability to automatically learn and improve from experience without being explicitly
programmed.
 Objective:
To develop algorithms that can generalize from data and make predictions or decisions.
 Applications:
Email filtering, speech recognition, recommendation systems, medical diagnosis, stock
market prediction, etc.

2. Machine Learning Types

Type Description Example


Learn from labeled data. Predict output for Regression,
Supervised Learning
new input. Classification
Unsupervised Learning Discover hidden patterns in unlabeled data. Clustering, Association
Semi-supervised Mix of labeled and unlabeled data for
Text classification
Learning training.
Reinforcement
Learn through rewards and punishments. Game AI, Robotics
Learning

3. Types of Data

 Structured Data: Tabular format, e.g., CSV files, SQL tables.


 Unstructured Data: Text, images, audio, videos.
 Semi-structured Data: JSON, XML – not strictly tabular but organized.
 Categorical Data: Represents categories (e.g., gender: male/female).
 Numerical Data: Integer or floating-point numbers.

4. Exploring Structure of Data

 Steps:
o Understand dataset shape and size.
o Check for missing values.
o Use summary statistics (mean, median, std).
o Visualize distributions (histograms, box plots).
o Analyze correlation between features.
 Tools: Pandas, NumPy, Matplotlib, Seaborn

5. Data Quality and Remediation

 Common Data Issues:


o Missing values
o Duplicates
o Outliers
o Inconsistent formatting
 Remediation Techniques:
o Imputation (mean, median, mode)
o Removing duplicates
o Normalizing/standardizing data
o Outlier detection and handling (Z-score, IQR)

6. Data Preprocessing

 Purpose: Prepare raw data for ML models.


 Steps:
o Cleaning: Remove noise and inconsistencies.
o Encoding: Convert categorical to numerical (Label/One-hot encoding).
o Normalization: Scale features to a standard range.
o Feature extraction and selection.

7. Model Selection

 Goal: Choose the best algorithm for your problem.


 Factors to Consider:
o Nature of data (linear/nonlinear)
o Training time
o Accuracy
o Interpretability
 Common Algorithms:
o Linear Regression, Decision Tree, KNN, SVM, Random Forest, Neural Networks

8. Training and Testing the Model


 Training Set: Used to fit the model.
 Testing Set: Used to evaluate the model’s performance.
 Validation Set (optional): Used during model tuning.
 Cross-Validation: Split data into multiple parts for training and validation to avoid
overfitting.

9. Model Representation

 Model Parameters: Learnable during training (e.g., weights in linear regression).


 Model Hypothesis: Mathematical function approximating the relationship between input
and output.
 Loss Function: Measures error between predicted and actual output.
 Example:
o Linear Regression:

 y=w1x+w0
o where w1,w0 are parameters

You might also like