0% found this document useful (0 votes)
28 views9 pages

Unit

btech notes mc

Uploaded by

abhilang836
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views9 pages

Unit

btech notes mc

Uploaded by

abhilang836
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

UNIT-I

Learning systems:-

A machine learning system is a computer system that is responsible for managing the data and the
programs that train and operate the machine learning models that power an AI-enabled application or
service.

Four types of Machine Learning Systems:-

1. real-time interactive applications that take user input and use a model to make a prediction;
2. batch applications that use models to make predictions on a schedule;
3. stream processing applications that use models to make predictions on streaming data;
4. embedded/edge applications that use models and sensors in resource constrained environments.

Machine learningsystems are both trained and operated using cleaned and processed data (called features),
created by a program called a feature pipeline. The feature pipeline writes its output feature data to a feature
store that feeds data to both the training pipeline (that trains the model) and the inference pipeline. The
inference pipeline makes predictions on new data that comes from the feature pipeline. Real-time, interactive
ML systems also take new data as input from the user. Feature pipelines and inference pipelines are operational
services - part of the operational ML system. In contrast, a ML system also has an offline component - model
training. The training of models is typically not an operational part of a ML system. Training pipelines can be
run on separate systems using separate resources (e.g., GPUs). Models are sometimes retrained on a schedule
(e.g., once day/week/etc), but are often retrained when a new improved model becomes available, e.g., because
new training data is available or the existing model’s performance has degraded and the model needed to be
retrained on more recent data.

Goals:-

The goals of machine learning (ML) encompass a wide range of objectives aimed at leveraging data to improve
processes, make predictions, and enhance decision-making. Here are some key goals:

1. Automation:
o Goal: Reduce human intervention by automating tasks and processes.
o Example: Automating customer service with chatbots.
2. Prediction:
o Goal: Predict future events or trends based on historical data.
o Example: Forecasting stock prices or weather conditions.
3. Classification:
o Goal: Categorize data into predefined classes.
o Example: Spam email detection.
4. Clustering:
o Goal: Group similar data points together without predefined labels.
o Example: Market segmentation in customer analysis.
5. Anomaly Detection:
o Goal: Identify outliers or unusual patterns in data.
o Example: Detecting fraudulent transactions.
6. Recommendation:
o Goal: Suggest products, services, or content to users based on their preferences.
o Example: Movie recommendations on streaming platforms.
7. Optimization:
o Goal: Improve efficiency and performance of systems and processes.
o Example: Optimizing supply chain logistics.
8. Personalization:
o Goal: Tailor experiences to individual users based on their behavior and preferences.
o Example: Customized news feeds on social media.
9. Understanding and Insight Extraction:
o Goal: Gain insights and understand patterns in large datasets.
o Example: Analyzing customer feedback to improve products.
10. Decision Support:
o Goal: Aid human decision-making by providing data-driven insights.
o Example: Clinical decision support systems in healthcare.
11. Natural Language Processing (NLP):
o Goal: Enable machines to understand and interpret human language.
o Example: Sentiment analysis of social media posts.
12. Computer Vision:
o Goal: Enable machines to interpret and understand visual information.
o Example: Autonomous vehicle navigation using image recognition.

Applications of machine learning:-

Image Recognition:

 Definition: The ability of a computer to identify and classify objects, people, or other items in images.

 Applications: Face recognition, disease diagnosis in healthcare, and employee attendance tracking.

Speech Recognition:

 Definition: The process of converting spoken language into text by a computer.

 Applications: Voice assistants like Alexa and Siri, and voice-activated searches on Google.

Recommender Systems:

 Definition: Algorithms that suggest products, services, or information to users based on analysis of their
preferences and behaviors.

 Applications: Content recommendations on platforms like YouTube and Netflix.

Fraud Detection:

 Definition: The use of algorithms to identify suspicious activities and potential fraudulent transactions.

 Applications: Monitoring user activities for red flags in financial transactions and alerting
administrators.

Self-Driving Cars:

 Definition: Vehicles that use machine learning and deep learning to navigate and drive without human
intervention.
 Applications: Autonomous driving systems in cars like Tesla.

Medical Diagnosis:

 Definition: The use of machine learning models to detect and diagnose diseases with high accuracy.

 Applications: Projects for detecting diseases such as breast cancer, Parkinson’s disease, and pneumonia.

Stock Market Trading:

 Definition: The application of machine learning to predict future stock prices and market trends.

 Applications: Time series forecasting for stock prices to make investment decisions.

Virtual Try On:

 Definition: Technology that allows users to try on products virtually using computer vision and facial
recognition.

 Applications: Virtual fitting of spectacles on platforms like Lenskart.

Aspects of developing a learning system:-

Developing a learning system involves several key aspects that contribute to the effectiveness and accuracy of
the machine learning model. Here are the main aspects:

1. Training Data:
o Definition: The dataset used to train the machine learning model, consisting of input-output pairs
that the model learns from.
o Importance: High-quality, representative training data is crucial for the model to generalize well
to new, unseen data.
o Considerations:
 Quantity and Quality: Adequate volume and diverse data points to cover different
scenarios.
 Labeling: Accurate and consistent labeling for supervised learning tasks.
 Preprocessing: Cleaning and transforming raw data into a suitable format for training.
2. Concept Representation:
o Definition: The way in which the data and the underlying concept to be learned are represented
within the model.
o Importance: Proper representation can simplify the learning process and improve model
performance.
o Considerations:
 Features: Selecting relevant features (attributes) that capture the essential aspects of the
data.
 Encoding: Transforming categorical data into numerical formats suitable for the model
(e.g., one-hot encoding, embeddings).
 Dimensionality: Balancing the complexity of the representation to avoid overfitting or
underfitting (e.g., feature selection, dimensionality reduction techniques like PCA).
3. Function Approximation:
o Definition: The process of finding a function that maps inputs to outputs based on the training
data.
o Importance: Determines the model’s ability to make accurate predictions or classifications.
o Considerations:
 Model Selection: Choosing the appropriate algorithm or model architecture (e.g., linear
regression, neural networks, decision trees).
 Training: Adjusting model parameters using optimization techniques (e.g., gradient
descent) to minimize error.
 Evaluation: Assessing model performance using metrics like accuracy, precision, recall,
and F1-score.
 Regularization: Techniques to prevent overfitting, such as L1/L2 regularization,
dropout, and cross-validation.

In summary, developing a learning system involves carefully curating and preprocessing training data,
effectively representing the concepts within the data, and accurately approximating the function that maps
inputs to outputs. Each of these aspects plays a critical role in the success of a machine learning model.

Types of Learning:-

Machine learning can be broadly categorized into two main types: supervised learning and unsupervised
learning. Each type serves different purposes and is used in different scenarios.

1. Supervised Learning

Definition: Supervised learning is a type of machine learning where the model is trained on labeled data. This
means that each training example is paired with an output label.

Characteristics:

 Labeled Data: The training dataset includes both input data and the corresponding correct output.
 Objective: The goal is to learn a mapping from inputs to outputs, which can then be used to predict the
outputs for new, unseen inputs.

Types of Problems:

 Classification: Predicting a discrete label for an input. Examples include:


o Email spam detection (spam or not spam).
o Image classification (e.g., recognizing digits or animals in pictures).
 Regression: Predicting a continuous value for an input. Examples include:
o House price prediction based on features like location, size, etc.
o Predicting temperature based on historical weather data.

Examples of Algorithms:

 Linear Regression
 Logistic Regression
 Support Vector Machines (SVM)
 Decision Trees
 Random Forests
 Neural Networks

Applications:

 Speech recognition (converting spoken words to text).


 Medical diagnosis (predicting the presence of a disease).
 Fraud detection (identifying fraudulent transactions).

2. Unsupervised Learning

Definition: Unsupervised learning is a type of machine learning where the model is trained on unlabeled data.
The model tries to learn the underlying structure or distribution in the data without specific output labels.

Characteristics:

 Unlabeled Data: The training dataset includes only input data without corresponding output labels.
 Objective: The goal is to infer the natural structure present within a set of data points.

Types of Problems:

 Clustering: Grouping similar data points together. Examples include:


o Customer segmentation for targeted marketing.
o Grouping similar documents or articles.
 Dimensionality Reduction: Reducing the number of random variables under consideration. Examples
include:
o Principal Component Analysis (PCA) for reducing data dimensions while retaining variance.
o t-Distributed Stochastic Neighbor Embedding (t-SNE) for visualization of high-dimensional
data.
 Anomaly Detection: Identifying rare items, events, or observations which raise suspicions by differing
significantly from the majority of the data. Examples include:
o Detecting unusual network traffic.
o Identifying fraudulent activities.

Examples of Algorithms:
 K-Means Clustering
 Hierarchical Clustering
 DBSCAN (Density-Based Spatial Clustering of Applications with Noise)
 Principal Component Analysis (PCA)
 t-Distributed Stochastic Neighbor Embedding (t-SNE)

Applications:

 Market basket analysis (identifying items frequently bought together).


 Social network analysis (detecting communities within a network).
 Anomaly detection in various fields (finance, cybersecurity, etc.).

Overview of classification:-

Classification is a supervised learning task where the goal is to assign a label to an input based on its features.
Here's an overview of the classification process, including the setup, training, testing, validation, and handling
overfitting.

Setup

1. Problem Definition:
o Clearly define the classification problem, specifying the input features and the possible output
classes.
o Example: Email classification into spam or not spam.
2. Data Collection:
o Gather a dataset containing input features and their corresponding labels.
o Example: Collect emails labeled as "spam" or "not spam."
3. Data Preprocessing:
o
Clean the data by handling missing values, outliers, and noise.
o
Normalize or standardize features if necessary.
o
Encode categorical variables into numerical formats.
4. Dataset Split:
o Split the dataset into three parts: training set, validation set, and test set.
 Training Set: Used to train the model.
 Validation Set: Used to tune model parameters and make decisions about model
selection.
 Test Set: Used to evaluate the final model's performance.

Training

1. Model Selection:
o Choose an appropriate classification algorithm based on the problem and data characteristics.
o Examples: Logistic Regression, Decision Trees, Support Vector Machines, Neural Networks.
2. Model Training:
o Use the training set to train the selected model.
o The model learns the mapping from input features to output labels by minimizing a loss function.
3. Hyperparameter Tuning:
o Adjust hyperparameters using the validation set to improve model performance.
o Techniques include grid search, random search, or more advanced methods like Bayesian
optimization.

Validation and Testing

1. Validation:
o Evaluate the model on the validation set to tune hyperparameters and prevent overfitting.
o Common metrics: Accuracy, Precision, Recall, F1-Score, ROC-AUC.
2. Cross-Validation:
o Use k-fold cross-validation to ensure that the model performs well on different subsets of the
data.
o This helps in assessing the model's generalizability.
3. Testing:
o Once the model is tuned, evaluate its performance on the test set.
o This gives an unbiased estimate of how the model will perform on new, unseen data.

Overfitting

1. Definition:
o Overfitting occurs when the model learns not only the underlying patterns but also the noise in
the training data.
o An overfitted model performs well on the training data but poorly on the validation and test data.
2. Detection:
o Large discrepancies between training and validation/test performance indicate overfitting.
o Monitoring metrics such as validation loss or error can help identify overfitting.
3. Prevention:
o Regularization: Techniques like L1/L2 regularization add a penalty to the loss function to
discourage complex models.
o Pruning: For decision trees, prune the tree to remove unnecessary branches.
o Dropout: In neural networks, randomly drop neurons during training to prevent reliance on
specific paths.
o Early Stopping: Stop training when the performance on the validation set starts to degrade.
o Data Augmentation: Increase the diversity of the training data by adding variations of the
existing data.
o Cross-Validation: Ensure that the model generalizes well to different subsets of data.

Summary

Classification involves defining the problem, collecting and preprocessing data, splitting the dataset, selecting
and training a model, validating and testing the model, and addressing overfitting. By carefully managing each
step, you can develop a robust classification model that generalizes well to new data.

1. Linear Discriminative Models

Description: Linear discriminative models aim to find a linear decision boundary that separates different
classes in the feature space.

Examples:

 Logistic Regression: It models the probability of the binary outcome using a logistic function.
 Linear Support Vector Machines (SVM): Maximizes the margin between different classes using a
linear hyperplane.

2. Non-linear Discriminative Models

Description: Non-linear discriminative models can capture more complex decision boundaries that are not
linear.

Examples:

 Kernel SVM: Utilizes kernel functions to map the input data into a higher-dimensional space where a
linear decision boundary can be applied.
 Non-linear Support Vector Machines: Utilizes non-linear kernels such as polynomial or Gaussian
radial basis function (RBF) kernels.

3. Decision Trees

Description: Decision trees recursively split the feature space into regions based on the feature values to make
classification decisions.

Examples:

 CART (Classification and Regression Trees): Constructs binary trees where each internal node
represents a decision based on a feature value, and each leaf node represents a class label.
 Random Forest: Ensemble of decision trees where each tree is trained on a random subset of the
training data and features, and final classification is determined by a majority vote.
4. Probabilistic Models

Description: Probabilistic models estimate the probability distribution of each class and make predictions based
on these probabilities.

Examples:

 Naive Bayes: Assumes that the features are conditionally independent given the class and calculates the
posterior probability of each class using Bayes' theorem.
 Gaussian Discriminant Analysis (GDA): Models the probability distributions of each class using
Gaussian distributions and calculates the posterior probability of each class using Bayes' theorem.

5. Nearest Neighbor Methods

Description: Nearest neighbor methods classify data points based on the majority class of their nearest
neighbors in the feature space.

Examples:

 k-Nearest Neighbors (k-NN): Classifies a data point by a majority vote of its k nearest neighbors in the
feature space.
 Radius Neighbors Classifier: Classifies a data point based on the class of all neighbors within a fixed
radius.

Each family of classification algorithms has its strengths and weaknesses, and the choice of algorithm depends
on factors such as the nature of the data, the complexity of the decision boundary, and computational
considerations.

Probabilistic modeling, whether conditional or generative, is a framework used to represent uncertainty and
make predictions or inferences based on probability theory.

Conditional Probabilistic Models: In conditional probabilistic models, the goal is to model the conditional
probability distribution P(Y∣X)P(Y|X)P(Y∣X), where XXX represents input data and YYY represents the output
or target variable. These models are often used for tasks like classification, regression, and sequence modeling.
Examples include logistic regression, conditional random fields, and neural networks trained with supervised
learning.

Generative Probabilistic Models: Generative probabilistic models aim to model the joint probability
distribution P(X,Y)P(X, Y)P(X,Y) of both the input data XXX and the target variable YYY. Unlike conditional
models, which focus on predicting YYY given XXX, generative models can generate new samples from the
learned distribution. Examples include Gaussian Mixture Models (GMMs), Hidden Markov Models (HMMs),
and Variational Autoencoders (VAEs).

Both types of models have their applications. Conditional models are often used for tasks where the goal is
prediction based on input data, while generative models are used in tasks such as data generation, unsupervised
learning, and semi-supervised learning, where learning the underlying structure of the data is important.

Probabilistic models play a crucial role in various fields such as machine learning, statistics, artificial
intelligence, and more, providing a principled way to reason under uncertainty and make informed decisions.

You might also like