0% found this document useful (0 votes)
7 views22 pages

Unit 1-2

U 12

Uploaded by

flash0483
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views22 pages

Unit 1-2

U 12

Uploaded by

flash0483
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Unit 1

Introduction to Machine Learning

Mrs. Vrishali Prabhu


AI vs ML vs DL

Mrs. Vrishali Prabhu


Introduction to ML
• Machine Learning is the science (and art) of programming computers so they can learn from data.
• Definition 1: Machine Learning is the field of study that gives computers the ability to learn
without being explicitly programmed. —Arthur Samuel, 1959
• Definition 2: A computer program is said to learn from experience E with respect to some task T
and some performance measure P, if its performance on T, as measured by P, improves with
experience E. —Tom Mitchell, 1997
• Spam filter is a machine learning program that learns to identify spam by looking at examples of
spam emails (marked by users) and regular emails. These examples make up the training set, and
each email in this set is a training instance.
• Task (T): Flag new emails as spam or not spam.
• Experience (E): The training data, which includes examples of spam and regular emails.
• Performance Measure (P): How well the filter correctly classifies emails. A common measure is
accuracy, which is the ratio of correctly classified emails to the total number of emails.
• In simpler terms, the spam filter gets better at recognizing spam by learning from a bunch of
emails already labeled as spam or not spam, and its success is measured by how accurately it can
classify new emails.

Mrs. Vrishali Prabhu


Why Use Machine Learning?
• 1:The traditional approach –
• i) First you would look at what spam typically looks like. You might notice that some words or phrases (such as “4U,” “credit
card,” “free,” and “amazing”) tend to come up a lot in the subject. Perhaps you would also notice a few other patterns in the
sender’s name, the email’s body, and so on.
• ii) You would write a detection algorithm for each of the patterns that you noticed, and your program would flag emails as
spam if a number of these patterns are detected.
• iii) You would test your program, and repeat steps 1 and 2 until it is good enough.
• This approach leads to a complex set of rules that can be hard to maintain.

Mrs. Vrishali Prabhu


2:Machine Learning approach -
• A machine learning-based spam filter automatically learns which words and
phrases are likely spam by analyzing common patterns in spam emails.
• This makes the program shorter, easier to maintain, and often more accurate.
Unlike traditional methods, it adapts when spammers change tactics, such as
switching "4U" to "For U," without needing constant manual updates.

Mrs. Vrishali Prabhu


• Machine learning is especially useful for:
• Problems for which existing solutions require a lot of hand-tuning or long lists of
rules: one Machine Learning algorithm can often simplify code and perform better.
• Complex problems for which there is no good solution at all using a traditional
approach: the best Machine Learning techniques can find a solution.
• Fluctuating environments: a Machine Learning system can adapt to new data.
• Getting insights about complex problems and large amounts of data.

Mrs. Vrishali Prabhu


Types of ML-
• Supervised/Unsupervised Learning -
• Supervised learning –
• In supervised learning, the training data you feed to the algorithm includes the
desired solutions, called labels.
• A typical supervised learning task is classification.
• Another typical task is to predict a target numeric value, such as the price of a car,
given a set of features (mileage, age, brand, etc.) called predictors. This sort of
task is called regression.

Mrs. Vrishali Prabhu


Types of ML
• Unsupervised learning-
•Definition: Unsupervised learning is a type of machine learning where the algorithm is trained on
unlabeled data without explicit instructions on what to learn.
•Goal: The main objective is to discover hidden patterns, groupings, or structures in the data.
•Common Techniques: Popular methods include clustering (e.g., K-means) and dimensionality
reduction (e.g., PCA).
•Applications: It is used in market segmentation, anomaly detection, and exploratory data analysis.
•Advantages: Unsupervised learning can work with large amounts of unlabeled data and often
uncovers insights that are not immediately apparent.

Mrs. Vrishali Prabhu


Types of ML
• Semisupervised learning -
• Some algorithms can deal with partially labeled training data, usually a lot of unlabeled data and
a little bit of labeled data. This is called semisupervised learning.
• Reinforcement Learning -
• The learning system, called an agent in this context, can observe the environment, select and
perform actions, and get rewards in return (or penalties in the form of negative rewards as well).
It must then learn by itself what is the best strategy, called a policy, to get the most reward over
time.

Mrs. Vrishali Prabhu


Types of ML
• Online vs Offline Machine Learning-

• Online Machine Learning-


• In online learning, you train the system incrementally by feeding it data instances sequentially, either individually or
by small groups called mini-batches.
• Each learning step is fast and cheap, so the system can learn about new data on the fly, as it arrives Online learning
is great for systems that receive data as a continuous flow (e.g., stock prices) and need to adapt to change rapidly or
autonomously.
• It is also a good option if you have limited computing resources: once an online learning system has learned about
new data instances, it does not need them anymore, so you can discard them Online learning algorithms can also be
used to train systems on huge datasets that cannot fit in one machine’s main memory

Mrs. Vrishali Prabhu


Types of ML -
• Offline Machine Learning –
• In batch learning, the system is incapable of learning incrementally.
• It must be trained using all the available data.
• This will generally take a lot of time and computing resources, so it is typically done offline.
• First the system is trained, and then it is launched into production and runs without learning anymore.it
just applies what it has learned. This is called offline learning.
• For a batch learning system to know about new data (such as a new type of spam), we need to train a
new version of the system from scratch on the full dataset (not just the new data, but also the old data),
then we need to stop the old system and replace it with the new one.

• Instance-based / Model-based learning -


• Instance-based learning -
• Instance-based learning involves memorizing specific examples and using them as references for future
decision-making, such as flagging emails identical to known spam emails.
• Model-based learning -
• Another way to generalize from a set of examples is to build a model of these examples, then use that model
to make predictions. This is called model-based learning.
Mrs. Vrishali Prabhu
Mrs. Vrishali Prabhu
Challenges in ML -
1) Insufficient Quantity of Training Data-
• It takes a lot of data for most Machine Learning algorithms to work properly.
• Even for very simple problems you typically need thousands of examples, and for
complex problems such as image or speech recognition you may need millions of
examples (unless you can reuse parts of an existing model).
• 2)Non representative Training Data-
• To make accurate predictions, it's important that the examples you use to train your model reflect the variety of situations it
will encounter in the real world.
• If the training examples don't cover this range well, the model might not perform well when faced with new cases.

• 3) Poor-Quality Data –
• If the training data has full of errors, outliers, and noise (e.g., due to poor quality measurements), it will make it
harder for the system to detect the underlying patterns, so it won’t perform well.
• It is often well worth the effort to spend time cleaning up your training data. The most data scientists spend a
significant part of their time doing just that.

Mrs. Vrishali Prabhu


Challenges in ML –
• 4) Irrelevant Features –
• Our system will only be capable of learning if the training data contains enough relevant features and not too many
irrelevant ones.
• A critical part of the success of a Machine Learning project is coming up with a good set of features to train on.
• This process, called feature engineering, involves
• i)Feature selection: selecting the most useful features to train on among existing features. •
• ii)Feature extraction: combining existing features to produce a more useful one (as we saw earlier, dimensionality
reduction algorithms can help).
• iii)Creating new features by gathering new data.
• 5) Overfitting the Training Data-
• When a model learns the training data too well, including its noise and outliers, it performs excellently on the
training data but poorly on new, unseen data because it hasn't generalized well.
• 6) Underfitting the Training Data-
• When a model is too simple to capture the underlying patterns in the data, it performs poorly on both the training
data and new data because it hasn't learned enough from the training data.
• 7) Stepping Back-
• In Machine Learning when we went through so many concepts that you may be feeling a little lost, so let’s step
back and look at the big picture of development lifecycle.

Mrs. Vrishali Prabhu


Application of ML-

•Healthcare: ML algorithms are used for disease diagnosis, medical imaging analysis, drug
discovery, personalized treatment plans, and predicting patient outcomes.
•Finance: In finance, ML is applied for fraud detection, algorithmic trading, credit scoring, risk
management, and personalized financial advice.
•Retail: ML helps in demand forecasting, recommendation systems, inventory management,
customer segmentation, and personalized marketing.
•Transportation: Self-driving cars, route optimization, traffic prediction, and predictive maintenance
of vehicles all leverage ML algorithms.
•Entertainment: ML is used for content recommendation on platforms like Netflix and Spotify,
automated content moderation, and audience sentiment analysis.
•Manufacturing: Predictive maintenance, quality control, supply chain optimization, and defect
detection are key applications in manufacturing.

Mrs. Vrishali Prabhu


Application of ML

•Customer Service: Catboats, virtual assistants, sentiment analysis, and customer feedback analysis improve
customer support and experience.
•Agriculture: Precision farming, crop yield prediction, pest detection, and soil health monitoring benefit from
ML.
•Marketing: ML is used for customer segmentation, predictive analytics, targeted advertising, and customer
lifetime value prediction.
•Cybersecurity: Threat detection, anomaly detection, spam filtering, and intrusion detection systems
enhance cybersecurity measures.
•Energy: ML algorithms are applied for demand forecasting, grid management, predictive maintenance of
infrastructure, and optimizing energy consumption.
•Education: Personalized learning, automated grading, plagiarism detection, and student performance
prediction are some educational applications.
•Human Resources: Resume screening, employee performance analysis, talent management, and predicting
employee turnover benefit from ML.
•Environmental Science: Climate modeling, natural disaster prediction, species identification, and
environmental monitoring leverage ML algorithms.

Mrs. Vrishali Prabhu


Data Preprocessing

•Data Collection:
•Gather data from various sources such as databases, APIs, web scraping, or files.
•Ensure the collected data is relevant and sufficient for the problem at hand.
•Data Cleaning:
•Handling Missing Values: Replace or impute missing values using techniques like mean/mode/median imputation,
forward/backward fill, or by using more sophisticated methods.
•Removing Duplicates: Identify and remove duplicate records to avoid redundancy.
•Handling Outliers: Detect and either remove or adjust outliers that can skew the results.
•Fixing Errors: Correct any inaccuracies or inconsistencies in the data.
•Data Transformation:
•Normalization/Standardization: Scale the data to a common range (e.g., 0-1) or to have a mean of 0 and standard
deviation of 1. This is especially important for algorithms sensitive to the scale of data.
•Encoding Categorical Variables: Convert categorical data into numerical format using techniques like one-hot
encoding, label encoding, or ordinal encoding.
•Feature Engineering: Create new features or modify existing ones to better represent the underlying patterns in the
data. This may involve combining features, creating interaction terms, or decomposing features into more meaningful
components.

Mrs. Vrishali Prabhu


Data Preprocessing

•Data Reduction:
•Dimensionality Reduction: Reduce the number of features using techniques like Principal
Component Analysis (PCA), Linear Discriminant Analysis (LDA), or feature selection methods to
simplify the model and reduce computation time.
•Sampling: Reduce the size of the dataset by sampling, which can make the training process faster
and more efficient without losing significant information.
•Data Splitting:
•Split the dataset into training, validation, and test sets to evaluate the model's performance and
ensure it generalizes well to unseen data.
•Data Integration:
•Combine data from multiple sources or tables to create a comprehensive dataset for analysis.
•Data Annotation (if applicable):
•Label the data, especially in supervised learning tasks, where each instance needs a corresponding
label or target value.

Mrs. Vrishali Prabhu


Introduction to Datasets
• Discuss different datasets like diabetics etc

Mrs. Vrishali Prabhu


ML development Life cycle
• The machine learning life cycle involves seven major steps, which are
given below:
• 1. Gathering Data
• 2. Data preparation
• 3. Data Wrangling
• 4. Analyze Data
• 5. Train the model
• 6. Test the model
• 7. Deployment

Mrs. Vrishali Prabhu


Mrs. Vrishali Prabhu
Assignment 1

Mrs. Vrishali Prabhu

You might also like