0% found this document useful (0 votes)
20 views24 pages

ML Unit1.2

Uploaded by

bhargavramkp02
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views24 pages

ML Unit1.2

Uploaded by

bhargavramkp02
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

MACHINE LEARNING

Unit 1: Fundamentals Of
MachineLearning
Introduction to Machine Learning:- What is Machine Learning?
Why Use Machine Learning?, Types of Machine Learning Systems,
Main Challenges ofMachine Learning, Applications of Machine
Learning. Why python, Scikit- learn, Essential Libraries & Tools.

Machine Learning :
Machine learning (ML) is a subdomain of artificial intelligence (AI)
that focuses on developing systems that learn—or improve
performance—based onthe data they ingest.
It focuses on the development of algorithms and statistical models
that enable computers to perform tasks without being explicitly
programmed for each task. Instead of relying on explicit instructions,
machine learning algorithms use patterns and inference to learn from
data and make predictions or decisions.
How does Machine Learning work
A machine learning system builds prediction models, learns from
previous data,and predicts the output of new data whenever it receives
it. The amount of data helps to build a better model that accurately
predicts the output, which in turn affects the accuracy of the predicted
output.

Features of Machine Learning:


• Machine learning uses data to detect various patterns in a given dataset.
• It can learn from past data and improve automatically.
• It is a data-driven technology.
• Machine learning is much similar to data mining as it also
deals with the huge amount of the data.
Classification of Machine Learning
At a broad level, machine learning can be classified into three types:
• Supervised learning
• Unsupervised learning
• Reinforcement learning
• Supervised Learning
In supervised learning, sample labeled data are provided to the
machine learningsystem for training, and the system then predicts the
output based on the training data.
The system uses labeled data to build a model that understands the
datasets andlearns about each one. After the training and processing
are done, we test the model with sample data to see if it can
accurately predict the output.
Supervised learning can be grouped further in two categories of algorithms:
• Classification
• Regression

• Unsupervised Learning
Unsupervised learning is a learning method in which a machine
learns withoutany supervision.
The training is provided to the machine with the set of data that has
not been labeled, classified, or categorized, and the algorithm needs
to act on that data without any supervision. The goal of unsupervised
learning is to restructure the input data into new features or a group of
objects with similar patterns.
Unsupervised learning classifieds into two categories of algorithms:

• Clustering
• Association

• Reinforcement Learning

Reinforcement learning is a feedback-based learning method, in


which a learning agent gets a reward for each right action and gets a
penalty for each wrong action. The agent learns automatically with
these feedbacks and improves its performance. In reinforcement
learning, the agent interacts with the environment and explores it. The
goal of an agent is to get the most reward points, and hence, it
improves its performance.
Applications of Machine learning

• Image Recognition:

Image recognition is one of the most common applications of machine


learning. It is used to identify objects, persons, places, digital images,
etc

For Example, Facebook provides us a feature of auto friend tagging


suggestion. Whenever we upload a photo with our Facebook friends,
then we automatically get a tagging suggestion with name, and the
technology behind this is machine learning's face detection and
recognition algorithm.

• Speech Recognition

While using Google, we get an option of "Search by voice," it comes


under speech recognition, and it's a popular application of machine
learning.

Speech recognition is a process of converting voice instructions into


text, and it is also known as "Speech to text", or "Computer speech
recognition." At present, machine learning algorithms are widely
used by various applications of speech recognition. Google assistant,
Siri, Cortana, and Alexa are using speech recognition technology to
follow the voice instructions.

• Traffic prediction:

If we want to visit a new place, we take help of Google Maps, which


shows usthe correct path with the shortest route and predicts the traffic
conditions.

It predicts the traffic conditions such as whether traffic is cleared,


slow-moving, or heavily congested with the help of two ways:

• Real Time location of the vehicle form Google Map app and sensors
• Average time has taken on past days at the same time.

Everyone who is using Google Map is helping this app to make it


better. It takes information from the user and sends back to its database
to improve the performance.

• Product recommendations:

Machine learning is widely used by various e-commerce and


entertainment companies such as Amazon, Netflix, etc., for product
recommendation to the user. Whenever we search for some product
on Amazon, then we started getting an advertisement for the same
product while internet surfing on the same browser and this is because
of machine learning.

• Self-driving cars:

One of the most exciting applications of machine learning is self-


driving cars. Machine learning plays a significant role in self-driving
cars. Tesla, the most popular car manufacturing company is working
on self-driving car. It is using unsupervised learning method to train
the car models to detect people and objects while driving.
• Email Spam and Malware Filtering:

Whenever we receive a new email, it is filtered automatically as


important, normal, and spam. We always receive an important mail in
our inbox with the important symbol and spam emails in our spam
box, and the technology behind this is Machine learning

• Virtual Personal Assistant:

We have various virtual personal assistants such as


Google assistant, Alexa, Cortana, Siri. As the name suggests, they
help us in finding the information using our voice instruction.
These assistants can help us in

various ways just by our voice instructions such as Play music, call
someone, Open an email, Scheduling an appointment, etc.

• Online Fraud Detection:

Machine learning is making our online transaction safe and secure by


detecting fraud transaction. Whenever we perform some online
transaction, there may be various ways that a fraudulent transaction
can take place such as fake accounts, fake ids, and steal money in
the middle of a transaction. So to detect this, Feed Forward Neural
network helps us by checking whether it is a genuine transaction or a
fraud transaction.

• Stock Market trading:

Machine learning is widely used in stock market trading. In the stock


market, there is always a risk of up and downs in shares, so for this
machine learning's long short term memory neural network is used
for the prediction of stock market trends.

• Medical Diagnosis:
In medical science, machine learning is used for diseases diagnoses.
With this, medical technology is growing very fast and able to build
3D models that can predict the exact position of lesions in the brain.

It helps in finding brain tumors and other brain-related diseases easily.

• Automatic Language Translation:

Nowadays, if we visit a new place and we are not aware of the


language then it is not a problem at all, as for this also machine
learning helps us by converting the text into our known languages.
Google's GNMT (Google Neural Machine Translation) provide this
feature, which is a Neural Machine Learning that translates the text
into our familiar language, and it called as automatic translation.

Machine learning Life cycle

• Gathering Data:

Data Gathering is the first step of the machine learning life cycle. The
goal of this step is to identify and obtain all data-related problems.

In this step, we need to identify the different data sources, as data can
be collected from various sources such as files, database, internet, or
mobile devices. It is one of the most important steps of the life cycle.
The quantity and quality of the collected data will determine the
efficiency of the output. The more will be the data, the more accurate
will be the prediction.

This step includes the below tasks:

• Identify various data sources


• Collect data
• Integrate the data obtained from different sources

Example code

• Data preparation

After collecting the data, we need to prepare it for further steps. Data
preparation is a step where we put our data into a suitable place and
prepare it to use in our machine learning training.

In this step, first, we put all data together, and then randomize the
ordering of data.

This step can be further divided into two processes:

• Data exploration:
It is used to understand the nature of data that we have to work
with. We need to understand the characteristics, format, and
quality of data. A better understanding of data leads to an
effective outcome. In this, we find Correlations, general trends,
and outliers.

• Data pre-processing: Now the next step is preprocessing of


data for itsanalysis.

Example code

• Data Wrangling

Data wrangling is the process of cleaning and converting raw data into
a useable format. It is the process of cleaning the data, selecting the
variable to use, and transforming the data in a proper format to make
it more suitable for analysis in the next step. It is one of the most
important steps of the complete process. Cleaning of data is required
to address the quality issues.

In real-world applications, collected data may have various issues, including:

• Missing Values
• Duplicate data
• Invalid data
• Noise

So, we use various filtering techniques to clean the data.

It is mandatory to detect and remove the above issues because it can


negatively affect the quality of the outcome.

Example code:
• Data Analysis

Now the cleaned and prepared data is passed on to the analysis step.
This step involves:

• Selection of analytical techniques


• Building models
• Review the result

The aim of this step is to build a machine learning model to analyze


the data using various analytical techniques and review the outcome.
It starts with the determination of the type of the problems, where we
select the machine learning

techniques such as Classification, Regression, Cluster analysis,


Association, etc. then build the model using prepared data, and
evaluate the model.

Example code:

• Train Model

Now the next step is to train the model, in this step we train our model
to improve its performance for better outcome of the problem.
We use datasets to train the model using various machine learning
algorithms. Training a model is required so that it can understand the
various patterns, rules, and, features.

Example code:

• Test Model

Once our machine learning model has been trained on a given dataset,
then we test the model. In this step, we check for the accuracy of our
model by providing a test dataset to it.

Testing the model determines the percentage accuracy of the model as


per the requirement of project or problem.

Example code:

• Deployment
The last step of machine learning life cycle is deployment, where we
deploy the model in the real-world system.

Example Code:

Supervised Machine Learning

Types of supervised Machine learning Algorithms:

Supervised learning can be further divided into two types of problems:

• Regression

Regression algorithms are used if there is a relationship between the


input variable and the output variable. It is used for the prediction of
continuous variables, such as Weather forecasting, Market Trends,
etc. Below are some popular Regression algorithms which come under
supervised learning:
• Linear Regression
• Regression Trees
• Non-Linear Regression
• Bayesian Linear Regression
• Polynomial Regression
• Ridge Regression
• Lasso Regression

• Classification

Classification algorithms are used when the output variable is


categorical, which means there are two classes such as Yes-No, Male-
Female, True-false, etc.

Example : Spam Filtering,

Below are some popular Classification algorithms which come under


supervised learning:

• Random Forest
• Decision Trees
• Logistic Regression
• Support vector Machines

Unsupervised Machine

Learning Types of

Unsupervised Learning

Algorithm:

The unsupervised learning algorithm can be further categorized into


two typesof problems:
• Clustering: Clustering is a method of grouping the objects into
clusters such that objects with most similarities remains into a
group and has less or no similarities with the objects of another
group. Cluster analysis finds the commonalities between the
data objects and categorizes them as per the presence and
absence of those commonalities.

• Association: An association rule is an unsupervised learning


method which is used for finding the relationships between
variables in the large database. It determines the set of items
that occurs together in the dataset. Association rule makes
marketing strategy more effective. Such as people

who buy X item (suppose a bread) are also tend to purchase Y


(Butter/Jam) item. A typical example of Association rule is
Market Basket Analysis.

Unsupervised Learning algorithms:

Below is the list of some popular unsupervised learning algorithms:

• K-means clustering
• KNN (k-nearest neighbors)
• Hierarchal clustering
• Anomaly detection
• Neural Networks
• Principle Component Analysis
• Independent Component Analysis
• Apriori algorithm
• Singular value decomposition

Tools For Machine Learning

What is Python?

Python is a general-purpose, dynamically typed, high-level, compiled


and interpreted, garbage-collected, and purely object-oriented
programming language that supports procedural, object-oriented, and
functional programming.
Features of Python:

 Easy to use and Read


 Dynamically Typed
 High-level

 Rich Standard Library

Python Libraries
→ Numpy

• Numpy is the fundamental package for Numerical Computation on


Arrays inPython

• Numerical Computation: Adding, Subtracting, multiplying, and


accessing theelements from the Array.

→ Pandas

• Pandas (Python data analysis) is a must in the Data Science Life Cycle

• Pandas is Used to Perform the


Exploratory Data Analysis (understanding theData)

• EDA: Max, Min, Avg, Sum, etc

→ Matplotlib
• Matplotlib is one of the most popular Python packages used for
Data Visualization

• It is a library for making 2D Plots/ Graphs from data in arrays.

→ Seaborn

• Seaborn is a Python Data Visualization library based on the


Matplotlib library.

• It provides a high-level interface for drawing attractive and


informative statistical graphs.

→ Scikit-learn

• Scikit-learn (Sklearn) is the most useful library for Machine


Learning in Python

• It provides machine learning and statistical modeling including


classification, regression, clustering and dimensionality reduction via
a consistence interfacein Python.

→ TensorFlow

• TensorFlow is an open-source Machine Learning Framework.

• It is used for implementing Machine Learning and Deep Learningapplications.


→ PyTorch

• PyTorch is an open source Machine Learning library for Python

• It is primarily used for applications such as Natural Language Processing.

→ Keras

• Keras is an open-source Deep Learning framework for python.

• Leading organizations like Google, Square, Netflix, Huawei, and


Uber arecurrently using Keras.

→ NLTK

• NLTK is a Natural Language Processing Toolkit


• It includes, Processing,
understanding, and
Generation of the text.

→ BeautifulSoup

• BeautifulSoup is a lightweight, easy-to-learn, and highly


effective way toprogrammatically isolate information on a single
webpage at a time.

Challenges and Issues in Machine Learning

Machine learning (ML) is a powerful technology with many applications, but it


also comes with various challenges and issues. Here are some of the primary
challenges:

1. Data Quality and Quantity:


• Quality Issues: Inconsistent, incomplete, or noisy data can significantly
impact model performance. Cleaning and preprocessing data is time-
consuming and requires expertise.
• Quantity Issues: Many ML algorithms require large datasets to train
effectively. Obtaining sufficient data can be challenging, particularly in
specialized domains

2. Overfitting and Underfitting


• Overfitting: When a model learns the training data too well, including the
noise, it performs poorly on new, unseen data. This is often due to a model
being too complex relative to the amount of training data.
• Underfitting: When a model is too simple, it fails to capture the underlying
patterns in the data, resulting in poor performance both on training and new
data.

3. Computational Complexity
• Resource Intensity: Training complex models, especially deep learning
models, requires significant computational resources. This includes powerful
hardware (e.g., GPUs) and time, which can be expensive.
• Scalability: Handling large-scale data and real-time processing can be
difficult, necessitating distributed computing frameworks and efficient
algorithms.

4. Ethical and Bias Issues


• Bias in Data: If the training data contains biases, the model will learn and
perpetuate these biases, leading to unfair or discriminatory outcomes.
• Ethical Concerns: ML applications can raise ethical issues related to privacy,
surveillance, and the potential misuse of technology. Ensuring ethical
standards and fairness is an ongoing challenge.

5. Data Privacy and Security


• Privacy: Collecting and using personal data raises significant privacy
concerns. Ensuring compliance with regulations like GDPR and CCPA is
crucial.
• Security: ML models are vulnerable to attacks such as adversarial attacks,
where small changes to input data can drastically alter the model's output.

6. Deployment and Maintenance


• Model Deployment: Moving models from development to production
environments can be complex. Issues include integrating with existing
systems, ensuring scalability, and maintaining performance.
• Model Maintenance: ML models can degrade over time as data distributions
change (data drift). Regular monitoring and retraining are required to
maintain performance

7. Expertise and Skill Gap


• Lack of Skilled Professionals: Developing and deploying effective ML
models requires expertise in data science, statistics, and domain-specific
knowledge. There is a high demand for skilled professionals, creating a gap
in the workforce.

8. Cost
• Development and Maintenance Costs: Developing ML solutions can be
expensive due to the need for specialized hardware, software, and skilled
personnel. Ongoing maintenance and updates also add to the cost.
Measures to reduce the above challenges & Issues

1. Data Quality and Quantity


• Data Cleaning and Preprocessing: Use automated tools and frameworks like
Pandas, Dask, or Apache Spark to clean and preprocess data efficiently.
Techniques such as imputation for missing values, normalization, and
standardization are essential.

• Data Augmentation: For image data, techniques like rotation, scaling, and
flipping can generate more training samples. For text data, synonym
replacement and back-translation are useful.

2. Overfitting and Underfitting


• Regularization: Techniques such as L1/L2 regularization, dropout in neural
networks, and pruning decision trees can help prevent overfitting.

• Cross-Validation: Use k-fold cross-validation to ensure the model


generalizes well to unseen data.

3. Computational Complexity
• Dimensionality Reduction: Techniques like Principal Component Analysis
(PCA) or t-SNE can reduce the computational load by decreasing the
number of features.

• Efficient Algorithms: Use more efficient algorithms and hardware


acceleration (e.g., GPUs, TPUs) for training models.

4. Ethical and Bias Issues


• Bias Mitigation: Techniques such as re-sampling the dataset, using fairness-
aware algorithms, and auditing models for bias.

• Ethical Guidelines: Establishing clear ethical guidelines and conducting


regular ethical reviews of ML projects.

5. Data Privacy and Security


• Privacy-Preserving Techniques: Use techniques like differential privacy and
federated learning to protect user data.
• Compliance: Ensuring compliance with data protection regulations like
GDPR and CCPA.

6. Deployment and Maintenance


• Model Monitoring: Continuously monitor model performance in production
to detect and address data drift and model degradation.

7. Expertise and Skill Gap


• Training and Education: Investing in training programs and workshops for
employees to build their ML skills.
• Collaborative Projects: Engaging in collaborative projects with academic
institutions or industry partners to bridge the skill gap.

9. Cost
• Cost-Effective Tools: Leveraging open-source tools and cloud-based ML
services to reduce costs.

You might also like