0% found this document useful (0 votes)
20 views80 pages

Unit 3

Uploaded by

Kranium A
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views80 pages

Unit 3

Uploaded by

Kranium A
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 80

SCET Ownership Course: DATA SCIENCE

UNIT - III
AI Vs. ML

2
Artificial Intelligence

What is AI ?
John McCarthy
Artificial Intelligence is concerned with the design of
intelligence in an artificial device.

The term was coined by McCarthy in 1956.

There are two ideas in the definition.

1. Intelligence
2. Artificial device
The Turing Test
(Can Machine think? A. M. Turing, 1950)

• Requires:
– Natural language Processing
– Knowledge representation
– Automated reasoning
– Machine learning
– (vision, robotics) for full test
Agents

An agent is any thing that can be viewed as perceiving its environment through sensors
and acting upon that environment through actuators/ effectors

Ex: Human Being , Calculator etc

Agent has goal –the objective which agent has to satisfy

Actions can potentially change the environment

Agent perceive current percept or sequence of perceptions


Agent:Sensors, Actuators,
Environment
Agents

• An agent is anything that can be viewed as perceiving its environment through


sensors and acting upon that environment through actuators/ effectors

• Human agent: eyes, ears, and other organs used as sensors;


• hands, legs, mouth, and other body parts used as actuators/ Effector

• Robotic agent:
– Sensors:- cameras (picture Analysis) and infrared range finders for sensors, Solar Sensor.
– Actuators- various motors, speakers, Wheels

• Software Agent(Soft Bot)


– Functions as sensors
– Functions as actuators
– Ex. Askjeeves.com, google.com
Example: Vacuum Cleaner Agent

i Percepts: location and contents, e.g., [A, Dirty]


i Actions: Left, Right, Suck, NoOp

8/7/2021 Unit -1 Introduction


8
AI Vs. ML

9
INTRODUCTION

10
How a computer works?
Cntd..

To solve a problem on a computer, we need an algorithm.


An algorithm is a sequence of instructions that should be carried out to transform the input to output.
Ex. Sorting Input :set of numbers , output : ordered list

For some tasks ,however, we do not have an algorithm, We have machine learning intelligence
Ex. tell spam emails from legitimate emails
Input :Email document (file of characters), output : yes/no output indicating whether the message is spam or not
Computer (machine) to extract automatically the algorithm for this task.
Cntd..

(source: https://medium.com/analytics-vidhya/introduction-to-machine-learning-e1b9c055039c)

• Machine learning is a “Field of study that gives computers the ability to learn without
being explicitly programmed.”
• In other words it is concerned with the question of how to construct computer programs
that automatically improve with the experience. - According to Arthur Samuel(1959)
13
Cntd..
• A computer program is said to learn from experience ‘E’ with respect to some
class of task ‘T’ and performance measure ‘P’ if its performance at task in ‘T’
as measured by ‘P’ improves with experience ‘E’ – Tom M Mitchell

• Machine learning is an application of artificial intelligence (AI) that provides


systems the ability to automatically learn and improve from experience
without being explicitly programmed.

• Machine learning focuses on the development of computer programs that can


access data and use it learn for themselves.

14
Cntd..
Example 1
Classify Email as spam or not spam
• Task (T): Classify email as spam or not spam
• Experience(E): watching the user to mark/label the email as spam or
not spam
• Performance (P): The number or fraction of email to be correctly
classified as spam or not spam

15
Cntd..
Example 2
Recognizing hand written digits/ characters
• Task(T): Recognizing hand written digit
• Experience (E): watching the user to mark/ label the hand written digit
to 10 classes(0-9) & identify underling pattern
• Performance(P):The number of fractions of hand-written digits
correctly classified

16
Why Machine Learning Important?.
• Human expertise does not exist
Navigating on Mars
industrial/manufacturing control
mass spectrometer analysis, drug design, astronomic discovery
• Black-box human expertise OR Some tasks cannot be defined well, except by
examples
face/handwriting/speech recognition/ recognizing people
driving a car, flying a plane

• Relationships and correlations can be hidden within large amounts of data


(e.g., stock market analysis)
• Environments change over time.
(e.g., routing on a computer network)
17
Cntd..
• The amount of knowledge available about certain tasks might be too large for explicit encoding by
humans
(e.g., medical diagnostic).
• New knowledge about tasks is constantly being discovered by humans. It may be difficult to
continuously re-design systems “by hand”.
• Rapidly changing phenomena
credit scoring, financial modeling
diagnosis, fraud detection
• Need for customization/personalization
personalized news reader
movie/book recommendation

18
How does machine learning help us in daily life?
Social networking :
• Use of the appropriate emotions,

suggestions about friend tags on

Facebook, filtered on Instagram, content

recommendations and suggested

followers on social media platforms,

etc., are examples of how machine

learning helps us in social networking.

19
How does machine learning help us in daily life?
Personal finance and
banking solutions

• Whether it’s fraud

prevention, credit decisions,

or checking deposits on our

smartphones machine

learning does it all.

20
How does machine learning help us in daily life?
Commute estimation

• Identification of the route

to our selected destination,

estimation of the time

required to reach that

destination using different

transportation modes,

calculating traffic time,

and so on are all made by


21
machine learning.
Applications of Machine
• Face detection Learning Speech recognition
• Stock prediction Hand-written digit recognition
• Spam Email Detection Computational Biology
• Machine Translation Recommender Systems
• Self-parking Cars Guiding robots
• Airplane Navigation Systems Space Exploration
• Medicine Supermarket Chain
• Data Mining

22
Examples…

Example 1: hand-written digit recognition: Output

Learn a classifier f(x) such that, f : x → {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}

Input training data : e.g. 500 samples


Example 2: Face detection

Input : is an image , the classes are people to be recognized [non-face , frontal-face , profile-face] and the learning
program should learn to associate the face images to identities.

This problem is more difficult because there are more classes, input image is larger, and a face is 3-dimensional and
differences in pose and lighting cause significant changes in the image. There may also be occlusion ( blockage )of
certain inputs; e.g. glasses may hide the eyes and eyebrows, and a beard may hide the chin.
Example 3: Spam detection
Watch Fest <[email protected]> Ma
y
22

to me

•This is a classification problem


• Task is to classify email into spam/non-spam
•Requires a learning system as “enemy” keeps innovating

Why is this message in Spam? It's similar to messages that were detected by our spam filters. Learn more
Example 4: Stock price prediction

• Task is to predict stock price at future date


• This is a regression task, as the output is continuous
Example : Weather prediction
Example : Medical Diagnosis
❖ Inputs are the relevant information about the patient and the classes are the illnesses.
❖ The inputs contain the patient’s age, gender, past medical history, and current symptoms.
❖ Some tests may not have been applied to the patient, and thus these inputs would be missing.
❖ Tests take time, may be costly, and may inconvience the patient so we do not want to apply
them unless we believe that they will give us valuable information.
❖ In the case of a medical diagnosis, a wrong decision may lead to a wrong or no treatment, and
in cases of doubt it is preferable that the classifier reject and defer decision to a human expert.
Example : Agriculture

A Crop Yield Prediction App in Senegal Using Satellite Imagery (Video Link)
https://www.youtube.com/watch?v=4OnBGkhA4jc&t=160s
.
Types of Learning
• Supervised learning
Types of Learning
Supervised Learning :
Aim is to learn a mapping from the input to an output whose correct values are provided
by a supervisor.
1. Classification : Data is labelled …meaning it is assigned a class,
for example spam/non-spam or fraud/non-fraud
e.g. for financial institution ..input to classifier is savings and income and output is
one the class like high risk or low risk based on following classification rule
❑ if income > δ1 and saving δ2 then low risk else high risk
2. Regression : Data is labelled with a real value rather then a label.
e.g. price of a stock over time.
e.g predict the price of used car ….
Input : brand, year, engine capacity, mileage & other information …
output: Price of car
Types of Learning
Types of Learning
Types of Learning
Supervised Learning :
Types of Learning
Supervised Learning :
Types of Learning
Supervised LEARNING

40
Supervised LEARNING

41
Unsupervised Learning
• Unsupervised learning
Example of Unsupervised learning
• Clustering
• Association

43
Example of Unsupervised learning
• Clustering
• Association

44
Example of Unsupervised learning

45
Example of Semi-supervised learning

46
Reinforcement Learning
• learning from mistakes
• Place a reinforcement learning algorithm into any environment and
it will make a lot of mistakes in the beginning
• As we provide some sort of signal to the algorithm that associates
good behaviors with a positive signal and bad behaviors with a
negative one
• we can reinforce our algorithm to prefer good behaviors over
bad ones.
• Over time, our learning algorithm learns to make less mistakes
than it used to.
Reinforcement Learning
Reinforcement Learning
Where is reinforcement learning in the real world?
• Video Games
• Industrial Simulation:
• Resource Management
50
Key Elements of Machine Learning

• There are tens of thousands of machine learning algorithms and hundreds


of new algorithms are developed every year.
• Every machine learning algorithm has three components:
1. Representation: how to represent knowledge.
Examples include decision trees, sets of rules, instances, graphical models,
neural networks, support vector machines, model ensembles and others.
2. Evaluation: the way to evaluate candidate programs (hypotheses).
Examples include accuracy, prediction and recall, squared error, likelihood,
posterior probability, cost, margin, entropy k-L divergence and others.

51
3. Optimization: the way candidate programs are generated known as the
search process.
For example combinatorial optimization, convex optimization,
constrained optimization.
• All machine learning algorithms are combinations of these three
components.
• A framework for understanding all algorithms.

52
Aspects of developing a learning system:
training data, concept representation, function approximation
• For training and testing purpose of our model we need to split the dataset in
to three distinct dataset, training set, validation set and testing set
• Training set:-
• A set of data used to train the model
• It is used to fit the model
• The model sees and learn from this data
• Later on the trained model can be deployed and used to accurately predict
on new data that it has not seen before
• Labeled data is used
53
Validation set

• Validation set is the set of data separate from the training data
• It is used to validate our model during training
• It gives information which is used for tuning model hyper parameter
• It ensures that our model is not over fitting to the data in the training
set
• Labeled data is used

54
Test Set
• A set of data use to test the model
• The test set is separated from both the train set and validation set
• Once the model is train and validated using the training data and
validation sets then the model is used to predict the output for the
data in the test set
• Unlabeled data is used

55
Data Split

Train Validation Test

• Rules for performing data split operation


• In order to avoid a correlation between the original dataset must be
randomly shuffled before applying the split phase
• All the split must represent the original distribution
• The percentage of splitting is mostly 60% for training 20% for
validation and 20% for testing
• With scikit-learn this can be done using train_test_split() function
56
Data Preparation

Data Preparation Pipeline


Data Preparation
Data Preparation
Why is Data Preparation important?

sometimes, data in data sets have missing or incomplete information, which leads to less
accurate or incorrect predictions.
Further, sometimes data sets are clean but not adequately shaped, such as aggregated or
pivoted, and some have less business context.
Hence, after collecting data from various data sources, data preparation needs to
transform raw data.
Significant advantages of data preparation in machine learning as follows:
• It helps to provide reliable prediction outcomes in various analytics operations.
• It helps identify data issues or errors and significantly reduces the chances of errors.
• It increases decision-making capability.
• It reduces overall project cost (data management and analytic cost).
• It helps to remove duplicate content to make it worthwhile for different applications.
• It increases model performance.
Steps in Data Preparation Process

1.Understand the problem:


• understand the actual problem and try to solve it.
2.Data collection:
• collect data from various potential sources. These data sources may be either within
enterprise or third parties vendors.
• Data collection is beneficial to reduce and mitigate biasing in the ML model;
• hence before collecting data, always analyze it and also ensure that the data set was
collected from diverse people, geographical areas, and perspectives.
3.Profiling and Data Exploration:
• explore data such as trends, outliers, exceptions, incorrect, inconsistent, missing, or
skewed information, etc.
• Data exploration helps to determine problems such as collinearity, which means a
situation when the Standardization of data sets and other data transformations are
necessary.
Steps in Data Preparation Process

4.Data Cleaning and Validation:


• Data cleaning and validation techniques help determine and solve inconsistencies, outliers,
anomalies, incomplete data, etc.
• Clean data helps to find valuable patterns and information in data and ignores irrelevant data
in the datasets.
5.Data Formatting:
• After cleaning and validating data, the following approach is to ensure that the data is
correctly formatted or not.
Steps in Data Preparation Process

6.Feature engineering and selection:


• Feature engineering is defined as the study of selecting, manipulating, and transforming raw
data into valuable features
There are various feature engineering techniques used in machine learning as follows:
Imputation:
• Feature imputation is the technique to fill incomplete fields in the datasets.
• It is essential because most machine learning models don't work when there are missing data
in the dataset.
• Although, the missing values problem can be reduced by using techniques such as single
value imputation, multiple value imputation, K-Nearest neighbor, deleting the row, etc.
Encoding:
• Feature encoding is defined as the method to convert string values into numeric form.
• This is important as all ML models require all values in numeric format.
• Feature encoding includes label encoding and One Hot Encoding (also known as
get_dummies).
Data Pre-Processing

1.Data cleaning Data preprocessing


2.Data integration
3.Data transformation
4.Data reduction
5.Data Discretization
Cntd..
• Data preparation is also known as data "pre-processing," "data wrangling,"
"data cleaning," "data pre-processing," and "feature engineering."
• It is the later stage of the machine learning lifecycle, which comes after data
collection..
The data preparation process can be complicated by issues such as:
1. Missing or incomplete records: Missing data sometimes appears as empty
cells, values (e.g., NULL or N/A), or a particular character, such as a question
mark

65
Cntd..

66
Cntd..

67
Cntd..

68
Cntd..

69
Cntd..

70
Cntd..
2. Outliers or anomalies: Unexpected values
• ML algorithms are sensitive to the range and distribution of values when data
comes from unknown sources.
• These values can spoil the entire machine learning training system and the
performance of the model.
• Hence, it is essential to detect these outliers or anomalies through techniques
such as visualization technique.

71
Cntd..
2. Outliers or anomalies: Unexpected values
• ML algorithms are sensitive to the range and distribution of values when data
comes from unknown sources.
• These values can spoil the entire machine learning training system and the
performance of the model.
• Hence, it is essential to detect these outliers or anomalies through techniques
such as visualization technique.

72
Cntd..
3. Unstructured data format :
• Data comes from various sources and needs to be extracted into a different
format.
• Hence, before deploying an ML project, always consult with domain experts or
import data from known sources.
4. Limited or sparse features / attributes :
• Whenever data comes from a single source, it contains limited features,
• so it is necessary to import data from various sources for feature enrichment
or build multiple features in datasets.
5. Understanding feature engineering:
• Features engineering helps develop additional content in the ML models,
increasing model performance and accuracy in predictions. 73
Cntd..

74
Cntd..

75
Cntd..

76
Cntd..

77
Cntd..

78
Cntd..

79
Cntd..

80

You might also like