AI Lab6
AI Lab6
Practical No. 6
To Understand the basics of Machine Learning
Student’s Roll no: _______________ Points Scored: __________________________
OBJECTIVES: Upon successful completion of this practical, the students will be able to:
Understand machine learning and its basic concepts.
Understand types of machine learning.
Build a machine learning model.
Machine Learning
Machine Learning allows the systems to make decisions autonomously without any external
support. These decisions are made when the machine is able to learn from
the data and understand the underlying patterns that are contained within it.
Machine Learning is making the computer learn from studying data and statistics.
Machine Learning is a step into the direction of artificial intelligence (AI).
Machine Learning is a program that analyses data and learns to predict the outcome.
Supervised machine learning can be classified into two types of problems, which are given below:
o Classification
o Regression
a) Classification
Classification algorithms are used to solve the classification problems in which the output variable
is categorical, such as "Yes" or No, Male or Female, Red or Blue, etc. The classification algorithms
predict the categories present in the dataset. Some real-world examples of classification algorithms
are Spam Detection, Email filtering, etc.
Regression algorithms are used to solve regression problems in which there is a linear relationship
between input and output variables. These are used to predict continuous output variables, such as
market trends, weather prediction, etc.
Unsupervised Learning
Unsupervised learning is different from the Supervised learning technique; as its name suggests,
there is no need for supervision. It means, in unsupervised machine learning, the machine is trained
using the unlabeled dataset, and the machine predicts the output without any supervision.
In unsupervised learning, the models are trained with the data that is neither classified nor labelled,
and the model acts on that data without any supervision.
The main aim of the unsupervised learning algorithm is to group or categories the unsorted dataset
according to the similarities, patterns, and differences. Machines are instructed to find the hidden
patterns from the input dataset.
Unsupervised Learning can be further classified into two types, which are given below:
o Clustering
o Association
1) Clustering
The clustering technique is used when we want to find the inherent groups from the data. It is a
way to group the objects into a cluster such that the objects with the most similarities remain in one
group and have fewer or no similarities with the objects of other groups. An example of the
clustering algorithm is grouping the customers by their purchasing behavior.
2) Association
Association rule learning is an unsupervised learning technique, which finds interesting relations
among variables within a large dataset. The main aim of this learning algorithm is to find the
dependency of one data item on another data item and map those variables accordingly so that it
can generate maximum profit. This algorithm is mainly applied in Market Basket analysis, Web
usage mining, continuous production, etc.
Some popular algorithms of Association rule learning are Apriori Algorithm, FP-growth algorithm.
Semi-Supervised Learning
Semi-Supervised learning is a type of Machine Learning algorithm that lies between Supervised and
Unsupervised machine learning. It represents the intermediate ground between Supervised (With
Labelled training data) and Unsupervised learning (with no labelled training data) algorithms and
uses the combination of labelled and unlabeled datasets during the training period.
Although Semi-supervised learning is the middle ground between supervised and unsupervised
learning and operates on the data that consists of a few labels, it mostly consists of unlabeled data.
As labels are costly, but for corporate purposes, they may have few labels. It is completely different
from supervised and unsupervised learning as they are based on the presence & absence of labels.
Reinforcement Learning
Reinforcement learning works on a feedback-based process, in which an AI agent (A software
component) automatically explore its surrounding by hitting & trail, taking action, learning from
experiences, and improving its performance. Agent gets rewarded for each good action and get
punished for each bad action; hence the goal of reinforcement learning agent is to maximize the
rewards. In reinforcement learning, there is no labelled data like supervised learning, and agents
learn from their experiences only.
A dataset is nothing but a collection of data. A dataset generally has two main components:
Features: (also known as predictors, inputs, or attributes) they are simply the variables of our data.
They can be more than one and hence represented by a feature matrix (‘X’ is a common notation to
represent feature matrix). A list of all the feature names is termed feature names.
Response: (also known as the target, label, or output) This is the output variable depending on the
feature variables. We generally have a single response column and it is represented by a response
vector (‘y’ is a common notation to represent response vector). All the possible values taken by a
response vector are termed target names.
Loading exemplar dataset: scikit-learn comes loaded with a few example datasets like the iris and
digits datasets for classification and the boston house prices dataset for regression.
Loading external dataset: Now, consider the case when we want to load an external dataset. For
this purpose, we can use the pandas library for easily loading and manipulating datasets.
Step 2: Splitting the dataset
o Split the dataset into two pieces: a training set and a testing set.
o Train the model on the training set.
o Test the model on the testing set, and evaluate how well our model did.
Lab Tasks
2. Differentiate classification vs. Regression with suitable examples. Also list ML algorithms that
can be used for classification and regression.
4. Load boston house prices dataset from sklearn and create a train-test split of 80:20 ratio. Add
code and output screenshot here.
The End