MADA
MADA
AIM:
Basic syntax of python in term of variables, conditional statement, loops, operator
and function.
Conditional statement
There are situations in real life when we need to do some specific task and
based on some specific conditions, we decide what we should do next.
If the simple code of block is to be performed if the condition holds true then
the if statement is used.
if else Statement in Python
The if-elif statement is shortcut of if..else chain. While using if-elif statement
at the end else block is added which is performed if none of the above if-elif
statement is true.
Loops
In Python, loops are used to repeatedly execute a block of code. There are
two main types of loops in Python: for loop and while loop.
1. for Loop:
The for loop is typically used for iterating over a sequence (such as a list,
tuple, string, or range) or other iterable objects.
The while loop is used to repeatedly execute a block of code as long as the
specified condition is true.
Example-
def function_name(parameter1, parameter2, ...):
# Code to perform the task
# ...
# Optionally, return a value
return result
Function
➢ It provides data structures like Series and DataFrame, which are designed
to handle and analyze structured data easily.
Example
import pandas as pd
# Creating a DataFrame
data = {'Name': ['John', 'Jane', 'Bob'],
'Age': [25, 30, 22]}
df = pd.DataFrame(data)
# Performing operations on the DataFrame
mean_age = df['Age'].mean()
Scikit-learn (sklearn):
➢ Scikit-learn is a machine-learning library that provides simple and efficient
tools for data analysis and modelling.
Regression Models: Used for predicting continuous outcomes. Examples include linear regression,
polynomial regression, and support vector regression.
Classification Models: Used for predicting categorical outcomes. Examples include logistic
regression, decision trees, random forests, and support vector machines.
Unsupervised Learning:
Unsupervised learning deals with unlabeled data. The algorithm tries to find patterns or structures
in the data without explicit guidance.
Clustering Models: Group similar data points together based on their features. Examples include
K-means clustering and hierarchical clustering.
Dimensionality Reduction Models: Reduce the number of features in the dataset while preserving
important information. Examples include principal component analysis (PCA).
Semi-supervised Learning:
Semi-supervised learning combines elements of both supervised and unsupervised learning. It uses
a small amount of labeled data along with a large amount of unlabeled data to improve learning
accuracy.
Semi-supervised Learning Models: These models utilize both labeled and unlabeled data for
training. Algorithms like self-training and co-training are examples of semi-supervised learning
approaches.
Reinforcement Learning:
Convolutional Neural Networks (CNNs): Designed for processing structured grid data like images or
sequences. CNNs are commonly used in image recognition and computer vision tasks.
Recurrent Neural Networks (RNNs): Specialized for sequence data, RNNs have feedback connections
allowing them to process sequences of inputs. Long Short-Term Memory Networks (LSTMs) and Gated
Recurrent Units (GRUs) are variants of RNNs.
Generative Models: Generate new data samples that resemble the training data. Examples include
Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs).
Ensemble Learning Models:
Random Forests: Consist of multiple decision trees trained on different subsets of the data. The final
prediction is the aggregation of predictions from individual trees.
Gradient Boosting Machines (GBMs): Build a sequence of decision trees where each subsequent tree
corrects the errors made by the previous ones.
Supervised Machine Learning Regression models:
where:
• 𝑦 is the dependent variable
•X is the independent variable
•β0 is the intercept
•β1 is the slope
Support Vector for Regression (SVR)
SVR was initially proposed by Drucker, which is a supervised learning technique, based on
the concept of Vapnik's support vectors. SVR aims at reducing the error by determining the
hyperplane and minimising the range between the predicted and the observed values.
Consider these two red lines as the decision boundary and the green line as the hyperplane. Our
objective, when we are moving on with SVR, is to basically consider the points that are
within the decision boundary line. Our best-fit line is the hyperplane that has a maximum
number of points.
Experiment – No. 8
AIM:
Dimensionality reduction (using PCA) and modelling (K-means) of
dataset.
The main feature of unsupervised learning algorithms, when compared to classification and
regression methods, is that input data are unlabeled (i.e. no labels or classes given) and that the
algorithm learns the structure of the data without any assistance.
This creates two main differences. First, it allows us to process large amounts of data because the
data does not need to be manually labeled.
Second, it is difficult to evaluate the quality of an unsupervised algorithm due to the absence of
an explicit goodness metric as used in supervised learning.
One of the most common tasks in unsupervised learning is dimensionality reduction.
On one hand, dimensionality reduction may help with data visualization (e.g. t-SNA method)
On the other hand, it may help deal with the multicollinearity of your data and prepare the data
for a supervised learning method (e.g. decision trees).
There are two types of Dimensionality Reduction techniques:
1.Feature Selection
2.Feature Extraction
Used for
➢ Noise filtering
➢ Visualization
➢ Feature Extraction
▪ Obtain the Eigenvectors and Eigenvalues from the covariance matrix or correlation matrix, or
▪ Sort eigenvalues in descending order and choose the k eigenvectors that correspond to the k
largest eigenvalues where k is the number of dimensions of the new feature subspace.