Sample Assignment-Artificial Intelligence)
Sample Assignment-Artificial Intelligence)
DUBAI, U.A.E
NAME :
REGISTER NO :
DEPARTMENT :
UNIT NAME :
Page 1 of 16
AL SHABAKA TECHNICAL INSTITUTIONAL ACADEMY
Any piece of student’s work without signed declaration will not be accepted for marking.
LEARNER DECLARATION
1) This assignment is the product of individual work.
2) I am aware of what plagiarism / collusion is and the penalties that I/we would suffer if I am found to have committed
plagiarism / collusion.
3) The work submitted is the product of my original work and where material and ideas have been taken from the
published and unpublished work of others, reference to all original sources has been made in the text and via the
reference, bibliography or notes sections, or by some other means.
4) I adhere to the given time period and understand that any kind of late submission is not acceptable.
LEARNER SIGNATURE FOR RECEIVING THE ASSIGNMENT
(Signature should not exceed the box)
Page 2 of 16
ASSIGNMENT BRIEF
Qualification Title DIPLOMA IN ARTIFICIAL INTELLIGENCE
Page 3 of 16
SCENARIO
Company X is a telecommunications company that provides services such as internet, cable TV, and
phone lines. They are facing a challenge with customer churn, where customers are canceling their
subscriptions and moving to competitors. To address this issue, they decide to apply machine learning
techniques to predict customer churn and take proactive measures to retain customers.
TASK 1
ASSESSMENT CRITERIA P1.1
1.1) What is machine learning? Explain different perspectives and issues in machine learning?
TASK 2
ASSESSMENT CRITERIA P2.1
2.1) Describe type of machine learning with example?
2.3) Discuss how a multi-layer network learns using a gradient decent algorithm.
TASK 3
ASSESSMENT CRITERIA P3.1
Page 4 of 16
ASSESSMENT CRITERIA P3.2
FOR TASK 1
Your evidence should be presented in the form of a report which contains detail
introduction of each task
Relevant and necessary diagrams should be presented in answers.
References must be included.
FOR TASK 2
Your evidence should INDEX
be presented in the form of a report which contains detail
Evidences to
S.NO introduction of each task.
PARTICULARS PAGE NO
be submitted
Relevant and necessary diagrams should be presented in answers.
1. TASK
1References must be included. 7-9
6.
7.
8.
9.
10.
11. Page 5 of 16
12.
Task-1 Solution
1.1)
Machine Learning (ML) is a field of artificial intelligence (AI) that focuses on developing algorithms and
techniques that allow computer systems to learn from and make predictions or decisions based on data,
Page 6 of 16
without being explicitly programmed. ML algorithms learn from data patterns and statistical relationships
to identify and generalize patterns, make predictions, or perform specific tasks.
a. Supervised Learning: In this perspective, the ML algorithm is trained on a labeled dataset, where each
data instance is associated with a known output or target. The goal is to learn a mapping between inputs
and outputs, enabling the algorithm to make accurate predictions or classifications on unseen data.
b. Unsupervised Learning: Here, the ML algorithm is exposed to an unlabeled dataset, without any
predefined outputs or targets. The algorithm discovers hidden patterns or structures in the data, such as
clusters or associations, to gain insights or extract meaningful representations.
c. Reinforcement Learning: This perspective involves an agent that learns to interact with an environment
to maximize a reward signal. The agent takes actions in the environment and receives feedback in the
form of rewards or penalties. Through trial and error, it learns to make optimal decisions and develop
strategies to achieve long-term goals.
a. Data Quality: ML algorithms heavily rely on high-quality data for training. Issues such as missing
values, outliers, or biased data can significantly impact the performance and fairness of the models.
Data preprocessing and cleaning techniques are employed to address these challenges.
b. Overfitting and Underfitting: Overfitting occurs when a model performs well on the training data but
fails to generalize to unseen data. It happens when the model captures noise or irrelevant patterns in the
training set. Underfitting, on the other hand, occurs when a model is too simple to capture the underlying
relationships in the data. Balancing model complexity is crucial to avoid these issues.
c. Bias and Fairness: ML models can inherit biases present in the training data, leading to discriminatory
or unfair outcomes. Addressing bias and ensuring fairness in ML models is an active area of research and
involves techniques such as data augmentation, algorithmic fairness, and careful evaluation of model
outputs.
d. Interpretability and Explainability: Many ML models, such as deep neural networks, can be complex
and difficult to interpret. Understanding the reasons behind model predictions or decisions is crucial,
especially in sensitive domains like healthcare or finance. Techniques like model-agnostic interpretation
or rule extraction aim to enhance interpretability.
1.2)
Applications and Goals of Machine Learning:
Page 7 of 16
a. Image and Speech Recognition: ML is used in applications like facial recognition, object detection, and
speech recognition. It enables systems to identify and understand visual or auditory data, leading to
advancements in fields like computer vision and natural language processing.
c. Fraud Detection: ML algorithms can analyze large volumes of financial transactions and detect
anomalous patterns or behaviors indicative of fraud. This helps in minimizing fraudulent activities in
sectors like banking, insurance, or e-commerce.
d. Healthcare and Medicine: ML plays a crucial role in medical image analysis, disease diagnosis, drug
discovery, and personalized medicine. It aids in identifying patterns or biomarkers in medical data,
predicting patient outcomes, and optimizing treatment plans.
e. Autonomous Vehicles: ML is vital for self-driving cars, enabling them to perceive the environment,
make decisions, and navigate safely. ML algorithms analyze sensor data to recognize objects, detect road
signs, and predict the behavior of other vehicles.
f. Natural Language Processing (NLP): ML techniques power NLP applications like machine translation,
sentiment analysis, chatbots, and voice assistants. It helps computers understand and generate human.
1.3)
Page 8 of 16
Designing a learning system involves several stages to ensure the development of an effective and
successful machine learning model. The following are the various stages typically involved in designing a
learning system:
Problem Definition: Clearly define the problem or task that the learning system aims to solve.
Identify the goals, objectives, and requirements of the system. This stage involves understanding the
problem domain, defining the target variables, and determining the available data.
Data Collection and Preparation: Gather the relevant data required for training and evaluation. This stage
involves identifying the data sources, collecting the data, and preprocessing it. Data preprocessing tasks
may include cleaning the data, handling missing values, removing outliers, and transforming the data into
a suitable format for training.
Data Exploration and Analysis: Explore the collected data to gain insights and a deeper understanding of
its characteristics. This stage involves analyzing the statistical properties of the data, visualizing data
distributions, identifying correlations, and conducting feature engineering, which involves selecting,
transforming, and creating appropriate features for training the model.
Model Selection: Choose the appropriate machine learning model or algorithm that suits the problem and
the available data. Consider factors such as the problem type (classification, regression, clustering, etc.),
the size of the dataset, and the desired model interpretability or complexity.
Model Training: Train the selected model on the prepared training data. This involves feeding the data
into the model and optimizing the model's parameters or weights based on a specific learning algorithm.
The training process aims to minimize the error or loss between the model's predictions and the actual
target values.
Model Evaluation: Assess the performance of the trained model to determine its effectiveness and
generalization capabilities. This stage involves evaluating the model's performance metrics, such as
accuracy, precision, recall, F1 score, or mean squared error, using appropriate evaluation techniques like
cross-validation or holdout validation. It helps in understanding how well the model performs on unseen
data.
Model Optimization and Tuning: Fine-tune the model to improve its performance or address any
identified issues. This stage involves optimizing hyperparameters, which are settings that control the
behavior of the model, to find the best configuration. Techniques like grid search or random search can be
used to systematically explore different hyperparameter combinations.
Deployment and Integration: Once the model is deemed satisfactory, it can be deployed and integrated
into the intended system or application. This stage involves creating APIs or interfaces to allow the model
to receive input data and provide predictions or decisions in real-time. Proper monitoring and
maintenance are crucial to ensure the model's continued performance and accuracy.
Iteration and Improvement: Machine learning systems are often iterative processes. Regularly monitor the
performance of the deployed model, collect feedback, and gather additional data to further refine the
model. Continuous iteration and improvement help adapt the model to changing conditions, improve
accuracy, and address any emerging challenges.
Page 9 of 16
Task-2 Solution
2.1)
Types of Machine Learning:
a. Supervised Learning: In supervised learning, the machine learning algorithm is trained on labeled
examples, where each input data instance is associated with a corresponding target output. The goal is to
learn a mapping between input variables and their corresponding outputs. For example, predicting
housing prices based on features like square footage, number of bedrooms, and location.
b. Unsupervised Learning: Unsupervised learning involves training the machine learning algorithm on
unlabeled data. The algorithm learns patterns, structures, or relationships in the data without specific
target labels. It aims to discover hidden patterns or groupings within the data. For example, clustering
similar customer groups based on their purchasing behavior without prior knowledge of their preferences.
e. Deep Learning: Deep learning involves training deep neural networks with multiple layers to learn
hierarchical representations of data. Deep learning has been particularly successful in tasks such as image
and speech recognition. For example, training a deep neural network to recognize and classify objects in
images.
2.2)
Support Vector Machines (SVM) are a popular machine learning technique used for classification and
regression tasks. SVMs are particularly effective when dealing with complex, high-dimensional datasets.
Page 10 of 16
The main idea behind SVM is to find the optimal hyperplane that best separates the data points of
different classes Here's an explanation of SVM with an example:
Let's consider a binary classification problem where we have two classes: Class A and Class B. We have
a dataset consisting of several data points, each described by two features, X1 and X2. The goal is to
create an SVM model that can accurately classify new data points into Class A or Class B.
First, we plot the data points on a 2D scatter plot, with X1 on the x-axis and X2 on the y-axis. The points
of Class A are represented by blue dots, and the points of Class B are represented by red dots.
The SVM algorithm aims to find the optimal hyperplane that maximizes the margin between the two
classes. The margin is the distance between the hyperplane and the nearest data points of each class. The
hyperplane that achieves the maximum margin is considered the best decision boundary.
The support vectors are the data points that lie closest to the hyperplane. These points are crucial for
defining the decision boundary and determining the hyperplane's position.
Once the optimal hyperplane is determined, we can use it to classify new, unlabeled data points. We
classify a new data point by checking which side of the hyperplane it lies on.
Example:
Sample X1 X2 Class
1 1 2 Class A
2 2 3 Class A
3 3 4 Class A
4 5 5 Class B
5 6 6 Class B
6 7 7 Class B
We can plot these samples on a scatter plot, with X1 on the x-axis and X2 on the y-axis. Based on the
data distribution, we can see that a line separating the two classes can be drawn roughly at X1 = 4.
Using this attribute, X1, as the root for decision tree classification, we can create a decision boundary that
can separate the data points of Class A and Class B effectively. The decision tree will then branch out to
consider other attributes for further classification.
Please note that the selection of the root attribute for decision tree classification depends on various
factors, including the dataset, the problem at hand, and the specific decision tree algorithm used. It is
Page 11 of 16
important to consider the characteristics and relationships between the attributes and classes in order to
make an informed choice for the root attribute.
2.3)
A multi-layer network, also known as a multi-layer perceptron (MLP), learns using the gradient descent
algorithm. Gradient descent is an iterative optimization algorithm used to minimize the error or loss
function of a neural network during the training process.
Here's how a multi-layer network learns using the gradient descent algorithm:
Initialization: Initially, the weights and biases of the network are randomly assigned or initialized to small
values.
Forward Propagation: In the forward propagation step, input data is fed into the network, and the
activations of each neuron are calculated layer by layer. Each neuron applies a weighted sum of its inputs,
followed by an activation function to produce an output.
Loss Calculation: After the forward propagation, the output of the network is compared to the expected
output using a loss function. The loss function quantifies the difference between the predicted output and
the desired output.
Backpropagation: The backpropagation algorithm is used to calculate the gradients of the loss function
with respect to the weights and biases of the network. The gradients are computed by
propagating the error backward through the network. This involves calculating the partial derivatives of
the loss function with respect to the weights and biases at each layer.
Gradient Descent Update: Once the gradients are computed, the weights and biases are updated to
minimize the loss function. The update is performed by subtracting a fraction of the gradient from the
current weights and biases. This fraction is determined by the learning rate, which controls the
step size of the update. The update equation for a weight parameter is:
makefile
Iterations: Steps 2 to 5 are repeated for a fixed number of iterations or until a convergence criterion is
met. Each iteration is called an epoch. During each epoch, the network processes a mini-batch or
individual samples from the training data and updates the weights and biases accordingly.
Model Evaluation: After training, the performance of the trained network is evaluated using a separate
validation set or by testing it on unseen data. This step helps to assess the generalization ability of the
network and avoid overfitting.
By iteratively adjusting the weights and biases through the backpropagation and gradient descent steps,
the multi-layer network gradually learns to approximate the desired mapping between the input data and
the output. The goal is to minimize the loss function, which leads to better predictions and improved
performance of the network on unseen data.
Page 12 of 16
Task-3 Solution
3.1)
Feature selection is the process of selecting a subset of relevant features or variables from a larger set of
available features. It is an important step in machine learning and data analysis because it helps to
improve model performance, reduce overfitting, and enhance interpretability. The goal of feature
selection is to identify the most informative and discriminative features that contribute the most to the
prediction or analysis task.
Here are some common techniques and approaches used in feature selection:
Filter Methods: These methods use statistical measures or scoring techniques to rank the features based
on their relevance to the target variable. Examples include correlation coefficient, mutual information,
and chi-square tests. Features are selected based on predefined criteria, such as selecting the top-k
features with the highest scores.
Wrapper Methods: Wrapper methods involve evaluating the performance of a machine learning model
using different subsets of features. It searches for an optimal subset by considering different combinations
and iteratively training and evaluating the model. Examples include forward selection, backward
elimination, and recursive feature elimination (RFE). These methods are computationally expensive but
can provide better feature subsets.
Embedded Methods: Embedded methods incorporate feature selection within the model building process.
They automatically select the most relevant features during model training. Examples include Lasso
regression, which performs both feature selection and regularization, and decision trees, which naturally
select features based on their importance in the tree structure.
The choice of feature selection technique depends on the specific problem, dataset characteristics, and the
goals of the analysis. It is important to carefully consider the trade-offs between model complexity,
interpretability, and performance when selecting features. Additionally, feature selection should be
evaluated in conjunction with the chosen machine learning algorithm to ensure that the selected features
are relevant and improve the model's predictive power.
3.2)
Logistic regression is a statistical algorithm used for binary classification problems, where the goal is to
predict a binary outcome variable based on one or more predictor variables. It is a popular and widely
used algorithm in machine learning and statistics.
Page 13 of 16
In logistic regression, the outcome variable is modeled as a function of the predictor variables using the
logistic function (also known as the sigmoid function). The logistic function maps any real-valued
number to a value between 0 and 1, which can be interpreted as the probability of the outcome being in
the positive class.
The logistic regression model assumes a linear relationship between the predictor variables and the log-
odds of the outcome variable. The log-odds (or logit) is the logarithm of the odds ratio, where the odds
ratio is the probability of the outcome occurring divided by the probability of the outcome not occurring.
The linear relationship is achieved by taking the dot product of the predictor variables and their
corresponding coefficients.
The logistic regression model is trained by optimizing the coefficients (also known as weights or
parameters) to maximize the likelihood of the observed outcomes given the predictor variables.
This process is typically done using maximum likelihood estimation or gradient descent optimization
algorithms.
Once the model is trained, it can be used to make predictions by calculating the probability of the
outcome being in the positive class based on the predictor variables. A threshold can be chosen to convert
the probabilities into binary predictions.
1. Simplicity: It is relatively simple and interpretable compared to more complex models like neural
networks.
2. Efficiency: It can handle large datasets efficiently and can be trained quickly.
3. Interpretability: The coefficients of logistic regression can provide insights into the relationships
between the predictor variables and the outcome.
4. Probability estimation: Logistic regression can provide probabilities as outputs, allowing for a more
nuanced understanding of the predictions
1. Linearity assumption: Logistic regression assumes a linear relationship between the predictor variables
and the log-odds of the outcome. If the relationship is non-linear, additional transformations or more
complex models may be needed.
2. Independence assumption: Logistic regression assumes that the predictor variables are independent of
each other. Violations of this assumption, such as multicollinearity, can affect the model's performance.
3. Limited to binary outcomes: Logistic regression is specifically designed for binary classification
problems. It can be extended to handle multiple classes (multinomial logistic regression) or ordinal
outcomes (ordinal logistic regression) with appropriate modifications.
3.3)
Generative learning algorithms are a class of machine learning algorithms that aim to model the
underlying probability distribution of the input data. They learn the joint probability distribution of the
Page 14 of 16
input features and the corresponding class labels (if applicable) to generate new samples from the learned
distribution. Generative models can be used for both unsupervised learning and supervised learning tasks.
1. Naive Bayes Classifier: Naive Bayes is a simple and popular generative algorithm used for
classification tasks. It assumes that the features are conditionally independent given the class label, which
simplifies the modeling process. Naive Bayes estimates the classconditional probability distribution for
each class, and then applies Bayes' theorem to calculate the posterior probabilities of the classes given the
input features.
2. Gaussian Mixture Models (GMMs): GMMs are probabilistic models that assume the data is generated
from a mixture of Gaussian distributions. Each Gaussian component represents a cluster in the data.
GMMs estimate the parameters of the Gaussian components, including mean, covariance, and mixture
weights, to model the data distribution. They can be used for tasks such as clustering, density estimation,
and data generation.
3. Hidden Markov Models (HMMs): HMMs are generative models that are commonly used for sequence
modeling tasks, such as speech recognition and natural language processing. HMMs assume that the
observed sequence is generated from a sequence of hidden states, and the transition between states
follows a Markov process. HMMs estimate the transition probabilities and emission probabilities
(probability of observing an output given the hidden state) to model the sequence
4. Variational Autoencoders (VAEs): VAEs are generative models that combine ideas from deep learning
and probabilistic modeling. They learn a low-dimensional latent representation of the input data and a
generative model that maps samples from the latent space back to the input space. VAEs are trained using
a combination of reconstruction loss and a regularization term to encourage the latent space to follow a
5. Generative Adversarial Networks (GANs): GANs are a powerful class of generative models that
consist of two neural networks: a generator and a discriminator. The generator network learns to generate
synthetic samples that resemble the real data, while the discriminator network learns to distinguish
between real and generated samples. GANs are trained in a competitive manner, where the generator tries
to fool the discriminator, and the discriminator aims to correctly classify real and generated samples.
Generative learning algorithms offer several advantages, including the ability to generate new samples,
handle missing data, and capture the underlying data distribution. However, they can be computationally
expensive and may suffer from overfitting if the training data is limited.
References
Page 15 of 16
1) Machine Learning for Absolute Beginners by Oliver Theobald
2) Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow by
Geron Aurelien
3) The Hundred-Page Machine Learning Book by Andriy Burkov
Page 16 of 16