AI Model Paper Answers
AI Model Paper Answers
1. b) For the given scenarios you are required to build an AI solution. Which
AI techniques can be applied / best suited for stated problems. Justify
1. Extract and digitize the customer information from the Know Your
Customer (KYC) forms.
DeepLearning Techniques are more suitable to extract and digitize the customer information
from KYC forms. Optical Character Recognition: Hand writing Recognition using Open CV,
Tensorflow and Keras can be used in Deep Learning Techniques. Deep learning, has networks
capable of performing unsupervised learning from unlabeled or unstructured data.
3. To identify and narrow down tumour regions and further predict if the
tumour is malignant or not.
Deep Learning with Convolutional Neural Network is more suitable. Deep learning replicates
the functions of the human brain to perform several actions — one of them is image detection.
It does this in the form of Neural Networks — algorithms which help a machine to recognize
relationships between large amounts of data. And the best thing is, it does this analysis all by
itself (with the help of a CNN)!
Deep learning has been successfully applied to numerous problems in many application areas.
These include natural language processing(NLP), sentimentanalysis, cyber security, business,
virtual assistants, visual recognition, healthcare, robotics, speech recognition software, as
self-driving cars and language translation services and many more.
Visual recognition tasks such as image classification, localization , and detection arekey
components of Computer vision.
Feature engineering is one of the best ways to increase the accuracy of theclassification
model.
2)b) For the following scenarios you are required to build a predictive model. Which
machine learning technique/ algorithm can be applied / best suited for stated problems.
Justify your recommendation.
The Linear Regressor does best in reducing the orders which are 10 minutes late,
by around 10%. Interestingly, the ensemble model does fairly well in both
metrics but notably, it has the smallest MAE out of the three models.
Predict the probability of a mechanical system breakdown, based on its system vibration
and operating temperature
Naïve Bayes classifier is a supervised learning algorithm, which is used to make
predictions based on the probability of the object. The algorithm named as Naïve
Bayes as it is based on Bayes theorem, and follows the naïve assumption that says'
variables are independent of each other.
Section-2
3.a) How to handle the missing values in the dataset? Explain.
The missing values can be handled by following ways:
• Keep the missing value as is.
• Remove data objects with missing values.
• Remove the attributes with missing values.
• Estimate and impute missing values.
A program to identify the attributes containing missing values, number of missing
values. perform data cleaning by removing missing values using various techniques.
import pandas as pd
df=pd.read_csv("tit anic.csv")
df.head()
The statistical data can be obtained by using describe() function . The describe method
returns description As the above given statical Summary the following things thing can be
analysed.
count - The number of not-empty values in Iris dataset
mean - The average or mean value of Iris dataset
std - The standard deviation.
min - the minimum value in Iris dataset..
25% - The 25% percentile.
50% - The 50% percentile.
75% - The 75% percentile.
max - the maximum value in Iris dataset
4.a) Consider a real estate company that has a dataset containing the prices
of properties in the Delhi region. It wishes to use the data to optimise the
sale prices of the properties based on important factors such as area,
bedrooms, parking, etc.
df=pd.read_csv("C:/Users/Shilpa/Desktop/Housing (1).csv")
df.head(10)
df.info()
df.describe()
status=pd.get_dummies(df['furnishingstatus'])
col=['mainroad', 'guestroom', 'basement', 'hotwaterheating',
'airconditioning', 'prefarea']def binary_map(x):
return x.map({'yes': 1, "no": 0})
Section-3
4) a) N-grams are defined as the combination of N keywords together. Consider the
given sentence: “Data Visualization is a way to express your data in a visual context so
that patterns, correlations, trends between the data can be easily understood.” Generate
bi-grams and tri-grams for the above sentence
• Before performing text cleaning steps.
• After performing following text cleaning steps:
a. Stop word Removal
b. Replacing punctuations by a single space
text="“Data Visualization is a way to express your data in a visual context so that patterns,
correlations, trends between the data can be easily understood.” Generate bi-grams and
tri- grams for the above sentence"
5)b) K-means clustering with Euclidean distance suffer from the curse of
dimensionality. Is the statement true and why?
True, KMeans clustering suffers from curse of dimentionality. The reason is as the number of
dimensions increases, a distance-based similarity measure converges to a constant value
between any given examples. This convergence means k-means becomes less effective at
distinguishing between examples. The ratio of the standard deviation to the mean of distance
between examples decreases as the number of dimensions increases. Reduce dimensionality
either by using PCA on the feature data, or by using “spectral clustering” to modify the
clustering algorithm.
6) a) The sinking of the Titanic is one of the most infamous shipwrecks in history.
On April 15, 1912, during her maiden voyage, the widely considered
“unsinkable” RMS Titanic sank after colliding with an iceberg. Unfortunately,
there were not enough lifeboats for everyone onboard, resulting in the death of
1502 out of 2224 passengers and crew. You are asked to build a machine
learning model to predict whether a passenger survived or not. Describe each
steps used to build model.
Section-4
7) a)A machine learning model was built to classify patient as covid +ve
or -ve. The confusion matrix for the model is as shown below.
Compute other performance metrics and analyze the performance of
the model.
TP=397,FP=103,FN=126,TN=142
accuracy=TP+FP/T
P+FP+FN+TN
397+103/397+103+
126+142
0.65
Recall=TP/TP+FN 397/397+126
0.75
Precision=
TP/TP+FP
397/397+10
3
0.79
F1_score=2*precision*recall/precision+recall
2*0.79*0.75/0.79+0.75
1.185/1.54
0.76
8 a)The data scientist observes that, during multiple runs with the identical
parameters the loss function converges to different, yet stable values. What
should the data scientist do to improve the training process? Justify.
It is most likely that the loss function is very curvy and has multiple local minima
where the training is getting stuck. Decreasing the batch size would help the Data
Scientist stochastically get out of the local minima saddles. Decreasing the learning
rate would prevent overshooting the global loss function
Section-5
9) a) What are the deployment strategies borrowed from DevOps that can
beutilized in MLOPs. Explain anyone strategy.
• Rolling Strategy and Canary Deployments.
• Recreate Strategy.
• Custom Strategy.
• Blue-Green Deployment using routes.
• A/B Deployment and canary deployments using routes.
• One Service, Multiple Deployment Configuration
Blue/Green
This form of deployment is essentially a server swap. There are two identical
systems available. The user requests are routed to one of the systems, and the
newer changes and updates are done on the other system. Once the updates are
tested and verified, theuser requests are routed to the newer system, essentially
swapping out the old model for a new one.
9)b) Machine learning models can be resource heavy. They require a good
amount of processing power to predict, validate, and recalibrate, millions
of times over. How can containerisation of ML model solve this problem?
Training an ML model on your own machine, without containerizing, can:
1. Slow down your machine or even make it unusable.
2. Hog local resources for a long time.
3. Waste time on repetitive tasks. If your machine fails during training, training
times can start all over.
4. Limit your resources to only what’s on your machine.
The container can be spun up anywhere, anytime, and will always be an exact duplicate
of the original container image, whether it is right now, in a month or in years.
Furthermore, exposing your model container as an endpoint will separate it from its
serving infrastructure, supporting decoupled architecture. This means that if you everwant
to exchange the existing model for another, or implement it with other services, it is an
easy switch and integration. Orchestration can help you distribute the job over several
nodes or containers, reducing the total amount of time to finish.
10. a) How will you deploy a trained machine learning model as a predictive
service in a production environment. Explain.
Step1: Get your data pipeline ready
Long before you reach the predictive model deployment stage, you need to make
sure that your data pipelines are structured effectively and are giving you high-
quality, relevant data.
Step2: Access the right external data
When you are building a predictive model for production, you need to be sure
that youare working with the best possible data, from the most relevant sources,
right up until the moment you launch. If it’s already stale, your carefully crafted
models won’t be much use. Part of the challenge is getting hold of enough
historical data to get a complete picture. Few organizations can collect all the data
they need internally.
Rigorous training and testing is essential before you can progress to the predictive
model deployment stage, but it can be a time-consuming process. To avoid getting
slowed down, you need to automate as much as you can. This doesn’t mean
simply working in a few time-saving tools or tricks. The goal is to create models
that can ultimately run without any action on your part.
Step4: Design robust auditing, monitoring and retraining protocols
Before you can deploy your predictive model, you need to know that it’s actually
delivering the kind of results you’re looking for, that these results are accurate,
and that the data you’re feeding into the model will keep these models relevant
over time. Relying on the same, tired old data can cause model drift, leading to
inaccurate results. This means you need to build training pipelines and processes
that bring in new data, audit your internal data sources, and tell you which features
are still giving you valuable insights.
10) b)For the below given scenarios, suggest best suited cloud deployment
model and list the challenges with it.
1. For , a. Variable workload b. Test and Development
2. For, a. Cloud bursting b. On demand access c. Sensitive data