0% found this document useful (0 votes)

32 views

AI Model Paper Answers

Paperr for jshsnvsjnsvha

Uploaded by

mantalk321

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views

AI Model Paper Answers

Paperr for jshsnvsjnsvha

Uploaded by

mantalk321

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

ARTIFICIAL INTELLIGENCE & MACHINE LEARNING

Model Paper Question & Answers

1.a) With Industry 4.0, artificial intelligence is finding place in every aspect
of life. What happens if AI replaces humans in the workplace?
If AI replaces humans, we will have both benefits and Disadvantages
Few benefits are:
1) Reduction in Human Error:
The phrase “human error” was born because human beings make errors from time to time.
Computers, however, do not make those errors if they’re programmed properly. With Artificial
intelligence, the choices are taken from the formerly accrued data applying a sure set of
algorithms. So, mistakes are decreased and the chance of attaining accuracy with extra
precision is a possibility.
2) Available 24×7:
An Average human will work for 4-6 hours a day aside from the breaks. Humans are
constructed in this type of manner to get some time out for refreshing themselves and getting
equipped for a brand-new day of labour and that they even have weekly offers to live intact
with their work-life and private life. But the usage of AI will make machines work 24×7 with
no breaks and that they do not even get bored, in contrast to humans.
3) Helping in Repetitive Jobs:
In our daily work, we can perform many repetitive responsibilities like sending a thanks letter,
verifying certain files for mistakes and lots of other things. Using artificial intelligence, we will
productively automate these mundane responsibilities and perform them efficiently.
4) Helps in Faster Decision making
Using Al alongside other technology we can make machines make choices quicker than a
human and carry out movements quicker. AI-powered machines work on what’s programmed
and deliver the outcomes more quickly.

Some Disadvantages are:

1)High Costs of Creation.

As AI is updating each day the hardware and software programs want to get up to date with
time to satisfy the trendy requirements. Machines need repairing and renovation which need
lots of expenses. Its creation calls for massive expenses as they’re very complicated machines.
2)Making Humans Lazy.
As AI makes humans life more convenient, humans gradually become heavily dependent on
machines and become lazy.
3)Unemployment.
As one machine can do the work of hundreds of people, obviously it leads to unemployment if
hundreds of people are replaced by machine.
4)Lacking Out of Box Thinking.
Machines can carry out only those obligations which they’re designed or programmed to do,
anything out of that, they tend to crash or deliver inappropriate outputs which will be a prime
backdrop.

5th CSE, SPT, Tumkur 1

ARTIFICIAL INTELLIGENCE & MACHINE LEARNING
5)No Emotions.
There is no doubt that machines are a lot better in terms of running successfully but they can’t
replace the human connection that makes the team. Machines can’t develop a bond with people
which is an important characteristic of Team Management.

1. b) For the given scenarios you are required to build an AI solution. Which
AI techniques can be applied / best suited for stated problems. Justify
1. Extract and digitize the customer information from the Know Your
Customer (KYC) forms.
DeepLearning Techniques are more suitable to extract and digitize the customer information
from KYC forms. Optical Character Recognition: Hand writing Recognition using Open CV,
Tensorflow and Keras can be used in Deep Learning Techniques. Deep learning, has networks
capable of performing unsupervised learning from unlabeled or unstructured data.

2. To identify if employees are wearing face mask in the office campus.

Deep neural networks have made tremendous strides in the categorization of facial photos in
the last several years. Due to the complexity of features, the enormous sizeof the picture/frame,
and the severe inhomogeneity of image data, efficient face image classification using deep
convolutional neural networks remains a challenge. Therefore, as data volumes continue to
grow, the effective categorization of face photos in a mobile context utilizing advanced deep
learning techniques is becoming increasingly important. In the recent past, some Deep Learning
(DL) approaches for learning to identify face images have been designed; many of them use
convolutional neural networks (CNNs).

3. To identify and narrow down tumour regions and further predict if the
tumour is malignant or not.
Deep Learning with Convolutional Neural Network is more suitable. Deep learning replicates
the functions of the human brain to perform several actions — one of them is image detection.
It does this in the form of Neural Networks — algorithms which help a machine to recognize
relationships between large amounts of data. And the best thing is, it does this analysis all by
itself (with the help of a CNN)!

4. Automated inspection and cost estimation step in the Insurance claim

business process
Random forest algorithm is best suited for automated inspection and cost estimation. The
random forest technique can handle large data sets due to its capability to work with many
variables running to thousands. The random forest technique can also handle big data with
numerous variables running into thousands. It can automatically balance data sets when a class
is more infrequent than other classes in the data. The method also handles variables fast,
making it suitable for complicated tasks.

5th CSE, SPT, Tumkur 2

ARTIFICIAL INTELLIGENCE & MACHINE LEARNING

5. To identify the location of a moving car within an image

The method uses a deeplearning-based approach (YOLOv4) for image processing for vehicle
detection and vehicle type classification. Lane-by-lane vehicle trajectories are estimated by
matching the detected vehicle locations with the high-definition map (HD map).Based on the
estimated vehicle trajectories, the traffic volumes of each lane-by-lane traveling direction and
queue lengths of each lane are estimated. The performance of the proposed method was tested
with thousands of samples according to five different evaluation criteria: vehicle detection rate,
vehicle type classification, trajectory prediction, traffic volume estimation, and queue length
estimation. The results show a 99% vehicle detection performance with less than 20% errors in
classifying vehicle types and estimating the lane-by-lane travel volume, which is reasonable.
The YOLOv4 method shows the feasibility of collecting detailed trafficinformation using a
camera installed at an intersection.

2.a) Which technique help in addressing certain complex problems with

higher accuracy and better generalization characteristics much like human
brain in Computer Vision, Natural Language Processing and Speech
Domains? And why?

Deep learning has been successfully applied to numerous problems in many application areas.
These include natural language processing(NLP), sentimentanalysis, cyber security, business,
virtual assistants, visual recognition, healthcare, robotics, speech recognition software, as
self-driving cars and language translation services and many more.
Visual recognition tasks such as image classification, localization , and detection arekey
components of Computer vision.
Feature engineering is one of the best ways to increase the accuracy of theclassification
model.

2)b) For the following scenarios you are required to build a predictive model. Which
machine learning technique/ algorithm can be applied / best suited for stated problems.
Justify your recommendation.

Predicting the food delivery time

The Linear Regressor does best in reducing the orders which are 10 minutes late,
by around 10%. Interestingly, the ensemble model does fairly well in both
metrics but notably, it has the smallest MAE out of the three models.

Predicting whether the transaction is fraudulent

Logistic Regression: Logistic Regression is a type of supervised learningtechnique
that is used to make categorical decisions. This means that if a transaction occurs,
the outcome will be either 'fraud' or 'non-fraud'.

Predicting the credit limit of a credit card applicant

Random Forest are used to test the variable in predicting credit fraud, and by the

5th CSE, SPT, Tumkur 3

ARTIFICIAL INTELLIGENCE & MACHINE LEARNING
experimental outcomes and results it's evident that the Random Forestalgorithm
predicts the credit card fraud detection with the accuracy of 99.95% and also with
good precision rate 100%
To group similar customers of an online grocery store, based on their purchasing patterns,
to offer discounts to its customers.

Clustering is a Machine Learning technique that involves classifying datapoints

into specific groups. If we have some objects or data points, then we can apply the
clustering algorithm (s) to analyze and group them as per their properties and
features. This method of unsupervised technique is used because of its statistical
techniques.

Predict the probability of a mechanical system breakdown, based on its system vibration
and operating temperature
Naïve Bayes classifier is a supervised learning algorithm, which is used to make
predictions based on the probability of the object. The algorithm named as Naïve
Bayes as it is based on Bayes theorem, and follows the naïve assumption that says'
variables are independent of each other.
Section-2
3.a) How to handle the missing values in the dataset? Explain.
The missing values can be handled by following ways:
• Keep the missing value as is.
• Remove data objects with missing values.
• Remove the attributes with missing values.
• Estimate and impute missing values.
A program to identify the attributes containing missing values, number of missing
values. perform data cleaning by removing missing values using various techniques.
import pandas as pd
df=pd.read_csv("tit anic.csv")
df.head()

#Checking missing values

df.isna().sum()

#Filling missing values through mean

df['Age']. fillna(df['Age']. mean(),inplace=True)

#Filling missing values through mode

df['Embarked'].fillna(df['Embarked'].mode()[0]
,inplace=True)#Dropping column
del df[‘Cabin’]
#Dropping specific rows
df.drop(df[(df['Name']=="Braund, Mr. Owen
Harris")].index,inplace=True)

5th CSE, SPT, Tumkur 4

ARTIFICIAL INTELLIGENCE & MACHINE LEARNING
df.drop(df[(df['PassengerId']==5)].index,inplace=True)
output:

3) b) The statistical summary of Iris dataset is as follows. Analyse and

explain statistical metrics from above summary.

The statistical data can be obtained by using describe() function . The describe method
returns description As the above given statical Summary the following things thing can be
analysed.
count - The number of not-empty values in Iris dataset
mean - The average or mean value of Iris dataset
std - The standard deviation.
min - the minimum value in Iris dataset..
25% - The 25% percentile.
50% - The 50% percentile.
75% - The 75% percentile.
max - the maximum value in Iris dataset

4.a) Consider a real estate company that has a dataset containing the prices
of properties in the Delhi region. It wishes to use the data to optimise the
sale prices of the properties based on important factors such as area,
bedrooms, parking, etc.

5th CSE, SPT, Tumkur 5

ARTIFICIAL INTELLIGENCE & MACHINE LEARNING

Essentially, the company wants

a. To identify the variables affecting house prices, e.g. area, number of
rooms, bathrooms, etc.
b. To create a model that quantitatively relates house prices with
variables such as number of rooms, area, number of bathrooms, etc.
c. To know the accuracy of the model, i.e. how well these variables can
predict house prices.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

df=pd.read_csv("C:/Users/Shilpa/Desktop/Housing (1).csv")
df.head(10)
df.info()
df.describe()
status=pd.get_dummies(df['furnishingstatus'])
col=['mainroad', 'guestroom', 'basement', 'hotwaterheating',
'airconditioning', 'prefarea']def binary_map(x):
return x.map({'yes': 1, "no": 0})

df[col] = df[col]. apply(binary_map)

status = pd.get_dummies(df['furnishingstatus'],
drop_first = True)df= pd.concat([df, status], axis =
1)
df.head(10)
df.drop(['furnishingstatus'], axis = 1,
inplace = True)x= df.iloc[:,1:]
y=df.iloc[ : ,0]

from sklearn.model_selection import train_test_split

x_train,x_test,y_train,y_test = train_test_split(x,y,test_size = 0.3,
random_state = 100)from sklearn.linear_model import LinearRegression
lm = LinearRegression()
lm.fit(x_train,y_train)
y_test=np.array(y_test)
y_test=y_test.reshape(-1,1)
y_pred=lm.predict(x_test)

from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

g=y_test.reshape(164,)
my_dict = {"Actual": g, "Pred" : y_pred}
compare = pd.DataFrame(my_dict)

5th CSE, SPT, Tumkur 6

ARTIFICIAL INTELLIGENCE & MACHINE LEARNING
compare.sample(10)

from sklearn.metrics import mean_squared_error, mean_absolute_error,

r2_score def evaluation_metrics(actual, pred):
MAE = mean_absolute_error(actual, pred)
MSE = mean_squared_error(actual, pred)
RMSE = np.sqrt(mean_squared_error(actual, pred))
SCORE = r2_score(actual, pred)
return print ("r2 score:", SCORE, "\n", "MAE:" , MAE, "\n", "mse: ",
MSE, "\n","rmse:" , RMSE)
evaluation_metrics(g, y_pred)
from yellowbrick.regressor import PredictionError
visualizer = PredictionError(lm) visualizer.fit(x_train,
y_train) visualizer.score(x_test,g)
visualizer.show()

Section-3
4) a) N-grams are defined as the combination of N keywords together. Consider the
given sentence: “Data Visualization is a way to express your data in a visual context so
that patterns, correlations, trends between the data can be easily understood.” Generate
bi-grams and tri-grams for the above sentence
• Before performing text cleaning steps.
• After performing following text cleaning steps:
a. Stop word Removal
b. Replacing punctuations by a single space

text="“Data Visualization is a way to express your data in a visual context so that patterns,
correlations, trends between the data can be easily understood.” Generate bi-grams and
tri- grams for the above sentence"

5th CSE, SPT, Tumkur 7

ARTIFICIAL INTELLIGENCE & MACHINE LEARNING

5th CSE, SPT, Tumkur 8

ARTIFICIAL INTELLIGENCE & MACHINE LEARNING

5)b) K-means clustering with Euclidean distance suffer from the curse of
dimensionality. Is the statement true and why?

True, KMeans clustering suffers from curse of dimentionality. The reason is as the number of
dimensions increases, a distance-based similarity measure converges to a constant value
between any given examples. This convergence means k-means becomes less effective at
distinguishing between examples. The ratio of the standard deviation to the mean of distance
between examples decreases as the number of dimensions increases. Reduce dimensionality
either by using PCA on the feature data, or by using “spectral clustering” to modify the
clustering algorithm.

6) a) The sinking of the Titanic is one of the most infamous shipwrecks in history.
On April 15, 1912, during her maiden voyage, the widely considered
“unsinkable” RMS Titanic sank after colliding with an iceberg. Unfortunately,
there were not enough lifeboats for everyone onboard, resulting in the death of
1502 out of 2224 passengers and crew. You are asked to build a machine
learning model to predict whether a passenger survived or not. Describe each
steps used to build model.

5th CSE, SPT, Tumkur 9

ARTIFICIAL INTELLIGENCE & MACHINE LEARNING

Section-4

7) a)A machine learning model was built to classify patient as covid +ve
or -ve. The confusion matrix for the model is as shown below.
Compute other performance metrics and analyze the performance of
the model.

5th CSE, SPT, Tumkur 10

ARTIFICIAL INTELLIGENCE & MACHINE LEARNING

TP=397,FP=103,FN=126,TN=142
accuracy=TP+FP/T
P+FP+FN+TN
397+103/397+103+
126+142
0.65
Recall=TP/TP+FN 397/397+126
0.75
Precision=
TP/TP+FP
397/397+10
3
0.79
F1_score=2*precision*recall/precision+recall
2*0.79*0.75/0.79+0.75
1.185/1.54
0.76

7)b) A Machine Learning Engineer is preparing a data frame for a

supervised learning task. The ML Engineer notices the target label
classes are highly imbalanced and multiple feature columns contain
missing values. The proportion of missing values across the entire data
frame is less than 5%. What should the ML Engineer do to minimize
bias due to missing values? Support your argument.
Columns in the dataset which are having numeric continuous values can be replaced
with the mean, median, or mode of remaining values in the column. This method can
prevent the loss of data compared to the earlier method. Replacing the above two
approximations (mean, median) is a statistical approach to handle the missing values.
When missing values is from categorical columns (string or numerical) then the
missing values can be replaced with the most frequent category. If the number of
missing values is very large then it can be replaced with a new category.
Missing values can be handled by deleting the rows or columns having null values. If
columns have more than half of the rows as null then the entire column can be
dropped. The rows which are having one or more columns values as null can also be
dropped.

5th CSE, SPT, Tumkur 11

ARTIFICIAL INTELLIGENCE & MACHINE LEARNING

8 a)The data scientist observes that, during multiple runs with the identical
parameters the loss function converges to different, yet stable values. What
should the data scientist do to improve the training process? Justify.
It is most likely that the loss function is very curvy and has multiple local minima
where the training is getting stuck. Decreasing the batch size would help the Data
Scientist stochastically get out of the local minima saddles. Decreasing the learning
rate would prevent overshooting the global loss function

8 b) A company has collected customer comments on its products, rating

them as safe or unsafe, using decision trees. The training dataset has the
following features: id, date, full review, full review summary, and a binary
safe/unsafe tag. During training, any data sample with missing features was
dropped. In a few instances, the test set was found to be missing the full
review text field. For this use case, which is the most effective course of
action to address test data samples with missing features. Justify
Missing values can be handled by deleting the rows or columns having null values.
If columns have more than half of the rows as null then the entire column can be
dropped. The rows which are having one or more columns values as null can also
be dropped.
In the column, full review, missing values can be replaced by most
frequently appeared review.

Section-5
9) a) What are the deployment strategies borrowed from DevOps that can
beutilized in MLOPs. Explain anyone strategy.
• Rolling Strategy and Canary Deployments.
• Recreate Strategy.
• Custom Strategy.
• Blue-Green Deployment using routes.
• A/B Deployment and canary deployments using routes.
• One Service, Multiple Deployment Configuration

Blue/Green
This form of deployment is essentially a server swap. There are two identical
systems available. The user requests are routed to one of the systems, and the
newer changes and updates are done on the other system. Once the updates are
tested and verified, theuser requests are routed to the newer system, essentially
swapping out the old model for a new one.

5th CSE, SPT, Tumkur 12

ARTIFICIAL INTELLIGENCE & MACHINE LEARNING

9)b) Machine learning models can be resource heavy. They require a good
amount of processing power to predict, validate, and recalibrate, millions
of times over. How can containerisation of ML model solve this problem?
Training an ML model on your own machine, without containerizing, can:
1. Slow down your machine or even make it unusable.
2. Hog local resources for a long time.
3. Waste time on repetitive tasks. If your machine fails during training, training
times can start all over.
4. Limit your resources to only what’s on your machine.
The container can be spun up anywhere, anytime, and will always be an exact duplicate
of the original container image, whether it is right now, in a month or in years.
Furthermore, exposing your model container as an endpoint will separate it from its
serving infrastructure, supporting decoupled architecture. This means that if you everwant
to exchange the existing model for another, or implement it with other services, it is an
easy switch and integration. Orchestration can help you distribute the job over several
nodes or containers, reducing the total amount of time to finish.

10. a) How will you deploy a trained machine learning model as a predictive
service in a production environment. Explain.
Step1: Get your data pipeline ready

Long before you reach the predictive model deployment stage, you need to make
sure that your data pipelines are structured effectively and are giving you high-
quality, relevant data.
Step2: Access the right external data

When you are building a predictive model for production, you need to be sure
that youare working with the best possible data, from the most relevant sources,
right up until the moment you launch. If it’s already stale, your carefully crafted
models won’t be much use. Part of the challenge is getting hold of enough
historical data to get a complete picture. Few organizations can collect all the data
they need internally.

Step3: Build strong training and testing automation tools

Rigorous training and testing is essential before you can progress to the predictive
model deployment stage, but it can be a time-consuming process. To avoid getting
slowed down, you need to automate as much as you can. This doesn’t mean
simply working in a few time-saving tools or tricks. The goal is to create models
that can ultimately run without any action on your part.
Step4: Design robust auditing, monitoring and retraining protocols

5th CSE, SPT, Tumkur 13

ARTIFICIAL INTELLIGENCE & MACHINE LEARNING

Before you can deploy your predictive model, you need to know that it’s actually
delivering the kind of results you’re looking for, that these results are accurate,
and that the data you’re feeding into the model will keep these models relevant
over time. Relying on the same, tired old data can cause model drift, leading to
inaccurate results. This means you need to build training pipelines and processes
that bring in new data, audit your internal data sources, and tell you which features
are still giving you valuable insights.

10) b)For the below given scenarios, suggest best suited cloud deployment
model and list the challenges with it.
1. For , a. Variable workload b. Test and Development
2. For, a. Cloud bursting b. On demand access c. Sensitive data