0% found this document useful (0 votes)
116 views22 pages

AI Part B (XII) 2023-24

Uploaded by

majirounak07
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
116 views22 pages

AI Part B (XII) 2023-24

Uploaded by

majirounak07
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

SHRI VAISHNAV ACADEMY

CLASS: XII
ARTIFICIAL INTELLIGENCE(843)
Subject Specific Skills(Part B) Marks:50
Blue-print for Sample Question Paper for Class XII (Session 2024-25)
Max. Time: 2 Hours Max. Marks: 50

OBJECTIVE SHORT DESCRIPTIVE/


TYPE ANSWER TYPE LONG ANS. TYPE
UNIT QUESTIONS QUESTIONS QUESTIONS TOTAL
NAME OF THE UNIT
NO. QUESTIONS
1 MARK EACH 2 MARKS EACH 4 MARKS EACH

1 Capstone Project 10 4 2 16

2 Model Life Cycle 8 1 1 10

Story Telling through


3 6 1 2 8
Data
TOTAL QUESTIONS 24 6 5 34

NO. OF QUESTIONS TO
20 Any 4 Any 3 27
BE ANSWERED
TOTAL MARKS 1 x 20 = 20 2x4=8 4 x 3 = 12 40 MARKS

PART B - SUBJECT SPECIFIC SKILLS (40 MARKS):


Unit 1- Capstone Project,
Unit 2- Model Life Cycle,
Unit 3- StoryTelling Through Data
Unit 1 & Unit 2-
Key concept: AI Project Cycle, Model Validation, RMSE, MSE, MAPE
Multiple Choice questions:
1.A researcher wants to study the association between gender and using a mobile phone.
Data collected for this study will be ______
a. Qualitative data b. Quantitative data c. Continuous data d. Classified data 2.
Primary way to collect DATA (Data Gathering process)?
a. Experiment b. Survey c. Interview d. Observation
3. The data scientist will use ___________ for predictive modelling?
a. Artificial Intelligence b. Machine Learning
c. Training Set d. Deep Learning
4. Which one does NOT belong with Classification loss?
a. Log loss b. Mean Absolute Error
c. Exponential Loss d. Hinge Loss
5. Which process does NOT come under the Capstone Project?
a. AI Model b. AI Project Cycle
c. Deployment d. Data Gathering
6.Which one does NOT belong with Regression loss?
1
a. Log Loss b. Mean Absolute Error
c. Log cosh Loss d. Quantile Loss
7. Choose Correct Option
a. Scope >> Acquire >> Explore >> Prepare>> Model>> Assess >>Deploy>> Batch
b. Scope >> Acquire >> Explore >> Prepare>> Model>> Deploy>>Real Time>> Batch
c. Scope >> Acquire >> Prepare >> Assess >>Deploy>> Batch>>Real Time>> Explore
d. Scope >> Acquire >> Explore >> Model >> Prepare>>Assess >>Deploy>> Batch 8. Adding a non-
important feature to a linear regression model may result in.
1)Increase in R-square
2)Decrease in R-square
a. Only 1 is correct b. Only 2 is correct c. Either 1 or 2 d. Neither 1 nor 2
9. Which of the following options is/are true for K-fold cross-validation?
1) Increase in K will result in higher time required to cross validate the result.
2) Higher values of K will result in higher confidence on the cross-validation result as compared to lower
value of K.
3) If K=N, then it is called Leave one out cross validation, where N is the number of observations
a. 1 and 2 b. 2 and 3 c. 1 and 3 d. 1, 2 and 3
10. Which stage in Design Thinking missing [Prototype, Ideate, Test, Define]
a. Evaluation b. Empathies c. Evolution d. Enrichment
11. In AI development which framework is used?
a. Scikit-learn b. Tkinter c. PyCharm d. Matplotlib
12. Which of these statements about deep learning programming frameworks are true?
1) A programming framework allows you to code up deep learning algorithms with typically
fewer lines of code than a lower-level language such as Python.
2) Even if a project is currently open source, good governance of the project helps ensure
that the it remains open even in the long term, rather than become closed or modified to
benefit only one company.
3) Deep learning programming frameworks require cloud-based machines to run.
a. 1 b. 1 & 2 c. 1, 2 & 3 d. 1 & 3
13.Choose Correct Option
a. Data Requirements >> Data Collection >> Data understanding >> Data preparation
b. Data Requirements >> Data understanding >> Data Collection>> Data preparation c. Data
Requirement >> Data Deployment>> Data Collection >> Data Gathering
d. Data Collection >> Data Request >> Data Filtering >> Data Evaluation
14. If your Neural Network model seems to have high variance, what of the following Would be promising
things to try?
a. Make the Neural Network deeper b. Get more training data
c. Get more test data d. Increase the number of units in each hidden layer 15. Why do we
normalize the inputs x?
a. Normalization is another word for regularization–It helps to reduce variance
b. It makes it easier to visualize the data
c. It makes the cost function faster to optimize
d. It makes the parameter initialization faster
16. Which language is Most suitable for developing AI?
a. Kotlin b. Swift c. Python d. HTML
17. A random sample of n=6 taken from the population has the elements 6, 10, 13, 14, 18, Then, which is the
option False?

2
a. Point estimate for population mean is 13.5
b. Point estimate for population standard deviation is 4.68
c. Point estimate for population standard deviation is 3.5
d. Point estimate for standard error of mean is 1.91
18. Which of the following statements is False in the case of the KNN Algorithm?
a. For a very large value of K, points from other classes may be included in the neighborhood.
b. For the very small value of K, the algorithm is very sensitive to noise.
c. KNN is used only for classification problem statements.
d. KNN is a lazy learner
19.The robotic arm will be able to paint every corner in the automotive parts while
minimizing the quantity of paint wasted in the process. Which learning technique is
used in this problem?
a. Supervised Learning. b. Unsupervised Learning.
c. Reinforcement Learning. d. Both (a) and (b).
20.Which of the following statements is/are INCORRECT
i) The volume of test data can be large, which presents complexities.
ii) Your testing team should test the AI and ML algorithms keeping model validation,
successful learnability, and algorithm effectiveness in mind.
iii) Test data should include all irrelevant subsets of training data, i.e., the data you
will use for training the AI system.
a. None of the Below b. ii) c. iii) d. i),ii) & iii)
21. Which of the following is FALSE about Correlation and Covariance?
a. A zero correlation does not necessarily imply independence between variables.
b. Correlation and covariance values are the same.
c. The covariance and correlation are always the same sign.
d. Correlation is the standardized version of Covariance.
22. Which of these is NOT analytic based on type of question?
a. Descriptive b. Statistical Analysis
c. Forecasting d. Data evaluation
23.Which of the following statements is/are INCORRECT:
i) Different transforms of the data used to train the same machine learning model.
ii) Different machine learning models cannot be trained on the same data.
iii) Different configurations for a machine learning model trained on the same data
a. i) b. ii) c. Both ii) & iii) d. Both i) & ii)
24. Which of the following is FALSE about Deep Learning and Machine Learning algorithms?
a. Deep Learning algorithms work efficiently on a high amount of data.
b. Feature Extraction needs to be done manually in both ML and DL algorithms.
c. Deep Learning algorithms are best suited for unstructured data.
d. Deep Learning algorithms require high computational power.
25. If the problem is based on probabilities of an action, then which analytic approach can be used?
a. Predictive Model b. Prescriptive
c. Diagnostic d. Descriptive
26. Which of the following is FALSE for neural networks?
a. Artificial neurons are similar in operation to biological neurons.
b. Training time for a neural network depends on network size.
c. Neural networks can be simulated on conventional computers.
d. The basic unit of neural networks are neurons.

3
27.The following data is used to apply a linear regression algorithm with least squares
regression line Y=a1X. Then, the approximate value of a1 is given by:
(XIndependent variable, Y-Dependent variable)

X 1 20 30 40

Y 1 400 800 1300

a. 27.876 b. 32.650 c. 40.541 d. 28.956


28.In Design Thinking, ____ phase involves gathering user feedback on the prototypes
you've created as well as obtaining a better understanding of your users.
a) Prototype b) Test c) Ideate d) Empathize
29. _______ is the first step involved in telling an effective data story.
(a) Creating visuals (b) Adding narrative
(c) Understanding the Audience (d) Gathering data
30 Match the following
1) Which category? A. (Anomaly Detection)
2) How much or how many? B. (Regression)
3) Which group? C. (Recommendation)
4) Is this unusual? D. (Classification)
5) Which option should be taken? E. (Clustering)
a) 1=D, 2=B, 3=E, 4=A, 5=C b) 1=C, 2=D, 3=B, 4=E, 5=A
c) 1=D, 2=B, 3=C, 4=E, 5=A d) 1=E, 2=A, 3=D, 4=C, 5=B
31. Identify two AI development tools from the following:
1) DataRobot 2) Python 3) Scikit Learn 4) Watson Studio
(a) 1 & 2 (b) 2 & 3 (c) 1 & 3 (d) 1 & 4
32. You want to predict future house prices. The price is a continuous value, and therefore we want to do
regression. Which loss function should be used here?
(a) RMSE (b) MSE (c) Exponential error (d) MAE
33. The design phase of the AI Model Life Cycle is an ______ process.
(a) compact (b) permanent (c) periodic (d) iterative
34.Techniques like descriptive statistics and visualizations can be applied to datasets after the original data
gathering to analyze the content. To close the gap, additional data collecting may be required. Identify the
stage of this analytic approach.
(a) Data Requirements (b) Data Gathering (c) Data Understanding (d) Data Preparation
35.In this phase, we define the project's strategic business objectives and desired outcomes,
align all stakeholders' expectations as well as establish success metrics.
Identify this phase of the AI Model Life Cycle.
(a) Design (b) Scoping (c ) Evaluation d) Data Collection
36. Aloss function is a measure of how good a prediction model is in terms of being
able to predict the expected outcome.
i. The loss function will output a lower number, if the predictions are good.
ii. The loss function will output a greater number, if the predictions are incorrect.
a.Only i is correct b. Only ii is correct
c.Either i or ii is correct d. Both i & ii are correct 37.
Which of the following is not a feature of RMSE? (a) It tells about
the accuracy of the model.

4
(b) Higher value means hyper parameters need to be tweaked
(c) Lower RMSE values are not good for the AI model.
(d) RMSE is a measure of how evenly distributed residual errors are.
38. Once you have got an AI model that's ready for production, AI engineers then ____ a trained model,
making it available for external inference requests.
(a) Evaluate (b) Test (c) Deploy (d) Redesign
39.Data Validation for human biases is conducted in _________ phase of AI Model Life Cycle.
(a) Scoping (b) Data Collection (c) Design (d) Testing
40.Which of the following is a disadvantage of Cross Validation Technique?
(a) Cross-validation provides insight into how the model will generalize to a new dataset.
(b) Cross-validation aids in determining a more accurate model prediction performance estimate.
(c) As we need to train on many training sets, cross-validation is computationally expensive. (d)
Cross-validation could result in more precise models.
41. Hyper parameters are parameters whose values govern the learning process.
(a) True (b) False
42. Choose the difference between Regression and Classification Loss functions from the following:
(a) Regression functions predict a quantity, and classification functions predict a label.
(b) Regression functions predict a label, and classification functions predict a quantity.
(c) Regression functions predict a qualitative value, and classification functions predict a label.
(d) Regression functions predict a label, and classification functions predict a qualitative value.
43. .A good model should have an _____ value less than 180.
(a) RMSE (b) MSE (c) Focal Loss (d) MAE
44. Which of the following is incorrect?
1) Testing data is the one on which we train and fit our model basically to fit the parameters
2) Training data is used only to assess performance of model
3) Testing data is the unseen data for which predictions have to be made
a) 1) and 3) only b) 1) and 2) only c) 2) and 3) only d) 1), 2) and 3)
45. Choose an example of an AI predictive model.
(a) YouTube (b) Spam detection
(c) weather forecast (d) Sentiment Analysis
46. Which of the following are the objectives of the testing team in AI modelling?
1) Model Validation
2) Security compliance
3) Understanding data
4) Minimizing bias
(a) (1), (2) and (3) b. (2), (3) and (4) c. (1), (3) and (4) d. (1), (2) and (4)
47.If AI techniques are to be applied to a dataset, the data must have a ________.
a. Association b. relationship c. pattern d. Either a and b
48. Which of these are common split percentage between Train and Test Data?
i.Train:5%. Test:95% ii.Train:50%. Test:50% iii.Train:80%. Test:20% iv.Train:67%.
Test:33% Choose the correct options:
a. i & ii b. ii, iii & iv c.ii & iv d. i, ii & iii
49. Which of the following key success factor to be considered in designing an AI model?
a.Initiative b. Visual Modelling
c. Effective evaluation d.Model validation
50. A training set is a set of already known. data in which the outcomes are
a. current (b) instant (c) historical (d) output

5
51.______involves a combination of three key elements, that help to explain the audience whats happening in
the data in an engaging and entertaining manner.
(a) Data Analysis (b) Data Visualization (c) Data Storytelling (d) Data Narrative
52. In Al development which of the following framework is used?
(a) Python (b) TensorFlow (c) Visual Basic (d) C++
53.All the algorithms in machine learning rely on minimizing or maximizing a function, called as______.
(a) Objective function (b) Focal Loss (c) Loss function (d) Gradient Descent
54. Which of the following is incorrect?
(a) The testing phase is essentially an iterative process.
(b) The first fundamental step when starting an Al initiative is scoping.
(c) In scoping phase, it's crucial to precisely define the strategic business objectives and desired
outcomes of the project,
(d) Test data should include all relevant subsets of training data.
55. It is believed that if there is a _______development in the data, then AI development techniques may be
employed.
(a) pattern (b) program (c) language (d) logic
56. Assertion (A): It's crucial to precisely define the strategic business objectives and desired outcomes of
the project, align all the different stakeholders' expectations, anticipate the key resources and steps, and
define the success metrics.
Reason (R): Selecting the Al or machine learning use cases and being able to evaluate the Return on
Investment (ROI) is critical to the success of any data project.
Select the appropriate option for the statements given above:
(a) Both (A) and (R) are true and (R) is the correct explanation of (A).
(b) Both (A) and (R) are true and (R) is not the correct explanation of (A).
(c) (A) is true, but (R) is false.
(d) (A) is false, but (R) is true.
57. Once the relevant projects have been selected and properly scoped, which of the following phase will be
the next step of the machine learning lifecycle?
(a) Problem Scoping (b) Deployment (c) Design (d) Testing
58. Which of the following is the first stage of an AI Model Life Cycle?
(a) Build (b) Scoping (c) Design (d) Testing
59. Which of the following is the most preferred language for building an Al model?
(a) Python (c) Java (b) VB (d) C++
60. First four steps of writing Python code to find out RMSE values of the model are given here. Arrange
them in proper order-
1. Splitting the data into training and test.
2 Reading the data.
3 Fitting simple linear regression to the training set.
4. Import required libraries.
(a) 2-4-1-3 (b) 4-3-2-1 (c) 1-2-3-4 (d) 4-2-1-3
61. Which of the following is NOT True for Testing?
(a) The volume of test data should be very small.
(b) Data validation is important.
(c) Your testing team should test the Al and ML algorithms keeping model validation. (d) Your team
must create test suites that help you validate your ML models.
62. Which of the following is NOT true for Train-Test Split Evaluation?
(a) The procedure involves taking a dataset and dividing it into two subsets.
6
(b) The train-test procedure is appropriate when there is a larger dataset.
(c) The objective is to estimate the performance of the user. (Correct Learning model on new data) (d) It
can be used for classification or regression problems.
2. Fill in the Blanks
1. Every project, regardless of its size, starts with business understanding, which lays the
foundation for successful resolution of the business problem.
2. If the problem is to determine probabilities of an action, then a predictive model might be
used.
3. If the problem is to show relationships, a descriptive approach maybe be required.
4. If the problem requires a yes/ no answer, then a classification approach to predicting a
response would be suitable.
5. Techniques such as descriptive statistics and visualization can be applied to the data set, to
assess the content, quality, and initial insights about the data.
2. What is a capstone project?
A capstone project is a project where students must research a topic independently to find a deep
understanding of the subject matter. It gives an opportunity for the student to integrate all their knowledge
and demonstrate it through a comprehensive project.
Examples:
1. Stock Prices Predictor
2. Develop A Sentiment Analyzer
3. Movie Ticket Price Predictor
4. Students Results Predictor
5. Human Activity Recognition using Smartphone Data set
6. Classifying humans and animals in a photo
3. What is the importance of pattern in problem solving?
Ans: The premise that underlies all Machine Learning disciplines is that there needs to be a pattern. If
there is no pattern, then the problem cannot be solved with AI technology. It is fundamental that this
question is asked before deciding to embark on an AI development journey.
4. List down different problem categories that comes under predictive analysis? Write one example
for each?
Ans: 1) Which category? (Classification)- Eg: Spam mail classification
2) How much or how many? (Regression)- Eg: Flight fare prediction
3) Which group? (Clustering)- Eg: Email marketing
4) Is this unusual? (Anomaly Detection) – Eg: Credit card fraud detection
5) Which option should be taken? (Recommendation) – Video recommendation system
5. Given below four graph pick approach based on the type of questions asked:-

A- Which of the graph is used for statistical analysis?


B- Which of the graph showing current status?
C- Which of the graph is used for predictive forecasting? D- Which of the graph is used
probability of the action? Ans:

7
A- The second graph shows statistical analysis what that is happened, why is happening?
B- The first graph shows current status or Descriptive status.
C- The third graph shows predictive forecasting that is what will happen next or what if the trend will continue
type of questions.
D- The four graph shows to determine probability of the action for this the prescriptive model might be used.

6.What is design thinking? Draw the diagram and briefly explain each stage of design thinking?
Ans: Design Thinking is a design methodology that provides a solution-based approach to solving
problems. It is extremely useful in tackling complex problems that are ill-defined or unknown.
The five stages of Design Thinking are as follows: Empathize, Define, Ideate, Prototype, and Test.
1.Empathize
● Observe consumers to gain a deeper understanding of the problem
● Observation must be made with empathy
● Use 5W1H method for right questioning
● Who, What, When, Where, Why
● How
Empathy Map
It is a collaborative visualization used to clarify our understanding of a specific type of user.

2.Define
● Define the problem statement
● Determining the cause of the problem
● Brainstorming to generate possible solutions
● Selecting most suitable solution
3.Ideate
● Gather ideas to solve the problem you defined
● Brainstorm to arrive at various creative solutions
4.Prototype

8
● A prototype is a simple experimental model for a proposed solution ● Build
representation(charts, models) of one or more ideas
5.Test
● Test the prototype and gain user feedback
● Iterate
● Design thinking is an iterative process
7. What is problem decomposition? Write down the steps involved in problem decomposition?
Problem decomposition is the process of breaking down the problem into smaller units before coding.
Problem decomposition steps
1. Understand the problem and then restate the problem in your own words
2. Break the problem down into a few large pieces.
3. Break complicated pieces down into smaller pieces.
4. Code one small piece at a time.
● Think about how to implement it ● Write the
code/query ● Test it… on its own.
● Fix problems, if any
8. Explain Train-Test Split Evaluation?
● The train-test split is a technique for evaluating the performance of a machine learning
algorithm.
● It can be used for classification or regression problems and can be used for any supervised
learning algorithm.
● The procedure involves taking a dataset and dividing it into two subsets.
● The first subset is used to fit the model and is referred to as the training dataset.
● The second subset is not used to train the model; but to evaluate the fit machine learning
model. It is referred to as testing dataset.
9. How will you configure train test split procedure?
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
OR
X_train, X_test, y_train, y_test = train_test_split(X,y, train_size=0.67)
● The procedure has one main configuration parameter, which is the size of the train and test
sets.
● This is most commonly expressed as a percentage between 0 and 1 for either the train or test
datasets.
● For example, a training set with the size of 0.67 (67 percent) means that the remainder
percentage 0.33 (33 percent) is assigned to the test set.
● There is no optimal split percentage.
Nevertheless, common split percentages include:
● Train: 80%, Test: 20%
● Train: 67%, Test: 33%
● Train: 50%, Test: 50
10.Explain cross validation?
Ans: It is a resampling technique for evaluating machine learning models on a sample of data.
● The process includes a parameter k, which specifies the number of groups in to which a given
data sample should be divided.
● The process is referred as K- fold cross validation. For example K=10 for 10-fold cross
validation.
● More reliable, though it takes longer to run.

9
11. Explain the difference between cross validation and train test split?
Ans: On small datasets, the extra computational burden of running cross-validation isn’t a big deal. So, if
your dataset is smaller, you should run cross-validation
● If your dataset is larger, you can use train-test-split method.
12.What are hyper parameters?
Ans: Hyper parameters are parameters whose values govern the learning process. They also determine the
value of model parameters learned by a learning algorithm.
Eg: The ratio of train-test-split, Number of hidden layers in neural network, Number of clusters in
clustering task.
13. How are MSE and RMSE related? What is their range? Are they sensitive to outliers?
MSE: One of the most used regression loss functions is MSE. We determine the error in
Mean-Squared-Error, also known as L2 loss, by squaring the difference between the predicted and actual
values and average it throughout the dataset.

● Squaring the error gives outliers more weight, resulting in a smooth gradient for minor errors.
● Because the errors are squared, MSE can never be negative. The error value varies from 0 to
infinity.
● The MSE grows exponentially as the error grows. An MSE value close to zero indicates a
good model.
● It is especially useful in removing outliers with substantial errors from the model by giving
them additional weight.

10
RMSE: The square root of MSE is used to calculate RMSE. The Root Mean Square Deviation (RMSE) is
another name for the Root Mean Square Error.

● A RMSE value of 0 implies that the model is perfectly fitted. The model and its
predictions perform better when the RMSE is low.
● A greater RMSE indicates a substantial discrepancy between the residual and the ground
truth.
● The RMSE of a good model should be less than 18
14. What is loss function? What are the different categories of loss function?
● A loss function is a measure of how good a prediction model does in terms of being able to
predict the expected outcome.
● Loss functions can be broadly categorized into 2 types: Classification and Regression
Loss.
● Regression functions predict a quantity, and classification functions predict a label.

15. Differentiate between training and test data.


Ans: A training set is a set of historical data in which the outcomes are already known.
Train Dataset is used to fit te machine learning model.
Test dataset :used to evaluate the fit machine learning model.
16. Explain the Cross -Validation Procedure? In which situation is it better than a Train Test
Spilt? Ans:Ans. We use cross-validation to obtain numerous metrics of model quality by running
our modelling procedure on various subsets of the data.

The original dataset is evenly divided into k subparts or folds for k-fold cross-validation. For each
iteration, one group from the k-folds is chosen as the test data, and the remaining (k-1) groups are chosen
as the training data. K times of this process are carried out. By using the mean accuracy, the model's final
accuracy is calculated.

Cross-validation should be used when the dataset is smaller for greater accuracy.

11
15. Consider the following data:

Regression line equation: Y=0.681x + 15.142. Calculate MSE and RMSE from the above
information

4.Consider the following data


x 44 46 48 50 52
y 47 48 55 58 49
Regression Equation: 0.7x + 17.8 Calculate the RMSE(Root
means Square Error) for the above data.
Ans: CALCULATE :RMSE
x y(Observed) Predicted (y1) Residual= Squared Residual
o.7x+17.8 (Predicted-Observed) (y1-y)2
Residual=(y1-y)

44 47 0.7x+17.8 48.6-47 (1.6x1.6)


=0.7*44+17.8 =1.6 =2.56
=48.6

46 48 50 2 4

48 55 51.4 -3.6 12.96

12
Regression equation: 0.7x+17.8

RMSE= = 3.836
50 58 52.8 -5.2 27.04

52 49 54.2 5.2 27.04

Σ(y1-y)2=73.6
15. Draw the diagram73. 6/5 of Analytic Approach and explain each stage?

1. Business understanding
• What problem you are trying to solve?
• Every project, whatever its size, begins with the understanding of the business.
• Business partners who need the analytics solution play a critical role in this phase by
defining the problem, the project objectives, and the solution requirements from a
business perspective.
2. Analytic approach
• How can you use the data to answer the question?
• The problem must be expressed in the context of statistical learning to identify the
appropriate machine learning techniques to achieve the desired result.
3. Data Requirement
What data do you need to answer the question?
• Analytic approach determines the data requirements – specific content, formats, and data
representations, based on domain knowledge.
4. Data collection
• Where is the data coming from (identify all sources) and how will you get it?
• The Data Scientist identifies and collects data resources (structured, unstructured and
semi-structured) that are relevant to the problem area.
• If the data scientist finds gaps in the data collection, he may need to review the data
requirements and collect more data.
5. Data understanding
• Is the data that you collected representative of the problem to be solved?
• Descriptive statistics and visualization techniques can help a data scientist understand
the content of the data, assess its quality, and obtain initial information about the data.
6. Data preparation
• What additional work is required to manipulate and work with the data?
• The Data preparation step includes all the activities used to create the data set used
during the modeling phase.
13
• This includes cleansing data, combining data from multiple sources, and transforming
data into more useful variables.
• In addition, feature engineering and text analysis can be used to derive new structured
variables to enrich all predictors and improve model accuracy.
7. Model Training
• In What way can the data be visualized to get the answer that is required?
• From the first version of the prepared data set, Data scientists use a Training dataset
(historical data in which the desired result is known) to develop predictive or descriptive
models.
• The modeling process is very iterative.
8. Model Evaluation
• Does the model used really answer the initial question or does it need to be adjusted? •
The Data Scientist evaluates the quality of the model and verifies that the business
problem is handled in a complete and adequate manner.
9. Deployment
• Can you put the model into practice?
• Once a satisfactory model has been developed and approved by commercial sponsors, it
will be implemented in the production environment or in a comparable test environment.
10.Feedback
• Can you get constructive feedback into answering the question?
• By collecting the results of the implemented model, the organization receives feedback
on the performance of the model and its impact on the implementation environment.
Train test split Evaluation Vs Cross Validation Procedure
Ans:
Train test split Evaluation Cross Validation Procedure

1. The train test procedure measure the 1. Cross validation is a resampling technique for evaluating
performance machine learning algorithm. machine learning.

2. It can be used for classification or regression problems. 2.Cross Validation is a very necessary tool to evaluate your
model for accuracy in classification. Logistic Regression,
Random Forest, and Support vector Machine (SVM) have
their advantages and drawbacks to their models.

3.It is quick and easy to use, it is only suitable for huge 3. It does not work for complex data, it is only used for
data. small data.

4. This process works in large data models that are 4. This process works in small data models that are easy to
extremely expensive to train. train and accurately predict data.

14
5. The size of the train and test sets is the procedure’s key 5.The process includes only one parameter, k, which
configuration parameter. For either the train or test specifies the number of groups into which a given data.
datasets,this is usually given as percentage 0 and 1. Split % The process is frequently referred to as k-fold cross
used commonly are: validation.for example, k=10 or 10 fold cross- validation.
● Train: 80%, Test : 20%
● Train: 67%, Test :33 %
● Train: 50%, Test : 50%

6.The technique divides the provided dataset into two 6.Cross-Validation is a statistical method of evaluating and
subsets: comparing learning algorithms by dividing data into two
Training data: Training dataset is used to fine-tune the segments: one used to learn or train a model and the other
machine learning model and train the algorithm. used to validate the model.
Testing data: Test dataset algorithms make predictions using
the input elements from the training data.

7. Deep Neural network models are one example. 7.k-fold cross-validation is one of the most popular
strategies widely used by data scientists.

16. What is the difference between parameter and hyperparameter?


Ans:

17.What is a Model Parameter?


Ans: A model parameter is a variable whose value is estimated from the dataset. Parameters are the values
learned during training from the historical data sets.
The values of model parameters are not set manually. They are estimated from the training
data. These are variables that are internal to the machine learning model.
Based on the training, the values of parameters are set. These values are used by machine learning while
making predictions. The accuracy of the values of the parameters defines the skill of your model.
18.What are the different stages of the AI Project Life cycle?
Ans:The steps involved in AI project cycle are as given:
● The first step is Scope the Problem by which you set the goal for your AI project by stating the
problem which you wish to solve with it. Under problem scoping, we look at various parameters
which affect the problem we wish to solve so that the picture becomes clearer.

15
● Next step is to acquire data which will become the base of your project as it will help you in
understanding the parameters that are related to problem scoping.
● Next, you go for data acquisition by collecting data from various reliable and authentic sources.
Since the data you collect would be in large quantities, you can try to give it a visual image of
different types of representations like graphs, databases, flow charts, maps, etc.This makes it easier
for you to interpret the patterns in which your acquired data follows.
● After exploring the patterns, you can decide upon the type of model you would build to achieve the
goal. For this, you can research online and select various models which give a suitable output.
● You can test the selected models and figure out which is the most efficient one.
● The most efficient model is now the base of your AI project and you can develop your algorithm
around it.
● Once the modelling is complete, you now need to test your model on some newly fetched data. The
results will help you in evaluating your model and hence improving it. Finally, after evaluation, the
project cycle is now complete and what you get is your AI project.

19.What are the key considerations for testing in the AI and Data Science Life Cycle or Analytics
Project Life Cycle?
Ans:AI and Data Science Lifecycle: Key Steps and Considerations

1. AI Project Scoping
The first fundamental step when starting an AI initiative is scoping and selecting the relevant use case(s)
that the AI model will be built to address. In this phase, it's crucial to precisely define the strategic
business objectives and desired outcomes of the project, align all the different stakeholders' expectations,
anticipate the key resources and steps, and define the success metrics. Selecting the AI or machine
learning use cases and being able to evaluate the return on investment (ROI) is critical to the success of
any data project.
2. Building the Model
Once the relevant projects have been selected and properly scoped, the next step of the machine learning
lifecycle is the Design or Build phase, which can take from a few days to multiple months, depending on
the nature of the project. The Design phase is essentially an iterative process comprising all the steps
relevant to building the AI or machine learning model: data acquisition, exploration, preparation, cleaning,
16
feature engineering, testing and running a set of models to try to predict behaviors or discover insights in
the data.
3. Deploying to Production
In order to realize real business value from data projects, machine learning models must not sit on the
shelf; they need to be operationalized, or deployed into production for use across the organization.
Sometimes the cost of deploying a model into production is higher than the value it would bring. Ideally,
this should be anticipated in the project scoping phase, before the model is actually built, but this is not
always possible.
Another crucial factor to consider in the deployment phase of the machine learning lifecycle is the
replicability of a project: think about how this project can be reused and capitalized on by other teams,
departments, regions, etc., than the ones it's initially built to serve.

*************************************************************************************
Unit 3: Storytelling
Multiple choice questions:
1. According to Storytelling with Data, data is Computer readable information &
________________?
Ans:Information collected about the physical world
2. Data storytelling makes information memorable and __________ .
Ans:Easier to retain
3. ___________ is the first step involved in Data Storytelling.
Ans:Data Exploration
4. When visuals are applied to data, they provide _________ to the audience.
Ans:Insights
5. Narrative is the way we simplify and ____________.
Ans:Make sense of a complex world
6. A well-told story is an inspirational narrative that is crafted to engage the audience
across________.
Ans:Boundaries and Culture
7. Stories create _______ experiences that transport the audience to another space and time.
(a) unpleasant (b) tedious (c ) repetitive (d) engaging
8. The steps that assist in finding compelling stories in the data sets are as follows. Arrange them in
proper order:
1) Visualize the data.
2) Examine data relationships.
3) Get the data and organize it.
4) Create a simple narrative embedded with conflict.
(a) 1-2-3-4 (b) 2-3-1-4 (c ) 4-1-3-2 (d) 3-1-2-4
9. Stories change the way that we interact with data, transforming it from a dry collection of ______
to something that can be entertaining, thought provoking, and inspiring change.
(a) visuals (b) points (c ) images (d) facts
10. When visuals are applied to data, they can _______ the audience to the insights that they wouldn't
perceive without the charts or graphs.
(a) engage (b) explain (c) enlighten (d) change

11. Assertion (A): Stories that combine statistics and analytics are more persuasive. Reason (R):
When we talk about data storytelling, we're talking about stories in which data plays a central role.
Select the appropriate option for the statements given above:
17
(a) Both A and R are true and R is the correct explanation of A
(b) Both A and R are true and R is not the correct explanation of A
(c) A is true but R is false d. A is False but R is true
12. Which of the following shows the audience where to look and what not to miss and also keeps the
audience engaged?
(a) data (b) narrative (c) charts (d) story
13. Data storytelling is a_____ approach for communicating insights drawn from data.
(a) iterative (b) procedural (c) sequential (d) structured
14. Stories create _____experiences that transport the audience to another space and time.
(a) visualizations (b) engaging (c) testing (d) necessary
15. When the ______ is accompanied with data, it helps to explain to the audience what's happening
in the data and why a particular insight has been generated.
(a) visuals (b) narrative (c) project (d) graphs
16. ______is the first step involved in telling an effective data story.
Ans:Data
17. Stories that incorporate ____and analytics are more convincing than those based entirely on
anecdotes or personal experience.
Ans: Data and analytics
18. _____change the way in which we interact with data.
Ans:Augmented reality and virtual reality
19. Stories create ____experiences that transport the audience to another space and time.
Ans:a experience
20. Each data point holds some____which may be unclear and contextually different on its own.
Ans: Information
Very short answers: (2 Marks each)
1.Identify the below elements used to make a compelling data story

2.Name any two key elements of data storytelling?


Ans: 1.Visuals 2.Data 3.Narrative
3. What are the steps involved in telling an effective data story?
Ans: 1. Understanding the audience
2. Choosing the right data and visualizations
3. Drawing attention to key information
4. Developing a narrative
5. Engaging your audience
4. What are the two possible graphs that can be used to represent this data?
Ans:Bar Graph, Line Graph

18
5.What elements of data storytelling, when merged together can engage the audience?
Ans: Narrative and Visual

6. What is the importance of a narrative in a story?


Ans:Stories are more likely to drive action than are statistics and numbers. Therefore, when
told in the form of a narrative, it reduces ambiguity, connects data with context, and
describes a specific interpretation – communicating the important messages in Most
effective ways.
7.Explain how data storytelling can bring about change using a diagram.
Ans:Data storytelling is a structured approach for communicating insights drawn from data,and invariably
involves a combination of three key elements: data, visuals, and narrative. When the narrative is
accompanied with data, it helps to explain the audience what’s happening in the data and why a particular
insight has been generated. When visuals are applied to data, they can enlighten the audience to the
insights that they wouldn’t perceive without the charts or graphs. Finally, when narrative and visuals are
merged together, they can engage or even entertain an audience. When you combine the right visuals and
narrative with the right data, you have a data story that can influence and drive change.

8.Visualize the following data on bar graph. Meals served over time

Ans:

19
9. Write any four factors that make storytelling a powerful tool.
Ans: 1.Audience 2. Theme or Structure 3.Emotions 4. Branded
10. With reference to data storytelling consider the following diagram, Name A,B,C,D.
Ans:

A→ Narrator B→ Data c→ Visualization D→Change


11. Explain the term data storytelling.
Ans:Data storytelling is the art of presenting data with a contextual narrative. There are a few different
ways to present your data story.Data storytelling is the process of translating data analyses into
understandable terms in order to influence a business decision or action.
12.”A well told story is an inspirational narrative” . Give two points to justify the given statement.
Ans:A True Inspirational Story Involves Action and we all need a motivational force to keep our hopes
alive and increase our productivity.
13.”Storytelling is considered as a powerful element to enhance global networking”
justify.
Ans: True :Storytelling is considered as a powerful element to enhance global networking, stories
appeal to our senses and our emotions, not only drawing our attention more easily, but also leaving
an impact on us as audiences.
Justify:Whose motivations, aspirations, problems and needs to justify what type of product or service we
are going to design.
14. What happens when visuals are applied to data?
Ans:Data visualization helps to tell stories by curating data into a form easier to understand, highlighting
the trends and outliers. A good visualization tells a story, removing the noise from data and highlighting
useful information.
15. What is the purpose of narrative in data storytelling? Discuss briefly.
Ans:Narratives are especially powerful in changing people's beliefs and behaviors when people are
transported into the story. When this happens, people become emotionally engaged, are less likely to
critically evaluate facts and are more open to changing their beliefs
Long Answers: (4 Marks each)
1.Why data storytelling has acquired a place of importance?
Ans:1. It is an effective tool to transmit human experience. Narrative is the way we simplify and make
sense of a complex world. It supplies context, insight, interpretation—all the things that make data
meaningful, more relevant and interesting.
2. No matter how impressive an analysis, or how high-quality the data, it is not going to compel
change unless the people involved understand what is explained through a story.

20
3. Stories that incorporate data and analytics are more convincing than those based entirely on
anecdotes or personal experience.
4. It helps to standardize communications and spread results.
5. It makes information memorable and easier to retain in the long run.
2. What is data storytelling? Explain in detail.
Ans:Data storytelling is a structured approach for communicating insights drawn from data, and invariably
involves a combination of three key elements: data, visuals, and narrative. When the narrative is
accompanied with data, it helps to explain to the audience what’s happening in the data and why a
particular insight has been generated. When visuals are applied to data, they can enlighten the audience to
the insights that they wouldn’t perceive without the charts or graphs. Finally, when narrative and visuals
are merged together, they can engage or even entertain an audience. When you combine the right visuals
and narrative with the right data, you have a data story that can influence and drive change.

3.(a) List the steps of creating an effective data story.


(b) Which of the following is a better data story? Give reasons. Option
A:

Option B:

Ans:(a) The steps involved in telling an effective data story are given below:
1. Understanding the audience
21
2. Choosing the right data and visualizations
3. Drawing attention to key information
4. Developing a narrative
5. Engaging your audience
(b) Option A is a better data story.
Reasons 1. The insight given at the bottom of the visual is giving a clear idea about context which is not
there in Option B. 2. Option A visual provides a compelling narrative.

4. Consider the following graph. Mention the steps that can assist in finding compelling stories in the data
sets.

Ans: Mention Year(2008-2014), Millions tones(0-1,000)


Steps that can assist in finding compelling stories in the data sets are as follows:
Step 1: Get the data and organize it.
Step 2: Visualize the data.
Step 3: Examine data relationships.
Step 4: Create a simple narrative embedded with conflict.
*************************************************************************

22

You might also like