0% found this document useful (0 votes)

52 views21 pages

Unit 1: Capstone Project

artificial inteligence class 12 notes

Uploaded by

Parampara Bhatia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

52 views21 pages

Unit 1: Capstone Project

artificial inteligence class 12 notes

Uploaded by

Parampara Bhatia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 21

Unit 1: Capstone Project

A capstone project is a project where students must

research a topic independently to find a deep
understanding of the subject matter. It gives an
opportunity for the student to integrate all their knowledge
and demonstrate it through a comprehensive project.

1)Understanding The Problem

Artificial Intelligence is perhaps the most transformative

technology available today. At a high level, every AI
project follows the following six steps:
1) Problem definition i.e. Understanding the problem
2) Data gathering
3) Feature definition
4) AI model construction
5) Evaluation & refinements
6) Deployment
The premise that underlies all Machine Learning
disciplines is that there needs to be a pattern. If there is no
pattern, then the problem cannot be solved with AI
technology. It is fundamental that this question is asked
before deciding to embark on an AI development journey.
If it is believed that there is a pattern in the data, then AI
development techniques may be employed. Applied uses
of these techniques are typically geared towards
answering five types of questions, all of which may be
categorized as being within the umbrella of predictive
analysis:
1) Which category? (Classification)
2) How much or how many? (Regression)
3) Which group? (Clustering)
4) Is this unusual? (Anomaly Detection)
5) Which option should be taken? (Recommendation)

2) Decomposing The Problem Through DT Framework

Design Thinking is a design methodology that provides a

solution-based approach to solving problems. It’s
extremely useful in tackling complex problems that are ill-
defined or unknown.
The five stages of Design Thinking are as follows:
Empathize, Define, Ideate, Prototype, and Test.
Real computational tasks are complicated. To accomplish
them you need to break down the problem into smaller
units before coding.
Problem decomposition steps

1. Understand the problem and then restate the problem in

your own words
Know what the desired inputs and outputs are
Ask questions for clarification (in class these questions
might be to your instructor, but most of the time they will
be asking either yourself or your collaborators)

2. Break the problem down into a few large pieces. Write

these down, either on paper or as comments in a file.

3. Break complicated pieces down into smaller pieces.

Keep doing this until all of the pieces are small.

4. Code one small piece at a time.

1. Think about how to implement it
2. Write the code/query
3. Test it… on its own.
4. Fix problems, if any

Reviewing the line plot, it suggests that there may be a

linear trend, but it is hard to be sure from eye-balling.
There is also seasonality, but the amplitude (height) of the
cycles appears to be increasing, suggesting that it is
multiplicative.

Running the example plots the observed, trend, seasonal,

and residual time series.
We can see that the trend and seasonality information
extracted from the series does seem reasonable. The
residuals are also interesting, showing periods of high
variability in the early and later years of the series.

3) Analytic Approach

Those who work in the domain of AI and Machine

Learning solve problems and answer questions through
data every day. They build models to predict outcomes or
discover underlying patterns, all to gain insights leading to
actions that will improve future outcomes.

Every project, regardless of its size, starts with business

understanding, which lays the foundation for successful
resolution of the business problem. The business
sponsors needing the analytic solution play the critical role
in this stage by defining the problem, project objectives
and solution requirements from a business perspective.
And, believe it or not—even with nine stages still to go—
this first stage is the hardest.
After clearly stating a business problem, the data scientist
can define the analytic approach to solving it. Doing so
involves expressing the problem in the context of
statistical and machine learning techniques so that the
data scientist can identify techniques suitable for achieving
the desired outcome.
Selecting the right analytic approach depends on the
question being asked. Once the problem to be addressed
is defined, the appropriate analytic approach for the
problem is selected in the context of the business
requirements. This is the second stage of the data science
methodology.

If the question is to determine probabilities of an action,

then a predictive model might be used.
If the question is to show relationships, a descriptive
approach maybe be required.
Statistical analysis applies to problems that require
counts: if the question requires a yes/ no answer, then a
classification approach to predicting a response would be
suitable.

4) Data Requirement
If the problem that needs to be resolved is "a recipe", so to
speak, and data is "an ingredient", then the data scientist
needs to identify:

1. which ingredients are required?

2. how to source or the collect them?
3. how to understand or work with them?
4. and how to prepare the data to meet the desired
outcome?

Prior to undertaking the data collection and data

preparation stages of the methodology, it's vital to define
the data requirements for decision-tree classification. This
includes identifying the necessary data content, formats
and sources for initial data collection.
In this phase the data requirements are revised and
decisions are made as to whether or not the collection
requires more or less data. Once the data ingredients are
collected, the data scientist will have a good
understanding of what they will be working with.
Techniques such as descriptive statistics and visualization
can be applied to the data set, to assess the content,
quality, and initial insights about the data. Gaps in data will
be identified and plans to either fill or make substitutions
will have to be made.
In essence, the ingredients are now sitting on the cutting
board.
5) Modeling Approach
Data Modeling focuses on developing models that are
either descriptive or predictive.
An example of a descriptive model might examine things
like: if a person did this, then they're likely to prefer that.
A predictive model tries to yield yes/no, or stop/go type
outcomes. These models are based on the analytic
approach that was taken, either statistically driven or
machine learning driven.

The data scientist will use a training set for predictive

modelling. A training set is a set of historical data in which
the outcomes are already known. The training set acts like
a gauge to determine if the model needs to be calibrated.
In this stage, the data scientist will play around with
different algorithms to ensure that the variables in play are
actually required.

The success of data compilation, preparation and

modelling, depends on the understanding of the problem
at hand, and the appropriate analytical approach being
taken. The data supports the answering of the question,
and like the quality of the ingredients in cooking, sets the
stage for the outcome.
Constant refinement, adjustments and tweaking are
necessary within each step to ensure the outcome is one
that is solid. .
The framework is geared to do 3 things:

First, understand the question at hand.

Second, select an analytic approach or method to solve
the problem.
Third, obtain, understand, prepare, and model the data.

The end goal is to move the data scientist to a point where

a data model can be built to answer the question.
6) How to validate model quality

6.1) Train-Test Split Evaluation

The train-test split is a technique for evaluating the

performance of a machine learning algorithm.
It can be used for classification or regression problems
and can be used for any supervised learning algorithm.
The procedure involves taking a dataset and dividing it
into two subsets. The first subset is used to fit the model
and is referred to as the training dataset. The second
subset is not used to train the model; instead, the input
element of the dataset is provided to the model, then
predictions are made and compared to the expected
values. This second dataset is referred to as the test
dataset.
Train Dataset: Used to fit the machine learning model.
Test Dataset: Used to evaluate the fit machine learning
model.

The objective is to estimate the performance of the

machine learning model on new data: data not used to
train the model.
This is how we expect to use the model in practice.
Namely, to fit it on available data with known inputs and
outputs, then make predictions on new examples in the
future where we do not have the expected output or target
values.
The train-test procedure is appropriate when there is a
sufficiently large dataset available.
How to Configure the Train-Test Split.

The procedure has one main configuration parameter,

which is the size of the train and test sets. This is most
commonly expressed as a percentage between 0 and 1
for either the train or test datasets. For example, a training
set with the size of 0.67 (67 percent) means that the
remainder percentage 0.33 (33 percent) is assigned to the
test set.

There is no optimal split percentage.

You must choose a split percentage that meets your
project’s objectives with considerations that include:
Computational cost in training the model.
Computational cost in evaluating the model.
Training set representativeness.
Test set representativeness.

Nevertheless, common split percentages include:

Train: 80%, Test: 20%
Train: 67%, Test: 33%
Train: 50%, Test: 50%
6.2) Introduce concept of cross validation
You will face choices about predictive variables to use,
what types of models to use, what arguments to supply
those models, etc. We make these choices in a data-
driven way by measuring model quality of various
alternatives.
You've already learned to use train_test_split to split the
data, so you can measure model quality on the test data.
Cross-validation extends this approach to model scoring
(or "model validation.") Compared to train_test_split,
cross-validation gives you a more reliable measure of your
model's quality, though it takes longer to run.
The Shortcoming of Train-Test Split
Imagine you have a dataset with 5000 rows. The
train_test_split function has an argument for test_size that
you can use to decide how many rows go to the training
set and how many go to the test set. The larger the test
set, the more reliable your measures of model quality will
be. At an extreme, you could imagine having only 1 row of
data in the test set. If you compare alternative models,
which one makes the best predictions on a single data
point will be mostly a matter of luck.
You will typically keep about 20% as a test dataset. But
even with 1000 rows in the test set, there's some random
chance in determining model scores. A model might do
well on one set of 1000 rows, even if it would be
inaccurate on a different 1000 rows. The larger the test
set, the less randomness (aka "noise") there is in our
measure of model quality.
The Cross-Validation Procedure

In cross-validation, we run our modeling process on

different subsets of the data to get multiple measures of
model quality. For example, we could have 5 folds or
experiments. We divide the data into 5 pieces, each being
20% of the full dataset.

We run an experiment called experiment 1 which uses the

first fold as a holdout set, and everything else as training
data. This gives us a measure of model quality based on a
20% holdout set, much as we got from using the simple
train-test split.

We then run a second experiment, where we hold out data

from the second fold (using everything except the 2nd fold
for training the model.) This gives us a second estimate of
model quality. We repeat this process, using every fold
once as the holdout. Putting this together, 100% of the
data is used as a holdout at some point.
Returning to our example above from train-test split, if we
have 5000 rows of data, we end up with a measure of
model quality based on 5000 rows of holdout (even if we
don't use all 5000 rows simultaneously.
Trade-offs Between Cross-Validation and Train-Test Split
Cross-validation gives a more accurate measure of model
quality, which is especially important if you are making a
lot of modeling decisions. However, it can take more time
to run, because it estimates models once for each fold. So
it is doing more total work.

Given these tradeoffs, when should you use each

approach? On small datasets, the extra computational
burden of running cross-validation isn't a big deal. These
are also the problems where model quality scores would
be least reliable with train-test split. So, if your dataset is
smaller, you should run cross-validation.
For the same reasons, a simple train-test split is sufficient
for larger datasets. It will run faster, and you may have
enough data that there's little need to re-use some of it for
holdout.

There's no simple threshold for what constitutes a large vs

small dataset. If your model takes a couple minute or less
to run, it's probably worth switching to cross-validation. If
your model takes much longer to run, cross-validation may
slow down your workflow more than it's worth.
Alternatively, you can run cross-validation and see if the
scores for each experiment seem close. If each
experiment gives the same results, train-test split is
probably sufficient.

7) Metrics of model quality by simple Math and

examples

After you make predictions, you need to know if they are

any good. There are standard measures that we can use
to summarize how good a set of predictions actually are.
Knowing how good a set of predictions is, allows you to
make estimates about how good a given machine learning
model of your problem,
You must estimate the quality of a set of predictions when
training a machine learning model.
Performance metrics like classification accuracy and root
mean squared error can give you a clear objective idea of
how good a set of predictions is, and in turn how good the
model is that generated them.
This is important as it allows you to tell the difference and
select among:
Different transforms of the data used to train the same
machine learning model.
Different machine learning models trained on the same
data.
Different configurations for a machine learning model
trained on the same data.
As such, performance metrics are a required building
block in implementing machine learning algorithms from
scratch.
All the algorithms in machine learning rely on minimizing
or maximizing a function, which we call “objective
function”. The group of functions that are minimized are
called “loss functions”. A loss function is a measure of how
good a prediction model does in terms of being able to
predict the expected outcome. A most commonly used
method of finding the minimum point of function is
“gradient descent”. Think of loss function like undulating
mountain and gradient descent is like sliding down the
mountain to reach the bottom most point.

Loss functions can be broadly categorized into 2

types: Classification and Regression Loss.
7.1) RMSE (Root Mean Squared Error)

In this section, we will be looking at one of the methods to

determine the accuracy of our model in predicting the
target values. All of you reading this article must have
heard about the term RMS i.e. Root Mean Square and you
might have also used RMS values in statistics as well. In
machine Learning when we want to look at the accuracy of
our model we take the root mean square of the error that
has occurred between the test values and the predicted
values mathematically:

7.2 MSE (Mean Squared Error)

Mean Square Error (MSE) is the most commonly used

regression loss function. MSE is the sum of squared
distances between our target variable and predicted
values.
Why use mean squared error
MSE is sensitive towards outliers and given several
examples with the same input feature values, the optimal
prediction will be their mean target value. This should be
compared with Mean Absolute Error, where the optimal
prediction is the median. MSE is thus good to use if you
believe that your target data, conditioned on the input, is
normally distributed around a mean value, and when it’s
important to penalize outliers extra much.
When to use mean squared error
Use MSE when doing regression, believing that your
target, conditioned on the input, is normally distributed,
and want large errors to be significantly (quadratically)
more penalized than small ones.

Artificial Intelligence Grade 12 Notes-Capstone Project CBSE Skill Education-Artificial Intelligence
91% (11)
Artificial Intelligence Grade 12 Notes-Capstone Project CBSE Skill Education-Artificial Intelligence
10 pages
Unit-4 Part 2 Modelling and Evaluation
No ratings yet
Unit-4 Part 2 Modelling and Evaluation
35 pages
Statistics For Data Science
No ratings yet
Statistics For Data Science
39 pages
Xii Ai Capstone Project
No ratings yet
Xii Ai Capstone Project
35 pages
Statistics For Data Science - 1
100% (2)
Statistics For Data Science - 1
38 pages
ML 5units
No ratings yet
ML 5units
284 pages
Basics of Machine Learning1
No ratings yet
Basics of Machine Learning1
67 pages
Workflow of A Machine Learning Project
No ratings yet
Workflow of A Machine Learning Project
12 pages
Data Science Methodology
No ratings yet
Data Science Methodology
4 pages
Ai Class 10
No ratings yet
Ai Class 10
78 pages
Breaking Into AI!
No ratings yet
Breaking Into AI!
30 pages
Artificial Intelligence: Ai Project Cycle
No ratings yet
Artificial Intelligence: Ai Project Cycle
75 pages
Capstone Project
No ratings yet
Capstone Project
40 pages
Unit 7 ML
No ratings yet
Unit 7 ML
33 pages
Machine Learning Life Cycle
No ratings yet
Machine Learning Life Cycle
11 pages
Chapter 02 Overview - 4
No ratings yet
Chapter 02 Overview - 4
43 pages
Business Data Analytics Part 4
No ratings yet
Business Data Analytics Part 4
52 pages
Unit III
No ratings yet
Unit III
19 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
11 pages
Capstone Project
No ratings yet
Capstone Project
28 pages
Class Xii Ai Worksheet Booklet Part2 2023-2024
No ratings yet
Class Xii Ai Worksheet Booklet Part2 2023-2024
26 pages
AI Project Cycle
No ratings yet
AI Project Cycle
31 pages
Unit 1
No ratings yet
Unit 1
41 pages
Fd45092a Ccad 459e Bc18 B01536fd6bac Untitled
No ratings yet
Fd45092a Ccad 459e Bc18 B01536fd6bac Untitled
53 pages
Shivani Patel
No ratings yet
Shivani Patel
40 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
50 pages
Week 12 Intro To DS and ML
No ratings yet
Week 12 Intro To DS and ML
67 pages
Introduction To Data Science Methodology
No ratings yet
Introduction To Data Science Methodology
45 pages
Data Science Methodology - English Template
No ratings yet
Data Science Methodology - English Template
23 pages
We Need To Talk About IT Architecture
No ratings yet
We Need To Talk About IT Architecture
60 pages
AI Project Cycle
No ratings yet
AI Project Cycle
33 pages
Capstone Project
No ratings yet
Capstone Project
6 pages
AI Project Cycle Class 10 Notes
No ratings yet
AI Project Cycle Class 10 Notes
7 pages
Unit 1 Part 4
No ratings yet
Unit 1 Part 4
8 pages
AI Project Cycle
No ratings yet
AI Project Cycle
6 pages
AI Project Cycle-Notes
No ratings yet
AI Project Cycle-Notes
14 pages
Lec 2
No ratings yet
Lec 2
13 pages
Capstone Project
No ratings yet
Capstone Project
9 pages
Podar Pearl School: Chapter 1: Capstone Project Question and Answers
No ratings yet
Podar Pearl School: Chapter 1: Capstone Project Question and Answers
11 pages
Unit 4 - Question Bank and Answers
No ratings yet
Unit 4 - Question Bank and Answers
23 pages
FDS Introduction
No ratings yet
FDS Introduction
41 pages
Ai Project Cycle Short Note
No ratings yet
Ai Project Cycle Short Note
9 pages
Chapter 1 Capstone Project Ai Class 12
No ratings yet
Chapter 1 Capstone Project Ai Class 12
5 pages
MLE
No ratings yet
MLE
15 pages
AI ClassXII Study Materials
No ratings yet
AI ClassXII Study Materials
40 pages
Ai Life Cycle
No ratings yet
Ai Life Cycle
30 pages
Glossary of Problem & Approach
No ratings yet
Glossary of Problem & Approach
3 pages
Dsur Ea2352001010391 W3
No ratings yet
Dsur Ea2352001010391 W3
3 pages
Air Quality Prediction Using Machine Learning
No ratings yet
Air Quality Prediction Using Machine Learning
29 pages
Class Xii Model Life Cycle
No ratings yet
Class Xii Model Life Cycle
6 pages
Internship Report: T.J.Instituteoftechnology
No ratings yet
Internship Report: T.J.Instituteoftechnology
29 pages
Xii Analytical Approach
No ratings yet
Xii Analytical Approach
3 pages
Part 2 Introduction To ML
No ratings yet
Part 2 Introduction To ML
13 pages
Artificial
No ratings yet
Artificial
5 pages
Data Science Lifecycle
No ratings yet
Data Science Lifecycle
3 pages
Oe Cae 3
No ratings yet
Oe Cae 3
7 pages
Artificial Intelligence - (Unit - 1)
No ratings yet
Artificial Intelligence - (Unit - 1)
47 pages
Unit III - I
No ratings yet
Unit III - I
15 pages
Sand Casting
No ratings yet
Sand Casting
92 pages
Paver Block Specification
No ratings yet
Paver Block Specification
8 pages
Handbook of Chemical Engineering Calculations, Third Edition. Nicholas P. Chopey
No ratings yet
Handbook of Chemical Engineering Calculations, Third Edition. Nicholas P. Chopey
3 pages
Ecology PDF
No ratings yet
Ecology PDF
3 pages
PHD Thesis GauthamRam Cover Final
No ratings yet
PHD Thesis GauthamRam Cover Final
251 pages
كل مذكرات السنة الأولى في الانجليزية
No ratings yet
كل مذكرات السنة الأولى في الانجليزية
32 pages
Stability Analysis and Modelling of Unde
No ratings yet
Stability Analysis and Modelling of Unde
309 pages
Modelling of Fluid Power Systems
No ratings yet
Modelling of Fluid Power Systems
85 pages
Udr 1000 Mkvi
No ratings yet
Udr 1000 Mkvi
2 pages
Q. No Sub Q.No Answer: (Autonomous)
No ratings yet
Q. No Sub Q.No Answer: (Autonomous)
23 pages
The Biomechanics of Spinal Manipulation: Walter Herzog, PHD
No ratings yet
The Biomechanics of Spinal Manipulation: Walter Herzog, PHD
7 pages
EL BR 023 CA EN 0120.1 - PVC Duct DB2 ES2 Pipe Fittings
No ratings yet
EL BR 023 CA EN 0120.1 - PVC Duct DB2 ES2 Pipe Fittings
8 pages
02-07-23 SR - Iit Star Co-Sc (Model-A) Jee Adv 2020 (P-I) Wat-45 Key&Sol
No ratings yet
02-07-23 SR - Iit Star Co-Sc (Model-A) Jee Adv 2020 (P-I) Wat-45 Key&Sol
14 pages
P.7 Math
No ratings yet
P.7 Math
12 pages
NW NSC GR 11 Maths Lit P1 Eng Memo Nov 2019
No ratings yet
NW NSC GR 11 Maths Lit P1 Eng Memo Nov 2019
7 pages
Shahzad 2014
No ratings yet
Shahzad 2014
21 pages
CH 6
No ratings yet
CH 6
19 pages
Lecture 1
No ratings yet
Lecture 1
20 pages
Assignment/Tugasan: Kod Kursus /course Code: EBTQ 3103 Tajuk Kursus /course Title: Quality Control
No ratings yet
Assignment/Tugasan: Kod Kursus /course Code: EBTQ 3103 Tajuk Kursus /course Title: Quality Control
6 pages
Kel 13 Jurnal Ips
No ratings yet
Kel 13 Jurnal Ips
10 pages
Data Class Nist SP 1800 39a Preliminary Draft
No ratings yet
Data Class Nist SP 1800 39a Preliminary Draft
4 pages
Unit 8 Year 6 (w21)
No ratings yet
Unit 8 Year 6 (w21)
23 pages
Kingspan Range Tribune Xe Brochure en GB
No ratings yet
Kingspan Range Tribune Xe Brochure en GB
16 pages
PHD Student in "Innovation Management in The Context of The Space Sector"
No ratings yet
PHD Student in "Innovation Management in The Context of The Space Sector"
4 pages
Final Baba Ghulam Shah Badshah University Admission 2021 Notification 1
No ratings yet
Final Baba Ghulam Shah Badshah University Admission 2021 Notification 1
2 pages
2013 ME Magway,, English
No ratings yet
2013 ME Magway,, English
4 pages
Stanford GSB Ee Sample Schedule MRR
No ratings yet
Stanford GSB Ee Sample Schedule MRR
1 page
De 7210 M
No ratings yet
De 7210 M
2 pages
Admnadvt
No ratings yet
Admnadvt
2 pages
MACHINE LEARNING FOR BEGINNERS: A Practical Guide to Understanding and Applying Machine Learning Concepts (2023 Beginner Crash Course)
From Everand
MACHINE LEARNING FOR BEGINNERS: A Practical Guide to Understanding and Applying Machine Learning Concepts (2023 Beginner Crash Course)
Elaine Tate
No ratings yet
Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Mastering Machine Learning: A Comprehensive Guide to Success
From Everand
Mastering Machine Learning: A Comprehensive Guide to Success
Rick Spair
No ratings yet

Unit 1: Capstone Project

Uploaded by

Unit 1: Capstone Project

Uploaded by

Unit 1: Capstone Project

A capstone project is a project where students must

1)Understanding The Problem

Artificial Intelligence is perhaps the most transformative

2) Decomposing The Problem Through DT Framework

Design Thinking is a design methodology that provides a

1. Understand the problem and then restate the problem in

2. Break the problem down into a few large pieces. Write

3. Break complicated pieces down into smaller pieces.

4. Code one small piece at a time.

Reviewing the line plot, it suggests that there may be a

Running the example plots the observed, trend, seasonal,

Those who work in the domain of AI and Machine

Every project, regardless of its size, starts with business

If the question is to determine probabilities of an action,

1. which ingredients are required?

Prior to undertaking the data collection and data

The data scientist will use a training set for predictive

The success of data compilation, preparation and

First, understand the question at hand.

The end goal is to move the data scientist to a point where

6.1) Train-Test Split Evaluation

The train-test split is a technique for evaluating the

The objective is to estimate the performance of the

The procedure has one main configuration parameter,

There is no optimal split percentage.

Nevertheless, common split percentages include:

In cross-validation, we run our modeling process on

We run an experiment called experiment 1 which uses the

We then run a second experiment, where we hold out data

Given these tradeoffs, when should you use each

There's no simple threshold for what constitutes a large vs

7) Metrics of model quality by simple Math and

After you make predictions, you need to know if they are

Loss functions can be broadly categorized into 2

In this section, we will be looking at one of the methods to

7.2 MSE (Mean Squared Error)

Mean Square Error (MSE) is the most commonly used

You might also like