0% found this document useful (0 votes)

39 views40 pages

Capstone Project

Uploaded by

yadavabhishekyadav1911

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views40 pages

Capstone Project

Uploaded by

yadavabhishekyadav1911

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 40

UNIT 1: CAPSTONE PROJECT

CAPSTONE PROJECT
• A capstone project is a project where students must research a topic
independently to find a deep understanding of the subject matter.
• It gives an opportunity for the student to integrate all their knowledge
and demonstrate it through a comprehensive project.
• Capstone projects
1. Stock Prices Predictor
2. Develop A Sentiment Analyzer
3. Movie Ticket Price Predictor
4. Students Results Predictor
5. Human Activity Recognition using Smartphone Data set
6. Classifying humans and animals in a photo
• Artificial Intelligence is perhaps the most transformative technology
available today. At a high level, every AI project follows the following
six steps:

1) Problem definition i.e. Understanding the problem

2) Data gathering
3) Feature definition
4) AI model construction
5) Evaluation & refinements
6) Deployment
UNDERSTANDING THE PROBLEM
• Begin formulating your problem by asking yourself this simple .

is there a pattern?
• The premise that underlies all Machine Learning disciplines is that
there needs to be a pattern.
• If there is no pattern, then the problem cannot be solved with AI
technology
• Applied uses of these techniques are typically geared towards answering five types
of questions, all of which may be categorized as being within the umbrella of
predictive analysis:
1) Which category? (Classification)
2) How much or how many? (Regression)
3) Which group? (Clustering)
4) Is this unusual? (Anomaly Detection)
5) Which option should be taken? (Recommendation)

It is important to determine which of these questions you’re asking, and how answering
it helps you solve your problem.
DECOMPOSING THE PROBLEM THROUGH
DT FRAMEWORK
• Design Thinking is a design methodology that provides a solution-
based approach to solving problems. It’s extremely useful in tackling
complex problems that are ill-defined or unknown.
PROBLEM DECOMPOSITION STEPS
• 1. Understand the problem and then restate the problem in your own
words
•  Know what the desired inputs and outputs are
•  Ask questions for clarification (in class these questions might be to
your instructor, but most of the time they will be asking either yourself
or your collaborators)
• 2. Break the problem down into a few large pieces. Write these down,
either on paper or as comments in a file.
• 3. Break complicated pieces down into smaller pieces. Keep doing this
until all of the pieces are small.
• 4. Code one small piece at a time.
1. Think about how to implement it
2. Write the code/query
3. Test it… on its own.
4. Fix problems, if any
• Example: Imagine that you want to create your first app. This is a complex problem.
How would you decompose the task of creating an app?
• To decompose this task, you would need to know the answer to a series of smaller
problems:

 what kind of app you want to create?

 what will your app will look like?
 who is the target audience for your app?
 what will the graphics will look like?
 what audio will you include?
 what software will you use to build your app?
 how will the user navigate your app?  how will you test your app?
ANALYTIC APPROACH
• Those who work in the domain of AI and Machine Learning solve problems and
answer questions through data every day.
• They build models to predict outcomes or discover underlying patterns, all to gain
insights leading to actions that will improve future outcomes.
 If the question is to determine probabilities of an action, then a predictive model
might be used.

 If the question is to show relationships, a descriptive approach maybe be required.

 Statistical analysis applies to problems that require counts: if the question requires a
yes/ no answer, then a classification approach to predicting a response would be
suitable.
DATA REQUIREMENT
• Prior to undertaking the data collection and data preparation stages of the
methodology, it's vital to define the data requirements for decision-tree
classification.
• This includes identifying the necessary data content, formats and sources for
initial data collection.
• n this phase the data requirements are revised and decisions are made as to
whether or not the collection requires more or less data.
• Once the data ingredients are collected, the data scientist will have a good
understanding of what they will be working with.
• Techniques such as descriptive statistics and visualization can be applied to
the data set, to assess the content, quality, and initial insights about the data.
• Gaps in data will be identified and plans to either fill or make substitutions will
have to be made.
MODELING APPROACH
• Data Modeling focuses on developing models that are either descriptive or
predictive.
 An example of a descriptive model might examine things like: if a person did this,
then they're likely to prefer that.
 A predictive model tries to yield yes/no, or stop/go type outcomes. These models
are based on the analytic approach that was taken, either statistically driven or
machine learning driven.
• The data scientist will use a training set for predictive modelling.
• A training set is a set of historical data in which the outcomes are already known.
• The training set acts like a gauge to determine if the model needs to be
calibrated.
• In this stage, the data scientist will play around with different algorithms to
ensure that the variables in play are actually required.
• The success of data compilation, preparation and modelling, depends on the understanding of
the problem at hand, and the appropriate analytical approach being taken.

• Constant refinement, adjustments and tweaking are necessary within each step to ensure the
outcome is one that is solid.
• The framework is geared to do 3 things:
 First, understand the question at hand.
 Second, select an analytic approach or method to solve the problem.
 Third, obtain, understand, prepare, and model the data
HOW TO VALIDATE MODEL QUALITY
• Train-Test Split Evaluation

• The train-test split is a technique for evaluating the performance of a machine

learning algorithm.
• It can be used for classification or regression problems and can be used for any
supervised learning algorithm.
• The procedure involves taking a dataset and dividing it into two subsets.
• The first subset is used to fit the model and is referred to as the training
dataset.
• The second subset is not used to train the model; instead, the input element
of the dataset is provided to the model, then predictions are made and
compared to the expected values.
• This second dataset is referred to as the test dataset.

 Train Dataset: Used to fit the machine learning model.

 Test Dataset: Used to evaluate the fit machine learning model.
• The objective is to estimate the performance of the machine learning model
on new data: data not used to train the model.
• The procedure has one main configuration parameter, which is the size of
the train and test sets.

• This is most commonly expressed as a percentage between 0 and 1 for

either the train or test datasets.

• For example, a training set with the size of 0.67 (67 percent) means that
the remainder percentage 0.33 (33 percent) is assigned to the test set.

• There is no optimal split percentage.

PREREQUISITES FOR TRAIN AND TEST
DATA
• We will need the following Python Libraries for this tutorial:
 Pandas
 Sklearn

We can install these with pip

1. pip install pandas
2. 2. pip install sklearn

We use pandas to import the dataset and sklearn to perform the splitting. You can import these
packages as:
1. >>> import pandas as pd
2. >>> from sklearn.model_selection import train_test_split
3. >>> from sklearn.datasets import load_iris
• Loading the Data set Let’s load the forestfires dataset using pandas.
1. >>> data=pd.read_csv(‘forestfires.csv’)
2. >>> data.head()

Let’s split this data into labels and features. Now, what’s that? Using features, we predict
labels. I mean using features (the data we use to predict labels), we predict labels (the
data we want to predict).
>>> y=data.temp
>>> x=data.drop(‘temp’,axis=1)

Temp is a label to predict temperatures in y; we use the drop() function to take all other
data in x.
>>> x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2)
>>> x_train.head()
CROSS-VALIDATION

• You've already learned to use train_test_split to split the data, so you can measure
model quality on the test data.
• Cross-validation extends this approach to model scoring (or "model validation.")
• Compared to train_test_split, cross-validation gives you a more reliable measure of
your model's quality, though it takes longer to run.
THE SHORTCOMING OF TRAIN-TEST SPLIT
• Imagine you have a dataset with 5000 rows.
• The train_test_split function has an argument for test_size that you can use to
decide how many rows go to the training set and how many go to the test set.
• The larger the test set, the more reliable your measures of model quality will
be.
• You will typically keep about 20% as a test dataset. But even with 1000 rows in
the test set, there's some random chance in determining model scores.
• A model might do well on one set of 1000 rows, even if it would be inaccurate
on a different 1000 rows.
• The larger the test set, the less randomness (aka "noise") there is in our
measure of model quality.
• In cross-validation, we run our modeling process on different subsets of the data to
get multiple measures of model quality.
• For example, we could have 5 folds or experiments. We divide the data into 5 pieces,
each being 20% of the full dataset
TRADE-OFFS BETWEEN CROSS-VALIDATION
AND TRAIN-TEST SPLIT
• Cross-validation gives a more accurate measure of model quality, which is
especially important if you are making a lot of modeling decisions.
• However, it can take more time to run, because it estimates models once
for each fold.
• On small datasets, the extra computational burden of running cross-
validation isn't a big deal. These are also the problems where model
quality scores would be least reliable with train-test split.
• So, if your dataset is smaller, you should run cross-validation.
• For the same reasons, a simple train-test split is sufficient for larger
datasets.
• It will run faster, and you may have enough data that there's little need to
re-use some of it for holdout.
• Using cross-validation gave us much better measures of model quality, with the added benefit
of cleaning up our code (no longer needing to keep track of separate train and test sets.

• Metrics of model quality by simple Math and examples

• After you make predictions, you need to know if they are any good.
• There are standard measures that we can use to summarize how good a set of predictions
actually are.
• Performance metrics like classification accuracy and root mean squared error can give you a
clear objective idea of how good a set of predictions is, and in turn how good the model is
that generated them.
• This is important as it allows you to tell the difference and select among:

 Different transforms of the data used to train the same machine learning model.
 Different machine learning models trained on the same data.
 Different configurations for a machine learning model trained on the same data.
• All the algorithms in machine learning rely on minimizing or
maximizing a function, which we call “objective function”.
The group of functions that are minimized are called “loss
functions”.
• A loss function is a measure of how good a prediction model
does in terms of being able to predict the expected
outcome.
• A most commonly used method of finding the minimum point
of function is “gradient descent”.
• Loss functions can be broadly categorized into 2 types: Classification
and Regression Loss.

• Regression functions
predict a quantity, and
classification functions
predict a label.
RMSE (ROOT MEAN SQUARED ERROR)
• In machine Learning when we want to look at the accuracy of our
model we take the root mean square of the error that has occurred
between the test values and the predicted values mathematically:

• For a Single Value

• Let a= (predicted value- actual value) ^2
• Let b= mean of a = a (for single value)
• Then RMSE= square root of b
• For a wide set of values RMSE is defined as follows:
• As you can see in this scattered graph the red dots are the actual values and the blue
line is the set of predicted values drawn by our model. Here X represents the
distance between the actual value and the predicted line this line represents the
error, similarly, we can draw straight lines from each red dot to the blue line. Taking
mean of all those distances and squaring them and finally taking the root will give us
RMSE of our model.
MSE (MEAN SQUARED ERROR)

• Mean Square Error (MSE) is the most commonly used regression loss function.
• MSE is the sum of squared distances between our target variable and predicted
values
from sklearn.metrics import mean_squared_error
# Given values
Y_true = [1,1,2,2,4] # Y_true = Y (original values)
# calculated values
Y_pred = [0.6,1.29,1.99,2.69,3.4] # Y_pred = Y'

# Calculation of Mean Squared Error (MSE)

mean_squared_error(Y_true,Y_pred)
Why use mean squared error
• MSE is sensitive towards outliers and given several examples with the same input
feature values, the optimal prediction will be their mean target value.
• This should be compared with Mean Absolute Error, where the optimal prediction is
the median.
• MSE is thus good to use if you believe that your target data, conditioned on the input,
is normally distributed around a mean value, and when it’s important to penalize
outliers extra much.
WHEN TO USE MEAN SQUARED ERROR

• Use MSE when doing regression, believing that your target,

conditioned on the input, is normally distributed, and want large errors
to be significantly (quadratically) more penalized than small ones.

• Example-1: You want to predict future house prices. The price is a

continuous value, and therefore we want to do regression. MSE can
here be used as the loss function.
• Example-2:Consider the given data points: (1,1), (2,1), (3,2), (4,2), (5,4)
MEAN ABSOLUTE PERCENTAGE ERROR
• One of the most common metrics of model prediction accuracy, mean absolute
percentage error (MAPE) is the percentage equivalent of mean absolute error
(MAE).
• MAPE is defined as the average absolute percentage difference between predicted
values and actual values.
• Where:
• N is the number of fitted points;
• A is the actual value;
• F is the forecast value; and

Artificial Intelligence Grade 12 Notes-Capstone Project CBSE Skill Education-Artificial Intelligence
91% (11)
Artificial Intelligence Grade 12 Notes-Capstone Project CBSE Skill Education-Artificial Intelligence
10 pages
J&T Express Leveraging Information Systems For Competitive Advantage
No ratings yet
J&T Express Leveraging Information Systems For Competitive Advantage
14 pages
Group 3
No ratings yet
Group 3
24 pages
AI Project Cycle
No ratings yet
AI Project Cycle
31 pages
ML 02 Dataset-Feature Selection PDF
No ratings yet
ML 02 Dataset-Feature Selection PDF
44 pages
Confined Space Entry Procedure CWP CHEC JAC HSE PRO 0001
No ratings yet
Confined Space Entry Procedure CWP CHEC JAC HSE PRO 0001
25 pages
Elite Force Airsoft 2023 Catalog
No ratings yet
Elite Force Airsoft 2023 Catalog
36 pages
Evaluating Model Performance: Evaluation Strategies: Train/Validation/Test
No ratings yet
Evaluating Model Performance: Evaluation Strategies: Train/Validation/Test
127 pages
DATA 2024 - Dist
No ratings yet
DATA 2024 - Dist
72 pages
Company Profile: Pt. Rekayasa Energi Bersama
No ratings yet
Company Profile: Pt. Rekayasa Energi Bersama
35 pages
Physics Capsule 2025 New
No ratings yet
Physics Capsule 2025 New
26 pages
Module 3 Data Science Machine Learning
No ratings yet
Module 3 Data Science Machine Learning
53 pages
Class Xii Ai Worksheet Booklet Part2 2023-2024
No ratings yet
Class Xii Ai Worksheet Booklet Part2 2023-2024
26 pages
Basic Concepts of Machine Learning For Beginners 1732109263
No ratings yet
Basic Concepts of Machine Learning For Beginners 1732109263
102 pages
One Sample T Test - SPSS
No ratings yet
One Sample T Test - SPSS
23 pages
Podar Pearl School: Chapter 1: Capstone Project Question and Answers
No ratings yet
Podar Pearl School: Chapter 1: Capstone Project Question and Answers
11 pages
Xii Ai Capstone Project
No ratings yet
Xii Ai Capstone Project
35 pages
Class 12 AI - Unit 1
No ratings yet
Class 12 AI - Unit 1
10 pages
Aiml Report
No ratings yet
Aiml Report
70 pages
DM Unit - 3
No ratings yet
DM Unit - 3
21 pages
Matlab Odd Workbook - 2022-2023
No ratings yet
Matlab Odd Workbook - 2022-2023
60 pages
ML 5
No ratings yet
ML 5
26 pages
Artificial Intelligence - (Unit - 1)
No ratings yet
Artificial Intelligence - (Unit - 1)
47 pages
CSC413 Lecture Note
No ratings yet
CSC413 Lecture Note
32 pages
Air Quality Prediction Using Machine Learning
No ratings yet
Air Quality Prediction Using Machine Learning
29 pages
APS1070 Lecture (3) Slides
No ratings yet
APS1070 Lecture (3) Slides
70 pages
02jul2024 StaticMedia AI UNIT 2-CAPSTONE PROJECT NOTES 6759955093464609405
No ratings yet
02jul2024 StaticMedia AI UNIT 2-CAPSTONE PROJECT NOTES 6759955093464609405
6 pages
Lecture 9 - Evaluations
No ratings yet
Lecture 9 - Evaluations
68 pages
IC Software Project Sign Off Document 11340
No ratings yet
IC Software Project Sign Off Document 11340
7 pages
Capstone Project
No ratings yet
Capstone Project
28 pages
Unit 4 - Question Bank and Answers
No ratings yet
Unit 4 - Question Bank and Answers
23 pages
Unit1 Ai&ml
No ratings yet
Unit1 Ai&ml
51 pages
Ai Life Cycle
No ratings yet
Ai Life Cycle
30 pages
Unit 4
No ratings yet
Unit 4
34 pages
AI Capstone Project - Notes-Part2
No ratings yet
AI Capstone Project - Notes-Part2
8 pages
Bi Unit 5
No ratings yet
Bi Unit 5
20 pages
Model Evaluation
No ratings yet
Model Evaluation
39 pages
Batt Mobile - Digital Strategy Deck
No ratings yet
Batt Mobile - Digital Strategy Deck
72 pages
Intro ML 1 Day
No ratings yet
Intro ML 1 Day
43 pages
Artificial Intelligence (Advance) Notes?
No ratings yet
Artificial Intelligence (Advance) Notes?
33 pages
Lesson Worksheet (Unit 5) With Solution
No ratings yet
Lesson Worksheet (Unit 5) With Solution
11 pages
Chapter 02 Overview - 4
No ratings yet
Chapter 02 Overview - 4
43 pages
Advanced Programming With Python
No ratings yet
Advanced Programming With Python
9 pages
5 DL
No ratings yet
5 DL
33 pages
AccountStatement - 23-06-2025 17 - 03 - 59
No ratings yet
AccountStatement - 23-06-2025 17 - 03 - 59
20 pages
Bus Route Timing
No ratings yet
Bus Route Timing
26 pages
Mitsubishi Q170M Quick Start Guide
No ratings yet
Mitsubishi Q170M Quick Start Guide
88 pages
Testing & Commissioning Procedure For Jockey Pumps - Fire Fighting System
No ratings yet
Testing & Commissioning Procedure For Jockey Pumps - Fire Fighting System
2 pages
Unit-4 Data Mining
No ratings yet
Unit-4 Data Mining
19 pages
Intro To ML
No ratings yet
Intro To ML
29 pages
AIML-Unit 5 Notes-Assignment 5
No ratings yet
AIML-Unit 5 Notes-Assignment 5
24 pages
Machine Learning
No ratings yet
Machine Learning
14 pages
K.radhabatch 2 - 1 Ibm Cbse Project
No ratings yet
K.radhabatch 2 - 1 Ibm Cbse Project
30 pages
Module - 1
No ratings yet
Module - 1
9 pages
Ai Project Cycle Short Note
No ratings yet
Ai Project Cycle Short Note
9 pages
CS3244 (2120) - Project Discussion 1 - Overview
No ratings yet
CS3244 (2120) - Project Discussion 1 - Overview
25 pages
Technology NEW Vocab Parts 1-2-3
No ratings yet
Technology NEW Vocab Parts 1-2-3
21 pages
Cbse - Department of Skill Education: Artificial Intelligence (Subject Code - 843)
No ratings yet
Cbse - Department of Skill Education: Artificial Intelligence (Subject Code - 843)
13 pages
Unit-2 AI Project Cycle
No ratings yet
Unit-2 AI Project Cycle
20 pages
All Formats of Notice and Invitations-Replies
No ratings yet
All Formats of Notice and Invitations-Replies
11 pages
Lecture 12 - Machine Learning
No ratings yet
Lecture 12 - Machine Learning
18 pages
Ai Part B ch6
No ratings yet
Ai Part B ch6
18 pages
AI Model Life Cycle
No ratings yet
AI Model Life Cycle
9 pages
U-1 Capstone Q&A
No ratings yet
U-1 Capstone Q&A
10 pages
Lec 2
No ratings yet
Lec 2
13 pages
Unit 1: Capstone Project
No ratings yet
Unit 1: Capstone Project
21 pages
Xiiaiuniticapstone Projectpartii
No ratings yet
Xiiaiuniticapstone Projectpartii
11 pages
ML.1Lecture.2 (Old)
No ratings yet
ML.1Lecture.2 (Old)
23 pages
Identifing Software Bugs or Not Using SMLT Model
No ratings yet
Identifing Software Bugs or Not Using SMLT Model
34 pages
Unit IIAIProjectCycle
No ratings yet
Unit IIAIProjectCycle
9 pages
Machine Learning Part: Domain Overview
No ratings yet
Machine Learning Part: Domain Overview
20 pages
HP48 Frequently Asked Questions List (FAQ) Questions About The Operating System and Using The HP
No ratings yet
HP48 Frequently Asked Questions List (FAQ) Questions About The Operating System and Using The HP
13 pages
ML Checklist PDF
No ratings yet
ML Checklist PDF
4 pages
pf6k Urcap Manual
No ratings yet
pf6k Urcap Manual
15 pages
Machine Learning
No ratings yet
Machine Learning
14 pages
Capstone Project
No ratings yet
Capstone Project
9 pages
Aids QB
No ratings yet
Aids QB
8 pages
35C+ & 45C+ Gas Fryer Parts Manual: Pitco Frialator Inc
No ratings yet
35C+ & 45C+ Gas Fryer Parts Manual: Pitco Frialator Inc
16 pages
Artificial
No ratings yet
Artificial
5 pages
TR Rain Error
No ratings yet
TR Rain Error
6 pages
Hydac Compact Power Units CO1: (Three-Phase Current)
No ratings yet
Hydac Compact Power Units CO1: (Three-Phase Current)
9 pages
Das 350
No ratings yet
Das 350
6 pages
Capstone Project
No ratings yet
Capstone Project
6 pages
Chapter 1 Capstone Project Ai Class 12
No ratings yet
Chapter 1 Capstone Project Ai Class 12
5 pages
Assignment Chapter Aldehyde Ketones
No ratings yet
Assignment Chapter Aldehyde Ketones
5 pages
Enterprise Resource Planning
No ratings yet
Enterprise Resource Planning
6 pages
BAPD1001 Assignment1 530487090
No ratings yet
BAPD1001 Assignment1 530487090
5 pages
BWTS Sampling Procedure V1
No ratings yet
BWTS Sampling Procedure V1
5 pages
AMER - BRO - Stroboscopy Solution - (MKENT-2482EN-U Rev 2) - 09.2020
No ratings yet
AMER - BRO - Stroboscopy Solution - (MKENT-2482EN-U Rev 2) - 09.2020
4 pages
Untitled Document
No ratings yet
Untitled Document
2 pages
Remarks On A Tropical Key Exchange System: Dylan Rudy Chris Monico
No ratings yet
Remarks On A Tropical Key Exchange System: Dylan Rudy Chris Monico
4 pages
FALLSEM2019-20 EEE2004 ETH VL2019201000960 MODEL QUESTION PAPER Model Question Paper
No ratings yet
FALLSEM2019-20 EEE2004 ETH VL2019201000960 MODEL QUESTION PAPER Model Question Paper
2 pages
3rd Quarter Summative Test in Animation For Week 1 - 2 Grade 7-8
No ratings yet
3rd Quarter Summative Test in Animation For Week 1 - 2 Grade 7-8
2 pages
Chopper Blade
No ratings yet
Chopper Blade
1 page
Valvula de Alivio de Presion 0.5in Modelo A Reliable
No ratings yet
Valvula de Alivio de Presion 0.5in Modelo A Reliable
1 page

Capstone Project

Uploaded by

Capstone Project

Uploaded by

UNIT 1: CAPSTONE PROJECT

1) Problem definition i.e. Understanding the problem

 what kind of app you want to create?

 If the question is to show relationships, a descriptive approach maybe be required.

• The train-test split is a technique for evaluating the performance of a machine

 Train Dataset: Used to fit the machine learning model.

• This is most commonly expressed as a percentage between 0 and 1 for

• There is no optimal split percentage.

We can install these with pip

• Metrics of model quality by simple Math and examples

• For a Single Value

# Calculation of Mean Squared Error (MSE)

• Use MSE when doing regression, believing that your target,

• Example-1: You want to predict future house prices. The price is a

You might also like