0% found this document useful (0 votes)
39 views5 pages

Chapter 1 Capstone Project Ai Class 12

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views5 pages

Chapter 1 Capstone Project Ai Class 12

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

capstone project:- is a project where students must research a topic independently to find a

deep understanding of the subject matter. It gives an opportunity for the student to integrate all
their knowledge and demonstrate it through a comprehensive project.
Some Capstone project ideas :-
1. Stock Prices Predictor
2. Develop A Sentiment Analyzer
3. Movie Ticket Price Predictor
4. Students Results Predictor
5. Human Activity Recognition using Smartphone Data set
6. Classifying humans and animals in a photo
Artificial Intelligence is perhaps the most transformative technology available today. Every AI project
follows the following six steps:
1) Problem definition i.e. Understanding the problem
2) Data gathering
3) Feature definition
4) AI model construction
5) Evaluation & refinements
6) Deployment
first step: “understanding the problem”
Begin formulating your problem by asking:- is there a pattern? If there is no pattern, then the problem cannot
be solved with AI technology.
If it is believed that there is a pattern in the data, then AI development techniques may be employed.
Five types of questions:- within the umbrella of predictive analysis:
1) Which category? (Classification)
2) How much or how many? (Regression)
3) Which group? (Clustering)
4) Is this unusual? (Anomaly Detection)
5) Which option should be taken? (Recommendation)
1. Decomposing The Problem Through DT Framework
Design Thinking is a design methodology that provides a solution-based approach to solving problems. It’s
extremely useful in tackling complex problems that are ill-defined or unknown.
The five stages of Design Thinking are as follows: Empathize, Define, Ideate, Prototype, and Test.

Important
Problem decomposition steps:-break down the problem into smaller units before coding
1. Understand the problem and then restate the problem in your own words
Know what the desired inputs and outputs are
Ask questions for clarification (in class these questions might be to your instructor, but most of the
time they will be asking either yourself or your collaborators)
2. Break the problem down into a few large pieces. Write these down, either on paper or as
comments in a file.
3. Break complicated pieces down into smaller pieces. Keep doing this until all of the pieces are small.
4. Code one small piece at a time.
1. Think about how to implement it
2. Write the code/query
3. Test it… on its own.
4. Fix problems, if any

1
How to validate model quality:- IMPORTANT
1.1 Train-Test Split Evaluation
The train-test split is a technique for evaluating the performance of a machine learning algorithm.
It can be used for classification or regression problems and can be used for any supervised learning
algorithm.
The procedure involves taking a dataset and dividing it into two subsets. The first subset is used to fit
the model and is referred to as the training dataset. The second subset of the dataset is provided to the
model, then predictions are made and compared to the expected values. This second dataset is referred
to as the test dataset.
Train Dataset: Used to fit the machine learning model.
Test Dataset: Used to evaluate the fit machine learning model.
The train-test procedure is appropriate when there is a sufficiently large dataset
available.
How to Configure the Train-Test Split
The procedure has one main configuration parameter, which is the size of the train and test sets.
There is no optimal split percentage.
You must choose a split percentage that meets your project’s objectives with considerations that
include:
Computational cost in training the model.
Computational cost in evaluating the model.
Training set representativeness.
Test set representativeness. Nevertheless, common split percentages include:
Train: 80%, Test: 20%
Train: 67%, Test: 33%
Train: 50%, Test: 50%
Example 1: Training and Test Data in Python Machine Learning
As we work with datasets, a machine learning model works in two stages. We usually split the data
around 20%-80% between testing and training stages. Under supervised learning, we split a dataset into
a training data and test data in Python ML.

1.2 Introduce concept of cross validation


Cross-validation extends this approach to model validation. Compared to train_test_split, cross-validation
gives you a more reliable measure of your model's quality, though it takes longer to run.
The Shortcoming of Train-Test Split:- not good for small data set
The larger the test set, the more reliable your measures of model quality will be.
The larger the test set, the less randomness (aka "noise") there is in our measure of model quality.
The Cross-Validation Procedure
In cross-validation, we run our modeling process on different subsets of the data to get multiple
measures of model quality. For example, we could have 5 folds or experiments. We divide the data into 5
pieces, each being 20% of the full dataset.
We run an experiment called experiment 1 which uses the first fold as a holdout set, and everything else as
training data. This gives us a measure of model quality based on a 20% holdout set, much as we got from using the
simple train-test split.
We then run a second experiment, where we hold out data from the second fold (using everything except
the 2nd fold for training the model.) This gives us a second estimate of model quality. We repeat this
process, using every fold once as the holdout. Putting this together, 100% of the data is used as a holdout at
some point.
2
Trade-offs Between Cross-Validation and Train-Test Split-IMPORTANT
Cross-validation gives a more accurate measure of model quality. However, it can take more time to
run, because it estimates models once for each fold. So it is doing more total work.
On small datasets, the extra computational burden of running cross-validation isn't a big deal. if your
dataset is smaller, you should run cross- validation.
a simple train-test split is sufficient for larger datasets. It will run faster.
There's no simple threshold for what constitutes a large vs small dataset. If your model takes a couple
minute or less to run, it's probably worth switching to cross-validation.

2. Metrics of model quality by simple Math and examples


After you make predictions, you need to know if they are any good.
Performance metrics like classification accuracy and root mean squared error can give you a clear objective
idea of how good a set of predictions is, and in turn how good the model is that generated them.
All the algorithms in machine learning rely on minimizing or maximizing a function, which we call
“objective function”. The groups of functions that are minimized are called “loss functions”. A loss
function is a measure of how good a prediction model does in terms of being able to predict the
expected outcome. A most commonly used method of finding the minimum point of function is
“gradient descent”.
Loss functions can be broadly categorized into 2 types: Classification and Regression Loss.

Regression functions predict a quantity, and classification functions predict a label.

3
2.1 RMSE (Root Mean Squared Error)
In machine Learning when we want to look at the accuracy of our model we take the root mean square of
the error that has occurred between the test values and the predicted values mathematically:
For a single value:
Let a= (predicted value- actual value) ^2 Let b=
mean of a = a (for single value) Then RMSE=
square root of b

For a wide set of values RMSE is defined as follows:

Graphically:

As you can see in this scattered graph the red dots are the actual values and the blue line is the set of
predicted values drawn by our model. Here X represents the distance between the actual value and the
predicted line this line represents the error, similarly, we can draw straight lines from each red dot to the
blue line. Taking mean of all those distances and squaring them and finally taking the root will give us RMSE
of our model.
A good model should have an RMSE value less than 180. Example :
Consider the following data
x y
5 6.4
15 13
25 14.2
35 21.5
45 27.5

Regression Equation:Y= 0.5 X+2.5 Calculate the RMSE (Root means Square Error) for the above data.
Ans:-
error
x y' y error=y'-y *error
5 6.4 5 1.4 1.96
15 13 10 3 9
25 14.2 15 -0.8 0.64
35 21.5 20 1.5 2.25
45 27.5 25 2.5 6.25
sum of square of error 20.1
sum of square of error /
no of observation 4.02
RSME 2.0049
4
MSE (Mean Squared Error)
Mean Square Error (MSE) is the most commonly used regression loss function. MSE is the sum of squared
distances between our target variable and predicted values.

Why use mean squared error


MSE is sensitive towards outliers . MSE is thus good to use if you believe that your target data,
conditioned on the input, is normally distributed around a mean value, and when it’s important to
penalize outliers extra much.
When to use mean squared error
Use MSE when doing regression, believing that your target, conditioned on the input, is normally
distributed, and want large errors to be significantly (quadratically) more penalized than small ones.
Example :
Consider the following data
x y
5 6.4
15 13
25 14.2
35 21.5
45 27.5

Regression Equation:Y= 0.5 X+2.5 Calculate the MSE ( means Square Error) for the above data.

error
x y' y error=y'-y *error
5 6.4 5 1.4 1.96
15 13 10 3 9
25 14.2 15 -0.8 0.64
35 21.5 20 1.5 2.25
45 27.5 25 2.5 6.25

sum of square of error 20.1


sum of square of error /
no of observation 4.02

You might also like