0% found this document useful (0 votes)

310 views7 pages

ML Projects For Final Year

Machine Learning, and it is dominating over every other technology today. The benefit of Machine Learning is that it helps you expand your horizons of thinking and helps you to build some of the amazing real-world ML projects For Final Year. https://takeoffprojects.com/ml-projects-for-final-year

Uploaded by

Alia Khan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

310 views7 pages

ML Projects For Final Year

Uploaded by

Alia Khan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

ML Projects For Final Year

The goal of this document is to provide a common framework for approaching machine learning
projects that can be referenced by practitioners. If you build ML models, this post is for you. If
you collaborate with people who build ML models, I hope that this guide provides you with a
good perspective on the common project workflow. Knowledge of machine learning is assumed.

Overview
This overview intends to serve as a project "checklist" for machine learning practitioners.
Subsequent sections will provide more detail.

Project lifecycle
Machine learning projects are highly iterative; as you progress through the ML lifecycle, you’ll
find yourself iterating on a section until reaching a satisfactory level of performance, then
proceeding forward to the next task (which may be circling back to an even earlier step).
Moreover, a project isn’t complete after you ship the first version; you get feedback from real-
world interactions and redefine the goals for the next iteration of deployment.

1. Planning and project setup

• Define the task and scope out requirements

• Determine project feasibility

• Discuss general model tradeoffs (accuracy vs speed)

• Set up project codebase

2. Data collection and labeling

• Define ground truth (create labeling documentation)

• Build data ingestion pipeline

• Validate quality of data

• Label data and ensure ground truth is well-definend

• Revisit Step 1 and ensure data is sufficient for the task

3. Model exploration
• Establish baselines for model performance

• Start with a simple model using initial data pipeline

• Overfit simple model to training data

• Stay nimble and try many parallel (isolated) ideas during early stages

• Find SoTA model for your problem domain (if available) and reproduce results, then apply to
your dataset as a second baseline

• Revisit Step 1 and ensure feasibility

• Revisit Step 2 and ensure data quality is sufficient

4. Model refinement
• Perform model-specific optimizations (ie. hyperparameter tuning)

• Iteratively debug model as complexity is added

• Perform error analysis to uncover common failure modes

• Revisit Step 2 for targeted data collection and labeling of observed failure modes

5. Testing and evaluation

• Evaluate model on test distribution; understand differences between train and test set
distributions (how is “data in the wild” different than what you trained on)

• Revisit model evaluation metric; ensure that this metric drives desirable downstream user
behavior

• Write tests for:

• Input data pipeline

• Model inference functionality

• Model inference performance on validation data

• Explicit scenarios expected in production (model is evaluated on a curated set of observations)

6. Model deployment
• Expose model via a REST API
• Deploy new model to small subset of users to ensure everything goes smoothly, then roll out to
all users

• Maintain the ability to roll back model to previous versions

• Monitor live data and model prediction distributions

7. Ongoing model maintenance

• Understand that changes can affect the system in unexpected ways

• Periodically retrain model to prevent model staleness

• If there is a transfer in model ownership, educate the new team

Team roles
A typical team is composed of:

• data engineer (builds the data ingestion pipelines)

• machine learning engineer (train and iterate models to perform the task)
• software engineer (aids with integrating machine learning model with the rest of the product)
• project manager (main point of contact with the client)
•

Planning and project setup

It may be tempting to skip this section and dive right in to "just see what the models can do".
Don't skip this section. All too often, you'll end up wasting time by delaying discussions
surrounding the project goals and model evaluation criteria. Everyone should be working
toward a common goal from the start of the project.

It's worth noting that defining the model task is not always straightforward. There's often many
different approaches you can take towards solving a problem and it's not always immediately
evident which is optimal. If your problem is vague and the modeling task is not clear, jump over
to my post on defining requirements for machine learning projects before proceeding.

Prioritizing projects
Ideal: project has high impact and high feasibility.
Mental models for evaluating project impact:

• Look for places where cheap prediction drives large value

• Look for complicated rule-based software where we can learn rules instead of programming
them
When evaluating projects, it can be useful to have a common language and understanding of the
differences between traditional software and machine learning software. Andrej
Karparthy's Software 2.0 is recommended reading for this topic.
Software 1.0
• Explicit instructions for a computer written by a programmer using a programming
language such as Python or C++. A human writes the logic such that when the system is
provided with data it will output the desired behavior.

Software 2.0
• Implicit instructions by providing data, "written" by an optimization algorithm
using parameters of specified model architecture. The system logic is learned from a provided
collection of data examples and their corresponding desired behaviour.

See this talk for more detail.

A quick note on Software 1.0 and Software 2.0 - these two paradigms are not mutually exclusive.
Software 2.0 is usually used to scale the logic component of traditional software systems by
leveraging large amounts of data to enable more complex or nuanced decision logic.
For example, Takeoff Projects about how the code for Google Translate used to be a very
complicated system consisting of ~500k lines of code. Google was able to simplify this product
by leveraging a machine learning model to perform the core logical task of translating text to a
different language, requiring only ~500 lines of code to describe the model. However, this
model still requires some "Software 1.0" code to process the user's query, invoke the machine
learning model, and return the desired information to the user.
In summary, machine learning can drive large value in applications where decision logic is
difficult or complicated for humans to write, but relatively easy for machines to learn. On that
note, we'll continue to the next section to discuss how to evaluate whether a task is "relatively
easy" for machines to learn.

Determining feasibility
Some useful questions to ask when determining the feasibility of a project:

• Cost of data acquisition

• How hard is it to acquire data?

• How expensive is data labeling?

• How much data will be needed?

• Cost of wrong predictions

• How frequently does the system need to be right to be useful?

• Are there scenarios where a wrong prediction incurs a large cost?

• Availability of good published work about similar problems

• Has the problem been reduced to practice?

• Is there sufficient literature on the problem?

• Are there pre-trained models we can leverage?

• Computational resources available both for training and inference

• Will the model be deployed in a resource-constrained environment?

• What are the latency requirements for the model?

Specifying project requirements

Establish a single value optimization metric for the project. Can also include several
other satisficing metrics (ie. performance thresholds) to evaluate models, but can
only optimize a single metric.
Example:
• Optimize for accuracy

• Prediction latency under 10 ms

• Model requires no more than 1gb of memory

• 90% coverage (model confidence exceeds required threshold to consider a prediction as valid)
The optimization metric may be a weighted sum of many things which we care about. Revisit
this metric as performance improves.

Some teams may choose to ignore a certain requirement at the start of the project, with the goal
of revising their solution (to meet the ignored requirements) after they have discovered a
promising general approach.

Decide at what point you will ship your first model.

Some teams aim for a “neutral” first launch: a first launch that explicitly deprioritizes machine
learning gains, to avoid getting distracted.

The motivation behind this approach is that the first deployment should involve a simple model
with focus spent on building the proper machine learning pipeline required for prediction. This
allows you to deliver value quickly and avoid the trap of spending too much of your time trying
to “squeeze the juice”

Setting up a ML codebase
A well-organized machine learning codebase should modularize data processing, model
definition, model training, and experiment management.

Example codebase organization:

configs/
baseline.yaml
latest.yaml
data/
docker/
project_name/
api/
app.py
models/
base.py
simple_baseline.py
cnn.py
datasets.py
train.py
experiment.py
scripts/

data/ provides a place to store raw and processed data for your project. You can also include
a data/README.md file which describes the data for your project.

docker/ is a place to specify one or many Dockerfiles for the project. Docker (and other
container solutions) help ensure consistent behavior across multiple machines and
deployments.

api/app.py exposes the model through a REST client for predictions. You will likely choose to
load the (trained) model from a model registry rather than importing directly from your library.

models/ defines a collection of machine learning models for the task, unified by a common API
defined in base.py. These models include code for any necessary data preprocessing and output
normalization.

datasets.py manages construction of the dataset. Handles data pipelining/staging areas,

shuffling, reading from disk.

experiment.py manages the experiment process of evaluating multiple models/ideas. This

constructs the dataset and models for a given experiment.

train.py defines the actual training loop for the model. This code interacts with the optimizer
and handles logging during training.

PHP MySQL Basic - Training Slides
83% (12)
PHP MySQL Basic - Training Slides
81 pages
Random Forest PDF
No ratings yet
Random Forest PDF
92 pages
ML Lab Manual
100% (1)
ML Lab Manual
37 pages
Java - 8 - Features Notes
No ratings yet
Java - 8 - Features Notes
39 pages
Evaluation Metrics in Machine Learning
No ratings yet
Evaluation Metrics in Machine Learning
14 pages
CCS355 Neural Networks and Deep Learning Lab
No ratings yet
CCS355 Neural Networks and Deep Learning Lab
43 pages
Deep Learning Lab Practicals
No ratings yet
Deep Learning Lab Practicals
24 pages
ML MU Unit 2
100% (3)
ML MU Unit 2
84 pages
Deep Learning
No ratings yet
Deep Learning
127 pages
Ensemble Methods - Bagging, Boosting and Stacking - Towards Data Science PDF
No ratings yet
Ensemble Methods - Bagging, Boosting and Stacking - Towards Data Science PDF
37 pages
Bagging+Boosting+Gradient Boosting
100% (1)
Bagging+Boosting+Gradient Boosting
48 pages
Adv - Java Means Durga Sir... : Durgasoft, Plot No: 202, Iind Floor, Huda Maitrivanam, Ameerpet, Hyderabad-500038
No ratings yet
Adv - Java Means Durga Sir... : Durgasoft, Plot No: 202, Iind Floor, Huda Maitrivanam, Ameerpet, Hyderabad-500038
14 pages
Module-2 - Logistic Regression in Machine Learning
No ratings yet
Module-2 - Logistic Regression in Machine Learning
28 pages
Parallelism of Statistics and Machine Learning & Logistic Regression Versus Random Forest
100% (1)
Parallelism of Statistics and Machine Learning & Logistic Regression Versus Random Forest
72 pages
Angular 11: by Chandan Naresh Technical Consultant
No ratings yet
Angular 11: by Chandan Naresh Technical Consultant
25 pages
Unit - 4 Machine Learning
100% (1)
Unit - 4 Machine Learning
84 pages
A Comparison of Classification Techniques On Prediction of Student Performance
No ratings yet
A Comparison of Classification Techniques On Prediction of Student Performance
6 pages
Data Science
No ratings yet
Data Science
39 pages
Sajjadiani Et Al - 2019 - Using Machine Learning To Translate Applicant Work History Into Predictors of
No ratings yet
Sajjadiani Et Al - 2019 - Using Machine Learning To Translate Applicant Work History Into Predictors of
61 pages
Assignment of Decision Tree in Machine Learning
No ratings yet
Assignment of Decision Tree in Machine Learning
15 pages
Unit V - Classification and Prediction 2020-21
100% (1)
Unit V - Classification and Prediction 2020-21
68 pages
Outliers, Hypothesis and Natural Language Processing
100% (1)
Outliers, Hypothesis and Natural Language Processing
7 pages
U02Lecture07 Classification
100% (1)
U02Lecture07 Classification
56 pages
Ensemble Methods Bagging Boosting and Stacking
100% (1)
Ensemble Methods Bagging Boosting and Stacking
19 pages
3 Regression Diagnostics
100% (1)
3 Regression Diagnostics
53 pages
Thesis Wordpress Expert
100% (2)
Thesis Wordpress Expert
6 pages
Code ExerciseModelSelection
100% (1)
Code ExerciseModelSelection
19 pages
Crime Prediction in Nigeria's Higer Institutions
No ratings yet
Crime Prediction in Nigeria's Higer Institutions
13 pages
Heart: Our "Goal" Predict The Presence of Heart Disease in The Patient
100% (1)
Heart: Our "Goal" Predict The Presence of Heart Disease in The Patient
73 pages
Introducing JSON: ECMA-404 The JSON Data Interchange Standard
No ratings yet
Introducing JSON: ECMA-404 The JSON Data Interchange Standard
6 pages
Advanced Modifying Images
0% (1)
Advanced Modifying Images
3 pages
Machine Learning Theory
100% (1)
Machine Learning Theory
12 pages
Madan Krishna Siwakoti: Java Developer Professional Summary
No ratings yet
Madan Krishna Siwakoti: Java Developer Professional Summary
3 pages
Week 2 - Select and Train A Model
No ratings yet
Week 2 - Select and Train A Model
29 pages
The Multilayer Perceptron
No ratings yet
The Multilayer Perceptron
11 pages
Oil Export Indonesia
100% (1)
Oil Export Indonesia
12 pages
Machine Learning Report
No ratings yet
Machine Learning Report
58 pages
Lecture Week 2 KNN and Model Evaluation PDF
100% (1)
Lecture Week 2 KNN and Model Evaluation PDF
53 pages
ML0101EN Clas K Nearest Neighbors CustCat Py v1
100% (1)
ML0101EN Clas K Nearest Neighbors CustCat Py v1
11 pages
Odoo Development
100% (1)
Odoo Development
27 pages
C2M2 - Assignment: 1 Risk Models Using Tree-Based Models
100% (1)
C2M2 - Assignment: 1 Risk Models Using Tree-Based Models
38 pages
Variosalgoritmos - Jupyter Notebook
100% (1)
Variosalgoritmos - Jupyter Notebook
9 pages
Lecture 9 PDF
100% (1)
Lecture 9 PDF
28 pages
### Data Exploration: 'Yes' 'No' 'Agency' 'Direct' 'Employee Referral' 'Yes' 'No'
100% (1)
### Data Exploration: 'Yes' 'No' 'Agency' 'Direct' 'Employee Referral' 'Yes' 'No'
6 pages
Assignment No - 6-1
100% (1)
Assignment No - 6-1
3 pages
Artificial Intelligence and Deep Learning
0% (1)
Artificial Intelligence and Deep Learning
9 pages
Heart Disease Prediction - Jupyter Notebook
100% (1)
Heart Disease Prediction - Jupyter Notebook
9 pages
Lecture 03 Gradient Descent
No ratings yet
Lecture 03 Gradient Descent
26 pages
Graph Neural Network The Next Frontier in Deep Learning
No ratings yet
Graph Neural Network The Next Frontier in Deep Learning
1 page
Seminar Report Machine Learning
No ratings yet
Seminar Report Machine Learning
20 pages
Machine Learning Mini-Project Report
No ratings yet
Machine Learning Mini-Project Report
26 pages
Roadmap To Build A Machine Learning Model
No ratings yet
Roadmap To Build A Machine Learning Model
12 pages
Ajax Unit1
No ratings yet
Ajax Unit1
64 pages
Web MCQ
No ratings yet
Web MCQ
74 pages
Vue vs. Angular vs. React - A 2023 Comparison of JS Frameworks
No ratings yet
Vue vs. Angular vs. React - A 2023 Comparison of JS Frameworks
16 pages
Data Preprocesing JavaPoint
No ratings yet
Data Preprocesing JavaPoint
19 pages
Customizations A - Z: Tech Trek 2019
No ratings yet
Customizations A - Z: Tech Trek 2019
39 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
5 pages
Comp519: Web Programming Autumn 2014: Client-Side Programming With Javascript
No ratings yet
Comp519: Web Programming Autumn 2014: Client-Side Programming With Javascript
30 pages
Instrument Cluster Driver Information Di
No ratings yet
Instrument Cluster Driver Information Di
12 pages
Front End Development
No ratings yet
Front End Development
13 pages
Profound Brochure All Courses 2024
No ratings yet
Profound Brochure All Courses 2024
10 pages
Difference Between Machine Learning and Traditional Programming
No ratings yet
Difference Between Machine Learning and Traditional Programming
11 pages
Deep Learning Based Recommendation Systems
No ratings yet
Deep Learning Based Recommendation Systems
47 pages
MLOps
No ratings yet
MLOps
16 pages
Enhancing Machine Learning Algorithms For Predictive Analytics in Healthcare - A Comparative Study and Optimization Approach
No ratings yet
Enhancing Machine Learning Algorithms For Predictive Analytics in Healthcare - A Comparative Study and Optimization Approach
53 pages
Data Science Lab: Introduction To Python
No ratings yet
Data Science Lab: Introduction To Python
21 pages
Styled Componenets
No ratings yet
Styled Componenets
7 pages
כתב צופים
No ratings yet
כתב צופים
2 pages
Gaurav Resume
No ratings yet
Gaurav Resume
3 pages
Build A Machine Learning Portfolio
No ratings yet
Build A Machine Learning Portfolio
18 pages
C# Concepts
No ratings yet
C# Concepts
6 pages
Fourth Semester: Business English and Communication-IV
No ratings yet
Fourth Semester: Business English and Communication-IV
8 pages
02 ML Supervised Learning
No ratings yet
02 ML Supervised Learning
32 pages
JavaScript Anonymous Functions
No ratings yet
JavaScript Anonymous Functions
4 pages
Attachment
No ratings yet
Attachment
1 page
Sodapdf
No ratings yet
Sodapdf
1 page
My Resume
No ratings yet
My Resume
1 page
Nireesh Kumar Paidi - Updated Resume
No ratings yet
Nireesh Kumar Paidi - Updated Resume
5 pages
Learn JavaScript - Arrays Cheatsheet - Codecademy
No ratings yet
Learn JavaScript - Arrays Cheatsheet - Codecademy
2 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
3 pages
Diabetes Prediction Using Data Mining
No ratings yet
Diabetes Prediction Using Data Mining
17 pages
Loading The Dataset: 'Churn - Modelling - CSV'
No ratings yet
Loading The Dataset: 'Churn - Modelling - CSV'
6 pages
Machine Learning Guide Line
No ratings yet
Machine Learning Guide Line
10 pages
JD - Software Developer - .NET + Angular
No ratings yet
JD - Software Developer - .NET + Angular
2 pages
Building Powerful Image Classification Models Using Very Little Data
No ratings yet
Building Powerful Image Classification Models Using Very Little Data
20 pages
Understanding DBSCAN Algorithm and Implementation From Scratch - by Andrewngai - Towards Data Science
No ratings yet
Understanding DBSCAN Algorithm and Implementation From Scratch - by Andrewngai - Towards Data Science
10 pages
Predicting Cardiovascular Disease Using Logistic Regression Research Paper
No ratings yet
Predicting Cardiovascular Disease Using Logistic Regression Research Paper
4 pages
Support Vector Machines: Dominik Wisniewski Wojciech Wawrzyniak
No ratings yet
Support Vector Machines: Dominik Wisniewski Wojciech Wawrzyniak
16 pages
Performance Evaluation of Machine Learning Algorithms in Post-Operative Life Expectancy in The Lung Cancer Patients
No ratings yet
Performance Evaluation of Machine Learning Algorithms in Post-Operative Life Expectancy in The Lung Cancer Patients
11 pages

ML Projects For Final Year

Uploaded by

ML Projects For Final Year

Uploaded by

ML Projects For Final Year

1. Planning and project setup

• Determine project feasibility

• Discuss general model tradeoffs (accuracy vs speed)

• Set up project codebase

2. Data collection and labeling

• Build data ingestion pipeline

• Label data and ensure ground truth is well-definend

• Revisit Step 1 and ensure data is sufficient for the task

• Start with a simple model using initial data pipeline

• Overfit simple model to training data

• Revisit Step 1 and ensure feasibility

• Revisit Step 2 and ensure data quality is sufficient

• Iteratively debug model as complexity is added

• Perform error analysis to uncover common failure modes

5. Testing and evaluation

• Write tests for:

• Input data pipeline

• Model inference functionality

• Model inference performance on validation data

• Explicit scenarios expected in production (model is evaluated on a curated set of observations)

• Maintain the ability to roll back model to previous versions

• Monitor live data and model prediction distributions

7. Ongoing model maintenance

• Periodically retrain model to prevent model staleness

• If there is a transfer in model ownership, educate the new team

• data engineer (builds the data ingestion pipelines)

Planning and project setup

• Look for places where cheap prediction drives large value

See this talk for more detail.

• Cost of data acquisition

• How hard is it to acquire data?

• How expensive is data labeling?

• How much data will be needed?

• Cost of wrong predictions

• How frequently does the system need to be right to be useful?

• Are there scenarios where a wrong prediction incurs a large cost?

• Availability of good published work about similar problems

• Has the problem been reduced to practice?

• Is there sufficient literature on the problem?

• Are there pre-trained models we can leverage?

• Computational resources available both for training and inference

• What are the latency requirements for the model?

Specifying project requirements

• Prediction latency under 10 ms

• Model requires no more than 1gb of memory

Decide at what point you will ship your first model.

Example codebase organization:

datasets.py manages construction of the dataset. Handles data pipelining/staging areas,

experiment.py manages the experiment process of evaluating multiple models/ideas. This

You might also like