Project Ideas
Project Ideas
Project Ideas
1/10 http://MachineLearningMastery.com
Small Projects Methodology
Project Ideas
http://SmallProjectsMethodology.com
2/10 http://MachineLearningMastery.com
Small Projects Methodology
Project Ideas
Table of Contents
Introduction
Study a Machine Learning Tool
Beginner
Intermediate
Advanced
Study a Machine Learning Dataset
Beginner
Intermediate
Advanced
Study a Machine Learning Algorithm
Beginner
Intermediate
Advanced
3/10 http://MachineLearningMastery.com
Small Projects Methodology
Project Ideas
Introduction
This guide provides ideas for small projects that you can complete.
Project ideas are listed for the of small project types listed below:
● Beginner
● Intermediate
● Advanced
You can use these ideas directly and complete small projects. You can also use the ideas as
inspiration and tailor them to problems and algorithms of your own choosing.
4/10 http://MachineLearningMastery.com
Small Projects Methodology
Project Ideas
Note that suggestions are on free open source tools for accessibility reasons.
Beginner
Beginner projects focus on identifying and describing the tools and platforms the are available.
Intermediate
Intermediate projects focus on learning how to effectively use machine learning tools.
5/10 http://MachineLearningMastery.com
Small Projects Methodology
Project Ideas
Advanced
Advanced projects focus on teaching and extending machine learning tools.
6/10 http://MachineLearningMastery.com
Small Projects Methodology
Project Ideas
Note that all specifically mentioned datasets were selected from the in the UCI Machine
Learning Repository. I suggest sticking to small experimental datasets because they are
typically small and fit into memory and there are existing results that you can read about.
Beginner
Beginner projects focus on summarizing data.
Intermediate
Intermediate projects focus on summarizing data and running small experiments on the data.
● Perform a cluster analysis on the Iris dataset and compare identified clusters to classes.
● Create pairwise scatter plots of the attributes in the Yeast dataset
● Run feature analysis on the attributes in the Poker Hand dataset
● Perform a spot check of tree-based algorithms on the Abalone dataset.
● Investigate various projection methods on the Forest Fires dataset and visualize the
results.
● Report the best performing SVM kernel on the Heart Disease dataset.
● Perform a feature selection analysis on the Internet Advertisements dataset.
● Investigate various projection methods on the Wine Quality dataset and visualize the
results.
● Perform a spot check of regression regularization on the Internet Advertisements
dataset.
● Investigate clustering algorithms on the Heart Disease dataset and compare to classes.
7/10 http://MachineLearningMastery.com
Small Projects Methodology
Project Ideas
Advanced
Advanced projects focus on designing and running fuller more complete experiments on the
data.
● Report on the best result you can achieve for the Iris dataset (the most studied dataset!).
● Perform a feature analysis and report on the best SVM result for the Wisconsin Breast
Cancer dataset.
● Perform a study on the Wine Quality dataset and identify a well performing algorithm and
its best tuned parameterization using 10-fold cross validation.
● Investigate and report on the best performing LVQ method on the Pima Indians Diabetes
dataset.
● Report on the best configuration of LASSO for the regression regularization on the
Internet Advertisements dataset.
● Perform a study on the Heart Disease and identify a well performing algorithm and its
best tuned parameterization using 10-fold cross validation.
● Investigate and report on the best configuration of stacked autoencoders on the KDD
Cup 1998 dataset (return from a direct mailing).
● Investigate and report on the best classification result you can achieve on the Statlog
Shuttle dataset.
● Report on the best configuration of the Random Forest algorithm for the Adult dataset.
● Report and investigate the best configuration of convolutional networks on the Dorothea
dataset.
8/10 http://MachineLearningMastery.com
Small Projects Methodology
Project Ideas
Beginner
Beginner projects focus on researching and describing an algorithm.
● Research and describe the best practices for configuring and applying the logistic
regression algorithm (at least 10 bullet points).
● Research and describe the k-nearest neighbor algorithm using a flow diagram.
● List and summarize the parameters to the random forest algorithm.
● Research and describe the standard backpropagation algorithm using pseudocode (may
require reading one or more open source implementations).
● Research and describe the information processing objectives of the naive bayes
algorithm.
● Create a flow chart of the gradient boosted machines algorithm.
● Describe the type of problems best suited for the k-Means algorithm.
● List the primary sources for the Learning Vector Quantization algorithm.
● Research and summarize the different types of support vector machine algorithms.
● Describe the main uses for the Principal Component Analysis algorithm.
Intermediate
Intermediate projects focus on fuller descriptions of algorithms and characterizing specific
behaviors of an algorithm, one one dataset.
9/10 http://MachineLearningMastery.com
Small Projects Methodology
Project Ideas
● Compare and contrast pruning methods on stepwise regression over one regression
dataset.
● Create a detailed description of the Multidimensional Scaling algorithm.
Advanced
Advanced projects focus on characterizing the interactions of behaviors of an algorithm,
typically across multiple datasets.
● Describe the effect of various scalings of input data (0-1, zero mean, -1 to 1, etc.) on the
behavior of the Logistic regression algorithm.
● Summarize the effect of varying the distance measure (Euclidean, Manhattan, etc.) and
k for the k-nearest neighbor algorithm over multiple datasets.
● Describe the interactions of momentum and learning rate on the back propagation
algorithm.
● Describe the effect of varying the kernel function in the Support Vector Machine
algorithm.
● Contrast the linear, quadratic, flexible, and mixture discriminant analysis methods.
● Describe the interaction of the number of layers and size of layers when using stacked
auto-encoders.
● Describe the effect of regularization with varying lambdas of gradient boosted machines.
● Describe the relationship between tree depth and number of trees in the random forest
algorithm.
● Investigate the effects of different linear methods as the blending algorithm in stacked
generalization across multiple regression problems.
● Investigate the effect of different stopping criteria on logistic regression.
10/10 http://MachineLearningMastery.com