0% found this document useful (0 votes)
72 views10 pages

Project Ideas

Uploaded by

itonpass89
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
72 views10 pages

Project Ideas

Uploaded by

itonpass89
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Small Projects Methodology

Project Ideas

1/10 http://MachineLearningMastery.com
Small Projects Methodology
Project Ideas

Machine Learning Mastery


Web: http://MachineLearningMastery.com
Email: [email protected]

Small Projects Methodology


Project Ideas
by Jason Brownlee, PhD

Copyright © 2014 Jason Brownlee, All Rights Reserved.

Share this Guide


If you know someone who can benefit from this guide, just send them this link:

http://SmallProjectsMethodology.com

2/10 http://MachineLearningMastery.com
Small Projects Methodology
Project Ideas

Table of Contents
Introduction
Study a Machine Learning Tool
Beginner
Intermediate
Advanced
Study a Machine Learning Dataset
Beginner
Intermediate
Advanced
Study a Machine Learning Algorithm
Beginner
Intermediate
Advanced

3/10 http://MachineLearningMastery.com
Small Projects Methodology
Project Ideas

Introduction
This guide provides ideas for small projects that you can complete.

Project ideas are listed for the of small project types listed below:

● Study a Machine Learning Tool


● Study a Machine Learning Dataset
● Study a Machine Learning Algorithm

Project ideas are also separated by skill level:

● Beginner
● Intermediate
● Advanced

You can use these ideas directly and complete small projects. You can also use the ideas as
inspiration and tailor them to problems and algorithms of your own choosing.

4/10 http://MachineLearningMastery.com
Small Projects Methodology
Project Ideas

Study a Machine Learning Tool


You may use this type of project to build up an understanding of many types of machine
learning tools or to target a specific tool and build up a deep knowledge of how to use it.

Note that suggestions are on free open source tools for accessibility reasons.

Beginner
Beginner projects focus on identifying and describing the tools and platforms the are available.

● List 10 benefits of using scikit-learn for machine learning


● Describe 10 benefits and limitations of using R for machine learning.
● Create a list and summary of 10 popular Python machine learning frameworks.
● Compare and contrast Weka, R and scikit-learn for machine learning using 5 attributes
of the tools.
● Identify and describe 10 libraries and frameworks that offer implementations of support
vector machines.
● Describe 10 features and benefits of using Weka for machine learning work.
● Compare and contrast 10 libraries and frameworks for deep learning.
● Summarize the top 10 popular Java machine learning tools and libraries
● Summaries the features of the Waffles machine learning framework.
● Identify and describe 10 libraries and frameworks that offer implementations of random
forest.

Intermediate
Intermediate projects focus on learning how to effectively use machine learning tools.

● Complete the Generalized Linear Models tutorials for scikit-learn.


● Use the Weka explorer interface to investigate bagging different types of models on the
Yeast dataset.
● Complete the Waffles examples for visualizing data.
● Compare decision tree algorithms on the Iris dataset using the Weka Experimenter
interface.
● Complete the Orange python tutorial for classification.
● Complete tutorials on how to use the caret package in R.
● Complete Octave homework from the Stanford Machine Learning course.
● Complete tutorials for running convolutional networks on the MNIST dataset using
pylearn2.
● Complete the nearest neighbors tutorials for scikit-learn.
● Complete the tutorials for using Google Predict.

5/10 http://MachineLearningMastery.com
Small Projects Methodology
Project Ideas

Advanced
Advanced projects focus on teaching and extending machine learning tools.

● Create tutorials for how to use the kernlab R package.


● Write a tutorial on how to use stacked autoencoders in pylearn to to address the
Wisconsin Breast Cancer dataset.
● Create a tutorial on how to use the Weka command line interface to experiment with
ensemble methods.
● Implement a generic cross validation test harness in R.
● Create an implementation of logistic regression as a plugin for Weka.
● Implement the Perceptron algorithm for the Orange machine learning platform.
● Create tutorials on how to use different neural networks in the RSNNS R package.
● Create a tutorial on how to use the Weka experimenter interface to perform a sensitivity
analysis on an algorithm.
● Implement a Support Vector Machine algorithm for scikit-learn.
● Create a tutorial on how to use Google Predict on a larger experimental dataset such as
the Kaggle Heritage Prize.

6/10 http://MachineLearningMastery.com
Small Projects Methodology
Project Ideas

Study a Machine Learning Dataset


You can use this type of project to build up an understanding of how to address multiple
different dataset types or to learn deeply about a specific dataset.

Note that all specifically mentioned datasets were selected from the in the UCI Machine
Learning Repository. I suggest sticking to small experimental datasets because they are
typically small and fit into memory and there are existing results that you can read about.

Beginner
Beginner projects focus on summarizing data.

● Locate and describe 5 standard experimental multi-class classification datasets.


● Describe the Wine Quality dataset using 5-number summaries.
● Describe the correlations between attributes in the Iris dataset
● Create histograms of the attributes in the Yeast dataset and show breakdowns by class.
● Summarize the attributes in the Adult dataset.
● Create boxplots of the attributes in the Wisconsin Breast Cancer dataset.
● Summarize the Car Evaluation dataset using histograms showing output classes.
● Identify and describe 5 standard experimental regression datasets with large numbers of
attributes
● Describe the attributes of the Iris dataset using 5 number summaries.
● Locate and describe 5 standard experimental 2-class output datasets.

Intermediate
Intermediate projects focus on summarizing data and running small experiments on the data.

● Perform a cluster analysis on the Iris dataset and compare identified clusters to classes.
● Create pairwise scatter plots of the attributes in the Yeast dataset
● Run feature analysis on the attributes in the Poker Hand dataset
● Perform a spot check of tree-based algorithms on the Abalone dataset.
● Investigate various projection methods on the Forest Fires dataset and visualize the
results.
● Report the best performing SVM kernel on the Heart Disease dataset.
● Perform a feature selection analysis on the Internet Advertisements dataset.
● Investigate various projection methods on the Wine Quality dataset and visualize the
results.
● Perform a spot check of regression regularization on the Internet Advertisements
dataset.
● Investigate clustering algorithms on the Heart Disease dataset and compare to classes.

7/10 http://MachineLearningMastery.com
Small Projects Methodology
Project Ideas

Advanced
Advanced projects focus on designing and running fuller more complete experiments on the
data.

● Report on the best result you can achieve for the Iris dataset (the most studied dataset!).
● Perform a feature analysis and report on the best SVM result for the Wisconsin Breast
Cancer dataset.
● Perform a study on the Wine Quality dataset and identify a well performing algorithm and
its best tuned parameterization using 10-fold cross validation.
● Investigate and report on the best performing LVQ method on the Pima Indians Diabetes
dataset.
● Report on the best configuration of LASSO for the regression regularization on the
Internet Advertisements dataset.
● Perform a study on the Heart Disease and identify a well performing algorithm and its
best tuned parameterization using 10-fold cross validation.
● Investigate and report on the best configuration of stacked autoencoders on the KDD
Cup 1998 dataset (return from a direct mailing).
● Investigate and report on the best classification result you can achieve on the Statlog
Shuttle dataset.
● Report on the best configuration of the Random Forest algorithm for the Adult dataset.
● Report and investigate the best configuration of convolutional networks on the Dorothea
dataset.

8/10 http://MachineLearningMastery.com
Small Projects Methodology
Project Ideas

Study a Machine Learning Algorithm


You can use this type of project to build up a breadth of experience across multiple machine
learning algorithms or to dive deeply into a specific algorithm.

Beginner
Beginner projects focus on researching and describing an algorithm.

● Research and describe the best practices for configuring and applying the logistic
regression algorithm (at least 10 bullet points).
● Research and describe the k-nearest neighbor algorithm using a flow diagram.
● List and summarize the parameters to the random forest algorithm.
● Research and describe the standard backpropagation algorithm using pseudocode (may
require reading one or more open source implementations).
● Research and describe the information processing objectives of the naive bayes
algorithm.
● Create a flow chart of the gradient boosted machines algorithm.
● Describe the type of problems best suited for the k-Means algorithm.
● List the primary sources for the Learning Vector Quantization algorithm.
● Research and summarize the different types of support vector machine algorithms.
● Describe the main uses for the Principal Component Analysis algorithm.

Intermediate
Intermediate projects focus on fuller descriptions of algorithms and characterizing specific
behaviors of an algorithm, one one dataset.

● Create a description of the Linear Discriminant Analysis algorithm including information


processing objective, pseudocode, heuristics and primary sources.
● Describe the effect of varying the k parameter on the k-nearest neighbors algorithm on
one classification dataset.
● Select one regression dataset and investigate the using of different transfer functions in
the standard back propagation algorithm.
● Investigate the effect of varying maximum tree depth in C4.5, CART (or similar decision
tree method) on one or more datasets.
● Create a detailed description of the LOESS algorithm.
● Select one regression dataset and describe the effect of varying lambda in the LASSO
method.
● Run a Principal Component Analysis on a high dimensional dataset and visualize the
result.
● Describe the effect of varying the data subset size in boosting with linear regression on a
2d dataset, visualize each sub-model and the combined model.

9/10 http://MachineLearningMastery.com
Small Projects Methodology
Project Ideas

● Compare and contrast pruning methods on stepwise regression over one regression
dataset.
● Create a detailed description of the Multidimensional Scaling algorithm.

Advanced
Advanced projects focus on characterizing the interactions of behaviors of an algorithm,
typically across multiple datasets.

● Describe the effect of various scalings of input data (0-1, zero mean, -1 to 1, etc.) on the
behavior of the Logistic regression algorithm.
● Summarize the effect of varying the distance measure (Euclidean, Manhattan, etc.) and
k for the k-nearest neighbor algorithm over multiple datasets.
● Describe the interactions of momentum and learning rate on the back propagation
algorithm.
● Describe the effect of varying the kernel function in the Support Vector Machine
algorithm.
● Contrast the linear, quadratic, flexible, and mixture discriminant analysis methods.
● Describe the interaction of the number of layers and size of layers when using stacked
auto-encoders.
● Describe the effect of regularization with varying lambdas of gradient boosted machines.
● Describe the relationship between tree depth and number of trees in the random forest
algorithm.
● Investigate the effects of different linear methods as the blending algorithm in stacked
generalization across multiple regression problems.
● Investigate the effect of different stopping criteria on logistic regression.

10/10 http://MachineLearningMastery.com

You might also like