0% found this document useful (0 votes)
35 views

Part I - Installing Weka: HW Assignment 1

This homework assignment involves installing and running the Weka data mining software toolbox. Students are instructed to download the Weka software, load a training data set, use the software to construct an initial decision tree classifier on the training data, test the classifier on a separate test data set, and include the results in a submission. The assignment also involves proposing three ideas for a classification or prediction problem that could use publicly available, WRDS, or personal firm data.

Uploaded by

Leonard Tambunan
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views

Part I - Installing Weka: HW Assignment 1

This homework assignment involves installing and running the Weka data mining software toolbox. Students are instructed to download the Weka software, load a training data set, use the software to construct an initial decision tree classifier on the training data, test the classifier on a separate test data set, and include the results in a submission. The assignment also involves proposing three ideas for a classification or prediction problem that could use publicly available, WRDS, or personal firm data.

Uploaded by

Leonard Tambunan
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

HW Assignment 1 Due: June 19th 2009

Part I Installing Weka


The purpose of this assignment is to install and run Weka, a widely used, FREE, Data Mining Software Toolbox in Java. This homework will walk you through the basic steps of installing, running the software, building classifiers, and labeling test cases. For this assignment, you will need to download the TRAINING and TEST sets from the course website. Note: It is important that you properly install and learn how to run Weka because we will use Weka for future hands on assignments as well as for the data mining competition and course project.

Step 1: Installing Weka


Go to the Weka website, http://www.cs.waikato.ac.nz/ml/weka/, and download the software. On the left hand side, click on the link that says download. Select the appropriate link corresponding to the version of the software based on your operating system and whether or not you already have Java VM running on your machine (if you dont know what Java VM is, then you probably dont). The link will forward you to a site where you can download the software from a mirror site. Save the self-extracting executable to disk and then double click on it to install Weka. Answer yes or next to the questions during the installation. Click yes to accept the Java agreement if necessary. After you install the program Weka should appear on your start menu under Programs (if you are using Windows).

Step 2: Running Weka

From the start menu select Programs, then Weka, then Weka 3*. You will see the Weka GUI Chooser. Select Explorer. The Weka Explorer will then launch.

Step 3: Load Training Set


You will find the training set, TRAIN.arff on the course website. The training set includes the records you will use in your next homework assignment. The TRAINING set contains the following data:

On the Weka Explorer, push the button that says open file. Open TRAIN.arff.

Step 5: Constructing the Initial Decision Tree


Select the tab that says Classify. In the box that says classifier, you can choose a classifier. Click on the Choose button and you will be presented with a hierarchy of methods. Pick weka, classifiers, trees, J48. Click on the text box in the classifer box (which says J48 and some cryptic options instead of ZeroR which is the default classifier). In the popup, change the following settings, minNumObj to 1 and unpruned to True and then Click OK. (Note: The order the options appear might vary depending on which mirror site you choose. For example, we found minNumObj is closer to the top of the GUI in some versions)

You will find the test set, TEST.arff on the course website. The TEST set includes the records you will use in future homework assignments. The TEST set contains the data below. In the box that says test options, pick Supplied test set. Click on the Set button and select your TEST.arff file.

Now press Start!!!!!!!!!!!!! AND WATCH WEKA GO!

Step 6: Results
You may have to scroll up and down in the classifier output box to see all the results. Cut and paste the results in the classifier output window to a text editor and HAND IN (or email) with your assignment. You will compare these results with a future homework assignment. Dont worry that you dont yet know how to interpret the o utput. In a short time, you will. This exercise is only to get you started with WEKA. In the results box, on the bottom left, Right click on the item that says trees.J48. Select Visualize Classification Errors from the list. Click Save. And save the results as RESULTS.arff. This file will include your original TEST set plus an extra column for the predicted classification. Cut and paste the text in the RESULTS.arff file to the end of your assignment and HAND IN. So, for the first part of the assignment, you simply need to hand in (or email to me) a text document with the results output from Weka along with the prediction results found in your RESULTS.arff file

Part II: Classification/prediction problem ideas


List three prediction problem ideas for your class project based on publicly available data, Wharton research data services (wrds) data (see shawndra.pbwiki.com), or data you have at your firm

Append your three ideas to the text file with your answers to Part I.

You might also like