0% found this document useful (0 votes)
15 views

An Introduction To The WEKA Data Mining System

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

An Introduction To The WEKA Data Mining System

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

An Introduction to the WEKA Data Mining System

Zdravko Markov Ingrid Russell


Central Connecticut State University University of Hartford
New Britain, CT, USA West Hartford, CT, USA
01-860-832-2712 01-860-768-4191
[email protected] [email protected]

ABSTRACT applications of Weka will be used to illustrate the various Weka


This is a proposal for a half day tutorial on Weka, an open source topics presented:
Data Mining software package written in Java and available from • Web document classification. Some basic classification
www.cs.waikato.ac.nz/~ml/weka/index.html. The goal of the schemes provided by Weka (Nearest Neighbor, Naïve Bayes
tutorial is to introduce faculty to the package and to the and Decision trees) are used to create models of web
pedagogical possibilities for its use in the undergraduate computer documents in topic directories and then to classify new
science and engineering curricula. The Weka system provides a documents according to their topic.
rich set of powerful Machine Learning algorithms for Data
Mining tasks, some not found in commercial data mining systems. • Intelligent web browser. Web documents are labeled with the
preferences of web users and ML models are created. These
These include basic statistics and visualization tools, as well as
tools for pre-processing, classification, and clustering, all models are then used to classify documents returned by web
searches according to the user preferences.
available through an easy to use graphical user interface.
In this framework the following topics will be covered:
Data Mining studies algorithms and computational paradigms that
allow computers to discover structure in databases, perform • Data preprocessing and visualization
prediction and forecasting, and generally improve their • Attribute selection
performance through interaction with data. Machine learning is
concerned with building computer systems that have the ability to • Association rules
improve their performance in a given domain through experience. • Classification algorithms (OneR, Decision trees,
Machine learning and Data Mining are becoming increasingly Covering rules)
important areas of engineering and computer science and have
been successfully applied to a wide range of problems in science • Prediction algorithms (Naïve Bayes, Nearest neighbor,
and engineering. Recently, acknowledging the importance of Linear models)
these areas in computer science and engineering, more work is • Evaluation techniques
being done to incorporate these areas into the undergraduate
• Clustering (K-means, EM, Cobweb)
curriculum.
For each of these topics, examples of using Weka will be
Weka is a widely used package that is particularly popular for
presented. No background in machine learning or data mining is
educational purposes. It is the companion software package of the
needed.
book “Data Mining: Practical Machine Learning Tools and
Techniques” by Ian H. Witten and Eibe Frank. The Weka team Categories and Subject Descriptors
has been recently awarded with the 2005 ACM SIGKDD Service K.3.2 [Computers and Education]: Computer Science Education
Award for their development of the Weka system, including the
accompanying book. As Gregory Piatetsky-Shapiro writes in the General Terms: Experimentation
news item about this event (KDnuggets news, June 28, 2005),
“Weka is a landmark system in the history of the data mining and Keywords: Artificial Intelligence, Projects
machine learning research communities, because it is the only
toolkit that has gained such widespread adoption and survived for REFERENCES
an extended period of time (the first version of Weka was released [1] Kumar, A., Kumar, D., Russell, I., “Non-Traditional Projects
11 years ago)”. in the Undergraduate AI Course”, Proceedings of the Thirty-
The purpose of this tutorial is to present an introduction to the Seventh SIGCSE Technical Symposium on Computer Science
Weka system and outline the major approaches to using Weka for Education, ACM Press, New York, NY, February 2006.
teaching Machine Learning, Data and Web Mining. We will also [2] Markov, Z., Russell, I., Neller, T. Proceedings of the Thirty-
present our experiences using Weka as a main tool for Fifth Annual Frontiers in Education Conference, IEEE Press,
implementing Machine Learning and Web Mining student October 2005.
projects that have been developed in the framework of a National [3] Mitchell, T., Does Machine Learning Really Work, AI
Science Foundation grant. In this framework, two basic Magazine, Vol. 18, No. 3, AAAI Press, Fall 1997.
[4] Neller, T., Presser, C., Russell, I., Markov, Z., “Pedagogical
Copyright is held by the author/owner(s). Possibilities for the Dice Game Pig”, The Journal of
ITiCSE’06, June 26–28, 2006, Bologna, Italy. Computing Sciences in Colleges, 21(5), May 2006.
ACM 1-59593-055-8/06/0006.

367
[5] Neller, T., Markov, Z., Russell, I., “Clue Deduction: Professor [7] Russell, S., J. and Norvig, P., Artificial Intelligence: A
Plum Teaches Logic”, Proceedings of the International Modern Approach, Upper Saddle River, NJ: Prentice-Hall,
FLAIRS Conference, AAAI Press, May 2006. second edition, 2002.
[6] Russell, I., Markov, Z., Neller, T., “Unifying an Introduction [8] Witten, I.H. and frank, E., Data Mining: Practical Machine
to Artificial Intelligence Course through Machine Learning Learning Tools and techniques with Java Implementations,
Laboratory Experiences”, Proceedings of the 2005 Annual Morgan Kaufmann, 1999.
American Society for Engineering Education Conference,
June 2005.

368

You might also like