Python Meetup Talk 21/07/2009
Python Meetup Talk 21/07/2009
Data Mining
And the results are
A vision over the present and the future
July of 2009
Index
Introduction
Data Mining
The results
The future
The problem
Support tools
Objectives:
Analyse the use of data mining technology, to data stored in
support tools, with the aim to improve software quality.
Develop an experimental prototype tool.
Applications:
Reduce the error rate.
Provides a non-exploited source of documentation.
Provide a new source of support tools for IDE’s.
Data mining
Methods
Types of:
Traditional Data Mining (K-Means, C4.5, Bayesian Networks).
Relational Data Mining (ILP, Markov logic networks,
Relational bayesian methods, Dependency Networks).
Categories:
Clusterers
Classifiers
Associative rules
Network Models.
Data mining
Issue detection
Question: Has this file a non detected error. The exact number of
errors can be predicted to.
Error prediction
The Prototype
Presentation:
Graph Drawing: NetworkX, with nice result. There are some
other but they look incomplete.
GUI: PyQT, wxWindows, pyGTK. It’s your taste XD!.
SVN, CVS processing:
SVN: pysvn - Python interface to Subversion.
CVS: It seams nothing is available.
GIT: PyGit - Pythonic git bindings targeted towards
porcelains.
XML Processing could be done using built-in support and with any
SAX or DOM parser.
The future
Known issues:
Data preprocessing performance.
Database performance, is the relational model valid?
Dynamic procedure addition.
The Todo List:
Develop new procedures over different related topics, like
software visualization, change support, etc.
Develop a more mature software. Python could help in some
parts. This software must be easily extensible.
Improve the hole process performance.
The end
Question?