Data Mining Concepts and Techniques
Data Mining Concepts and Techniques
◼ Retail industry
◼ Telecommunication industry
sets
◼ Need multiple dimensional view in selection
◼ Data types: relational, transactional, text, time sequence,
spatial?
◼ System issues
◼ running on only one or on several operating systems?
◼ a client/server architecture?
◼ Data sources
◼ ASCII text files, multiple relational data sources
◼ Scalability
◼ Row (or database size) scalability
◼ SGI MineSet
◼ Multiple data mining algorithms and advanced statistics
◼ Clementine (SPSS)
◼ An integrated data mining development environment
for end-users and developers
◼ Multiple data mining algorithms and visualization tools
◼ Purpose of Visualization
◼ Gain insight into an information space by mapping
data onto graphical primitives
◼ Provide qualitative overview of large data sets
◼ Search for patterns, trends, structure, irregularities,
relationships among data.
◼ Help find interesting regions and suitable parameters
for further quantitative analysis.
◼ Provide a visual proof of computer representations
derived
◼ Data visualization
◼ Data in a database or data warehouse can be viewed
dimensions
◼ Data can be presented in various visual forms
Understand
variations with
visualized data
regression
◼ Mixed-effect models
◼ For analyzing grouped data, i.e. data that can be classified
according to one or more grouping variables
◼ Typically describe relationships between a response variable and
some covariates in data grouped according to one or more factors
◼ Regression trees
◼ Binary trees used for classification
and prediction
◼ Similar to decision trees:Tests are
performed at the internal nodes
◼ In a regression tree the mean of the
objective attribute is computed and
used as the predicted value
◼ Analysis of variance
◼ Analyze experimental data for two or
◼ Factor analysis
◼ determine which variables are
combined to generate a given factor
◼ e.g., for many psychiatric data, one
can indirectly measure other
quantities (such as test scores) that
reflect the factor of interest
◼ Discriminant analysis
◼ predict a categorical response
variable, commonly used in social
science
◼ Attempts to determine several
discriminant functions (linear
combinations of the independent
variables) that discriminate among
the groups defined by the response
variable
February 18, 2022 Data Mining: Concepts and Techniques 36
Scientific and Statistical Data Mining (5)
◼ Survival analysis
◼ predicts the
probability that a
patient undergoing
a medical
treatment would
survive at least to
time t (life span
prediction)
February 18, 2022 Data Mining: Concepts and Techniques 37
Theoretical Foundations of Data Mining (1)
◼ Data reduction
◼ The basis of data mining is to reduce the data
representation
◼ Trades accuracy for speed in response
◼ Data compression
◼ The basis of data mining is to compress the given
data by encoding in terms of bits, association rules,
decision trees, clusters, etc.
◼ Pattern discovery
◼ The basis of data mining is to discover patterns
occurring in the database, such as associations,
classification models, sequential patterns, etc.
◼ Probability theory
◼ The basis of data mining is to discover joint probability
databases,
◼ The task is to query the data and the theory (i.e., patterns)
of the database
◼ Popular among many researchers in database systems
◼ Biometric encryption
◼ Anonymous databases
◼ Application exploration
◼ development of application-specific data mining
system
◼ Invisible data mining (mining as built-in function)
◼ J. L. Devore. Probability and Statistics for Engineering and the Science, 4th ed. Duxbury
Press, 1995.
◼ A. J. Dobson. An Introduction to Generalized Linear Models. Chapman and Hall, 1990.
◼ B. Gates. Business @ the Speed of Thought. New York: Warner Books, 1999.
◼ M. Goebel and L. Gruenwald. A survey of data mining and knowledge discovery
software tools. SIGKDD Explorations, 1:20-33, 1999.
◼ D. Gusfield. Algorithms on Strings, Trees and Sequences, Computer Science and
Computation Biology. Cambridge University Press, New York, 1997.
◼ J. Han, Y. Huang, N. Cercone, and Y. Fu. Intelligent query answering by knowledge
discovery techniques. IEEE Trans. Knowledge and Data Engineering, 8:373-390, 1996.
◼ R. C. Higgins. Analysis for Financial Management. Irwin/McGraw-Hill, 1997.
◼ C. H. Huberty. Applied Discriminant Analysis. New York: John Wiley & Sons, 1994.
◼ T. Imielinski and H. Mannila. A database perspective on knowledge discovery.
Communications of ACM, 39:58-64, 1996.
◼ D. A. Keim and H.-P. Kriegel. VisDB: Database exploration using multidimensional
visualization. Computer Graphics and Applications, pages 40-49, Sept. 94.
February 18, 2022 Data Mining: Concepts and Techniques 53
References (3)