0% found this document useful (0 votes)
32 views

Data Mining

Data mining involves extracting useful patterns from large amounts of data. It can be used for applications like market analysis, risk analysis, and fraud detection. The data mining process includes data preprocessing, pattern evaluation, and knowledge presentation. A typical data mining system architecture includes components for data storage, data mining algorithms, knowledge bases, and user interfaces. Major issues in data mining include evaluating interesting patterns, handling noisy data, and improving performance on large datasets.

Uploaded by

Rishika Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views

Data Mining

Data mining involves extracting useful patterns from large amounts of data. It can be used for applications like market analysis, risk analysis, and fraud detection. The data mining process includes data preprocessing, pattern evaluation, and knowledge presentation. A typical data mining system architecture includes components for data storage, data mining algorithms, knowledge bases, and user interfaces. Major issues in data mining include evaluating interesting patterns, handling noisy data, and improving performance on large datasets.

Uploaded by

Rishika Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 15

Data Mining:

Concepts and Techniques


1.2 What Is Data Mining?

• Data mining (knowledge discovery from data)


• Extraction of interesting (non-trivial, implicit, previously
unknown and potentially useful) patterns or knowledge from
huge amount of data
• Data mining: a misnomer?
• Alternative names
• Knowledge discovery (mining) in databases (KDD), knowledge
extraction, data/pattern analysis, data archeology, data
dredging, information harvesting, business intelligence, etc.
1.1 Why Data Mining?

• The Explosive Growth of Data: from terabytes(10004) to yottabytes(10008)


• Data collection and data availability
• Automated data collection tools, database systems, web
• Major sources of abundant data
• Business: Web, e-commerce, transactions, stocks, …
• Science: bioinformatics, scientific simulation, medical research …
• Society and everyone: news, digital cameras, …
• Data rich but information poor!
• What does those data mean?
• How to analyze data?

• Data mining — Automated analysis of massive data sets


Potential Applications
• Data analysis and decision support
• Market analysis and management
• Target marketing, customer relationship management (CRM),
market basket analysis, cross selling, market segmentation
• Risk analysis and management
• Forecasting, customer retention, improved underwriting, quality
control, competitive analysis
• Fraud detection and detection of unusual patterns (outliers)
• Other Applications
• Text mining (news group, email, documents) and Web mining
• Stream data mining
• Bioinformatics and bio-data analysis
Ex.: Market Analysis and Management

• Where does the data come from?—Credit card transactions, loyalty cards,
discount coupons, customer complaint calls, surveys …
• Target marketing
• Find clusters of “model” customers who share the same characteristics: interest,
income level, spending habits, etc.,
• E.g. Most customers with income level 60k – 80k with food expenses $600 - $800 a month live in that
area
• Determine customer purchasing patterns over time
• E.g. Customers who are between 20 and 29 years old, with income of 20k – 29k usually buy this type of
CD player

• Cross-market analysis—Find associations/co-relations between product sales,


& predict based on such association
• E.g. Customers who buy computer A usually buy software B
Ex.: Market Analysis and Management (2)

• Customer requirement analysis


• Identify the best products for different customers
• Predict what factors will attract new customers
• Provision of summary information
• Multidimensional summary reports
• E.g. Summarize all transactions of the first quarter from three different branches
Summarize all transactions of last year from a particular branch
Summarize all transactions of a particular product
• Statistical summary information
• E.g. What is the average age for customers who buy product A?

• Fraud detection
• Find outliers of unusual transactions
• Financial planning
• Summarize and compare the resources and spending
Knowledge Discovery (KDD) Process
KDD Process: Several Key Steps
• Learning the application domain
• relevant prior knowledge and goals of application
• Identifying a target data set: data selection
• Data Pre-processing
• Data cleaning (remove noise and inconsistent data)
• Data integration (multiple data sources maybe combined)
• Data selection (data relevant to the analysis task are retrieved from database)
• Data transformation (data transformed or consolidated into forms appropriate for mining)
(Done with data preprocessing)
• Data mining (an essential process where intelligent methods are applied to
extract data patterns)
• Pattern evaluation (indentify the truly interesting patterns)
• Knowledge presentation (mined knowledge is presented to the user with
visualization or representation techniques)

• Use of discovered knowledge


8
Data Mining and Business Intelligence

Increasing potential
to support
business decisions End User
Decision
Making

Data Presentation Business


Analyst
Visualization Techniques
Data Mining Data
Information Discovery Analyst

Data Exploration
Statistical Summary, Querying, and Reporting

Data Preprocessing/Integration, Data Warehouses


DBA
Data Sources
Paper, Files, Web documents, Scientific experiments, Database Systems
A typical DM System Architecture
• Database, data warehouse, WWW or other information
repository (store data)

• Database or data warehouse server (fetch and combine data)

• Knowledge base (turn data into meaningful groups


according to domain knowledge)

• Data mining engine (perform mining tasks)


• Pattern evaluation module (find interesting patterns)
• User interface (interact with the user)
A typical DM System Architecture (2)
Data Warehouses

• A repository of information collected from multiple sources, stored


under a unified schema, and that usually resides at a single site.
• Constructed via a process of data cleaning, data integration, data
transformation, data loading and periodic data refreshing.

Data Mining: Concepts and Techniques


Data Warehouses (2)

• Data are organized around major subjects, e.g. customer, item, supplier and
activity.
• Provide information from a historical perspective (e.g. from the past 5 – 10
years)
• Typically summarized to a higher level (e.g. a summary of the
transactions per item type for each store)
• User can perform drill-down or roll-up operation to view the data at
different degrees of summarization
1.9 Major Issues in Data Mining
• Presentation and visualization of results
• Knowledge should be easily understood and directly usable
• High level languages, visual representations or other expressive forms
• Require the DM system to adopt the above techniques
• Handling noisy or incomplete data
• Require data cleaning methods and data analysis methods that can handle noise
• Pattern evaluation – the interestingness problem
• How to develop techniques to access the interestingness of discovered patterns, especially
with subjective measures bases on user beliefs or expectations
1.9 Major Issues in Data Mining
• Performance Issues
• Efficiency and scalability
• Huge amount of data
• Running time must be predictable and acceptable
• Parallel, distributed and incremental mining algorithms
• Divide the data into partitions and processed in parallel
• Incorporate database updates without having to mine the entire data again from
scratch

• Diversity of Database Types


• Other database that contain complex data objects, multimedia data,
spatial data, etc.
• Expect to have different DM systems for different kinds of data
• Heterogeneous databases and global information systems
• Web mining becomes a very challenging and fast-evolving field in data mining

You might also like