Unit 1
Unit 1
(03105430)
Dheeraj Kumar Singh, Assistant Professor
Department of Information Technology
CHAPTER-1
Introduction to Data Mining
Introduction of Data Mining
•Drowning in data, but starving for knowledge!
•“Necessity is the mother of invention”—Data mining
• Automated analysis of massive data sets
Introduction of Data Mining
•As data is growing at very remarkable rate, there comes a need to analyze
large, complex and information rich data sets to gain the hidden
information. This may result into greater customer satisfaction and
remarkable turn over for the firm.
Data vs. Information
Data Information
• raw facts • data with context
• no context • processed data
• just numbers and text • value-added to data
– summarized
Example: – organized
Data: 51007 – analyzed
Example:
– 5/10/20 Date of your final exam.
– $51,007 you salary.
– 51007 Zip code of any place.
Data Information Knowledge
Data
Information
Data Information Knowledge
Information
Knowledge
Why do We Need Mata Mining?
• Databases to be mined:
- Relational, transactional, object-oriented, object-relational, active,
spatial, time-series, text, multi-media, heterogeneous, legacy, WWW, etc
• Knowledge to be mined:
- Characterization, discrimination, association, classification, clustering,
trend, deviation and outlier analysis, etc.
- Multiple/integrated functions and mining at multiple levels Techniques
utilized
Classification of Data Mining System (Contd.....)
• Techniques utilized
- Database-oriented, data warehouse (OLAP), machine learning, statistics,
visualization, neural network, etc.
• Applications adapted
- Retail, telecommunication, banking, fraud analysis, DNA mining, stock
market analysis, Web mining, Weblog analysis, etc.
Architecture of Data Mining System
• Four kind of data mining architecture
- No- Coupling
- Loose Coupling
- Semi tight Coupling
- Tight coupling
No- Coupling
• Data is retrieved from data sources like file system and processed using data
mining algorithms which are stored into file system.
• The loose coupling data mining system uses database or data warehouse for
data retrieval.
• In this architecture, data mining system retrieves data from database or data
warehouse, processes data using data mining algorithms and stores the result
in those systems.
• Data integration
- multiple data sources (heterogeneous) may be combined in a common
source
• Data selection
- data relevant to the analysis is decided on and retrieved from the data
collection
Data mining: A KDD Process (Contd.....)
• Data transformation
- Also known as data consolidation
- it is a phase in which the selected data is transformed into forms appropriate
for the mining procedure
• Data mining
- clever techniques are applied to extract patterns potentially useful.
• Pattern evaluation
- interesting patterns representing knowledge are identified based on given
measures
Data mining: A KDD Process (Contd.....)
•Knowledge representation
- final phase in which the discovered knowledge is visually represented to the
user
Issue in Data Mining
•Other Applications:
- Text mining (news group, email, documents) and Web analysis
- Intelligent query answering
Advantages of Data Mining (Contd.....)
•Marketing /Retail
- Data mining helps marketing companies to build models based on historical
data which will precisely predict responders to the new marketing campaigns.
- Data mining helps retail companies as well. By using market basket analysis,
a store can have an appropriate arrangement in such a way that customers can
purchase frequent buying products together with pleasant. It also helps the
retail companies to offer certain discounts which will attract more customers.
Advantages of Data Mining (Contd.....)
• Finance / Banking
- By building a model from historical customer’s data of loans, the bank
officials and financial institution can determine good and bad loans.
- Data mining also helps banks to detect fraudulent credit card transactions
•Manufacturing
- Data mining is useful in operational engineering data which can detect faulty
equipments and determines optimal control parameters.
- Data mining can determine the range of control parameters which leads to
the production of perfect product. Hence optimal control parameters can
provide the desired quality.
Advantages of Data Mining (Contd.....)
• Governments
- Data mining helps government agency to analyze records of financial
transaction which will help in building patterns that can detect money
laundering or criminal activities.
• Market segmentation
- Data mining helps to identify the common characteristics of customers who
buy the same products from your company.
Advantages of Data Mining (Contd.....)
• Customer anticipation
- It helps to predict which customers may leave your company and go to a
competitor.
• Fraud detection
- It indentifies which transactions are most likely to be fraudulent.
Advantages of Data Mining (Contd.....)
• Direct marketing
- Direct marketing identifies which prospects should be included to obtain the
highest response rate.
• Interactive marketing
- It is useful for predicting what each user on a Web site is most likely
interested in seeing.
• Trend analysis
- Trend analysis identifies the difference between a typical customer this
month and last.
Disadvantages of Data Mining
• Privacy Issues
- The internet is booming with social networks, ecommerce, blogs etc, the
concerns about the personal privacy has been increasing.
- This worries the users as the information might be collected and used in
unethical way which can potentially cause a lot of troubles.
- Businesses collect the information of its users for setting up the marketing
strategies but there are chances that business might be taken by other firms or
gets shut down and that’s where a concern of misusing or leaking the personal
information arises.
Disadvantages of Data Mining (Contd.....)
• Security Issues
- Security is the biggest concern in data mining. Businesses own all the
information of their employees which even includes personal and financial
information, there are the chances of misusing data by hackers and which
cause serious trouble to the organization and its employees.
Disadvantages of Data Mining (Contd.....)
• Misuse of information/inaccurate information
- The information collected by the data mining may be exploited by
unethical people or businesses in order to take benefits of vulnerable people.