Data Mining L-5
Data Mining L-5
IIIT Surat
Unit I
Summary and Q/A
Interesting Facts
❖ Who is the founder of data mining: Gregory Piatetsky-Shapiro in 1989 for KDD.
❖ The term "data mining" was coined in the 1990s,Dr. Usama Fayyad
❖ Who is the father of data scientist: William S. Cleveland.
❖ Father of database: E. F. Codd, father of the relational database
Myths And Mistakes About Data Mining
Myths
❖ Data Mining Is Always Invasive And Violates Privacy
❖ Data Mining Is Illegal
❖ Data Mining Is Expensive
❖ Data Mining Is Only For Technical Experts
❖ Data Mining is for Large Companies with Lots of Customer Data
Mistakes
❖ Collecting Too Much Data
❖ Failing To Secure Data
❖ Misinterpreting Data
❖ Ignoring Privacy Regulations
Data Mining Privacy
❖ Obtain consent: Organizations should obtain consent from individuals before collecting and
using their data for data mining purposes.
❖ Anonymize data: Organizations should anonymize data before using it for data mining
purposes to protect individuals’ privacy.
❖ Use secure methods: Organizations should use secure methods to store and transmit data
to prevent unauthorized access.
❖ Limit access: Organizations should limit access to data mining tools and data to only
authorized personnel to prevent misuse or unauthorized access.
❖ Be transparent: Organizations should be transparent about their data mining practices and
inform individuals about the purpose and scope of data mining activities.
❖ Educate users: Educate users about data privacy and the importance of protecting their
personal information. Provide clear and concise information about how their data will be
used and give them the option to opt out if they choose to do so
❖ Regularly review and update policies: Organizations should regularly review and update
their data mining policies to ensure they comply with privacy laws and regulations.
❖ Provide transparency: Be transparent about the data mining process and provide
individuals with information about how their data will be used.
Non-Compliance With Privacy Regulations
Correlation Analysis
Descriptive Data Mining
Descriptive data mining focuses on summarising and describing the characteristics of data. It helps
organisations gain a deeper understanding of their existing data and identify patterns that can inform
strategic decisions.
❖ Data Characterization: Involves summarising the general characteristics of a data set or a specific
group within it. For instance, analysing customer demographics or product attributes.
❖ Data Discrimination: Compares the characteristics of target classes with those of contrasting
classes. This helps identify differentiating factors between groups.
❖ Association Rule Mining: Discovers relationships between items or events that occur frequently
together. Commonly used in market basket analysis to identify product affinities.
❖ Clustering: Groups similar data points together without prior knowledge of group membership.
Useful for customer segmentation, anomaly detection, and image analysis.
❖ Visualisation: Presents data in a graphical format to facilitate understanding and interpretation.
Effective for exploring patterns, trends, and outliers.
Predictive Data Mining
Predictive data mining goes beyond description to forecast future trends and outcomes based on historical
data. It enables organisations to make informed predictions and optimise decision-making processes.
❖ Classification: Assigns data instances to predefined categories or classes. Used for customer churn
prediction, fraud detection, and risk assessment.
❖ Regression: Predicts numerical values based on input variables. Applications include sales
forecasting, price prediction, and demand estimation.
❖ Prediction: Encompasses both classification and regression, aiming to forecast future values or
categories.
❖ Outlier Detection: Identifies data points that deviate significantly from the norm. Helpful in fraud
detection, anomaly detection in sensor data, and quality control.
❖ Evolution and Deviation Analysis: Tracks changes in data patterns over time. Valuable for trend
analysis, market analysis, and monitoring system performance.
❖ Correlation Analysis: Measures the strength and direction of relationships between variables. Used
for identifying dependencies, cause-and-effect relationships, and feature selection.
Data Mining Primitives
Five primitives for data mining task in the form of a data mining query:
❖ The kind of knowledge to be mined,
❖ Background knowledge
❖ Interestingness measures,
❖ Knowledge presentation and
❖ Visualization techniques
Query Language in data Mining
❖ Data mining query languages can be designed to support ad hoc and
interactive data mining.
❖ A data mining query language, such as DMQL, should provide commands for
specifying each of the data mining primitives.
What are the components of data mining?
❖ Databases
❖ Data warehouse server
❖ Knowledge base
❖ Data mining engine
❖ Pattern evaluation module
❖ User interface
OLAP VS OLTP