The document provides an overview of data mining, distinguishing it from simple data querying and outlining its classification schemes, tasks, and techniques. It introduces the CRISP-DM framework for data mining projects, detailing its iterative and flexible nature, and discusses the components and architecture of data mining systems. Additionally, it covers predictive analytics, its challenges, and various applications across different industries.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
23 views15 pages
Unit 1 - Lecture 2
The document provides an overview of data mining, distinguishing it from simple data querying and outlining its classification schemes, tasks, and techniques. It introduces the CRISP-DM framework for data mining projects, detailing its iterative and flexible nature, and discusses the components and architecture of data mining systems. Additionally, it covers predictive analytics, its challenges, and various applications across different industries.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 15
Data Mining
Content
Data mining Introduction
KDD What is (not) Data Mining?
What is not Data What is Data Mining? –
Mining? – Certain names are more – Look up phone number prevalent in certain US in phone directory locations (O’Brien, O’Rurke, O’Reilly… in Boston area) – Query a Web search engine for information – Group together similar about “Amazon” documents returned by search engine according to their – Querying or searching context (e.g. Amazon rainforest, Amazon.com,)
– Finding trends and patterns
Data Mining: Classification Schemes
Decisions in data mining
– Kinds of databases to be mined – Kinds of knowledge to be discovered – Kinds of techniques utilized – Kinds of applications adapted
Data mining tasks
– Descriptive data mining – Predictive data mining Decisions in data mining Databases to be mined Relational, transactional, object-oriented, spatial, time- series, text, multi-media, heterogeneous, WWW, etc. Knowledge to be mined Characterization, discrimination, association, classification, clustering, trend, deviation and outlier analysis, etc. Multiple/integrated functions and mining at multiple levels Techniques utilized Database-oriented, data warehouse (OLAP), machine learning, statistics, visualization, neural network, etc. Applications adapted Retail, telecommunication, banking, fraud analysis, Data mining tasks/techniques Predictive modeling Use some variables to predict unknown or future values of other variables Descriptive modeling Find human-interpretable patterns that describe the data. Data mining tasks/techniques Predictive Modeling: Classification: Assigning data instances to predefined classes (e.g., decision trees, neural networks, support vector machines). Regression: Predicting continuous numerical values (e.g., linear regression, logistic regression). Time Series Analysis: Analyzing data points collected at specific time intervals (e.g., ARIMA, exponential smoothing). Descriptive Modeling: Clustering: Grouping similar data points together (e.g., k-means, hierarchical clustering). Association Rule Mining: Discovering relationships between items (e.g., market basket analysis). Outlier Detection: Identifying abnormal data points CRISP-DM: Framework for Data Mining CRISP-DM stands for Cross-Industry Standard Process for Data Mining. Widely adopted methodology Provides a structured approach for planning & executing DM projects. Designed to be adaptable across various industries and applications. Key Characteristics of CRISP-DM Iterative: The process is not strictly linear. You may need to revisit previous phases as you progress. Flexible: It can be adapted to various project sizes and CRISP-DM: Data Mining Operations 1. Business Understanding: 4. Data Modeling: 1. Determine business objectives and 1. Select modeling techniques. requirements. 2. Generate test design. 2. Assess situation and 3. Build and Assess models. resources. 3. Determine data mining 5. Evaluation: goals. 1. Evaluate results. 2. Data Understanding: 2. Review process. 1. Collect initial data. 3. Determine next steps. 2. Describe data. 3. Explore data. 6. Deployment: 4. Verify data quality. 1. Plan deployment. 2. Plan monitoring and 3. Data Preparation: 1. Select and Clean data. maintenance.
2. Construct data. 3. Produce final report.
CRISP-DM: Framework for Data Mining Components of Data Mining Data Source: This is the origin of the data, which can be databases, data warehouses, or other repositories. Data Warehouse Server: This component retrieves relevant data from the data source based on user requests. Data Mining Engine: The heart of the data mining process, it applies various algorithms and techniques to extract patterns from the data. Pattern Evaluation Module: Assesses the discovered patterns based on predefined criteria to determine their significance and usefulness. Graphical User Interface (GUI): This provides a user-friendly interface for interaction with the data mining system. Data Mining Architecture Predictive Analytics
It is the use of data to predict future trends and events.
Attempts to answer the question, “What might happen next?” It leverages historical data, statistical modeling, and machine learning algorithms to identify patterns and make forecasts. It works by identifying correlations between different elements in selected datasets. There are broadly two types of predictive analytics models: classification models regression models. Predictive Analytics Challenges Data Quality: Inaccurate, incomplete, or biased data can lead to unreliable models. Data Availability: Insufficient or limited data can hinder model development. Model Complexity: Complex models can be difficult to interpret and explain. Overfitting: Models that are too closely fitted to the training data may not perform well on new data. Ethical Considerations: Concerns about privacy, bias, and fairness in model development and deployment. Computational Resources: Handling large datasets and complex models requires significant computational power. Predictive Analytics Applications Finance: Fraud detection, credit risk assessment, investment portfolio optimization, market trend prediction. Healthcare: Disease outbreak prediction, patient risk assessment, drug discovery, personalized medicine. Retail: Customer segmentation, demand forecasting, inventory management, recommendation systems. Marketing: Customer churn prediction, campaign optimization, targeted advertising. Manufacturing: Predictive maintenance, supply chain optimization, quality control. Insurance: Risk assessment, fraud detection, customer churn prediction.
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB