0% found this document useful (0 votes)
23 views16 pages

2 - Unit 1 - Lecture 3

Uploaded by

sihagmukesh05
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views16 pages

2 - Unit 1 - Lecture 3

Uploaded by

sihagmukesh05
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 16

Data Mining

Content

 Data mining Introduction


 KDD
What is (not) Data Mining?

What is not Data What is Data Mining? –


Mining?
– Certain names are more
– Look up phone number prevalent in certain US
in phone directory locations (O’Brien, O’Rurke,
O’Reilly… in Boston area)
– Query a Web search
engine for information – Group together similar
about “Amazon” documents returned by search
engine according to their
– Querying or searching context (e.g. Amazon rainforest,
Amazon.com,)

– Finding trends and patterns


Data Mining: Classification Schemes

 Decisions in data mining


– Kinds of databases to be mined
 – Kinds of knowledge to be discovered
 – Kinds of techniques utilized
 – Kinds of applications adapted

 Data mining tasks


– Descriptive data mining
 – Predictive data mining
Decisions in data mining
 Databases to be mined
 Relational, transactional, object-oriented, spatial, time-
series, text, multi-media, heterogeneous, WWW, etc.
 Knowledge to be mined
 Characterization, discrimination, association,
classification, clustering, trend, deviation and outlier
analysis, etc.
 Multiple/integrated functions and mining at multiple
levels
 Techniques utilized
 Database-oriented, data warehouse (OLAP), machine
learning, statistics, visualization, neural network, etc.
 Applications adapted
 Retail, telecommunication, banking, fraud analysis,
Data mining tasks/techniques
 Predictive modeling
 Use some variables to predict unknown or future values
of other variables
 Descriptive modeling
 Find human-interpretable patterns that describe the
data.
Data mining tasks/techniques
 Predictive Modeling:
 Classification: Assigning data instances to predefined
classes (e.g., decision trees, neural networks, support
vector machines).
 Regression: Predicting continuous numerical values
(e.g., linear regression, logistic regression).
 Time Series Analysis: Analyzing data points collected at
specific time intervals (e.g., ARIMA, exponential
smoothing).
 Descriptive Modeling:
 Clustering: Grouping similar data points together (e.g.,
k-means, hierarchical clustering).
 Association Rule Mining: Discovering relationships
between items (e.g., market basket analysis).
 Outlier Detection: Identifying abnormal data points
CRISP-DM: Framework for Data Mining
CRISP-DM stands for Cross-Industry Standard Process for Data
Mining.
 Widely adopted methodology
 Provides a structured approach for planning & executing DM
projects.
 Designed to be adaptable across various industries and
applications.
 Key Characteristics of CRISP-DM
 Iterative: The process is not strictly linear. You may need to
revisit previous phases as you progress.
 Flexible: It can be adapted to various project sizes and
SELF->Key Characterisics
Here’s a simplified explanation of the key characteristics of CRISP-DM:
1. Iterative: The CRISP-DM process isn’t a straight line; it’s more like a circle. As you
work on a data project, you might find that you need to go back and revisit earlier
steps. For example, after analyzing your data, you might realize you need to refine
your questions or gather more data.
2. Flexible: CRISP-DM can be used for different types of projects, whether they are big
or small. You can adjust the process to fit the specific needs of your project, making it
versatile for various situations.
3. Industry-Neutral: This approach can be used in any industry, whether it’s healthcare,
finance, marketing, or any other field. It’s designed to be useful no matter what kind of
data you’re working with.
4. Focus on Business Value: At the heart of CRISP-DM is the idea of understanding
what the business needs. It’s important to make sure that your data analysis is
aligned with the goals of the organization. This way, your work provides real value and
helps the business succeed.
5. Structured Framework: CRISP-DM provides a clear framework for managing data
mining projects. It outlines specific steps to follow, making it easier for teams to
collaborate and stay organized. This structure helps ensure that all important aspects
of the project are covered, from understanding the problem to evaluating the results.
CRISP-DM: Data Mining Operations
1. Business Understanding:
4. Data Modeling:
1. Determine business
objectives and 1. Select modeling techniques.
requirements. 2. Generate test design.
2. Assess situation and
3. Build and Assess models.
resources.
3. Determine data mining 5. Evaluation:
goals.
1. Evaluate results.
2. Data Understanding: 2. Review process.
1. Collect initial data. 3. Determine next steps.
2. Describe data.
3. Explore data.
6. Deployment:
4. Verify data quality. 1. Plan deployment.
2. Plan monitoring and
3. Data Preparation:
1. Select and Clean data. maintenance.

2. Construct data. 3. Produce final report.


CRISP-DM: Framework for Data Mining
Components of Data Mining
 Data Source: This is the origin of the data, which can be databases,
data warehouses, or other repositories.
 Data Warehouse Server: This component retrieves relevant data
from the data source based on user requests.
 Data Mining Engine: The heart of the data mining process, it
applies various algorithms and techniques to extract patterns from
the data.
 Pattern Evaluation Module: Assesses the discovered patterns
based on predefined criteria to determine their significance and
usefulness.
 Graphical User Interface (GUI): This provides a user-friendly
interface for interaction with the data mining system.
Data Mining Architecture/ Components Of
data Mining
Predictive Analytics

 It is the use of data to predict future trends and events.


 Attempts to answer the question, “What might happen next?”
 It leverages historical data, statistical modeling, and machine
learning algorithms to identify patterns and make forecasts.
 It works by identifying correlations between different
elements in selected datasets.
 There are broadly two types of predictive analytics models:
 classification models
 regression models.
Predictive Analytics Challenges
 Data Quality: Inaccurate, incomplete, or biased data can lead to
unreliable models.
 Data Availability: Insufficient or limited data can hinder model
development.
 Model Complexity: Complex models can be difficult to interpret and
explain.
 Overfitting: Models that are too closely fitted to the training data
may not perform well on new data.
 Ethical Considerations: Concerns about privacy, bias, and fairness
in model development and deployment.
 Computational Resources: Handling large datasets and complex
models requires significant computational power.
Predictive Analytics Applications
 Finance: Fraud detection, credit risk assessment, investment
portfolio optimization, market trend prediction.
 Healthcare: Disease outbreak prediction, patient risk assessment,
drug discovery, personalized medicine.
 Retail: Customer segmentation, demand forecasting, inventory
management, recommendation systems.
 Marketing: Customer churn prediction, campaign optimization,
targeted advertising.
 Manufacturing: Predictive maintenance, supply chain optimization,
quality control.
 Insurance: Risk assessment, fraud detection, customer churn
prediction.

You might also like