Introduction To Data Mining
Introduction To Data Mining
Model
X Y
Problem Definition:
• Focuses on Understanding the project objectives and requirements in terms of
business perspective.
• Eg: How can I sell more of my product to customer? Which customers are most
likely to purchase the product?
Data Gathering and Preparation:
• Data Collection & Exploration.
• Identify data quality, patterns in data.
• Data preparation phase covers all the tasks involved to build the model.
• Data preparation tasks are likely to be performed multiple and not in any prescribed
order.
Data Mining Process cont..
Model Building and Evaluation:
Knowledge Deployment:
• Use data mining within a target environment.
• Insight and actionable information can be derived from data.
Why Data Mining?
Machine Visualization
Learning Data Mining
Pattern
Recognition Other
Algorithm Disciplines
Data Mining Vs. Query Tools
• SQL can find normal queries from the database such as what is
an average turnover? Whereas data mining tools find
interesting patterns and facts such as what are the important
trends in sells?
• Data mining is much more faster than SQL in trend and pattern
analysis since it uses algorithm like machine learning, genetic
algorithm.
• If we know exactly what we are looking for, we use SQL nut if
we know only vaguely what we are looking for we use data
mining.
• Hybrid information can’t be easily be traced using SQL.
Data Warehouse
• In most of the organization, there occur large databases in
operation for normal daily transactions called operational
database.
• A data warehouse is a large database built from the operational
database.
• In computing, a data warehouse (DW or DWH), also known as
an enterprise data warehouse (EDW), is a system used for
reporting and data analysis, and is considered a core
component of business intelligence. DWs are central
repositories of integrated data from one or more disparate
sources.
Data Warehouse
• A data warehouse is a database, which is kept separate from the
organization's operational database.
• There is no frequent updating done in a data warehouse.
• It possesses consolidated historical data, which helps the
organization to analyze its business.
• A data warehouse helps executives to organize, understand, and
use their data to take strategic decisions.
• Data warehouse systems help in the integration of diversity of
application systems.
• A data warehouse system helps in consolidated historical data
analysis.
Data Warehouse
Query Manager
Operational Load manager
Data Detailed Summary
Information Information
External
Meta Data
Data
Warehouse Manager
time
time_key item
day item_key
day_of_the_week Sales Fact Table item_name
month brand
quarter time_key type
year supplier_type
item_key
branch_key
branch location
location_key
branch_key location_key
branch_name units_sold street
branch_type city
dollars_sold state_or_province
country
avg_sales
Measures
27
Example of Snowflake Schema
time
item
time_key
day item_key supplier
day_of_the_week Sales Fact Table item_name supplier_key
month brand supplier_type
quarter time_key type
year item_key supplier_key
branch_key
branch location
location_key
location_key
branch_key
units_sold street
branch_name
city_key
branch_type
dollars_sold city
city_key
avg_sales city
state_or_province
Measures country
28
Example of Fact Constellation
time Shipping Fact Table
item
time_key time_key
Sales Fact Table item_key
day
day_of_the_week item_name
time_key brand item_key
month
quarter type shipper_key
year
item_key supplier_type
branch_key from_location