0% found this document useful (0 votes)
27 views7 pages

Data Mining UNIT - 1 (Important)

Uploaded by

deepakjami27
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views7 pages

Data Mining UNIT - 1 (Important)

Uploaded by

deepakjami27
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

💽

Data Mining

Important Questions :

1. What is Data Mining and Data Architecture.


ANS :
Data Mining :
Data mining is the process of extracting or mining the knowledge from large
amount of data.

Knowledge mining from data.

Knowledge extraction

Data Pattern Analysis.

Information Harvesting.

Data Archeology.

Data Dredging.

Data Architecture :
Data architecture refers to the design and structure of data systems, specifying
how data is collected, stored, integrated, and managed across an organization.

2. Explain KDD Process with a neat diagram.


ANS :
KDD ( Knowledge Discovery in Databases) Process :

KDD is a iterative process of converting raw data into useful information


and knowledge through data mining.

Mining can be refined.

Data Mining 1
New data can be integrated and transformed in order to get different and
more appropriate results.

Preprocessing of databases consists of data cleaning and data integration.

The following steps are included in KDD process :

1. Data Cleaning : Removal of noisy and irrelevant data from collection.

Cleaning in case of missing values.

Cleaning noisy data, where noise is random or variance error.

Cleaning with data discrepancy detection and data transformation tools.

2. Data Integration : It is defined as heterogeneous data from multiple sources


combined in a common source ( Data Warehouse ).

Data integration using data migration tools.

Data integration using data synchronization tools.

Data integration using ETL (Extract-Load-Transformation) process.

3. Data Selection : It is defined as the process where data relevant to the


analysis is decided and retrieved from the data collection.

Data Selection is done using :

Neural Network.

Decision Tree.

Naive bayes.

Clustering, Regression, etc.

4. Data Transformation : It is defined as a process of transforming data into


appropriate form for mining by mining procedure.

Data transformation is a two step process :

Data Mapping : Assigning elements from source base to destination


to capture transformation.

Code Generation : Creation of the actual transformation program

5. Data Mining : Clever techniques that are applied to extract patterns.

Transforms task relevant data into patterns.

Decides purpose of model using classification or characterization.

Data Mining 2
6. Pattern Evolution : It is used to identify truly interesting patterns
representing the knowledge using interesting measures.

Find the interestingness of score of each pattern.

Uses summarization and visualization to make data understandable by


the user.

7. Knowledge Representation : It is defined as a technique which utilizes


visualization tools to represent data mining results.

Generate Reports.

Generate Tables.

Generate discriminant rules, classification rules, characterization rules,


etc.

Data Mining 3
3. Explain Data Mining Architecture with a neat
diagram.
ANS :

1. Database : It is a collection of organized data

2. Data Warehouse : A data warehouse is a large, centralized repository that


stores and manages data from multiple sources, designed for query,
reporting, and analysis.

3. WWW (World Wide Web) : This is a system of interconnected documents


and resources that can be accessed through the internet using web
browsers.

4. Data cleaning, integration and selection :

Data cleaning and integration can be performed on data, where


resulting data is stored in data warehouse server.

Sometimes data transformation and consolidation are performed before


the data selection process.

Data Mining 4
5. DWS (Data Warehouse Server) :

Responsible for fetching relevant data based on users mining request


knowledge base.

This is a domain knowledge that is used to search for evaluate


interestingness of a resulting pattern.

Main objective is to make results more accurate and reliable.

6. DMS (Data Mining System) :It consists of set of functions and modules
such as :

Characterization classification.

Cluster analysis.

Prediction.

Evaluation

Deviation Analysis.

7. Pattern Evaluation :

It employees interestingness measures and interacts with DM modules


to search towards interesting patterns for efficient data mining.

It is highly recommended to push the evaluation of as deep as possible


into mining process to get only interesting patterns.

8. User Interface : This helps the user to interact with the data mining system
by data mining query.

4. Explain Data Mining Functionalities with examples.


ANS :

Data mining functionalities refer to the different types of tasks and techniques
used to extract useful patterns, knowledge, and insights from large datasets

Below are the key functionalities:

Multidimensional Concept Description : It includes characterization


(summarizing data) and discrimination (comparing and contrasting data).

Example : dry vs wet regions.

Frequent pattern, association, correlation vs causality :

Data Mining 5
1. Frequent Pattern: Identifies items, sequences, or structures that occur
frequently in a dataset.

Example : In market basket analysis, finding that "milk" and "bread" are
frequently bought together.

2. Association: Discovers rules that describe relationships between items.


Example : If a customer buys bread, they are 80% likely to buy butter.

3. Correlation: Measures how two items are related in terms of their


occurrence together.
Example : Correlating weather conditions with product sales.

4. Causality: Explains cause-and-effect relationships, unlike correlation,


which only measures association.

Example : A rise in ice cream sales causes a rise in ice cream-related


injuries (correlation ≠ causation).

Classification and Prediction :

1. Classification: Assigns items to predefined classes or groups based on


certain attributes.
Example : Classifying customers as high-risk or low-risk based on their
credit history.

2. Prediction: Predicts a future or unknown outcome using historical data,


often in numerical form (e.g., regression).
Example : Predicting a customer’s future spending behavior based on
past transactions.

Cluster Analysis : Groups data objects into clusters or categories based on


their similarity, without predefined labels.

Example : cluster houses to find distribution patterns.

Outliner Analysis :

1. Detects data points that significantly differ from other data points in the
dataset.

2. Outliers often represent anomalies or rare events.

Example : Detecting fraudulent credit card transactions that don’t match


typical spending behavior.

Trend and Evolution Analysis : Studies patterns over time to understand


how data evolves and detects trends, patterns, or regularities.

Data Mining 6
Example : Analyzing stock market data to identify upward or downward
trends over time.

Other pattern-directed or statistical analysis : This includes other


techniques used to discover patterns.

5. Difference between OLAP and OLTP from data


mining to data warehouse system.
ANS :

OLAP (Online Analytical OLTP (Online Transaction


Processing) Processing)

Designed for data analysis and Primarily focused on managing and


decision-making. processing day-to-day transactions.

Utilizes a multidimensional data


Utilizes a relational data model.
model (data cubes).

Handles smaller, real-time


Handles large volumes of data. transaction data that is current and
frequently updated.

Optimized for simple, quick


Supports complex queries.
operations.

Queries are typically complex and Queries are generally


involve aggregations, slicing, dicing, straightforward and involve single-
and drill-down operations. row operations.

Response times can be longer due Response times are very fast due to
to the complexity of the queries. simple queries.

Uses Multi-dimensional model for Uses Normalized data model for


query, reporting, and aggregation. faster database operations.

Example : Business intelligence Example : Banking and e-commerce


tools. systems.

7. Preprocessing Techniques in Data Mining.


ANS :
REFER THE BELOW VIDEO FOR EXPLANATION :

https://youtu.be/us0JWCywAng

Data Mining 7

You might also like