0% found this document useful (0 votes)

26 views10 pages

Unit Iii

advance databases and datamining

Uploaded by

Prudhvi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views10 pages

Unit Iii

advance databases and datamining

Uploaded by

Prudhvi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 10

UNIT III

Data Mining

stages and techniques:

Data mining involves several stages and techniques to uncover useful patterns and insights
from large datasets. Here’s an overview:

Stages of Data Mining

1. Data Collection:
o Objective: Gather relevant data from various sources.
o Activities: Data sourcing, data integration, and data preparation.
2. Data Cleaning and Preparation:
o Objective: Ensure data quality and format it for analysis.
o Activities: Handling missing values, removing duplicates, and normalizing
data.
3. Exploratory Data Analysis (EDA):
o Objective: Understand the data’s characteristics and patterns.
o Activities: Statistical summaries, visualizations, and data profiling.
4. Data Transformation:
o Objective: Convert data into a suitable format for mining.
o Activities: Feature extraction, data reduction, and dimensionality reduction.
5. Data Mining:
o Objective: Apply algorithms to discover patterns and relationships.
o Activities: Choosing and applying appropriate algorithms.
6. Pattern Evaluation:
o Objective: Assess the usefulness and validity of discovered patterns.
o Activities: Validation, testing, and interpretation of results.
7. Knowledge Representation:
o Objective: Present the mined knowledge in a comprehensible manner.
o Activities: Reporting, visualization, and documentation.
8. Deployment:
o Objective: Implement the insights into practical applications.
o Activities: Integration into decision-making processes or systems.

Techniques in Data Mining

1. Classification:
o Objective: Predict categorical labels for data.
o Techniques: Decision Trees, Random Forests, Support Vector Machines
(SVM), Naive Bayes.
2. Regression:
o Objective: Predict continuous values.
o Techniques: Linear Regression, Polynomial Regression, Ridge Regression.
3. Clustering:
o Objective: Group similar data points into clusters.
o Techniques: K-Means, Hierarchical Clustering, DBSCAN.
4. Association Rule Learning:
o Objective: Discover relationships between variables.
o Techniques: Apriori Algorithm, Eclat Algorithm, FP-Growth.
5. Anomaly Detection:
o Objective: Identify unusual data points that do not conform to expected
patterns.
o Techniques: Statistical Methods, Isolation Forest, One-Class SVM.
6. Text Mining:
o Objective: Extract useful information from textual data.
o Techniques: Natural Language Processing (NLP), Sentiment Analysis, Topic
Modelling.
7. Sequential Pattern Mining:
o Objective: Identify regular sequences or patterns in data over time.
o Techniques: GSP Algorithm, Prefix Span Algorithm.
8. Dimensionality Reduction:
o Objective: Reduce the number of features in the data.
o Techniques: Principal Component Analysis (PCA), t-Distributed Stochastic
Neighbour Embedding (t-SNE).

Each of these stages and techniques plays a crucial role in transforming raw data into
actionable insights.

knowledge representation methods:

Knowledge representation in data mining involves organizing and encoding information to

facilitate analysis, interpretation, and decision-making. Here are some common methods:

1. Decision Trees: These are hierarchical models used to make decisions based on
different conditions. They represent data in a tree-like structure where each node
represents a decision based on the value of a particular attribute.
2. Neural Networks: Inspired by biological neural networks, these models consist of
interconnected nodes (neurons) that process information through weighted
connections. They are used for complex pattern recognition tasks.
3. Bayesian Networks: These graphical models represent probabilistic relationships
among variables. They use Bayesian inference to make predictions or understand the
dependencies between different factors.
4. Association Rules: These are used to identify relationships or patterns between
variables in large datasets. A common example is market basket analysis, where rules
like "if a customer buys bread, they are likely to buy butter" are discovered.
5. Clustering: Techniques like K-means or hierarchical clustering group similar data
points together based on their features. This helps in identifying patterns and
structures within the data.
6. Support Vector Machines (SVM): These are used for classification tasks by finding
the hyperplane that best separates different classes of data.
7. Graph-based Models: These represent data in the form of graphs where nodes
represent entities and edges represent relationships. They are useful for analyzing
networks, social connections, and other interconnected data.
8. Rule-based Systems: These use a set of if-then rules to represent knowledge and
make decisions. They are often used in expert systems and decision support systems.

Each method has its strengths and is chosen based on the nature of the data and the specific
goals of the analysis.

data mining approaches (OLAP, DBMS, Statistics and ML):

n data mining, various approaches and techniques are used to extract valuable insights from
large datasets. Here’s a brief overview of the main approaches:

1. OLAP (Online Analytical Processing)

 Purpose: OLAP is used for complex queries and analysis of multidimensional data. It
allows users to interactively analyze data from multiple perspectives.
 Key Features: Data is organized into multidimensional cubes, which makes it easier
to perform ad hoc queries, aggregations, and trend analyses.
 Applications: Business intelligence, reporting, and data visualization.

2. DBMS (Database Management Systems)

 Purpose: DBMSs manage and structure data within databases, ensuring efficient data
storage, retrieval, and management.
 Key Features: Provides a way to interact with relational data using SQL (Structured
Query Language), supports data integrity, security, and concurrent access.
 Applications: General data management, transaction processing, and basic querying.

3. Statistics

 Purpose: Statistical methods are used to analyze and interpret data, identify patterns,
and make inferences.
 Key Features: Includes descriptive statistics (e.g., mean, median), inferential
statistics (e.g., hypothesis testing), and probability theory.
 Applications: Data summarization, hypothesis testing, predictive modeling, and trend
analysis.

4. Machine Learning (ML)

 Purpose: ML involves algorithms that can learn from data and make predictions or
decisions without being explicitly programmed for each task.
 Key Features: Includes supervised learning (e.g., classification, regression),
unsupervised learning (e.g., clustering, dimensionality reduction), and reinforcement
learning.
 Applications: Predictive modeling, anomaly detection, recommendation systems, and
pattern recognition.
Integration of Approaches

 OLAP and DBMS: OLAP systems often rely on DBMS for data storage and
retrieval. The data is often pre-aggregated or structured in a way that facilitates fast
querying.
 Statistics and Machine Learning: Machine learning models often use statistical
methods for evaluation and interpretation. Statistical techniques can help in
understanding the underlying data patterns before applying machine learning
algorithms.

Each approach has its own strengths and is often used in conjunction with others to achieve
comprehensive data analysis.

data warehouse and DBMS:

In data mining, understanding the difference between a data warehouse and a Database
Management System (DBMS) is crucial for effectively managing and analyzing data. Here’s
a breakdown of each:

Data Warehouse

 Purpose: A data warehouse is designed for analytical processing and reporting. It

stores large amounts of historical data from multiple sources and is optimized for
querying and data analysis rather than transaction processing.
 Structure: It uses a multidimensional schema (such as star or snowflake schema) that
organizes data into facts and dimensions. This structure supports complex queries and
reporting.
 Data Integration: Data warehouses integrate data from various operational systems
and sources, transforming and cleaning the data before loading it into the warehouse.
 Usage: Ideal for business intelligence, complex queries, and data mining tasks, such
as pattern recognition and trend analysis.

Database Management System (DBMS)

 Purpose: A DBMS is designed for managing and manipulating data in real-time

applications. It supports transaction processing and ensures data integrity and security.
 Structure: It typically uses a relational model with tables, rows, and columns. This
model is suitable for day-to-day operations and transactional tasks.
 Data Management: A DBMS focuses on data insertion, updating, and deletion, and it
handles concurrent user access and transaction management.
 Usage: Best for managing current operational data and supporting routine
transactions, like customer orders or inventory management.

In Data Mining

 Data Warehouse: Data warehouses are often the source for data mining because they
consolidate and store large volumes of historical data that can be analyzed to discover
patterns, trends, and insights.
 DBMS: While a DBMS can be used in data mining, it's typically more suited for
handling operational data. Data mining processes might involve exporting data from a
DBMS into a data warehouse or specialized data mining tools.

In summary, a data warehouse supports in-depth analysis and data mining by providing a
consolidated, historical view of data, while a DBMS manages real-time transactional data.

multidimensional data model:

A multidimensional data model is a framework used in data mining and data warehousing to
represent data in multiple dimensions, making it easier to analyze and visualize complex data
sets. This model is particularly useful in the context of Online Analytical Processing (OLAP),
which involves querying and analyzing large volumes of data.

Key Concepts

1. Dimensions: These are perspectives or entities by which data can be categorized and
analyzed. Common dimensions include time, geography, and product. For example,
sales data might be analyzed by time (year, quarter, month), location (city, state,
country), and product (category, brand).
2. Measures: These are the quantitative data points that are analyzed. Measures are
typically numerical values such as sales revenue, quantity sold, or profit.
3. Cubes: Data is organized into multidimensional cubes, which are structures that allow
data to be viewed from multiple perspectives. Each cell in a cube represents a
measure at a specific intersection of dimensions. For instance, a sales cube might
show the total revenue for each product category in each region and month.
4. Hierarchies: Dimensions often have hierarchies that represent different levels of
granularity. For example, a time dimension might have hierarchies like year > quarter
> month > day. Hierarchies help in drilling down into more detailed data or rolling up
to more aggregated views.
5. Slices, Dice, Roll-up, and Drill-down: These are operations used to navigate and
analyze multidimensional data:
o Slice: A subset of data obtained by selecting a single dimension value.
o Dice: A subset of data obtained by selecting specific values across multiple
dimensions.
o Roll-up: Aggregating data along a dimension to a higher level in the
hierarchy.
o Drill-down: Breaking down data to a more detailed level in the hierarchy.

Example

Imagine a retail company analyzing sales data. They might use a multidimensional data
model to examine:

 Dimensions: Time (year, quarter, month), Product (category, brand), Location

(region, store).
 Measures: Sales revenue, units sold.
They can create a data cube where each cell shows sales revenue for a specific combination
of time, product, and location. By slicing the cube, they can view sales data for a particular
month. By dicing, they can look at sales data for specific products in a particular region. Roll-
up might show total sales by year, while drill-down could reveal detailed sales data by day.

This model allows for flexible and efficient querying, helping businesses to uncover trends,
make informed decisions, and generate reports.

OLAP operations:

OLAP (Online Analytical Processing) operations are crucial in data mining and business
intelligence for analyzing data from multiple perspectives. They help in summarizing and
examining large amounts of data quickly and interactively. The key OLAP operations
include:

1. Roll-Up: Aggregates data along a dimension. For example, summarizing sales data
from a daily level to a monthly level.
2. Drill-Down: Provides more detailed data. For example, drilling down from annual
sales data to monthly or daily sales data.
3. Slice: Selects a single dimension from a cube, creating a sub-cube. For instance,
viewing sales data for a specific region or time period.
4. Dice: Creates a sub-cube by selecting specific values from multiple dimensions. For
example, analyzing sales data for a particular product in a specific region and time
period.
5. Pivot (or Rotate): Changes the orientation of the data cube to view it from different
perspectives. For example, switching the axes to compare different metrics.

These operations allow users to interact with and explore multidimensional data to uncover
insights and make informed decisions.

Data processing:

cleaning:

Data cleaning is a crucial step in the data mining process. It involves identifying and
correcting errors or inconsistencies in data to improve its quality. Here are some common
tasks in data cleaning:

1. Removing Duplicates: Identifying and eliminating duplicate records to ensure that

each data entry is unique.
2. Handling Missing Values: Addressing gaps in data either by imputing missing
values using statistical methods or removing records with missing values, depending
on the extent and nature of the missing data.
3. Correcting Errors: Fixing inaccuracies or inconsistencies in data entries, such as
typos, incorrect formats, or outdated information.
4. Standardizing Data: Converting data into a consistent format, such as ensuring that
dates are all in the same format or that categorical variables use consistent naming
conventions.
5. Filtering Outliers: Identifying and addressing outliers or extreme values that may
skew analysis or indicate errors.
6. Normalization: Scaling numerical data to a standard range to ensure that variables
contribute equally to the analysis.

Effective data cleaning improves the reliability of the data and the accuracy of any insights
derived from it.

Transformation:

Transformation in data mining involves converting data from its original format into a format
that is more suitable for analysis. This step is crucial for preparing data for mining tasks such
as classification, clustering, and regression. Here are some common transformation
techniques:

1. Normalization: Adjusting the scale of numerical data to a common range, often [0, 1]
or [-1, 1], to ensure that features contribute equally to the analysis. This can involve
min-max scaling or z-score normalization.
2. Aggregation: Combining multiple data records into a single summary record. For
example, aggregating sales data by month instead of analyzing daily records.
3. Discretization: Converting continuous data into discrete bins or categories. For
instance, age groups might be categorized into "18-24", "25-34", etc.
4. Feature Engineering: Creating new features from existing data to enhance the
predictive power of models. This might involve combining features, extracting
meaningful parts of data (like extracting the year from a date), or generating new
variables.
5. Encoding Categorical Variables: Converting categorical data into numerical format
for machine learning algorithms. Common methods include one-hot encoding and
label encoding.
6. Log Transformation: Applying a logarithmic function to skewed data to reduce the
impact of extreme values and make the data more normally distributed.
7. Data Smoothing: Applying techniques such as moving averages or exponential
smoothing to reduce noise and highlight trends in time series data.
8. Data Reduction: Reducing the complexity of the data, which can involve techniques
like Principal Component Analysis (PCA) to reduce the number of dimensions while
retaining important information.

Transformation helps in making the data more suitable for analysis and improving the
performance of data mining algorithms.

Reduction:

Reduction in data mining refers to techniques used to reduce the volume of data while
retaining its essential characteristics. This process helps in improving the efficiency and
effectiveness of data mining tasks by simplifying the data and reducing computational costs.
Key techniques in data reduction include:

1. Dimensionality Reduction:
o Principal Component Analysis (PCA): Transforms data into a set of linearly
uncorrelated variables called principal components, capturing the most
variance in the data with fewer dimensions.
o Linear Discriminant Analysis (LDA): Focuses on maximizing the separation
between classes by projecting data into a lower-dimensional space.
o t-Distributed Stochastic Neighbor Embedding (t-SNE): A technique for
reducing dimensions while preserving the relative distances between data
points, often used for visualization.
2. Feature Selection:
o Filter Methods: Use statistical techniques to select features based on their
relevance to the target variable, such as correlation coefficients or chi-square
tests.
o Wrapper Methods: Evaluate subsets of features based on the performance of
a specific model, using techniques like forward selection or recursive feature
elimination.
o Embedded Methods: Perform feature selection as part of the model training
process, such as with Lasso regression which incorporates feature selection
into the training phase.
3. Data Compression:
o Lossless Compression: Reduces the size of data without losing any
information, such as using algorithms like Huffman coding or Run-Length
Encoding.
o Lossy Compression: Reduces data size by approximating the original data,
which can be acceptable in certain applications where some loss of
information is permissible.
4. Sampling:
o Random Sampling: Selecting a representative subset of the data randomly,
which can be useful for handling large datasets.
o Stratified Sampling: Ensures that each class or subgroup is appropriately
represented in the sample, improving the quality of the analysis.
5. Aggregation: Summarizes detailed data into more general forms, such as combining
daily sales data into monthly aggregates.

Data reduction techniques are crucial for managing large datasets, improving model
performance, and making data processing more feasible.

filters and discretization with weka:

In Weka, a popular data mining and machine learning tool, filters and discretization are
crucial for preprocessing data and improving model performance. Here's a brief overview of
each:

Filters in Weka:

Filters in Weka are used to preprocess data before applying machine learning algorithms.
They help in cleaning, transforming, or selecting features from the dataset. Weka provides
several types of filters, categorized into different classes:

1. Supervised Filters: Use class labels to guide the transformation. Examples include:
AttributeSelection: Selects a subset of attributes based on their relevance.
o
Discretize: Converts continuous attributes into discrete ones.
o
2. Unsupervised Filters: Do not use class labels and include operations such as:
o Normalize: Scales attributes to a common range (e.g., [0,1]).
o Standardize: Centers attributes around zero with unit variance.
o Remove: Deletes specified attributes or instances.

To apply a filter in Weka:

1. Open Weka and load your dataset.

2. Go to the "Preprocess" tab.
3. Choose the "Filter" option and select the desired filter from the list.
4. Configure the filter settings as needed.
5. Apply the filter to the dataset.

Discretization in Weka

Discretization is the process of converting continuous data into discrete intervals or bins. This
can be useful when dealing with algorithms that perform better with categorical data. Weka
provides a built-in discretization filter:

 Discretize Filter: Converts continuous attributes into discrete values. You can choose
different discretization methods such as equal-width or equal-frequency binning.

To use the Discretize filter:

1. In the "Preprocess" tab, select the "Filter" option.

2. Choose "Supervised" → "Discretize".
3. Configure the discretization method and parameters.
4. Apply the filter to your data.

By preprocessing your data effectively with filters and discretization, you can improve the
performance of your machine learning models and ensure better results in your data mining
tasks.

Guide Complet N8N
No ratings yet
Guide Complet N8N
41 pages
Data Mining Unit 1-1
No ratings yet
Data Mining Unit 1-1
11 pages
Unit 1,2,3
No ratings yet
Unit 1,2,3
35 pages
DM & W SQ
No ratings yet
DM & W SQ
15 pages
MCA 301 Data Mining Notes
No ratings yet
MCA 301 Data Mining Notes
6 pages
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
39 pages
Data Mining
No ratings yet
Data Mining
48 pages
DMT Unit1
No ratings yet
DMT Unit1
46 pages
Chapter 1
No ratings yet
Chapter 1
55 pages
Data Science
No ratings yet
Data Science
11 pages
Aryan DWMPPT
No ratings yet
Aryan DWMPPT
9 pages
ISS-DSS - Module 3
No ratings yet
ISS-DSS - Module 3
23 pages
Ba Unit 2 Imp
No ratings yet
Ba Unit 2 Imp
9 pages
DMBI Theory
No ratings yet
DMBI Theory
15 pages
My Notes DWDM
No ratings yet
My Notes DWDM
18 pages
Introduction To Data Mining and Data Warehousing
No ratings yet
Introduction To Data Mining and Data Warehousing
2 pages
Data Ming Unit 2
No ratings yet
Data Ming Unit 2
8 pages
DWDM
No ratings yet
DWDM
18 pages
Unit1 - Intoduction To Data Mining
No ratings yet
Unit1 - Intoduction To Data Mining
10 pages
DWM Notes
No ratings yet
DWM Notes
19 pages
Unit III DWDM
No ratings yet
Unit III DWDM
113 pages
Data Warehousing & Data Mining Unit-3 Notes
No ratings yet
Data Warehousing & Data Mining Unit-3 Notes
27 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
20 pages
Lecture 2.1.1 2.1.2
No ratings yet
Lecture 2.1.1 2.1.2
19 pages
ISS - Module 3
No ratings yet
ISS - Module 3
11 pages
DWDM 2marks
No ratings yet
DWDM 2marks
15 pages
Lecture 1 & 2
No ratings yet
Lecture 1 & 2
14 pages
Unit No 3
No ratings yet
Unit No 3
10 pages
Chapter 1&2
No ratings yet
Chapter 1&2
91 pages
Data Visualization
No ratings yet
Data Visualization
5 pages
Ai Pass
No ratings yet
Ai Pass
12 pages
CompTIA Data+ Practice Test
100% (2)
CompTIA Data+ Practice Test
149 pages
DM Module1 Notes
No ratings yet
DM Module1 Notes
25 pages
Data Mining
No ratings yet
Data Mining
43 pages
FDS Chap 1
No ratings yet
FDS Chap 1
22 pages
MBA Data Mining Unit 1 Notes
No ratings yet
MBA Data Mining Unit 1 Notes
12 pages
Data Mining Summary
No ratings yet
Data Mining Summary
3 pages
Unit 5 Notes
No ratings yet
Unit 5 Notes
19 pages
Fundamentals of Data Science Notes (Module - 1)
No ratings yet
Fundamentals of Data Science Notes (Module - 1)
19 pages
QB 2 Marker
No ratings yet
QB 2 Marker
25 pages
Data Mining
No ratings yet
Data Mining
46 pages
ICS 2408 Lecture 1 Introduction
No ratings yet
ICS 2408 Lecture 1 Introduction
32 pages
DWDM Unit-2
No ratings yet
DWDM Unit-2
13 pages
Data Mining
No ratings yet
Data Mining
4 pages
Presentation On Data Mining
100% (1)
Presentation On Data Mining
51 pages
Document
No ratings yet
Document
44 pages
Data Mining
No ratings yet
Data Mining
9 pages
Data Mining Practical 123
No ratings yet
Data Mining Practical 123
26 pages
Datawarehouse&Data Mining - ALL
No ratings yet
Datawarehouse&Data Mining - ALL
46 pages
Unit 5
No ratings yet
Unit 5
27 pages
Data Science & Big Data Analysis Module 1,2,3,4,5
No ratings yet
Data Science & Big Data Analysis Module 1,2,3,4,5
70 pages
Data Mining Summaries PDF
No ratings yet
Data Mining Summaries PDF
22 pages
Unit 01
No ratings yet
Unit 01
10 pages
DWM Assigment-Questions Ans
No ratings yet
DWM Assigment-Questions Ans
67 pages
Unit 1
No ratings yet
Unit 1
7 pages
Data Mining
No ratings yet
Data Mining
8 pages
Data Mining
No ratings yet
Data Mining
4 pages
Knowledge Management UNIT-3 Notes
No ratings yet
Knowledge Management UNIT-3 Notes
17 pages
Trends in Data Mining
No ratings yet
Trends in Data Mining
9 pages
Introduction To Data Mining-Week1
No ratings yet
Introduction To Data Mining-Week1
43 pages
DDBMS Questions Answers
No ratings yet
DDBMS Questions Answers
4 pages
Data Mining
No ratings yet
Data Mining
3 pages
Checklist For ABAP Code Review v2
No ratings yet
Checklist For ABAP Code Review v2
5 pages
Data Mining Theory Syllabus
No ratings yet
Data Mining Theory Syllabus
2 pages
Ccs356 Object Oriented Software Engineering - 1
No ratings yet
Ccs356 Object Oriented Software Engineering - 1
114 pages
CSA UNIT-10 (MCQ'S) - Exams CSA UNIT-10 (MCQ'S) - Exams
No ratings yet
CSA UNIT-10 (MCQ'S) - Exams CSA UNIT-10 (MCQ'S) - Exams
89 pages
DAA - Lab Workbook - 23CS2205R
No ratings yet
DAA - Lab Workbook - 23CS2205R
94 pages
Mid - 1 Ans
No ratings yet
Mid - 1 Ans
13 pages
Data Wrangling
No ratings yet
Data Wrangling
18 pages
Chapter 10. Building A Report Using SAP Query Viewer: Steps For Creating The User Group
No ratings yet
Chapter 10. Building A Report Using SAP Query Viewer: Steps For Creating The User Group
31 pages
SE Important Question & Answers
No ratings yet
SE Important Question & Answers
74 pages
Unit-I Aos
No ratings yet
Unit-I Aos
30 pages
Bhel-Haridwar - Wps & PQR For HW 19581
No ratings yet
Bhel-Haridwar - Wps & PQR For HW 19581
3 pages
Uxgf KWJJ AThz JKL D
No ratings yet
Uxgf KWJJ AThz JKL D
15 pages
Computer Science June 2024
No ratings yet
Computer Science June 2024
62 pages
Mtech 1 Sem Advanced Data Structures Feb 2018
No ratings yet
Mtech 1 Sem Advanced Data Structures Feb 2018
1 page
Unit Iv Aos
No ratings yet
Unit Iv Aos
22 pages
29 Branch Bound
No ratings yet
29 Branch Bound
11 pages
CS Project
No ratings yet
CS Project
17 pages
OpenText Automatic Document Numbering 16.3.7 - Installation and Administration Guide English (LLESADN160307-IGD-EN-1)
No ratings yet
OpenText Automatic Document Numbering 16.3.7 - Installation and Administration Guide English (LLESADN160307-IGD-EN-1)
74 pages
Mtech 1 Sem Advanced Operating Systems r16 Jan 2017
No ratings yet
Mtech 1 Sem Advanced Operating Systems r16 Jan 2017
1 page
Index Creation For Mutli Value Fields
No ratings yet
Index Creation For Mutli Value Fields
3 pages
Unit Iv
No ratings yet
Unit Iv
10 pages
Oose Unit-1
No ratings yet
Oose Unit-1
17 pages
Sap Basis Mock Test
No ratings yet
Sap Basis Mock Test
4 pages
Machine Learning Mid 2 Set 1
No ratings yet
Machine Learning Mid 2 Set 1
6 pages
Ip Month Wise Final25-26
No ratings yet
Ip Month Wise Final25-26
3 pages
BHEL WPS For HY19567
No ratings yet
BHEL WPS For HY19567
6 pages
An Overview of Blockchain For Higher Education
No ratings yet
An Overview of Blockchain For Higher Education
5 pages
New Front Pages
No ratings yet
New Front Pages
2 pages
SQL Server Physical Architecture
100% (1)
SQL Server Physical Architecture
45 pages
Upload Catalogue Item Images in Iprocurement
No ratings yet
Upload Catalogue Item Images in Iprocurement
42 pages
No SQL
No ratings yet
No SQL
21 pages
Meanstack Mid2 QP (Sets)
No ratings yet
Meanstack Mid2 QP (Sets)
1 page
Mohammed Hafeez Amir Resume
No ratings yet
Mohammed Hafeez Amir Resume
2 pages
Tuple and Domain Relational Calculus
No ratings yet
Tuple and Domain Relational Calculus
10 pages
Unit - I 1.data Base System
No ratings yet
Unit - I 1.data Base System
102 pages
Unit I
No ratings yet
Unit I
11 pages
Business Analytics-1
No ratings yet
Business Analytics-1
2 pages
Addbafba
No ratings yet
Addbafba
21 pages
Hsslive Xii Cs Key Dec 2024
No ratings yet
Hsslive Xii Cs Key Dec 2024
9 pages
Multitenant Create and Configure Pluggable Database 12c
No ratings yet
Multitenant Create and Configure Pluggable Database 12c
40 pages
1 Stored Procedures in PL/SQL: 1.1 Oracle Users
No ratings yet
1 Stored Procedures in PL/SQL: 1.1 Oracle Users
22 pages
LAB - Chapter 9 - Database Security
No ratings yet
LAB - Chapter 9 - Database Security
9 pages
TejaswiSVS DataBIEngineer
No ratings yet
TejaswiSVS DataBIEngineer
3 pages
SUBQUERY
No ratings yet
SUBQUERY
3 pages
ORACLe Backup Policy
No ratings yet
ORACLe Backup Policy
2 pages
How To: Append A Timestamp To The Name of A Flat File Target Using A Powercenter Center Workflow Variable
No ratings yet
How To: Append A Timestamp To The Name of A Flat File Target Using A Powercenter Center Workflow Variable
5 pages
01-DAS Assignment Question - UCDF2007SE
No ratings yet
01-DAS Assignment Question - UCDF2007SE
4 pages
Data Cleaning
No ratings yet
Data Cleaning
4 pages
9 GraphQL Variables
No ratings yet
9 GraphQL Variables
4 pages
Introduction to Robotics
From Everand
Introduction to Robotics
Swarnalata Verma
No ratings yet

Unit Iii

Uploaded by

Unit Iii

Uploaded by

UNIT III

stages and techniques:

Stages of Data Mining

Techniques in Data Mining

knowledge representation methods:

Knowledge representation in data mining involves organizing and encoding information to

data mining approaches (OLAP, DBMS, Statistics and ML):

1. OLAP (Online Analytical Processing)

2. DBMS (Database Management Systems)

4. Machine Learning (ML)

data warehouse and DBMS:

 Purpose: A data warehouse is designed for analytical processing and reporting. It

Database Management System (DBMS)

 Purpose: A DBMS is designed for managing and manipulating data in real-time

multidimensional data model:

 Dimensions: Time (year, quarter, month), Product (category, brand), Location

1. Removing Duplicates: Identifying and eliminating duplicate records to ensure that

filters and discretization with weka:

To apply a filter in Weka:

1. Open Weka and load your dataset.

To use the Discretize filter:

1. In the "Preprocess" tab, select the "Filter" option.

You might also like