0% found this document useful (0 votes)

28 views20 pages

Data Mining

Uploaded by

mohitvmax620922

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views20 pages

Data Mining

Uploaded by

mohitvmax620922

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

1.Describe the issues of data mining.

Ans:-
1. Data Quality:
Poor data quality, such as incomplete, noisy, or inconsistent data, can lead to
inaccurate analysis. Proper data cleaning and preprocessing are essential to
ensure reliable results.

2. Data Security and Privacy:

Mining sensitive information without consent can violate privacy laws. Ensuring
data security and compliance with regulations like GDPR is crucial to protect
personal or confidential data.

3. Data Integration from Multiple Sources:

Combining data from various sources can be challenging due to differences in
formats and structures. Addressing these discrepancies is vital to maintain data
consistency and quality.

4. Scalability and Efficiency:

As data volumes grow, traditional data mining algorithms may struggle with
performance. Optimizing techniques to handle large datasets efficiently is
essential for scalability.

5. Interpretation of Results:
Some data mining models, especially complex ones, can be hard to interpret.
Ensuring the results are understandable to non-experts is important for gaining
meaningful insights.
6. Overfitting:
Overfitting occurs when models perform well on training data but poorly on new
data. Proper validation techniques are needed to ensure models generalize
effectively.

2.Describe in detail about the applications of data mining

Ans:-
1. Market Basket Analysis:
Helps retailers identify product purchasing patterns to recommend items
frequently bought together, optimizing sales and inventory management.

2. Fraud Detection:
Detects unusual patterns and anomalies in financial transactions, enabling the
identification of fraudulent activities in banking and insurance sectors.

3. Customer Relationship Management (CRM):

Assists businesses in understanding customer preferences and behavior,
improving targeted marketing and customer retention strategies.

4. Healthcare:
Analyzes patient data to predict disease outbreaks, improve treatment plans,
and enhance healthcare decision-making for better outcomes.

5. Risk Management:
Helps financial institutions assess and predict risks in lending and investment,
improving decision-making for minimizing losses.
6. Web Mining:
Extracts useful information from web data, such as user behavior and trends, to
improve website structure, content personalization, and targeted advertising.

3. Describe the steps involved in Knowledge discovery in databases (KDD).

Ans:-
### Steps Involved in Knowledge Discovery in Databases (KDD)

1. Data Selection:
The relevant data is selected from large datasets based on the goals of the
analysis, ensuring that the data is suitable for mining.

2. Data Preprocessing:
Involves cleaning the data by handling missing values, removing noise, and
resolving inconsistencies to improve data quality.

3. Data Transformation:
Data is transformed into an appropriate format or structure through
normalization, aggregation, or generalization for effective mining.

4. Data Mining:
The core step where intelligent methods and algorithms are applied to discover
patterns, trends, or knowledge from the data.

5. Pattern Evaluation:
Extracted patterns are evaluated and interpreted to identify meaningful insights
and ensure their relevance to the given problem.

6. Knowledge Representation:
The discovered knowledge is presented in a user-friendly manner, often through
visualization, reports, or other interpretation techniques for decision-making.

4.Define Data mining. List out the steps in data mining.

Ans:-
Data mining is the process of extracting useful patterns, trends, and insights from
large sets of data using techniques from statistics, machine learning, and database
systems. It aims to discover hidden relationships in data that can be used for
decision-making and prediction.

Steps in Data Mining

1. Data Cleaning:
Remove noise and handle missing data to improve data quality.
2. Data Integration:
Combine data from multiple sources into a coherent dataset.
3. Data Selection:
Select the relevant data for analysis from the larger dataset.
4. Data Transformation:
Convert the data into a suitable format for mining, such as normalization or
aggregation.
5. Data Mining:
Apply algorithms and techniques to extract patterns or knowledge from the
data.
6. Pattern Evaluation:
Evaluate the mined patterns to identify meaningful and useful insights.
7. Knowledge Representation:
Present the discovered knowledge using visualization or other techniques for
easy understanding.

5. Analyse the issues in data mining techniques

Ans:-
Scalability:
Data mining techniques may struggle to process large-scale datasets efficiently.
With increasing data volume, optimizing algorithms to handle big data is crucial
for performance.
Complexity of Data:
Handling complex and heterogeneous data types, such as multimedia, text, or
unstructured data, poses a challenge for many traditional data mining techniques.
Data Quality:
Inaccurate or inconsistent data affects the reliability of mining results. Data
cleaning and preprocessing become vital to address issues like missing, noisy, or
duplicated data.
Algorithm Selection:
Choosing the right algorithm for a given problem is challenging. Different
algorithms work better for specific types of data or patterns, and improper
selection can lead to poor outcomes.
Privacy and Security:
Mining sensitive data without proper authorization can lead to privacy violations.
Techniques must ensure data privacy and comply with regulations like GDPR.
Interpretability of Results:
Some advanced models, such as deep learning, produce results that are difficult to
interpret. This "black box" problem makes it hard to understand how decisions are
made.

6. Evaluate the major tasks in data mining.

Ans:-
Classification:
Classification assigns items in a dataset to predefined categories or classes. It's
used for tasks like spam detection or diagnosing diseases, where models predict
the category based on input features.
Clustering:
Clustering groups similar data points into clusters based on their characteristics,
without predefined labels. This is useful in customer segmentation and market
research, where hidden patterns are identified.
Regression:
Regression predicts numerical values based on input data, often used in
forecasting. Examples include predicting sales, prices, or stock market trends
based on historical data.
Association Rule Mining:
This task discovers relationships between variables in large datasets, commonly
used for market basket analysis to find products frequently bought together.
Anomaly Detection:
Detects outliers or unusual data patterns that do not conform to expected
behavior. It's widely used in fraud detection, network security, and fault detection.
Summarization:
Summarization provides a compact representation of data, often through
generating reports or visual summaries that capture key insights, making large
datasets easier to interpret.
7. State and explain various classification in data mining with examples
Ans:-
Types of Classification in Data Mining
Classification in data mining refers to the process of predicting the class or
category of an object based on its features. Various types of classification
techniques are used depending on the nature of the problem and the dataset.
Below are the key types:
1. Decision Tree Classification:
A decision tree uses a tree-like structure where internal nodes represent
features, branches represent decisions, and leaf nodes represent classes.
Example: Predicting whether a customer will default on a loan based on
their income, credit score, and employment history.
2. Naive Bayes Classification:
Based on Bayes' Theorem, this method assumes independence among
features. It is fast and effective for large datasets, especially for text
classification.
Example: Classifying emails as spam or not spam based on word
occurrences.
3. K-Nearest Neighbors (KNN):
KNN classifies an object by considering the majority class among its K
nearest neighbors in the feature space. It is simple and works well for
smaller datasets.
Example: Classifying the type of flower (species) based on sepal and petal
measurements using a known labeled dataset.
4. Support Vector Machine (SVM):
SVM finds a hyperplane that best separates data into different classes. It's
effective in high-dimensional spaces and works well for both linear and non-
linear classification tasks.
Example: Identifying whether a tumor is malignant or benign based on
patient data.
5. Random Forest Classification:
A random forest consists of multiple decision trees. The classification result
is determined by majority voting among the trees. It reduces overfitting and
improves accuracy.
Example: Predicting customer churn based on various attributes like usage
patterns, customer service calls, and payment history.
6. Logistic Regression:
Logistic regression estimates the probability of a categorical outcome (often
binary) based on the input features. It is commonly used for binary
classification.
Example: Predicting whether a patient has heart disease (yes/no) based on
features like age, blood pressure, and cholesterol levels.
7. Neural Networks:
Neural networks are powerful models that mimic the human brain to
perform classification by learning from data. They are highly effective for
complex and large datasets.
Example: Image recognition tasks, such as classifying handwritten digits in
the MNIST dataset.

8.Define association and correlations.

Ans:-
Association
Association refers to discovering interesting relationships or patterns between
variables in large datasets. It is primarily used in Association Rule Mining, which
helps identify rules that explain the occurrence of one item with another in a
dataset. A common use of association is Market Basket Analysis, where retailers
identify items that are frequently bought together.
Example:
In a supermarket, the rule {Bread} => {Butter} might indicate that customers who
buy bread are likely to buy butter.
Correlation
Correlation is a statistical measure that quantifies the strength and direction of a
relationship between two continuous variables. It helps determine whether and
how strongly pairs of variables are related. The value of correlation ranges from -1
to +1, where:
• +1 indicates a perfect positive relationship,
• -1 indicates a perfect negative relationship, and
• 0 indicates no relationship.
Example:
If the height and weight of people show a correlation of +0.85, it indicates a
strong positive correlation—meaning as height increases, weight also tends to
increase.
In summary, association deals with discovering relationships in large datasets,
often categorical, while correlation measures the linear relationship between
continuous variables.

9.Define decision tree induction

Ans:-
Decision Tree Induction
Decision tree induction is a supervised learning technique used for both
classification and regression tasks. It constructs a tree-like model where:
• Internal nodes represent attributes or features.
• Branches represent decisions based on those attributes.
• Leaf nodes represent the output class or value.
The tree is built by recursively splitting the dataset using the most significant
attribute (selected through criteria like Information Gain or Gini Index) to separate
the data into distinct classes. The primary aim is to create a model that accurately
predicts the class or outcome of new instances based on their attributes.
Example:
A decision tree can classify whether a customer will purchase a product based on
factors like age, income, and purchase history.
Steps in Decision Tree Induction:
1. Select the Best Attribute:
Choose the attribute that best divides the data using a splitting criterion
(e.g., Information Gain).
2. Split the Dataset:
Partition the dataset based on the selected attribute into subsets.
3. Recursive Splitting:
Continue splitting the subsets recursively until all instances in a node belong
to the same class or meet a stopping criterion.

10.Name the features of Decision tree induction.

Ans:-
Features of Decision Tree Induction
1. Simplicity and Interpretability:
Decision trees are easy to understand and visualize, allowing clear decision-
making paths that are intuitive for users.
2. Handles Both Numerical and Categorical Data:
They can work with both types of data, making them flexible for various
datasets in classification and regression tasks.
3. No Data Distribution Assumptions:
Decision trees do not require assumptions about the underlying data
distribution, making them suitable for complex or unstructured data.
4. Captures Non-Linear Relationships:
They can effectively model non-linear relationships between features,
making them powerful for real-world scenarios.
5. Automatic Feature Selection:
Decision trees select the most important features during the splitting
process, reducing the need for manual feature selection.
6. Handles Missing Data:
Many decision tree algorithms can manage missing data by making
decisions based on surrogate splits or probable values.
11. What is the difference between supervised and unsupervised classification?
Provide examples of algorithms used in each type.
Ans:-

Feature Supervised Classification Unsupervised Classification

Involves learning from labeled Involves learning from unlabeled

Definition data where the output classes data without predefined output
are known. classes.

Data Requires a training dataset Does not require labeled data;

Requirement with input-output pairs. uses input data only.

To predict the output class for To discover patterns or groupings

Objective
new, unseen instances. within the data.

- K-Means Clustering
- Hierarchical Clustering
- Decision Trees
- DBSCAN (Density-Based Spatial
- Random Forest
Clustering of Applications with
Common - Support Vector Machines
Noise)
Algorithms (SVM)
- Principal Component Analysis
- Neural Networks
(PCA)
- Naive Bayes
- t-SNE (t-Distributed Stochastic
Neighbor Embedding)

Classifying emails as spam or Grouping customers based on

Examples not spam based on labeled purchasing behavior without
examples. predefined categories.
Feature Supervised Classification Unsupervised Classification

Evaluation is often subjective;

Performance can be evaluated
techniques like silhouette score or
Evaluation using metrics like accuracy,
cluster validation indices are
precision, recall, and F1-score.
used.

12. Explain the concept of overfitting in classification models. What techniques

can be used to prevent overfitting?
Ans:-
Overfitting in Classification Models
Overfitting occurs when a classification model learns the training data too well,
including noise and outliers, resulting in high accuracy on training data but poor
performance on unseen data. This inability to generalize leads to low accuracy on
validation or test sets.
Techniques to Prevent Overfitting
1. Cross-Validation:
Utilize k-fold cross-validation to evaluate the model on multiple data
subsets, ensuring more reliable performance estimates and improved
generalization.
2. Simplifying the Model:
Choose simpler models with fewer parameters, such as limiting decision
tree depth, to reduce complexity and enhance generalization.
3. Regularization:
Implement L1 or L2 regularization to add penalties for large coefficients,
which discourages model complexity and prevents overfitting.
4. Pruning:
In decision trees, prune branches that do not significantly contribute to
predictions, simplifying the model and improving generalization.
5. Early Stopping:
Monitor validation performance during training and stop when
performance starts to decline, preventing the model from fitting to noise.
6. Data Augmentation:
Increase the training dataset's diversity by creating variations of existing
data (e.g., rotating images), providing more examples for the model to learn
from.
13. What is a confusion matrix? Describe how it is used to evaluate the
performance of a classification model, and define the terms accuracy, precision,
recall, and F1-score.
Ans:-
Confusion Matrix
A confusion matrix is a table used to evaluate the performance of a classification
model by comparing the predicted classifications to the actual classifications. It
summarizes the results of a classification problem in a matrix format, showing the
counts of true positives, true negatives, false positives, and false negatives.
Components of a Confusion Matrix
• True Positives (TP): The number of instances correctly predicted as positive.
• True Negatives (TN): The number of instances correctly predicted as
negative.
• False Positives (FP): The number of instances incorrectly predicted as
positive (Type I error).
• False Negatives (FN): The number of instances incorrectly predicted as
negative (Type II error).
Evaluation Metrics
The confusion matrix is used to calculate several important metrics that evaluate
the performance of a classification model:
1. Accuracy:
Accuracy measures the proportion of correctly classified instances (both
positive and negative) out of the total instances.
Accuracy=TP+TN/FP+FN+TP+TN
2. Precision:
Precision measures the proportion of true positive predictions out of all
positive predictions. It indicates the accuracy of the positive predictions.
Precision=TP/FP+TP
3. Recall (Sensitivity):
Recall measures the proportion of true positive predictions out of all actual
positive instances. It reflects the model's ability to identify all relevant
instances.
Recall =TP/FN+TP
4. F1-Score:
The F1-score is the harmonic mean of precision and recall, providing a
balance between the two. It is particularly useful when dealing with
imbalanced datasets.
F1-Score=2×Precision+Recall/Precision×Recall

14.Explain the concept of the silhouette score in clustering. How is it calculated,

and what does it indicate about the clustering performance?
Ans:-
Silhouette Score in Clustering
The silhouette score is a metric used to evaluate the quality of clustering in
unsupervised machine learning. It measures how similar an object is to its own
cluster compared to other clusters, providing insight into the effectiveness of the
clustering.
Calculation of Silhouette Score
The silhouette score for an individual data point is calculated using the following
formula:
Silhouette Score(s)= b−a/ max(a,b)
Where:
• a: The average distance between the data point and all other points in the
same cluster (intra-cluster distance).
• b: The average distance between the data point and all points in the nearest
cluster (inter-cluster distance).
Steps to Calculate the Silhouette Score:
1. Calculate Intra-cluster Distance (a):
For each point, compute the average distance to all other points in the
same cluster.
2. Calculate Inter-cluster Distance (b):
For each point, compute the average distance to all points in the nearest
neighboring cluster.
3. Compute the Silhouette Score:
Use the formula to calculate the silhouette score for each point. The overall
silhouette score for the clustering can be obtained by averaging the
individual scores.
Interpretation of Silhouette Score
• Score Range: The silhouette score ranges from -1 to +1.
o +1: Indicates that the data point is well clustered, as it is far from
neighboring clusters.
o 0: Suggests that the data point is on or very close to the boundary
between two neighboring clusters.
o -1: Indicates that the data point may have been assigned to the
wrong cluster, as it is closer to a neighboring cluster than its own.
Outlier
An outlier is a data point that significantly differs from the other observations in a
dataset. It may represent variability in the data, measurement errors, or
experimental errors. Outliers can skew and mislead the interpretation of results,
affecting statistical analyses and models.
Detection of Outliers
Outliers can be detected using various methods, including:
1. Statistical Methods:
o Z-Score: Calculates the number of standard deviations a data point is
from the mean. A common threshold is a Z-score of ±3, indicating an
outlier.
o IQR Method: Calculates the interquartile range (IQR). Outliers are
defined as points lying below Q1−1.5×IQRQ1 - 1.5 \times
IQRQ1−1.5×IQR or above Q3+1.5×IQRQ3 + 1.5 \times
IQRQ3+1.5×IQR, where Q1Q1Q1 and Q3Q3Q3 are the first and third
quartiles, respectively.
2. Visualization Techniques:
o Box Plots: Visual representations show the distribution of data points
and highlight outliers.
o Scatter Plots: These can visually indicate points that lie far from
clusters of normal data.
3. Machine Learning Algorithms:
o Isolation Forest: An algorithm that identifies anomalies by isolating
observations in a random forest.
o DBSCAN (Density-Based Spatial Clustering of Applications with
Noise): This algorithm can identify points that do not belong to any
cluster as outliers.
Remedies for Outliers
Outliers can be managed using various strategies, depending on their cause and
impact on the analysis:
1. Removal:
If the outlier is determined to be a data entry error or not relevant to the
analysis, it can be removed from the dataset.
2. Transformation:
Applying transformations (e.g., logarithmic, square root) can reduce the
influence of outliers by compressing the range of data.
3. Imputation:
Replace outliers with a value that makes sense based on the context, such
as the mean or median of the non-outlier data points.
4. Robust Statistical Techniques:
Use algorithms or statistical methods that are less sensitive to outliers, such
as median-based measures instead of mean-based measures.
5. Segmentation:
Analyze outliers separately to understand their significance and how they
affect the overall dataset.

15.Define Skewness.
Ans:-
Skewness
Skewness measures the asymmetry of a dataset's distribution, indicating how
much and in which direction the distribution deviates from symmetry.
Types of Skewness
1. Positive Skewness (Right Skew):
The right tail of the distribution is longer, indicating that most data points
are concentrated on the left with a few higher values. An example is income
distribution, where a small number of individuals have significantly higher
incomes.
2. Negative Skewness (Left Skew):
The left tail is longer, showing that most data points are concentrated on
the right with a few lower values. An example can be exam scores where
most students score high but a few score very low.
3. Zero Skewness:
Indicates a perfectly symmetrical distribution with equal tails on both sides,
as seen in a normal distribution.
Measurement of Skewness
Skewness can be calculated using formulas such as Pearson's first and second
coefficients or through statistical software that provides skewness values for
datasets.

16.Bias and Variance

Ans:-
Bias
Bias refers to the error introduced by approximating a real-world problem, which
may be complex, with a simplified model. It represents the systematic deviation of
the model's predictions from the actual values.
• High Bias: Models with high bias tend to underfit the data, making strong
assumptions about the data's structure and leading to poor performance on
both training and testing datasets. For example, using a linear model for a
nonlinear relationship results in high bias.
Variance
Variance refers to the model's sensitivity to fluctuations in the training dataset. It
measures how much the model's predictions change when trained on different
subsets of the data.
• High Variance: Models with high variance tend to overfit the training data,
capturing noise and outliers. Such models perform well on training data but
poorly on unseen data. For example, a complex model that perfectly fits the
training data but fails to generalize to new data points exhibits high
variance.
Bias-Variance Tradeoff
The bias-variance tradeoff is the balance between bias and variance that affects a
model's performance. The goal in machine learning is to minimize both bias and
variance to achieve optimal model performance. Too much bias can lead to
underfitting, while too much variance can lead to overfitting. Finding the right
balance is crucial for creating effective predictive models.

17.Define Clustering and Clustering Matrices.

Ans:-
Clustering
Clustering is an unsupervised machine learning technique used to group a set of
objects or data points into clusters based on their similarities or distances from
one another. The goal is to partition the data such that objects within the same
cluster are more similar to each other than to those in different clusters.
Clustering is widely used in various applications, such as customer segmentation,
image recognition, and pattern recognition.
Types of Clustering Algorithms
1. K-Means Clustering:
Partitions data into kkk clusters by minimizing the distance between data
points and their respective cluster centroids.
2. Hierarchical Clustering:
Builds a tree of clusters by either merging smaller clusters into larger ones
(agglomerative) or dividing larger clusters into smaller ones (divisive).
3. DBSCAN (Density-Based Spatial Clustering of Applications with Noise):
Groups together points that are closely packed while marking points that lie
alone in low-density regions as outliers.
Clustering Matrices
Clustering matrices are tools used to evaluate the quality of clustering results.
They help in comparing the actual classifications of data points to the results
produced by a clustering algorithm. Two common types of clustering matrices are:
1. Confusion Matrix:
Used to compare the predicted clusters against the true labels, summarizing
the correct and incorrect classifications. It provides insights into true
positives, false positives, true negatives, and false negatives.
2. Adjusted Rand Index (ARI):
A similarity measure that quantifies the agreement between two different
clustering assignments. It considers the number of pairs of data points that
are clustered together or apart in both clusterings.
3. Silhouette Matrix:
Evaluates how well each data point is clustered by measuring the average
distance between a point and others in the same cluster compared to the
nearest clu

Unit 1 Data Mining Task
No ratings yet
Unit 1 Data Mining Task
7 pages
Reviewer: Industrial Organizational Psychology
100% (3)
Reviewer: Industrial Organizational Psychology
35 pages
Dataming Cat Answers
No ratings yet
Dataming Cat Answers
43 pages
DWM (Data Warehousing and Mining) : By: Akatsuki
No ratings yet
DWM (Data Warehousing and Mining) : By: Akatsuki
12 pages
CASM Aircrafthistories CanadairCL 30SilverStar
100% (1)
CASM Aircrafthistories CanadairCL 30SilverStar
38 pages
Data Mining OVERVIEW
No ratings yet
Data Mining OVERVIEW
8 pages
Data Mining Notes1
No ratings yet
Data Mining Notes1
56 pages
DWDM Unit-II Notes
No ratings yet
DWDM Unit-II Notes
29 pages
Document
No ratings yet
Document
44 pages
QB 2 Marker
No ratings yet
QB 2 Marker
25 pages
Data Mining Simran
No ratings yet
Data Mining Simran
128 pages
Data Science
No ratings yet
Data Science
11 pages
DM - Unit I-Updated
No ratings yet
DM - Unit I-Updated
65 pages
Unit 1,2,3
No ratings yet
Unit 1,2,3
35 pages
DMBI Theory
No ratings yet
DMBI Theory
15 pages
BTECH Data Mining Answer
No ratings yet
BTECH Data Mining Answer
35 pages
WINSEM2024-25 MCSE615L TH VL2024250502897 2024-12-19 Reference-Material-I
No ratings yet
WINSEM2024-25 MCSE615L TH VL2024250502897 2024-12-19 Reference-Material-I
58 pages
Assignment 3
No ratings yet
Assignment 3
4 pages
ECAP Learning - Vijay Kumar
No ratings yet
ECAP Learning - Vijay Kumar
7 pages
DM-Model Question Paper Solutions
No ratings yet
DM-Model Question Paper Solutions
27 pages
Datawarehouse&Data Mining - ALL
No ratings yet
Datawarehouse&Data Mining - ALL
46 pages
Unit-1 Data Mining
No ratings yet
Unit-1 Data Mining
19 pages
My Notes DWDM
No ratings yet
My Notes DWDM
18 pages
Data Mining (Mid Sem)
No ratings yet
Data Mining (Mid Sem)
7 pages
Data Warehousing & Data Mining Unit-3 Notes
No ratings yet
Data Warehousing & Data Mining Unit-3 Notes
27 pages
DM Answers
No ratings yet
DM Answers
22 pages
DM Notes
No ratings yet
DM Notes
26 pages
ISS-DSS - Module 3
No ratings yet
ISS-DSS - Module 3
23 pages
Fundamentals of Data Science Notes (Module - 1)
No ratings yet
Fundamentals of Data Science Notes (Module - 1)
19 pages
Data Mining Assign 1
No ratings yet
Data Mining Assign 1
7 pages
FDS (Answers)
No ratings yet
FDS (Answers)
15 pages
Plastic Furniture
No ratings yet
Plastic Furniture
20 pages
What Is Data Mining: Effective Data Collection Warehousing
No ratings yet
What Is Data Mining: Effective Data Collection Warehousing
21 pages
DWDM 3 Unit Notes
No ratings yet
DWDM 3 Unit Notes
10 pages
Dissolution Problems
No ratings yet
Dissolution Problems
12 pages
Unit 3 DW
No ratings yet
Unit 3 DW
19 pages
Data Mining
No ratings yet
Data Mining
9 pages
Unit 1
No ratings yet
Unit 1
148 pages
1.data Mining Functionalities
No ratings yet
1.data Mining Functionalities
14 pages
DM Unit 1
No ratings yet
DM Unit 1
10 pages
Unit 1 Data Mining
No ratings yet
Unit 1 Data Mining
16 pages
Data Mining Notes
No ratings yet
Data Mining Notes
25 pages
BI - Unit 5
No ratings yet
BI - Unit 5
9 pages
cc15 2nd
No ratings yet
cc15 2nd
2 pages
Synopsis Print
No ratings yet
Synopsis Print
4 pages
Data Mining
No ratings yet
Data Mining
4 pages
FDS - I Unit
No ratings yet
FDS - I Unit
9 pages
21SE204-B DATA MINING - S2 M.Tech: Prepared By, Prince V Jose Ap, Cse Saintgits College of Engg
No ratings yet
21SE204-B DATA MINING - S2 M.Tech: Prepared By, Prince V Jose Ap, Cse Saintgits College of Engg
31 pages
DWDM Unit-2
No ratings yet
DWDM Unit-2
13 pages
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
13 pages
Data Mining
No ratings yet
Data Mining
43 pages
Unit III
No ratings yet
Unit III
101 pages
Notes For DMDWH - Module1
No ratings yet
Notes For DMDWH - Module1
21 pages
Unit Iii
No ratings yet
Unit Iii
33 pages
BANKING LAWS Case Digest
100% (1)
BANKING LAWS Case Digest
412 pages
Data Mining Unit I Notes
No ratings yet
Data Mining Unit I Notes
24 pages
Cockfighting Law
No ratings yet
Cockfighting Law
5 pages
Unit 1
No ratings yet
Unit 1
7 pages
DM Chapter 1
No ratings yet
DM Chapter 1
10 pages
Data Science & Big Data Analysis Module 1,2,3,4,5
No ratings yet
Data Science & Big Data Analysis Module 1,2,3,4,5
70 pages
Unit1 - Intoduction To Data Mining
No ratings yet
Unit1 - Intoduction To Data Mining
10 pages
CS-25 Amendment 11 - AMC 25.1309
100% (1)
CS-25 Amendment 11 - AMC 25.1309
30 pages
UNIT 1 - Lecture 1 - Introduction To Data Mining
No ratings yet
UNIT 1 - Lecture 1 - Introduction To Data Mining
62 pages
Data Mining Summaries PDF
No ratings yet
Data Mining Summaries PDF
22 pages
annotated-BUREAU OF CORRECTIONS
No ratings yet
annotated-BUREAU OF CORRECTIONS
6 pages
DM Module1
No ratings yet
DM Module1
15 pages
Whats App
No ratings yet
Whats App
23 pages
Ifrs 7 Financial Instruments Disclosures
No ratings yet
Ifrs 7 Financial Instruments Disclosures
70 pages
Badplaas and Mashishila Cluster: Geography Mapwork/Gis Task1 Marking Guideline MARCH 2023
No ratings yet
Badplaas and Mashishila Cluster: Geography Mapwork/Gis Task1 Marking Guideline MARCH 2023
9 pages
Knowledge Management UNIT-3 Notes
No ratings yet
Knowledge Management UNIT-3 Notes
17 pages
Ace 150 250 Manual English
No ratings yet
Ace 150 250 Manual English
50 pages
1B Compiled Lab Reports
No ratings yet
1B Compiled Lab Reports
68 pages
Vande Bharat Ex Executive Class (EC) : WL WL
No ratings yet
Vande Bharat Ex Executive Class (EC) : WL WL
2 pages
Perl Bioinf 0411 PDF
No ratings yet
Perl Bioinf 0411 PDF
69 pages
SPE-121928 Field Development and Productivity Improvement in ... Synergistic Approach To Carbonate Fracture Acidizing
No ratings yet
SPE-121928 Field Development and Productivity Improvement in ... Synergistic Approach To Carbonate Fracture Acidizing
10 pages
Crystal Reports Notes
No ratings yet
Crystal Reports Notes
10 pages
Burger King Nutritional Brochure
100% (3)
Burger King Nutritional Brochure
6 pages
Reciprocating Compressor - Valves and Unloaders
No ratings yet
Reciprocating Compressor - Valves and Unloaders
11 pages
M55 LE1 Review Handout
No ratings yet
M55 LE1 Review Handout
33 pages
Kaleen Butterfield Resume Summer 2014
No ratings yet
Kaleen Butterfield Resume Summer 2014
2 pages
Business Environment Notes
No ratings yet
Business Environment Notes
4 pages
IGC 2 Nebosh Summaries
No ratings yet
IGC 2 Nebosh Summaries
39 pages
Astrophotography Nightscape Lens Rating
No ratings yet
Astrophotography Nightscape Lens Rating
26 pages
ISSN 1411 - 5247 Perancangan Produk Tempat Tisu Multifungsi Dengan Menggunakan Quality Function Deployment (QFD)
No ratings yet
ISSN 1411 - 5247 Perancangan Produk Tempat Tisu Multifungsi Dengan Menggunakan Quality Function Deployment (QFD)
9 pages
Daftar Pustaka
No ratings yet
Daftar Pustaka
4 pages
Terziev Etal AOR 2019 A Geosim Analysis of Ship Resistance Decomposition and Scale Effects
No ratings yet
Terziev Etal AOR 2019 A Geosim Analysis of Ship Resistance Decomposition and Scale Effects
36 pages
KB2960) Exclude A Safe Website From Being Blocked by Web Access Protection in ESET
No ratings yet
KB2960) Exclude A Safe Website From Being Blocked by Web Access Protection in ESET
5 pages
Planner
No ratings yet
Planner
2 pages
2505744634 (2)
No ratings yet
2505744634 (2)
1 page
The Secret Of Machine Learning
From Everand
The Secret Of Machine Learning
Mhd Arjunanta
No ratings yet

Data Mining

Uploaded by

Data Mining

Uploaded by

1.Describe the issues of data mining.

2. Data Security and Privacy:

3. Data Integration from Multiple Sources:

4. Scalability and Efficiency:

2.Describe in detail about the applications of data mining

3. Customer Relationship Management (CRM):

3. Describe the steps involved in Knowledge discovery in databases (KDD).

4.Define Data mining. List out the steps in data mining.

Steps in Data Mining

5. Analyse the issues in data mining techniques

6. Evaluate the major tasks in data mining.

8.Define association and correlations.

9.Define decision tree induction

10.Name the features of Decision tree induction.

Feature Supervised Classification Unsupervised Classification

Involves learning from labeled Involves learning from unlabeled

Data Requires a training dataset Does not require labeled data;

To predict the output class for To discover patterns or groupings

Classifying emails as spam or Grouping customers based on

Evaluation is often subjective;

12. Explain the concept of overfitting in classification models. What techniques

14.Explain the concept of the silhouette score in clustering. How is it calculated,

16.Bias and Variance

17.Define Clustering and Clustering Matrices.

You might also like