0% found this document useful (0 votes)
25 views3 pages

Datamining

Datamining notes

Uploaded by

Abhishek Ghosh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views3 pages

Datamining

Datamining notes

Uploaded by

Abhishek Ghosh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

DATA MINING

Q1. What are the different tasks of Data Mining? (5 marks)

Data mining involves various tasks aimed at discovering patterns,


relationships, and valuable information from large datasets. The
key tasks include:

Classification: Assigning predefined labels or categories to


instances based on their attributes. Example: Predicting whether
an email is spam or not.

Regression: Predicting a continuous numerical value based on


other attributes.
Example: Predicting the price of a house based on its features.

Clustering: Grouping similar instances or data points together


based on their characteristics.
Example: Segmenting customers into groups with similar
purchasing behavior.

Association Rule Mining: Discovering relationships and


associations between different variables in a dataset.
Example: Finding associations between products frequently
bought together.

Anomaly Detection: Identifying unusual patterns or outliers in the


data that may indicate errors or fraud.
Example: Detecting unusual activity in credit card transactions.

Q2. Explain the Decision Tree Classifier. (5 marks)

The Decision Tree Classifier is a supervised machine learning


algorithm used for both classification and regression tasks. It
works by recursively partitioning the dataset into subsets based
on the values of input features, ultimately assigning a class label
or predicting a target value for each instance. The tree-like
structure is composed of nodes, where each node represents a
decision or a test on a specific feature. Here's an overview of how
the Decision Tree Classifier operates:
Root Node: The top most node in the tree, representing the entire
dataset.
It is associated with the feature that, at this level, is deemed the
most significant for making decisions.

Internal Nodes: Nodes within the tree, representing tests or


decisions based on specific features.Each internal node has
branches leading to child nodes, corresponding to the possible
outcomes of the associated test.

Branches: Each branch emanating from an internal node


corresponds to a possible value or range of values for the
associated feature.

Leaf Nodes: Terminal nodes at the end of the branches,


representing the final predicted class label or regression value.
Instances reaching a leaf node are assigned the class label or
value associated with that leaf.
Decision Making: To classify a new instance, start at the root node
and navigate the tree based on the feature values of the instance.
Follow the branches according to the test outcomes until a leaf
node is reached, and the associated class label or value is
assigned.

Q3. Define Clustering. (5 marks)

Clustering is a data mining technique that involves grouping a set


of data points or objects based on their similarity. The goal is to
create clusters or groups where objects within the same cluster
are more similar to each other than to those in other clusters.
Clustering is an unsupervised learning approach, meaning it
doesn't require predefined labels for the data.

Key Characteristics:

Unsupervised Learning: No predefined categories; the algorithm


discovers patterns on its own.
Similarity Measure: Clusters are formed based on the similarity or
distance between data points.

Noisy Data Handling: Robust to noise and outliers.


Applications:

Market Segmentation: Grouping customers with similar


preferences.

Image Segmentation: Grouping pixels with similar characteristics.

Anomaly Detection: Identifying unusual patterns by recognizing


deviations from normal clusters.
Algorithms:

K-Means: Divides data into k clusters based on centroids.


Hierarchical Clustering: Builds a tree of clusters by merging or
splitting them.
DBSCAN (Density-Based Spatial Clustering of Applications with
Noise): Clusters dense regions of data points.
Clustering is valuable for exploratory data analysis and pattern
recognition, helping to uncover hidden structures within datasets.

You might also like