We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3
DATA MINING
Q1. What are the different tasks of Data Mining? (5 marks)
Data mining involves various tasks aimed at discovering patterns,
relationships, and valuable information from large datasets. The key tasks include:
Classification: Assigning predefined labels or categories to
instances based on their attributes. Example: Predicting whether an email is spam or not.
Regression: Predicting a continuous numerical value based on
other attributes. Example: Predicting the price of a house based on its features.
Clustering: Grouping similar instances or data points together
based on their characteristics. Example: Segmenting customers into groups with similar purchasing behavior.
Association Rule Mining: Discovering relationships and
associations between different variables in a dataset. Example: Finding associations between products frequently bought together.
Anomaly Detection: Identifying unusual patterns or outliers in the
data that may indicate errors or fraud. Example: Detecting unusual activity in credit card transactions.
Q2. Explain the Decision Tree Classifier. (5 marks)
The Decision Tree Classifier is a supervised machine learning
algorithm used for both classification and regression tasks. It works by recursively partitioning the dataset into subsets based on the values of input features, ultimately assigning a class label or predicting a target value for each instance. The tree-like structure is composed of nodes, where each node represents a decision or a test on a specific feature. Here's an overview of how the Decision Tree Classifier operates: Root Node: The top most node in the tree, representing the entire dataset. It is associated with the feature that, at this level, is deemed the most significant for making decisions.
Internal Nodes: Nodes within the tree, representing tests or
decisions based on specific features.Each internal node has branches leading to child nodes, corresponding to the possible outcomes of the associated test.
Branches: Each branch emanating from an internal node
corresponds to a possible value or range of values for the associated feature.
Leaf Nodes: Terminal nodes at the end of the branches,
representing the final predicted class label or regression value. Instances reaching a leaf node are assigned the class label or value associated with that leaf. Decision Making: To classify a new instance, start at the root node and navigate the tree based on the feature values of the instance. Follow the branches according to the test outcomes until a leaf node is reached, and the associated class label or value is assigned.
Q3. Define Clustering. (5 marks)
Clustering is a data mining technique that involves grouping a set
of data points or objects based on their similarity. The goal is to create clusters or groups where objects within the same cluster are more similar to each other than to those in other clusters. Clustering is an unsupervised learning approach, meaning it doesn't require predefined labels for the data.
Key Characteristics:
Unsupervised Learning: No predefined categories; the algorithm
discovers patterns on its own. Similarity Measure: Clusters are formed based on the similarity or distance between data points.
Noisy Data Handling: Robust to noise and outliers.
Applications:
Market Segmentation: Grouping customers with similar
preferences.
Image Segmentation: Grouping pixels with similar characteristics.
Anomaly Detection: Identifying unusual patterns by recognizing
deviations from normal clusters. Algorithms:
K-Means: Divides data into k clusters based on centroids.
Hierarchical Clustering: Builds a tree of clusters by merging or splitting them. DBSCAN (Density-Based Spatial Clustering of Applications with Noise): Clusters dense regions of data points. Clustering is valuable for exploratory data analysis and pattern recognition, helping to uncover hidden structures within datasets.