DM Vsaq
DM Vsaq
Unit-3
1. What is the Difference Between Classification and Prediction?
Classification: Assigns data to predefined categories (e.g., spam or not spam).
Prediction: Forecasts continuous values or trends (e.g., predicting house prices).
2. What is Bayes Theorem? Explain.
Unit-4
1. Explain About Types of Data in Cluster Analysis.
In cluster analysis, data can be:
Numerical Data: Continuous or discrete values (e.g., sales figures).
Categorical Data: Non-numeric data representing categories (e.g., gender, product types).
Mixed Data: A combination of numerical and categorical data (e.g., customer demographics).
2. Differentiate Between AGNES and DIANA Algorithms.
AGNES (Agglomerative Nesting): A bottom-up approach where each data point starts in its
own cluster and merges clusters iteratively based on similarity.
DIANA (Divisive Analysis): A top-down approach that starts with all data points in one
cluster and recursively splits the clusters.
3. Write K-means Clustering Algorithm.
1. Select kkk initial centroids randomly.
2. Assign each data point to the nearest centroid.
3. Calculate the new centroids by averaging the points in each cluster.
4. Repeat steps 2 and 3 until centroids do not change.
4. Explain Grid-based Clustering Methods.
Grid-based clustering divides the data space into a grid structure and performs clustering on
these grid cells. Examples include:
STING: Uses a hierarchical grid structure to organize the data and create clusters.
CLIQUE: Combines grid-based and density-based methods to form clusters.
5. Write the Key Issues in Hierarchical Clustering Algorithm.
Scalability: Hierarchical clustering can be computationally expensive for large datasets.
Choice of Distance Measure: The performance is sensitive to the similarity measure used.
Irreversible Merging: Once two clusters are merged, they cannot be split again.
6. What is the Use of Clustering?
Clustering is used to:
Discover hidden patterns and groups within data.
Segment customers for targeted marketing.
Detect anomalies (e.g., fraud detection).
Organize large datasets for better analysis.
7. Explain Interclustering.
Interclustering refers to the relationship or distance between different clusters. It measures how
distinct or similar the clusters are. Effective clustering methods aim to maximize inter-cluster distance
while minimizing intra-cluster distance.
8. What is Clustering? How is it Useful to Business?
Clustering groups similar data points together. It helps businesses by:
Segmenting customers based on behavior for personalized marketing.
Identifying market trends and new product opportunities.
Improving customer service by categorizing customer needs.
9. Explain Unsupervised Data.
Unsupervised data refers to datasets without predefined labels or target variables. In this type
of data, algorithms like clustering or dimensionality reduction are used to identify patterns or
structures within the data.
10. What is Intracluster?
Intracluster refers to the similarity or cohesion within a single cluster. It measures how close
or similar the data points within the same cluster are to each other. The higher the intracluster
similarity, the better the clustering
Unit-5
1. What is Web Mining?
Web mining is the process of extracting useful patterns and insights from web data. It
includes:
Text mining extracts meaningful insights, patterns, and knowledge from unstructured textual
data using natural language processing (NLP) and statistical methods. Example applications
include sentiment analysis and document classification.
Text clustering groups similar documents or textual data based on content similarity.
Techniques include:
K-means clustering
Hierarchical clustering
Applications: Topic modeling, document summarization.
Web content mining focuses on extracting and analyzing textual, image, video, and structured
data (e.g., HTML, XML) from web pages. Example: Extracting product reviews from e-commerce sites.
Root: Electronics
Sub-category: Mobiles
Sentiment analysis,
User behavior analysis, SEO
classification