We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2
DM - SHORTS
1. Advantages of using decision trees:?
○ Easy to understand and interpret. ○ Can handle both numerical and categorical data. ○ Non-parametric and requires little data preparation. ○ Can model nonlinear relationships. 2. Two approaches to improve quality of hierarchical clustering:? ○ Agglomerative (Bottom-up): Start with each point asits own cluster and m erge them iteratively. ○ Divisive (Top-down): Start with one large clusterand split it iteratively into s maller clusters. 3. Applications of cluster analysis:? ○ Market segmentation. ○ Image segmentation. ○ Social network analysis. ○ Biological data analysis (e.g., gene expression). ○ Document classification. 4. Define data stream mining:? ○ The process of extracting knowledge structures from continuous, rapid data records. 5. Taxonomy of web mining:? ○ Web Content Mining: Extracting useful informationfrom web content. ○ Web Structure Mining: Analyzing the structure of hyperlinkswithin the web. ○ Web Usage Mining: Understanding user behavior throughweb logs and interactions. 6. Effectiveness of Bayesian classifiers:? ○ Bayesian classifiers are effective for small and medium-sized datasets, handling probabilistic relationships and real-time predictions well. They assume feature independence, which can sometimes limit accuracy. 7. What is an outlier?: ○ An observation or data point that significantly deviates from the other observations in the dataset. 8. Weaknesses of hierarchical clustering: ○ High computational complexity. ○ Difficulty handling large datasets. ○ No inherent ability to reassign points once merged or split. 9. Define unstructured text: ○ Text data that does not have a predefined data model or is not organized in a s tructured format (e.g., free-form text).
1.Characteristics of k-nearest neighbor algorithm:
1 ● Instance-based and lazy learning algorithm. ● Simple and easy to implement. ● Sensitive to the choice of k and distance metric. ● Computationally expensive for large datasets.
2.Brief Discussion about Clustering Problem Definition:
1 ● Clustering involves grouping a set of objects such that objects in the same group (or c luster) are more similar to each other than to those in other groups. It is an unsupervised learning task used to identify natural groupings in data.
3.Need of outlier detection & two applications:
1 ● Need: To identify and handle anomalies that could indicate errors, fraud, or significant events. ● Applications: Fraud detection in financial transactions and network security for intrusion detection.
4.Define web mining:
1 ● The process of using data mining techniques to automatically discover and extract information from web documents and services.
5.Define text clustering:
1 ● The process of grouping a set of text documents into clusters based on their content s imilarity, enabling efficient organization and retrieval of information.