0% found this document useful (0 votes)

13 views22 pages

DM Answers

The document provides a comprehensive overview of data mining, including definitions, techniques, applications, and processes involved in knowledge discovery. It covers various concepts such as data cleaning, integration, and transformation, as well as different data mining functions like classification, clustering, and regression. Additionally, it discusses challenges in data mining, the importance of data preprocessing, and the confluence of multiple disciplines in the field.

Uploaded by

sangwikcgowda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views22 pages

DM Answers

Uploaded by

sangwikcgowda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 22

Short Answers (2 Marks Each)

1. Define Data Mining.

Data mining is the process of discovering patterns, correlations, and anomalies in
large datasets to extract meaningful information and develop predictive models.

2. Why is data mining required?

Data mining is required to uncover hidden patterns, trends, and relationships in data,
enabling better decision-making, predicting future outcomes, and gaining a
competitive advantage.

3. Enlist the applications of Data Mining.

o Market Basket Analysis

o Fraud Detection

o Customer Segmentation

o Healthcare Diagnostics

o Recommendation Systems

4. What is Cluster Analysis?

Cluster analysis is a technique used to group similar objects into clusters based on
their characteristics, helping to identify patterns and structures in data.

5. What is Outlier Analysis?

Outlier analysis identifies data points that deviate significantly from the rest of the
data, which may indicate errors, anomalies, or significant events.

6. Define data, Information, Knowledge.

o Data: Raw facts or figures.

o Information: Processed data that is meaningful.

o Knowledge: Insights derived from information through analysis and

interpretation.

7. Define Correlation, Covariance.

o Correlation: Measures the strength and direction of the linear relationship

between two variables.

o Covariance: Measures how much two random variables change together.

8. Compute the similarity between Chicken and Bird using SMC coefficient.
SMC = (Number of matching attributes) / (Total number of attributes)
Matching attributes: 7 (positions 2, 3, 4, 8, 10)
Total attributes: 10
SMC = 7/10 = 0.7

9. Define Time Series Data.

Time series data is a sequence of data points collected or recorded at specific time
intervals, used to analyze trends over time.

10. Define the following:

o Object: An entity with attributes (e.g., a customer).

o Attribute: A property or characteristic of an object (e.g., age, salary).

11. List data reduction techniques in data mining.

o Dimensionality Reduction

o Numerosity Reduction

o Data Compression

12. Define Ordinal Data Attribute.

Ordinal data represents categories with a meaningful order or ranking (e.g., low,
medium, high).

13. Enlist types of Datasets.

o Structured Data

o Unstructured Data

o Semi-structured Data

14. Define Qualitative Data and Quantitative Data.

o Qualitative Data: Descriptive and categorical (e.g., gender, color).

o Quantitative Data: Numerical and measurable (e.g., age, height).

15. Define Data Redundancy.

Data redundancy occurs when the same piece of data is stored in multiple places,
leading to inefficiency and inconsistency.

16. Define Data Scrubbing, Data Auditing.

o Data Scrubbing: Cleaning data to remove errors and inconsistencies.

o Data Auditing: Assessing data quality and integrity.

17. Which tools are used for Data Mitigation?

Tools like Python, R, SQL, and specialized software like SAS, SPSS, and Tableau are
used for data mitigation.
18. Explain Ordered Data.
Ordered data is data that has a natural sequence or ranking, such as numerical data
or ordinal categories.

5-Marker Questions

19. Explain the Knowledge Discovery in Databases (KDD) process with a neat diagram.

Step-by-Step Explanation:

1. Data Cleaning:

o Remove noise, handle missing values, and correct inconsistencies in the data.

o Example: Removing duplicate records or filling missing values with averages.

2. Data Integration:

o Combine data from multiple sources into a unified dataset.

o Example: Merging customer data from different databases.

3. Data Selection:

o Choose relevant data for analysis.

o Example: Selecting only customer transaction data for fraud detection.

4. Data Transformation:

o Convert data into a suitable format for mining.

o Example: Normalizing data or creating derived attributes.

5. Data Mining:

o Apply algorithms to discover patterns.

o Example: Using clustering to group similar customers.

6. Pattern Evaluation:

o Assess the usefulness of discovered patterns.

o Example: Evaluating the accuracy of a classification model.

7. Knowledge Presentation:

o Visualize and interpret the results.

o Example: Creating dashboards or reports for business stakeholders.

Diagram:

Copy
Raw Data → Data Cleaning → Data Integration → Data Selection → Data Transformation →
Data Mining → Pattern Evaluation → Knowledge Presentation

20. Discuss Different Data Mining Functions in Detail

Step-by-Step Explanation:

1. Classification:

o Assigns labels to data based on input features.

o Example: Classifying emails as spam or not spam.

2. Clustering:

o Groups similar data points into clusters.

o Example: Grouping customers based on purchasing behavior.

3. Regression:

o Predicts continuous values based on input features.

o Example: Predicting house prices based on location and size.

4. Association Rule Mining:

o Finds relationships between variables in large datasets.

o Example: Market basket analysis to find products frequently bought together.

5. Anomaly Detection:

o Identifies outliers or unusual patterns.

o Example: Detecting fraudulent transactions in banking.

22. Explain How Data Mining Works

Step-by-Step Explanation:

1. Data Collection:

o Gather raw data from various sources (e.g., databases, sensors).

2. Data Preprocessing:

o Clean, integrate, and transform data for analysis.

3. Model Selection:
o Choose appropriate algorithms (e.g., decision trees, neural networks).

4. Pattern Discovery:

o Apply algorithms to uncover hidden patterns.

5. Evaluation:

o Assess the accuracy and usefulness of the results.

6. Deployment:

o Use the insights for decision-making or automation.

23. Explain the Role of Data Mining in Business Intelligence

Step-by-Step Explanation:

1. Customer Insights:

o Analyze customer behavior to improve targeting and retention.

2. Market Trends:

o Identify trends to optimize product offerings.

3. Operational Efficiency:

o Streamline processes by identifying inefficiencies.

4. Risk Management:

o Detect and mitigate risks (e.g., fraud, credit risk).

5. Strategic Decisions:

o Provide data-driven insights for long-term planning.

25. List and Explain the Goals of Data Mining

Step-by-Step Explanation:

1. Prediction:

o Forecast future trends (e.g., sales predictions).

2. Description:

o Summarize data patterns (e.g., customer segmentation).

3. Classification:
o Categorize data into predefined classes (e.g., spam detection).

4. Clustering:

o Group similar data points (e.g., market segmentation).

5. Association:

o Find relationships between variables (e.g., product recommendations).

26. Discuss the Confluence of Multiple Disciplines in Data Mining

Step-by-Step Explanation:

1. Statistics:

o Provides foundational techniques for data analysis, such as hypothesis testing

and regression.

o Example: Using statistical tests to validate patterns.

2. Machine Learning:

o Offers algorithms for predictive modeling and pattern recognition.

o Example: Using decision trees for classification.

3. Database Systems:

o Enables efficient storage, retrieval, and management of large datasets.

o Example: Using SQL queries to extract relevant data.

4. Domain Knowledge:

o Provides context and meaning to the data.

o Example: A healthcare expert interpreting medical data.

5. Visualization:

o Helps in presenting data insights effectively.

o Example: Using dashboards to display trends.

27. Illustrate the Typical View in ML and Statistics with a Neat Diagram

Step-by-Step Explanation:

1. Machine Learning (ML):

o Focuses on predictive modeling and automation.

o Example: Training a model to predict customer churn.

2. Statistics:

o Emphasizes inference and hypothesis testing.

o Example: Testing if a new drug is effective.

3. Diagram:

Copy

Data → [Statistics: Inference, Hypothesis Testing] → Insights

Data → [ML: Predictive Modeling, Automation] → Predictions

28. Illustrate 5 Applications of Data Mining

Step-by-Step Explanation:

1. Fraud Detection:

o Identifies unusual patterns in transactions.

o Example: Detecting credit card fraud.

2. Customer Segmentation:

o Groups customers based on behavior.

o Example: Targeting marketing campaigns.

3. Healthcare Diagnostics:

o Predicts diseases based on patient data.

o Example: Early detection of cancer.

4. Recommendation Systems:

o Suggests products or content based on user preferences.

o Example: Netflix movie recommendations.

5. Market Basket Analysis:

o Identifies products frequently bought together.

o Example: Supermarket product placement.

29. How to Search for Knowledge and Interesting Patterns in Data?

Step-by-Step Explanation:

1. Data Collection:

o Gather relevant data from various sources.

2. Data Preprocessing:

o Clean, integrate, and transform data.

3. Exploratory Data Analysis (EDA):

o Use visualizations and summary statistics to understand data.

4. Apply Data Mining Techniques:

o Use algorithms like clustering, classification, and association rule mining.

5. Evaluate and Interpret Results:

o Assess the significance of patterns and derive actionable insights.

30. Discuss the Major Issues of Data Mining

Step-by-Step Explanation:

1. Data Quality:

o Incomplete, noisy, or inconsistent data can lead to inaccurate results.

2. Scalability:

o Handling large datasets efficiently is challenging.

3. Privacy and Security:

o Protecting sensitive data from misuse.

4. Interpretability:

o Complex models may be difficult to explain.

5. Ethical Concerns:

o Ensuring fairness and avoiding bias in decision-making.

31. Compare Quantitative Data and Qualitative Data

Step-by-Step Explanation:

1. Quantitative Data:
o Numerical and measurable.

o Example: Age, height, income.

2. Qualitative Data:

o Descriptive and categorical.

o Example: Gender, color, opinions.

3. Comparison:

o Quantitative: Objective, suitable for statistical analysis.

o Qualitative: Subjective, used for understanding context.

32. Explain Attribute Subset Selection Methods with an Example

Step-by-Step Explanation:

1. Filter Methods:

o Select features based on statistical measures.

o Example: Using correlation to select relevant features.

2. Wrapper Methods:

o Use a predictive model to evaluate feature subsets.

o Example: Recursive feature elimination.

3. Embedded Methods:

o Perform feature selection during model training.

o Example: Lasso regression.

4. Example:

o In a dataset with age, income, and education, select income and education as
the most relevant features for predicting loan approval.

33. How to Perform Correlation Analysis Between Categorical Variables Using Chi-Square
Test

Step-by-Step Explanation:

1. Null Hypothesis (H₀):

o No association between the variables.

2. Alternative Hypothesis (H₁):

o There is an association between the variables.

3. Calculate Expected Frequencies:

o Use the formula:

E=(Row Total×Column Total)Grand TotalE=Grand Total(Row Total×Column Total)

4. Calculate Chi-Square Statistic:

o Use the formula:

χ2=∑(O−E)2Eχ2=∑E(O−E)2

5. Compare with Critical Value:

o If the calculated chi-square value > critical value, reject H₀.

34. Apply Chi-Square Test to Determine if Data Supports the Study

Step-by-Step Explanation:

1. Given Data:

o One car: 73

o Two cars: 38

o Three or more: 18

o Total: 129

2. Expected Frequencies:

o One car: 129 * 0.60 = 77.4

o Two cars: 129 * 0.28 = 36.12

o Three or more: 129 * 0.12 = 15.48

3. Calculate Chi-Square Statistic:

χ2=(73−77.4)277.4+(38−36.12)236.12+(18−15.48)215.48=0.25+0.10+0.42=0.77χ2=77.4(73−7
7.4)2+36.12(38−36.12)2+15.48(18−15.48)2=0.25+0.10+0.42=0.77

4. Compare with Critical Value (5.99):

o Since 0.77 < 5.99, we fail to reject H₀. The data supports the study.
35. Calculate Covariance for Stocks A and B

Step-by-Step Explanation:

1. Given Data:

o Stock A: 2, 3, 5, 4, 6

o Stock B: 5, 8, 10, 11, 14

2. Calculate Means:

o Mean of A = (2 + 3 + 5 + 4 + 6) / 5 = 4

o Mean of B = (5 + 8 + 10 + 11 + 14) / 5 = 9.6

3. Calculate Covariance:

Covariance=∑(Ai−Aˉ)(Bi−Bˉ)n−1Covariance=n−1∑(Ai−Aˉ)(Bi−Bˉ)

o Covariance = [(2-4)(5-9.6) + (3-4)(8-9.6) + (5-4)(10-9.6) + (4-4)(11-9.6) + (6-4)

(14-9.6)] / 4

o Covariance = [(-2)(-4.6) + (-1)(-1.6) + (1)(0.4) + (0)(1.4) + (2)(4.4)] / 4 = (9.2 +

1.6 + 0.4 + 0 + 8.8) / 4 = 20 / 4 = 5

4. Interpretation:

o Positive covariance indicates that the stocks rise or fall together.

36. What is Dimensionality Reduction? Explain Methods Used for Reducing Dimensionality

Step-by-Step Explanation:

1. Definition:

o Dimensionality reduction reduces the number of features while preserving

important information.

2. Methods:

o PCA (Principal Component Analysis): Transforms data into a lower-

dimensional space.

o LDA (Linear Discriminant Analysis): Maximizes class separability.

o t-SNE (t-Distributed Stochastic Neighbor Embedding): Visualizes high-

dimensional data.

o Feature Selection: Selects the most relevant features.

3. Example:
o Reducing a dataset with 100 features to 10 principal components using PCA.

37. Illustrate Why Data Preprocessing is a Major Step in Data Mining

Step-by-Step Explanation:

1. Data Cleaning:

o Removes noise and inconsistencies.

o Example: Handling missing values.

2. Data Integration:

o Combines data from multiple sources.

o Example: Merging customer data from different databases.

3. Data Transformation:

o Converts data into a suitable format.

o Example: Normalizing numerical data.

4. Data Reduction:

o Reduces data size while preserving important information.

o Example: Using PCA for dimensionality reduction.

5. Impact:

o Improves data quality, leading to more accurate and reliable results.

38. Apply Binning Technique to Remove Noisy Data

Step-by-Step Explanation:

1. Given Data:

o Salaries: 25, 30, 28, 55, 60, 42, 70, 75, 50, 48

2. Binning:

o Divide data into intervals (bins).

o Example: 20-40, 40-60, 60-80.

3. Smoothing:

o Replace values in each bin with the bin mean or median.

o Example:

 Bin 20-40: Mean = (25 + 30 + 28) / 3 = 27.67 → Replace with 28.

 Bin 40-60: Mean = (55 + 60 + 42 + 50 + 48) / 5 = 51 → Replace with 51.

 Bin 60-80: Mean = (70 + 75) / 2 = 72.5 → Replace with 72.5.

4. Result:

o Smoothed data: 28, 28, 28, 51, 51, 51, 72.5, 72.5, 51, 51.

40. Illustrate Similarity, Dissimilarity, and Their Properties

Step-by-Step Explanation:

1. Similarity:

o Measures how alike two objects are.

o Example: Cosine similarity, Jaccard similarity.

o Properties:

 Ranges between 0 and 1.

 Higher values indicate greater similarity.

2. Dissimilarity:

o Measures how different two objects are.

o Example: Euclidean distance, Manhattan distance.

o Properties:

 Ranges from 0 to infinity.

 Lower values indicate greater similarity.

3. Example:

o For two vectors A = (1, 2, 3) and B = (4, 5, 6):

 Cosine Similarity:

Similarity=A⋅B∥A∥∥B∥=3214⋅77≈0.974Similarity=∥A∥∥B∥A⋅B=14⋅7732≈0.974

 Euclidean Distance (Dissimilarity):

Distance=(4−1)2+(5−2)2+(6−3)2=27≈5.196Distance=(4−1)2+(5−2)2+(6−3)2=27≈5.196

41. Define Noisy Data. Explain How Noisy Data Can Be Handled in Data Mining
Step-by-Step Explanation:

1. Noisy Data:

o Data that contains errors, outliers, or inconsistencies.

o Example: Incorrect sensor readings, typos in data entry.

2. Handling Noisy Data:

o Binning: Smooth data by grouping values into bins.

o Regression: Fit a model to smooth out noise.

o Clustering: Group similar data points to identify outliers.

o Filtering: Remove outliers using statistical methods.

o Imputation: Replace missing or noisy values with estimates.

3. Example:

o For a dataset with noisy salary values, use binning to replace values with bin
means.

42. Calculate Cosine Similarity Distance Between d1 and d2 Vectors

Step-by-Step Explanation:

1. Given Vectors:

o d1 = (3, 2, 0, 5, 0, 0, 0, 2, 0, 0)

o d2 = (1, 0, 0, 0, 0, 0, 0, 1, 0, 2)

2. Dot Product (d1 · d2):

(3×1)+(2×0)+(0×0)+(5×0)+(0×0)+(0×0)+(0×0)+(2×1)+(0×0)+(0×2)=3+0+0+0+0+0+0+2+0+0=5(3
×1)+(2×0)+(0×0)+(5×0)+(0×0)+(0×0)+(0×0)+(2×1)+(0×0)+(0×2)=3+0+0+0+0+0+0+2+0+0=5

3. Magnitude of d1 (||d1||):

32+22+02+52+02+02+02+22+02+02=9+4+0+25+0+0+0+4+0+0=42≈6.4832+22+02+52+02+02
+02+22+02+02=9+4+0+25+0+0+0+4+0+0=42≈6.48

4. Magnitude of d2 (||d2||):

12+02+02+02+02+02+02+12+02+22=1+0+0+0+0+0+0+1+0+4=6≈2.4512+02+02+02+02+02+0
2+12+02+22=1+0+0+0+0+0+0+1+0+4=6≈2.45

5. Cosine Similarity:
Similarity=d1⋅d2∥d1∥∥d2∥=56.48×2.45≈515.88≈0.315Similarity=∥d1∥∥d2∥d1⋅d2=6.48×2.455
≈15.885≈0.315

43. Illustrate Why Data Preprocessing is a Major Step in Data Mining

Step-by-Step Explanation:

1. Data Cleaning:

o Removes noise and inconsistencies.

o Example: Handling missing values.

2. Data Integration:

o Combines data from multiple sources.

o Example: Merging customer data from different databases.

3. Data Transformation:

o Converts data into a suitable format.

o Example: Normalizing numerical data.

4. Data Reduction:

o Reduces data size while preserving important information.

o Example: Using PCA for dimensionality reduction.

5. Impact:

o Improves data quality, leading to more accurate and reliable results.

44. Describe Quality Measures of Data Preprocessing

Step-by-Step Explanation:

1. Accuracy:

o Ensures data is correct and free from errors.

o Example: Removing duplicate records.

2. Completeness:

o Ensures all required data is available.

o Example: Filling missing values.

3. Consistency:

o Ensures data is uniform across the dataset.

o Example: Standardizing date formats.

4. Timeliness:

o Ensures data is up-to-date.

o Example: Using real-time data for analysis.

5. Relevance:

o Ensures data is useful for the analysis.

o Example: Removing irrelevant features.

45. List and Explain the Major Tasks in Data Preprocessing

Step-by-Step Explanation:

1. Data Cleaning:

o Handle missing values, noise, and inconsistencies.

o Example: Removing outliers.

2. Data Integration:

o Combine data from multiple sources.

o Example: Merging datasets.

3. Data Transformation:

o Normalize, aggregate, or generalize data.

o Example: Scaling numerical data.

4. Data Reduction:

o Reduce data size while preserving important information.

o Example: Using PCA for dimensionality reduction.

5. Data Discretization:

o Convert continuous data into discrete intervals.

o Example: Binning age into groups.

46. Normalize the Following Group of Data Using Min-Max, Z-Score, and Decimal Scaling

Step-by-Step Explanation:

1. Given Data:

o 200, 300, 400, 600, 1000

2. Min-Max Normalization:

o Formula:

Normalized Value=x−minmax−minNormalized Value=max−minx−min

o Example: For 200,

200−2001000−200=01000−200200−200=0

o Result: [0, 0.125, 0.25, 0.5, 1]

3. Z-Score Normalization:

o Formula:

Normalized Value=x−μσNormalized Value=σx−μ

o Mean (μμ) = 500, Standard Deviation (σσ) = 316.23

o Example: For 200,

200−500316.23≈−0.948316.23200−500≈−0.948

o Result: [-0.948, -0.632, -0.316, 0.316, 1.581]

4. Decimal Scaling:

o Formula:

Normalized Value=x10jNormalized Value=10jx

o j=3j=3 (since max value = 1000)

o Example: For 200,

2001000=0.21000200=0.2

o Result: [0.2, 0.3, 0.4, 0.6, 1.0]

47. Explain Data Cube Aggregation

Step-by-Step Explanation:

1. Definition:
o Summarizes data across multiple dimensions.

o Example: Aggregating sales data by region and time.

2. Operations:

o Roll-up: Aggregates data to a higher level.

o Drill-down: Breaks data into finer details.

o Slice and Dice: Focuses on specific subsets of data.

3. Example:

o A data cube for sales data might include dimensions like time, location, and
product, with measures like total sales and profit.

48. Calculate Covariance for Economic Growth and S&P 500 Returns

Step-by-Step Explanation:

1. Given Data:

Copy

Economic Growth (%) (ai): 2.1, 2.5, 4.0, 3.6

S&P 500 Returns (%) (bi): 8, 12, 14, 10

2. Calculate Means:

o Mean of ai = (2.1 + 2.5 + 4.0 + 3.6) / 4 = 3.05

o Mean of bi = (8 + 12 + 14 + 10) / 4 = 11

3. Calculate Covariance:

Covariance=∑(ai−aˉ)(bi−bˉ)n−1Covariance=n−1∑(ai−aˉ)(bi−bˉ)

o Covariance = [(2.1-3.05)(8-11) + (2.5-3.05)(12-11) + (4.0-3.05)(14-11) + (3.6-

3.05)(10-11)] / 3

o Covariance = [(-0.95)(-3) + (-0.55)(1) + (0.95)(3) + (0.55)(-1)] / 3 = (2.85 - 0.55

+ 2.85 - 0.55) / 3 = 4.6 / 3 ≈ 1.53

4. Interpretation:

o Positive covariance indicates that economic growth and S&P 500 returns tend
to rise or fall together.
49. Explain Data Discretization in Detail, Supervised and Unsupervised Discretization

Step-by-Step Explanation:

1. Definition:

o Converts continuous data into discrete intervals.

2. Supervised Discretization:

o Uses class labels to create intervals.

o Example: Entropy-based discretization.

3. Unsupervised Discretization:

o Creates intervals without class labels.

o Example: Equal-width binning.

4. Example:

o For a dataset with ages, discretize into intervals like 0-20, 20-40, 40-60.

50. Describe Binarization with Example

Step-by-Step Explanation:

1. Definition:

o Converts data into binary form (0 or 1).

2. Example:

o Convert "Yes/No" responses to 1/0.

o Original Data: ["Yes", "No", "Yes"]

o Binarized Data: [1, 0, 1]

51. Explain Linear Relationship Between Variables

Step-by-Step Explanation:

1. Definition:

o A linear relationship exists when two variables change at a constant rate.

2. Example:

o For variables X and Y, if Y = 2X + 3, they have a linear relationship.

3. Visualization:

o A straight line on a scatter plot indicates a linear relationship.

52. Describe Similarity and Dissimilarity in Detail

Step-by-Step Explanation:

1. Similarity:

o Measures how alike two objects are.

o Example: Cosine similarity, Jaccard similarity.

2. Dissimilarity:

o Measures how different two objects are.

o Example: Euclidean distance, Manhattan distance.

3. Properties:

o Similarity ranges between 0 and 1.

o Dissimilarity ranges from 0 to infinity.

4. Example:

o For two vectors A = (1, 2, 3) and B = (4, 5, 6):

 Cosine Similarity: ≈ 0.974

 Euclidean Distance: ≈ 5.196

10-Marker Questions

53. Apply Entropy-Based Discretization on the Given Set S

Step-by-Step Explanation:

1. Given Data:

Copy

S = (16, n), (0, y), (4, y), (12, y), (16, n), (26, n), (18, y), (24, n), (28, n)

2. Split Points:

o Possible split points: 14 and 21.

3. Calculate Entropy for Each Split:

o Entropy Formula:
Entropy(S)=−∑i=1npilog⁡2(pi)Entropy(S)=−i=1∑npilog2(pi)

o For Split Point 14:

 S1: (0, y), (4, y), (12, y) → All "y" → Entropy = 0.

 S2: (16, n), (16, n), (26, n), (18, y), (24, n), (28, n) → 5 "n" and 1 "y" →
Entropy = 0.65.

 Weighted Entropy = (3/9)0 + (6/9)0.65 = 0.43.

o For Split Point 21:

 S1: (0, y), (4, y), (12, y), (16, n), (16, n), (18, y) → 3 "y" and 3 "n" →
Entropy = 1.

 S2: (26, n), (24, n), (28, n) → All "n" → Entropy = 0.

 Weighted Entropy = (6/9)1 + (3/9)0 = 0.67.

4. Choose the Best Split:

o Split point 14 has lower entropy (0.43) and is chosen as the best split.

54. Calculate Minkowski and Euclidean Distance

Step-by-Step Explanation:

1. Given Points:

Copy

p1 = (0, 2), p2 = (2, 0), p3 = (3, 1), p4 = (5, 1)

2. Euclidean Distance:

o Formula:

Distance=(x2−x1)2+(y2−y1)2Distance=(x2−x1)2+(y2−y1)2

o Example: Distance between p1 and p2:

(2−0)2+(0−2)2=4+4=2.83(2−0)2+(0−2)2=4+4=2.83

3. Minkowski Distance:

o Formula:

Distance=(∑i=1n∣xi−yi∣p)1/pDistance=(i=1∑n∣xi−yi∣p)1/p

o For p = 2, it becomes Euclidean distance.

o Example: Distance between p1 and p2 with p = 3:

(∣2−0∣3+∣0−2∣3)1/3=(8+8)1/3=2.52(∣2−0∣3+∣0−2∣3)1/3=(8+8)1/3=2.52

55. Explain Data Reduction Methods in Detail

Step-by-Step Explanation:

1. Dimensionality Reduction:

o Reduces the number of features.

o Methods: PCA (Principal Component Analysis), LDA (Linear Discriminant

Analysis).

2. Numerosity Reduction:

o Reduces the number of data points.

o Methods: Sampling, clustering.

3. Data Compression:

o Reduces data size using encoding techniques.

o Methods: Huffman coding, wavelet transforms.

4. Feature Selection:

o Selects the most relevant features.

o Methods: Correlation analysis, information gain.

5. Data Cube Aggregation:

o Summarizes data across multiple dimensions.

o Example: Aggregating sales data by region and time.

Pmbok 6th Edition Free Download PDF
No ratings yet
Pmbok 6th Edition Free Download PDF
3 pages
QB 2 Marker
No ratings yet
QB 2 Marker
25 pages
Unit 1,2,3
No ratings yet
Unit 1,2,3
35 pages
Data Warehousing & Data Mining Unit-3 Notes
No ratings yet
Data Warehousing & Data Mining Unit-3 Notes
27 pages
Data Mining
No ratings yet
Data Mining
20 pages
Unit 3
No ratings yet
Unit 3
22 pages
Data Mining
No ratings yet
Data Mining
4 pages
Data Mining OVERVIEW
No ratings yet
Data Mining OVERVIEW
8 pages
Lecture 01 11jan
No ratings yet
Lecture 01 11jan
29 pages
DMDW Unit 1 Qna
No ratings yet
DMDW Unit 1 Qna
8 pages
HTCB Unit 2
No ratings yet
HTCB Unit 2
7 pages
Chap 1
No ratings yet
Chap 1
45 pages
Document
No ratings yet
Document
44 pages
Fundamentals of Data Science Notes (Module - 1)
No ratings yet
Fundamentals of Data Science Notes (Module - 1)
19 pages
Unit 1
No ratings yet
Unit 1
7 pages
Datawarehouse&Data Mining - ALL
No ratings yet
Datawarehouse&Data Mining - ALL
46 pages
Data Mining Concept (MMU)
No ratings yet
Data Mining Concept (MMU)
38 pages
PredictiveAnalysis U1 U2
No ratings yet
PredictiveAnalysis U1 U2
7 pages
Data Ming Unit 2
No ratings yet
Data Ming Unit 2
8 pages
Data Science & Big Data Analysis Module 1,2,3,4,5
No ratings yet
Data Science & Big Data Analysis Module 1,2,3,4,5
70 pages
Unit III DWDM
No ratings yet
Unit III DWDM
113 pages
Unit 1
No ratings yet
Unit 1
148 pages
DM & W SQ
No ratings yet
DM & W SQ
15 pages
Chapter - 1
No ratings yet
Chapter - 1
22 pages
Data Mining: Concepts and Techniques: - Chapter 1
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 1
37 pages
Data Mining (Gtu Sem-6) 001
No ratings yet
Data Mining (Gtu Sem-6) 001
2 pages
CS423 Data Warehousing and Data Mining: Dr. Hammad Afzal
No ratings yet
CS423 Data Warehousing and Data Mining: Dr. Hammad Afzal
28 pages
What Is Data Mining: Effective Data Collection Warehousing
No ratings yet
What Is Data Mining: Effective Data Collection Warehousing
21 pages
Ba Unit 3 Own
No ratings yet
Ba Unit 3 Own
7 pages
Introduction To Data Mining-Week1
No ratings yet
Introduction To Data Mining-Week1
43 pages
Introduction
No ratings yet
Introduction
46 pages
Data Mining
No ratings yet
Data Mining
52 pages
DM 1
No ratings yet
DM 1
47 pages
Data Science
No ratings yet
Data Science
11 pages
Data Mining: Concepts and Techniques
100% (2)
Data Mining: Concepts and Techniques
27 pages
MBA Data Mining Unit 1 Notes
No ratings yet
MBA Data Mining Unit 1 Notes
12 pages
Intro of Data Mining
No ratings yet
Intro of Data Mining
27 pages
Introduction
No ratings yet
Introduction
27 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
27 pages
5 Data Mining Proccess and Techniques - Week 7
No ratings yet
5 Data Mining Proccess and Techniques - Week 7
61 pages
DM Introduction
No ratings yet
DM Introduction
32 pages
DWM Unit II
No ratings yet
DWM Unit II
76 pages
Data Mining 1
No ratings yet
Data Mining 1
39 pages
ISS-DSS - Module 3
No ratings yet
ISS-DSS - Module 3
23 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
25 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
46 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
20 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
11 pages
Unit-1 Data Mining
No ratings yet
Unit-1 Data Mining
19 pages
DM Unit 1
No ratings yet
DM Unit 1
10 pages
Unit 4 Intro DM
No ratings yet
Unit 4 Intro DM
30 pages
Lecture 1.1.1 1.1.2
No ratings yet
Lecture 1.1.1 1.1.2
32 pages
01 Intro
No ratings yet
01 Intro
23 pages
Data Mining:: Concepts and Techniques
No ratings yet
Data Mining:: Concepts and Techniques
28 pages
Data Visualization Using Business Intelligence (MDS204) : Arti Yadav Einfach Bussiness Analytics PVT LTD
No ratings yet
Data Visualization Using Business Intelligence (MDS204) : Arti Yadav Einfach Bussiness Analytics PVT LTD
60 pages
Data Mining: Concepts and Techniques: - Chapter 1 - Introduction
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 1 - Introduction
53 pages
Knowledge Management UNIT-3 Notes
No ratings yet
Knowledge Management UNIT-3 Notes
17 pages
Data Mining Mids
No ratings yet
Data Mining Mids
24 pages
DM - Unit I-Updated
No ratings yet
DM - Unit I-Updated
65 pages
Unit1 - Intoduction To Data Mining
No ratings yet
Unit1 - Intoduction To Data Mining
10 pages
Data Analytics with Generative AI
From Everand
Data Analytics with Generative AI
Younish P
No ratings yet
Equipment Ready
No ratings yet
Equipment Ready
2 pages
Lumbini Bikas Bank
No ratings yet
Lumbini Bikas Bank
14 pages
Bess White-Paper Explosion-Protection Final
No ratings yet
Bess White-Paper Explosion-Protection Final
2 pages
KT-60 Introduction V1.0 20241114
No ratings yet
KT-60 Introduction V1.0 20241114
24 pages
Icom IC-T90A Instruction Manual
100% (1)
Icom IC-T90A Instruction Manual
100 pages
C++ - Short-Notes
No ratings yet
C++ - Short-Notes
73 pages
Working Principle of Flash Welding
No ratings yet
Working Principle of Flash Welding
3 pages
ECON1000 Introductory Economics Trimester 2 2023 Dubai Intern'l Academic City INT
No ratings yet
ECON1000 Introductory Economics Trimester 2 2023 Dubai Intern'l Academic City INT
13 pages
Bsbops501 Task 2
No ratings yet
Bsbops501 Task 2
6 pages
EBSCO-FullText-03 03 2025
No ratings yet
EBSCO-FullText-03 03 2025
12 pages
Forms of Quadratic Function
No ratings yet
Forms of Quadratic Function
2 pages
2018 Grade 11 Mathematics Third Term Test Paper Sabaragamuwa Province
No ratings yet
2018 Grade 11 Mathematics Third Term Test Paper Sabaragamuwa Province
12 pages
EE102 Lab 4
No ratings yet
EE102 Lab 4
10 pages
AGRU - FM 1613 Approved HDPE Pipes Fittings
No ratings yet
AGRU - FM 1613 Approved HDPE Pipes Fittings
64 pages
Oscorp Style Guide: Logos
No ratings yet
Oscorp Style Guide: Logos
2 pages
Austin Data Center Project Feasibility Study TACC SECO Final Feasibility Report CM1001
100% (2)
Austin Data Center Project Feasibility Study TACC SECO Final Feasibility Report CM1001
31 pages
Penna Cement Industries LTD: at Telangana Submitted To: Submitted by
No ratings yet
Penna Cement Industries LTD: at Telangana Submitted To: Submitted by
36 pages
Automation Services Work Portfolio in Telecom Industry: Open Source. Cloud. Automation
No ratings yet
Automation Services Work Portfolio in Telecom Industry: Open Source. Cloud. Automation
34 pages
Spm-Unit Ii
No ratings yet
Spm-Unit Ii
84 pages
Parts Manual SK750 - SK755 (053-2566)
No ratings yet
Parts Manual SK750 - SK755 (053-2566)
207 pages
Dynamic Difficulty Adjustment Via Fast User Adaptation
No ratings yet
Dynamic Difficulty Adjustment Via Fast User Adaptation
3 pages
National Progression Sheet For Form 1
100% (1)
National Progression Sheet For Form 1
8 pages
Stationary Waves
No ratings yet
Stationary Waves
3 pages
s600 User Manual v2
No ratings yet
s600 User Manual v2
83 pages
Re: Confidential: Re: Resignation Notice Due To Multiple Family Issues - 002CVM744 - Shaik Ahmad/India/IBM
No ratings yet
Re: Confidential: Re: Resignation Notice Due To Multiple Family Issues - 002CVM744 - Shaik Ahmad/India/IBM
5 pages
Dachs - Bedien Einstellanleitung MSR2 - TULKOTS
No ratings yet
Dachs - Bedien Einstellanleitung MSR2 - TULKOTS
108 pages
38942089968
No ratings yet
38942089968
2 pages
Dna Mica Desist em As
No ratings yet
Dna Mica Desist em As
535 pages
Ie4-1le7 Simotics Motor Brochure - 06.24
No ratings yet
Ie4-1le7 Simotics Motor Brochure - 06.24
4 pages