Data Mining
Data Mining
Ans:-
1. Data Quality:
Poor data quality, such as incomplete, noisy, or inconsistent data, can lead to
inaccurate analysis. Proper data cleaning and preprocessing are essential to
ensure reliable results.
5. Interpretation of Results:
Some data mining models, especially complex ones, can be hard to interpret.
Ensuring the results are understandable to non-experts is important for gaining
meaningful insights.
6. Overfitting:
Overfitting occurs when models perform well on training data but poorly on new
data. Proper validation techniques are needed to ensure models generalize
effectively.
2. Fraud Detection:
Detects unusual patterns and anomalies in financial transactions, enabling the
identification of fraudulent activities in banking and insurance sectors.
4. Healthcare:
Analyzes patient data to predict disease outbreaks, improve treatment plans,
and enhance healthcare decision-making for better outcomes.
5. Risk Management:
Helps financial institutions assess and predict risks in lending and investment,
improving decision-making for minimizing losses.
6. Web Mining:
Extracts useful information from web data, such as user behavior and trends, to
improve website structure, content personalization, and targeted advertising.
1. Data Selection:
The relevant data is selected from large datasets based on the goals of the
analysis, ensuring that the data is suitable for mining.
2. Data Preprocessing:
Involves cleaning the data by handling missing values, removing noise, and
resolving inconsistencies to improve data quality.
3. Data Transformation:
Data is transformed into an appropriate format or structure through
normalization, aggregation, or generalization for effective mining.
4. Data Mining:
The core step where intelligent methods and algorithms are applied to discover
patterns, trends, or knowledge from the data.
5. Pattern Evaluation:
Extracted patterns are evaluated and interpreted to identify meaningful insights
and ensure their relevance to the given problem.
6. Knowledge Representation:
The discovered knowledge is presented in a user-friendly manner, often through
visualization, reports, or other interpretation techniques for decision-making.
1. Data Cleaning:
Remove noise and handle missing data to improve data quality.
2. Data Integration:
Combine data from multiple sources into a coherent dataset.
3. Data Selection:
Select the relevant data for analysis from the larger dataset.
4. Data Transformation:
Convert the data into a suitable format for mining, such as normalization or
aggregation.
5. Data Mining:
Apply algorithms and techniques to extract patterns or knowledge from the
data.
6. Pattern Evaluation:
Evaluate the mined patterns to identify meaningful and useful insights.
7. Knowledge Representation:
Present the discovered knowledge using visualization or other techniques for
easy understanding.
- K-Means Clustering
- Hierarchical Clustering
- Decision Trees
- DBSCAN (Density-Based Spatial
- Random Forest
Clustering of Applications with
Common - Support Vector Machines
Noise)
Algorithms (SVM)
- Principal Component Analysis
- Neural Networks
(PCA)
- Naive Bayes
- t-SNE (t-Distributed Stochastic
Neighbor Embedding)
15.Define Skewness.
Ans:-
Skewness
Skewness measures the asymmetry of a dataset's distribution, indicating how
much and in which direction the distribution deviates from symmetry.
Types of Skewness
1. Positive Skewness (Right Skew):
The right tail of the distribution is longer, indicating that most data points
are concentrated on the left with a few higher values. An example is income
distribution, where a small number of individuals have significantly higher
incomes.
2. Negative Skewness (Left Skew):
The left tail is longer, showing that most data points are concentrated on
the right with a few lower values. An example can be exam scores where
most students score high but a few score very low.
3. Zero Skewness:
Indicates a perfectly symmetrical distribution with equal tails on both sides,
as seen in a normal distribution.
Measurement of Skewness
Skewness can be calculated using formulas such as Pearson's first and second
coefficients or through statistical software that provides skewness values for
datasets.