Types of Attributes-1
Types of Attributes-1
Qualitative Attributes:
1. Nominal Attributes :
Nominal attributes, as related to names, refer to categorical data where the
values represent different categories or labels without any inherent order or
ranking. These attributes are often used to represent names or labels
associated with objects, entities, or concepts.
Example :
Quantitative Attributes:
Numeric: A numeric attribute is quantitative because, it is a measurable
quantity, represented in integer or real values. Numerical attributes are of
2 types: interval , and ratio-scaled.
An interval-scaled attribute has values, whose differences are interpretable,
but the numerical attributes do not have the correct reference point, or we can
call zero points. Data can be added and subtracted at an interval scale but can
not be multiplied or divided.
example of temperature in degrees Centigrade. If a day’s temperature of one
day is twice of the other day we cannot say that one day is twice as hot as
another day.
A ratio-scaled attribute is a numeric attribute with a fix zero-point. If a
measurement is ratio-scaled, we can say of a value as being a multiple (or ratio)
of another value.
Discrete : Discrete data refer to information that can take on specific, separate
values rather than a continuous range. These values are often distinct and
separate from one another, and they can be either numerical or categorical in
nature.
Data Cleaning : Data cleaning ensures the data is high quality and appropriate
for analysis. it is the process of locating and fixing errors in a dataset. Data
cleaning is crucial to fix missing and noisy values of real-world data that can
negatively affect the system’s accuracy. It helps in improving overall data
quality.
Data Integration: The process of merging data from various sources into a single,
complete view is known as data integration. Real-world data is frequently
distributed over numerous databases, servers and files, making it challenging to
analyse and derive valid data without integrating them. Data integration offers
the framework for efficient analysis and knowledge discovery throughout the
KDD process. It helps data scientists draw in-depth results from various
distributed data sources.
Data-Selection:One of the primary steps in the KDD process is data selection. It
is described as selecting the proper data source, kind, and instruments to gather
the data. It prepares the foundation for the KDD process's further data
transformation, mining, and knowledge presentation processes.
Data-Transformation: Data transformation entails converting and altering the
data to make the original data acceptable for analysis and knowledge discovery.
It transforms raw data into a form that may be used for modelling and analysis.
Often very project-specific, this step can be crucial for the overall KDD project's
success.
Data Mining:To find patterns and associations that can be used to solve
problems through data analysis, vast data sets are sorted using a process called
data mining. Data mining is commonly used in several fields for consumer
segmentation, fraud detection, market analysis in business and marketing, and
disease evaluation and diagnosis in healthcare.
Pattern Evaluation
Pattern evaluation is the process of finding strictly increasing patterns that
indicate knowledge based on specific metrics. Not every pattern exists equally;
some patterns might be useless, and others might be highly valuable and
informative. Methods of pattern evaluation play a part in such kind of situation
Knowledge Presentation
It is the final step of the KDD process. When knowledge is presented to a user
visually through tables, graphs, charts, trees, matrices, etc., it is known as
knowledge representation. It is used to facilitate well-informed decision-making
and problem-solving. The main objective of knowledge presentation is to explain
the insights and conclusions produced through data mining clearly.
Advantages of KDD in Data Mining
• Some of the advantages of KDD are as follows:
• KDD helps in data-driven decision-making.
• It is also used for pattern recognition and fraud detection systems.
• It improves the performance of firms and organisations.
Disadvantages of KDD in Data Mining
• Some of the disadvantages of KDD are as follows:
• KDD is a complex process.
• It heavily depends on the quality of the data. So, data quality maintenance
is required for the KDD process.
• Analysing large amounts of data can raise security and privacy issues.
• Overfitting data in the KDD process can decrease the system's
performance.
Functionalities of Data Mining
Data mining tasks are designed to be semi-automatic or fully automatic and on
large data sets to uncover patterns such as groups or clusters, unusual or over
the top data called anomaly detection and dependencies such as association and
sequential pattern. Once patterns are uncovered, they can be thought of as a
summary of the input data, and further analysis may be carried out using
Machine Learning and Predictive analytics.
Descriptive Data Mining: It includes certain knowledge to understand what is
happening within the data without a previous idea. The common data features
are highlighted in the data set. For example, count, average etc.
I. Data Characterization: This refers to the summary of general
characteristics or features of the class, resulting in specific rules that
define a target class. A data analysis technique called Attribute-oriented
Induction is employed on the data set for achieving characterization.
Example: To study the characteristics of software products with sales
increased by 10% in the previous years.
Data Discrimination: Discrimination is used to separate distinct data sets
based on the disparity in attribute values. It compares features of a class
with features of one or more contrasting classes.g., bar charts, curves and
pie charts.
II. Mining Frequent Patterns:One of the functions of data mining is finding
data patterns. Frequent patterns are things that are discovered to be most
common in data. Various types of frequency can be found in the dataset.
• Frequent item set:This term refers to a group of items that are
commonly found together, such as milk and sugar.
• Frequent substructure: It refers to the various types of data
structures that can be combined with an item set or subsequences,
such as trees and graphs.
• Frequent Subsequence: A regular pattern series, such as buying a
phone followed by a cover.
Association Analysis: The process involves uncovering the relationship
between data and deciding the rules of the association. It is a way of
discovering the relationship between various items.