0% found this document useful (0 votes)
64 views8 pages

Types of Attributes-1

The document discusses the classification of attributes into qualitative (nominal, ordinal, binary) and quantitative (numeric, discrete, continuous) types. It also outlines the KDD (Knowledge Discovery in Databases) process in data mining, detailing steps such as data cleaning, integration, selection, transformation, mining, pattern evaluation, and knowledge presentation. Additionally, it highlights the advantages and disadvantages of KDD, as well as various data mining functionalities including descriptive and predictive data mining techniques.

Uploaded by

shaikilyas0579
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views8 pages

Types of Attributes-1

The document discusses the classification of attributes into qualitative (nominal, ordinal, binary) and quantitative (numeric, discrete, continuous) types. It also outlines the KDD (Knowledge Discovery in Databases) process in data mining, detailing steps such as data cleaning, integration, selection, transformation, mining, pattern evaluation, and knowledge presentation. Additionally, it highlights the advantages and disadvantages of KDD, as well as various data mining functionalities including descriptive and predictive data mining techniques.

Uploaded by

shaikilyas0579
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Types of attributes:

Attributes can be broadly classified into two main types:


1. Qualitative (Nominal (N), Ordinal (O), Binary(B)).
2. Quantitative (Numeric, Discrete, Continuous)

Qualitative Attributes:
1. Nominal Attributes :
Nominal attributes, as related to names, refer to categorical data where the
values represent different categories or labels without any inherent order or
ranking. These attributes are often used to represent names or labels
associated with objects, entities, or concepts.
Example :

3. Binary Attributes: Binary attributes are a type of qualitative attribute


where the data can take on only two distinct values or states. These
attributes are often used to represent yes/no, presence/absence, or
true/false conditions within a dataset.
Symmetric: In a symmetric attribute, both values or states are considered
equally important or interchangeable. For example, in the attribute “Gender”
with values “Male” and “Female.
Asymmetric: An asymmetric attribute indicates that the two values or states
are not equally important or interchangeable. For instance, in the attribute
“Result” with values “Pass” and “Fail.

Ordinal Attributes : Ordinal attributes are a type of qualitative attribute where


the values possess a meaningful order or ranking, but the magnitude between
values is not precisely quantified.

Quantitative Attributes:
Numeric: A numeric attribute is quantitative because, it is a measurable
quantity, represented in integer or real values. Numerical attributes are of
2 types: interval , and ratio-scaled.
An interval-scaled attribute has values, whose differences are interpretable,
but the numerical attributes do not have the correct reference point, or we can
call zero points. Data can be added and subtracted at an interval scale but can
not be multiplied or divided.
example of temperature in degrees Centigrade. If a day’s temperature of one
day is twice of the other day we cannot say that one day is twice as hot as
another day.
A ratio-scaled attribute is a numeric attribute with a fix zero-point. If a
measurement is ratio-scaled, we can say of a value as being a multiple (or ratio)
of another value.
Discrete : Discrete data refer to information that can take on specific, separate
values rather than a continuous range. These values are often distinct and
separate from one another, and they can be either numerical or categorical in
nature.

Continuous : Continuous data, unlike discrete data, can take on an infinite


number of possible values within a given range. It is characterized by being able
to assume any value within a specified interval, often including fractional or
decimal values.

KDD in Data Mining


What is KDD (Knowledge Discovery in Databases) in Data Mining?
KDD is a method of extracting relevant, unknown, and useful data from massive
databases(also known as big data). It is useful for researchers in various fields,
such as machine learning, artificial intelligence, pattern recognition, data
visualisation etc
Knowledge Discovery in Databases (KDD) refers to the process of extracting
useful insights, patterns, and knowledge from large volumes of data.
KDD Process
KDD is an iterative method and extracts valuable data after numerous repetitions
of the processes.KDD involves several steps, each advancing the goal of
extracting useful information from data. These steps are as follows:
• Data-cleaning
• Data-Integration
• Data-selection
• Data-transformation
• Datamining
• Pattern-evaluation
• Knowledge presentation

Data Cleaning : Data cleaning ensures the data is high quality and appropriate
for analysis. it is the process of locating and fixing errors in a dataset. Data
cleaning is crucial to fix missing and noisy values of real-world data that can
negatively affect the system’s accuracy. It helps in improving overall data
quality.
Data Integration: The process of merging data from various sources into a single,
complete view is known as data integration. Real-world data is frequently
distributed over numerous databases, servers and files, making it challenging to
analyse and derive valid data without integrating them. Data integration offers
the framework for efficient analysis and knowledge discovery throughout the
KDD process. It helps data scientists draw in-depth results from various
distributed data sources.
Data-Selection:One of the primary steps in the KDD process is data selection. It
is described as selecting the proper data source, kind, and instruments to gather
the data. It prepares the foundation for the KDD process's further data
transformation, mining, and knowledge presentation processes.
Data-Transformation: Data transformation entails converting and altering the
data to make the original data acceptable for analysis and knowledge discovery.
It transforms raw data into a form that may be used for modelling and analysis.
Often very project-specific, this step can be crucial for the overall KDD project's
success.
Data Mining:To find patterns and associations that can be used to solve
problems through data analysis, vast data sets are sorted using a process called
data mining. Data mining is commonly used in several fields for consumer
segmentation, fraud detection, market analysis in business and marketing, and
disease evaluation and diagnosis in healthcare.
Pattern Evaluation
Pattern evaluation is the process of finding strictly increasing patterns that
indicate knowledge based on specific metrics. Not every pattern exists equally;
some patterns might be useless, and others might be highly valuable and
informative. Methods of pattern evaluation play a part in such kind of situation
Knowledge Presentation
It is the final step of the KDD process. When knowledge is presented to a user
visually through tables, graphs, charts, trees, matrices, etc., it is known as
knowledge representation. It is used to facilitate well-informed decision-making
and problem-solving. The main objective of knowledge presentation is to explain
the insights and conclusions produced through data mining clearly.
Advantages of KDD in Data Mining
• Some of the advantages of KDD are as follows:
• KDD helps in data-driven decision-making.
• It is also used for pattern recognition and fraud detection systems.
• It improves the performance of firms and organisations.
Disadvantages of KDD in Data Mining
• Some of the disadvantages of KDD are as follows:
• KDD is a complex process.
• It heavily depends on the quality of the data. So, data quality maintenance
is required for the KDD process.
• Analysing large amounts of data can raise security and privacy issues.
• Overfitting data in the KDD process can decrease the system's
performance.
Functionalities of Data Mining
Data mining tasks are designed to be semi-automatic or fully automatic and on
large data sets to uncover patterns such as groups or clusters, unusual or over
the top data called anomaly detection and dependencies such as association and
sequential pattern. Once patterns are uncovered, they can be thought of as a
summary of the input data, and further analysis may be carried out using
Machine Learning and Predictive analytics.
Descriptive Data Mining: It includes certain knowledge to understand what is
happening within the data without a previous idea. The common data features
are highlighted in the data set. For example, count, average etc.
I. Data Characterization: This refers to the summary of general
characteristics or features of the class, resulting in specific rules that
define a target class. A data analysis technique called Attribute-oriented
Induction is employed on the data set for achieving characterization.
Example: To study the characteristics of software products with sales
increased by 10% in the previous years.
Data Discrimination: Discrimination is used to separate distinct data sets
based on the disparity in attribute values. It compares features of a class
with features of one or more contrasting classes.g., bar charts, curves and
pie charts.
II. Mining Frequent Patterns:One of the functions of data mining is finding
data patterns. Frequent patterns are things that are discovered to be most
common in data. Various types of frequency can be found in the dataset.
• Frequent item set:This term refers to a group of items that are
commonly found together, such as milk and sugar.
• Frequent substructure: It refers to the various types of data
structures that can be combined with an item set or subsequences,
such as trees and graphs.
• Frequent Subsequence: A regular pattern series, such as buying a
phone followed by a cover.
Association Analysis: The process involves uncovering the relationship
between data and deciding the rules of the association. It is a way of
discovering the relationship between various items.

Correlation Analysis: Correlation is a mathematical technique that can show


whether and how strongly the pairs of attributes are related to each other. For
example, Highted people tend to have more weight.
Predictive Data Mining: It helps developers to provide unlabeled definitions of
attributes. With previously available or historical data, data mining can be used
to make predictions about critical business metrics based on data's linearity.
I. Classification
Classification is a data mining technique that categorizes items in a
collection based on some predefined properties. It uses methods like if-
then, decision trees or neural networks to predict a class or essentially
classify a collection of items. A training set containing items whose
properties are known is used to train the system to predict the category
of items from an unknown collection of items.
II. Prediction
It defines predict some unavailable data values or spending trends. An
object can be anticipated based on the attribute values of the object and
attribute values of the classes. It can be a prediction of missing numerical
values or increase or decrease trends in time-related information. There
are primarily two types of predictions in data mining: numeric and class
predictions.
III. Numeric predictions are made by creating a linear regression model that
is based on historical data. Prediction of numeric values helps businesses
ramp up for a future event that might impact the business positively or
negatively.
• Class predictions are used to fill in missing class information for
products using a training data set where the class for products is
known.
• Cluster Analysis
In image processing, pattern recognition and bioinformatics,
clustering is a popular data mining functionality. It is similar to
classification, but the classes are not predefined. Data attributes
represent the classes.
IV. Outlier Analysis
outlier analysis is important to understand the quality of data. If there are
too many outliers, you cannot trust the data or draw patterns. An outlier
analysis determines if there is something out of turn in the data and
whether it indicates a situation that a business needs to consider and take
measures to mitigate.
V. Evolution and Deviation Analysis
Evolution Analysis pertains to the study of data sets that change over time.
Evolution analysis models are designed to capture evolutionary trends in
data helping to characterize, classify, cluster or discriminate time-related
data.

You might also like