DM
DM
insights. Here are the primary types of data involved in data mining:
1. Structured Data
Definition: Structured data refers to data that is highly organized and stored in a tabular format (such as
spreadsheets or databases). It is typically easy to search, query, and analyze.
Examples:
CSV files
Types:
Numeric data: Data represented by numbers (e.g., sales figures, temperature readings).
Categorical data: Data represented by categories (e.g., gender, region, product type).
2. Unstructured Data
Definition: Unstructured data lacks a predefined structure and can be difficult to store, process, or
analyze using traditional data models.
Examples:
Types:
Textual data: Natural language data that requires techniques like text mining, sentiment analysis, and
natural language processing (NLP).
Multimedia data: Data that includes video and audio formats requiring specialized algorithms (e.g.,
image recognition, speech-to-text).
3. Semi-structured Data
Definition: Semi-structured data is a blend of structured and unstructured data. It does not follow a rigid
structure like relational databases but still contains tags or markers to separate elements, making it
easier to analyze than fully unstructured data.
Examples:
XML files
JSON files
Types:
XML-based data: Data stored in XML format, often used for transmitting data between systems.
JSON-based data: Common format for data exchange in web services, especially in APIs.
4. Time-Series Data
Definition: Time-series data consists of observations taken at regular intervals over time, allowing for
trend analysis, forecasting, and anomaly detection.
Examples:
Types:
Temporal data: Data points that are time-stamped and can be analyzed for trends, seasonality, and
forecasting.
5. Spatial Data
Definition: Spatial data, also known as geospatial data, is related to the location and shape of objects in
space, typically used in geographical and mapping applications.
Examples:
GPS coordinates
Vector data: Data represented by points, lines, and polygons (e.g., boundaries of a city).
6. Transactional Data
Definition: Transactional data captures the details of transactions, often involving the exchange of goods,
services, or information.
Examples:
Types:
Itemset data: Describes items bought together in retail data mining (e.g., market basket analysis).
7. Categorical Data
Definition: Categorical data refers to variables that can take on a limited number of distinct values, often
representing different categories or groups.
Examples:
Types:
Ordinal data: Categories with a natural order (e.g., education level: High School, Bachelor's, Master's,
Doctorate).
8. Relational Data
Definition: Relational data is stored in a structured way in relational databases, with tables that
represent different entities and relationships between them.
Examples:
Types:
One-to-many: A single record in one table corresponds to many records in another table.
9. Graph Data
Definition: Graph data consists of nodes and edges, used to represent relationships between entities. It
is used to model networks, social connections, or hierarchical structures.
Examples:
Types:
Directed graphs: Graphs where the edges have a direction (e.g., web pages with links pointing to each
other).
Definition: Big data refers to massive datasets that are too large or complex to be processed by
traditional data processing methods.
Examples:
Types: