Unit V 1
Unit V 1
Complex data objects allow you to create data structures that group together different
types of data. Complex data objects are based on complex data types. Complex data types
allow you to create data structures based on basic data objects.
For example, you can create a complex data object called employee that contains different
types of data for employee like id, name, and age. The relationship between complex data
types and complex data objects is analogous to the relationship between classes and
instances in the Java programming language.
Figure 13-1 shows the relationship between basic and complex data objects.
Description of “Figure 13-1 Relationship Between Basic and Complex Data Objects”
Before creating a complex data object, you must first define the complex data type that
defines the data structure. For more information about using complex data types, see
Using Complex Data Types to Define Data Structures.
Spatial databases
Spatial databases in data mining handle geographical or spatial information, enabling the
analysis of data with location-based attributes. These databases store and manage spatial
data, allowing for efficient retrieval and processing of information related to specific
locations. In data mining, spatial databases play a crucial role in tasks like spatial
clustering, spatial association rule mining, and spatial outlier detection, contributing to
insights derived from spatial relationships within the data.
Tools for spatial databases in data mining include:
PostGIS: An extension for PostgreSQL that adds support for spatial features.
Oracle Spatial: Part of Oracle Database, it provides spatial data management capabilities.
ESRI ArcGIS: A comprehensive suite of tools for mapping and spatial analysis.
Spatial Data Warehouses: Store large volumes of spatial data for efficient querying and
analysis.
Spatial Indexing Structures: Enhance retrieval speed by organizing spatial data in a way that
supports fast queries.
Spatial Data Mining Algorithms: Specialized algorithms for extracting patterns and
knowledge from spatial data.
Geographic Information System (GIS) Databases: Integrated systems for capturing, storing,
managing, and analyzing spatial or geographic data.
These tools and types facilitate the effective use of spatial information in data mining tasks.
Valid Time Databases: Store the time period during which a fact is considered true.
Transaction Time Databases: Track the time period during which a particular version of a
fact is valid.
Bitemporal Databases: Combine both valid time and transaction time aspects.
Oracle Temporal Database Features: Oracle provides features like Temporal Validity and
Flashback Query for managing temporal data.
Microsoft SQL Server Temporal Tables: SQL Server offers temporal tables to track historical
data changes effectively.
PostgreSQL Temporal Tables: PostgreSQL has capabilities for temporal tables, enabling the
storage of historical data.
IBM Db2 Temporal Tables: Db2 supports temporal tables, allowing users to manage
historical data with ease.
Teradata Temporal Database: Teradata provides temporal database support for managing
time-varying data.
Temporal Query Languages: SQL extensions or temporal query languages like TSQL2 are
used for querying temporal databases effectively.
These tools and databases enable data miners to explore time-centric patterns, trends,
and anomalies in datasets, contributing to a more comprehensive analysis of temporal
aspects in data mining tasks.
TensorFlow and PyTorch: Popular for deep learning applications, these frameworks are
valuable for tasks like image and speech recognition.
OpenCV: A library for computer vision tasks, OpenCV is instrumental in multimedia data
processing and analysis.
Image Databases: Store and retrieve images, often employing techniques like feature
extraction and content-based image retrieval (CBIR).
Video Databases: Manage and analyze video data, incorporating methods for shot
detection, object tracking, and action recognition.
Audio Databases: Deal with audio content, using techniques such as signal processing and
pattern recognition for tasks like speech recognition and music analysis.
Text and Image Databases: Combine text and image data for comprehensive analysis,
common in applications like multimedia content summarization.
These tools and database types help leverage multimedia data for knowledge discovery
through data mining techniques.
TIME SERIES
Time series represents a collection of values or data obtained from the logical order of
measurement over time. Time series data mining makes our natural ability to visualize the
shape of real-time data. It is an ordered sequence of data points at uniform time intervals.
Time Series Analysis comprises methods for analyzing time-series data in order to extract
meaningful statistics, rules and patterns. These rules and patterns might be used to build
forecasting models that are able to predict future developments.
Time Series database consists of a sequence of values or events changing with time. Data
are recorded at regular intervals.
1. Financial:
2. Industry:
3. Scientific:
4. Meteorological:
4.1 Concerned with the processes and phenomena of the atmosphere, basically for
forecasting weather
1. Trend
2. Cycle
3.Seasonal
4. Irregular
The general direction in which a time series is moving over a long interval of time. It shows
the general tendency of the data to increase or decrease a long period of time.
Long term oscillations about a trend line or curve. For example, business cycles. This
oscillatory movement has a period of oscillation of more than a year.
Almost identical patterns that a time series appears to follow during corresponding months
of successive years. This variation will be present in a time series if the data are recorded
hourly, daily, weekly or monthly.
These fluctuations are unforeseen, uncontrollable and unpredictable. They are not regular
variations and are purely random or irregular.
Sequence Data in Data Mining is defined as data in which the points in the dataset are
reliant on the other points in the dataset. A Timeseries, such as a stock price or sensor
data, is an example of this, where each point represents an observation at a specific point
in time.
By using text mining, the unstructured text data can be transformed into structured data
that can be used for data mining tasks such as classification, clustering, and association
rule mining. This allows organizations to gain insights from a wide range of data sources,
such as customer feedback, social media posts, and news articles.
Text mining is widely used in various fields, such as natural language processing,
information retrieval, and social media analysis. It has become an essential tool for
organizations to extract insights from unstructured text data and make data-driven
decisions.
Text mining is a process of extracting useful information and nontrivial patterns from a large
volume of text databases. There exist various strategies and devices to mine the text and
find important data for the prediction and decision-making process. The selection of the
right and accurate text mining procedure helps to enhance the speed and the time
complexity also. This article briefly discusses and analyzes text mining and its applications
in diverse fields.
Pre-processing and data cleansing tasks are performed to distinguish and eliminate
inconsistency in the data. The data cleansing process makes sure to capture the genuine
text, and it is performed to eliminate stop words stemming (the process of identifying the
root of a certain word and indexing the data.
Processing and controlling tasks are applied to review and further clean the data set.
Information processed in the above steps is utilized to extract important and applicable
data for a powerful and convenient decision-making process and trend analysis.
Procedures for Analyzing Text Mining
Text Summarization: To extract its partial content and reflect its whole content
automatically.
Text Categorization: To assign a category to the text among categories predefined by users.
Text Clustering: To segment texts into several clusters, depending on the substantial
relevance.
Information Retrieval
In the process of Information retrieval, we try to process the available documents and the
text data into a structured form so, that we can apply different pattern recognition and
analytical processes. It is a process of extracting relevant and associated patterns
according to a given set of words or text documents. For this, we have processes like
Tokenization of the document or the stemming process in which we try to extract the base
word or let’s say the root word present there.
Information Extraction
Feature Extraction – In this process, we try to develop some new features from existing
ones. This objective can be achieved by parsing an existing feature or combining two or
more features based on some mathematical operation.
Feature Selection – In this process, we try to reduce the dimensionality of the dataset
which is generally a common issue while dealing with the text data by selecting a subset of
features from the whole dataset.
Graph Clustering:
Grouping nodes or subgraphs based on similarity, revealing communities or clusters within
the graph.
Anomaly Detection:
Graph Classification:
NetworkX:
A Python library for creating, analyzing, and visualizing complex networks and graphs.
Gephi:
An open-source software for exploring and visualizing networks, offering various algorithms
for graph analysis.
Igraph:
A library for creating, manipulating, and analyzing large-scale graphs in languages like R,
Python, and C.
Neo4j:
A graph database that allows for efficient storage and querying of graph-structured data.
Cytoscape:
A platform for visualizing molecular interaction networks and integrating with various data
sources for analysis.
Graph-tool:
These tools cater to different aspects of graph mining, providing functionalities ranging
from basic analysis to advanced mining techniques.
Web Mining
Web Mining is the process of Data Mining techniques to automatically discover and extract
information from Web documents and services. The main purpose of web mining is
discovering useful information from the World-Wide Web and its usage patterns.
Web mining is the process of discovering patterns, structures, and relationships in web
data. It involves using data mining techniques to analyze web data and extract valuable
insights. The applications of web mining are wide-ranging and include:
Personalized marketing:
Web mining can be used to analyze customer behavior on websites and social media
platforms. This information can be used to create personalized marketing campaigns that
target customers based on their interests and preferences.
E-commerce
Web mining can be used to analyze customer behavior on e-commerce websites. This
information can be used to improve the user experience and increase sales by
recommending products based on customer preferences.
Web mining can be used to analyze search engine queries and search engine results pages
(SERPs). This information can be used to improve the visibility of websites in search engine
results and increase traffic to the website.
Fraud detection:
Web mining can be used to detect fraudulent activity on websites. This information can be
used to prevent financial fraud, identity theft, and other types of online fraud.
Sentiment analysis:
Web mining can be used to analyze social media data and extract sentiment from posts,
comments, and reviews. This information can be used to understand customer sentiment
towards products and services and make informed business decisions.
Web mining can be used to analyze web content and extract valuable information such as
keywords, topics, and themes. This information can be used to improve the relevance of
web content and optimize search engine rankings.
Customer service:
Web mining can be used to analyze customer service interactions on websites and social
media platforms. This information can be used to improve the quality of customer service
and identify areas for improvement.
Healthcare:
Web mining can be used to analyze health-related websites and extract valuable
information about diseases, treatments, and medications. This information can be used to
improve the quality of healthcare and inform medical research.
Web mining can be broadly divided into three different types of techniques of mining: Web
Content Mining, Web Structure Mining, and Web Usage Mining. These are explained as
following below.
Web Content Mining: Web content mining is the application of extracting useful
information from the content of the web documents. Web content consist of several types
of data – text, image, audio, video etc. Content data is the group of facts that a web page is
designed. It can provide effective and interesting patterns about user needs. Text
documents are related to text mining, machine learning and natural language processing.
This mining is also known as text mining. This type of mining performs scanning and mining
of the text, images and groups of web pages according to the content of the input.
Web Structure Mining: Web structure mining is the application of discovering structure
information from the web. The structure of the web graph consists of web pages as nodes,
and hyperlinks as edges connecting related pages. Structure mining basically shows the
structured summary of a particular website. It identifies relationship between web pages
linked by information or direct link connection. To determine the connection between two
commercial websites, Web structure mining can be very useful.
Web Usage Mining: Web usage mining is the application of identifying or discovering
interesting usage patterns from large data sets. And these patterns enable you to
understand the user behaviors or something like that. In web usage mining, user access
data on the web and collect data in form of logs. So, Web usage mining is also called log
mining.
Application and trends in data Mining
Retail Industry
Data Mining has its great application in Retail Industry because it
collects large amount of data from on sales, customer purchasing
history, goods transportation, consumption and services. It is
natural that the quantity of data collected will continue to expand
rapidly because of the increasing ease, availability and popularity of
the web.
Customer Retention.
Telecommunication Industry
Today the telecommunication industry is one of the most emerging
industries providing various services such as fax, pager, cellular
phone, internet messenger, images, e-mail, web data transmission,
etc. Due to the development of new computer and communication
technologies, the telecommunication industry is rapidly expanding.
This is the reason why data mining is become very important to help
and understand the business.
Application Exploration.
Web mining.