0% found this document useful (0 votes)
13 views23 pages

Unit V 1

The document discusses various types of databases used in data mining, including complex, spatial, temporal, multimedia, and time series databases, each serving specific analytical purposes. It also covers text mining techniques for extracting insights from unstructured text data and graph mining for analyzing relationships in graph-structured data. Additionally, web mining is explored, detailing its applications and processes for discovering patterns from web data.

Uploaded by

Aflah Sidhik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views23 pages

Unit V 1

The document discusses various types of databases used in data mining, including complex, spatial, temporal, multimedia, and time series databases, each serving specific analytical purposes. It also covers text mining techniques for extracting insights from unstructured text data and graph mining for analyzing relationships in graph-structured data. Additionally, web mining is explored, detailing its applications and processes for discovering patterns from web data.

Uploaded by

Aflah Sidhik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

UNIT V DATA WAREHOUSING AND DATA MINING SOFTWARE’S AND APPLICATIONS

Complex Data Objects

Complex data objects allow you to create data structures that group together different
types of data. Complex data objects are based on complex data types. Complex data types
allow you to create data structures based on basic data objects.

For example, you can create a complex data object called employee that contains different
types of data for employee like id, name, and age. The relationship between complex data
types and complex data objects is analogous to the relationship between classes and
instances in the Java programming language.

Figure 13-1 shows the relationship between basic and complex data objects.

Figure 13-1 Relationship Between Basic and Complex Data Objects

Description of Figure 13-1 follows

Description of “Figure 13-1 Relationship Between Basic and Complex Data Objects”
Before creating a complex data object, you must first define the complex data type that
defines the data structure. For more information about using complex data types, see
Using Complex Data Types to Define Data Structures.

Spatial databases

Spatial databases in data mining handle geographical or spatial information, enabling the
analysis of data with location-based attributes. These databases store and manage spatial
data, allowing for efficient retrieval and processing of information related to specific
locations. In data mining, spatial databases play a crucial role in tasks like spatial
clustering, spatial association rule mining, and spatial outlier detection, contributing to
insights derived from spatial relationships within the data.
Tools for spatial databases in data mining include:

PostGIS: An extension for PostgreSQL that adds support for spatial features.

Oracle Spatial: Part of Oracle Database, it provides spatial data management capabilities.

GeoMesa: A geospatial analytics toolkit designed for distributed computing frameworks


like Apache Hadoop and Apache Accumulo.

Spatialite: A lightweight spatial database engine that works as a self-contained library.

ESRI ArcGIS: A comprehensive suite of tools for mapping and spatial analysis.

Types of spatial databases in data mining:

Spatial Data Warehouses: Store large volumes of spatial data for efficient querying and
analysis.

Spatial Indexing Structures: Enhance retrieval speed by organizing spatial data in a way that
supports fast queries.

Spatial Data Mining Algorithms: Specialized algorithms for extracting patterns and
knowledge from spatial data.

Geographic Information System (GIS) Databases: Integrated systems for capturing, storing,
managing, and analyzing spatial or geographic data.
These tools and types facilitate the effective use of spatial information in data mining tasks.

Temporal databases in data mining


Temporal databases play a crucial role in data mining by incorporating time-related
information into the analysis. They allow for the storage and retrieval of data at different
points in time, enabling a more comprehensive understanding of trends, patterns, and
changes over time. In data mining, temporal databases can be utilized to identify temporal
patterns, detect anomalies, and make predictions based on historical data. Temporal
aspects enhance the accuracy and relevance of mining results, especially in dynamic
environments where data evolves over time.

Types and tools for temporal databases in data mining:

Types of Temporal Databases:

Valid Time Databases: Store the time period during which a fact is considered true.

Transaction Time Databases: Track the time period during which a particular version of a
fact is valid.

Bitemporal Databases: Combine both valid time and transaction time aspects.

Tools for Temporal Databases in Data Mining:

Oracle Temporal Database Features: Oracle provides features like Temporal Validity and
Flashback Query for managing temporal data.

Microsoft SQL Server Temporal Tables: SQL Server offers temporal tables to track historical
data changes effectively.
PostgreSQL Temporal Tables: PostgreSQL has capabilities for temporal tables, enabling the
storage of historical data.

IBM Db2 Temporal Tables: Db2 supports temporal tables, allowing users to manage
historical data with ease.

Teradata Temporal Database: Teradata provides temporal database support for managing
time-varying data.

Temporal Database Management Systems (TDBMS): Some systems, like bitemporal


databases, are specifically designed for managing temporal data efficiently.

Temporal Query Languages: SQL extensions or temporal query languages like TSQL2 are
used for querying temporal databases effectively.

These tools and databases enable data miners to explore time-centric patterns, trends,
and anomalies in datasets, contributing to a more comprehensive analysis of temporal
aspects in data mining tasks.

Multimedia databases in data mining


Multimedia databases in data mining involve analyzing and extracting patterns from diverse
forms of data, such as images, videos, audio, and more. Techniques like content-based
retrieval, clustering, and classification are adapted to handle multimedia content for tasks
like image recognition, video analysis, and audio processing. Efficient storage and retrieval
methods, as well as feature extraction algorithms, play crucial roles in managing
multimedia data within the context of data mining.

Tools and types of Multimedia databases in data mining

Tools for Multimedia Databases in Data Mining:


MATLAB: Widely used for image and signal processing, MATLAB offers powerful tools for
multimedia data analysis.

TensorFlow and PyTorch: Popular for deep learning applications, these frameworks are
valuable for tasks like image and speech recognition.

OpenCV: A library for computer vision tasks, OpenCV is instrumental in multimedia data
processing and analysis.

Types of Multimedia Databases in Data Mining:

Image Databases: Store and retrieve images, often employing techniques like feature
extraction and content-based image retrieval (CBIR).

Video Databases: Manage and analyze video data, incorporating methods for shot
detection, object tracking, and action recognition.

Audio Databases: Deal with audio content, using techniques such as signal processing and
pattern recognition for tasks like speech recognition and music analysis.

Text and Image Databases: Combine text and image data for comprehensive analysis,
common in applications like multimedia content summarization.

3D Model Databases: Handle three-dimensional models, important in fields like computer-


aided design (CAD) and virtual reality.

These tools and database types help leverage multimedia data for knowledge discovery
through data mining techniques.

TIME SERIES
Time series represents a collection of values or data obtained from the logical order of
measurement over time. Time series data mining makes our natural ability to visualize the
shape of real-time data. It is an ordered sequence of data points at uniform time intervals.

Time Series Analysis comprises methods for analyzing time-series data in order to extract
meaningful statistics, rules and patterns. These rules and patterns might be used to build
forecasting models that are able to predict future developments.

Is the database play a vital role in Time Series mining?


The database is the collection of data retrieved from a different source in which the data
are stored in a structural, nonstructural format on their respective columns.

Time Series database consists of a sequence of values or events changing with time. Data
are recorded at regular intervals.

Application of Time Series Mining:

1. Financial:

1.1 Used for stock price evaluation

1.2 For the measurement of Inflation

2. Industry:

2.1 Determine the power consumption

3. Scientific:

3.1 Used for experiment results

4. Meteorological:

4.1 Concerned with the processes and phenomena of the atmosphere, basically for
forecasting weather

Characteristic of time series components:

1. Trend

2. Cycle

3.Seasonal

4. Irregular

Category of Time-Series Movements:


1. Long-term or trend movements :

The general direction in which a time series is moving over a long interval of time. It shows
the general tendency of the data to increase or decrease a long period of time.

2. Cyclic movements or cycle variations:

Long term oscillations about a trend line or curve. For example, business cycles. This
oscillatory movement has a period of oscillation of more than a year.

3. Seasonal movements or seasonal variations:

Almost identical patterns that a time series appears to follow during corresponding months
of successive years. This variation will be present in a time series if the data are recorded
hourly, daily, weekly or monthly.

4. Irregular or random movements:

These fluctuations are unforeseen, uncontrollable and unpredictable. They are not regular
variations and are purely random or irregular.

What is Sequence Data in Data Mining?

Sequence Data in Data Mining is defined as data in which the points in the dataset are
reliant on the other points in the dataset. A Timeseries, such as a stock price or sensor
data, is an example of this, where each point represents an observation at a specific point
in time.

What is Text Mining?


Text mining is a component of data mining that deals specifically with unstructured text
data. It involves the use of natural language processing (NLP) techniques to extract useful
information and insights from large amounts of unstructured text data. Text mining can be
used as a preprocessing step for data mining or as a standalone process for specific tasks.

By using text mining, the unstructured text data can be transformed into structured data
that can be used for data mining tasks such as classification, clustering, and association
rule mining. This allows organizations to gain insights from a wide range of data sources,
such as customer feedback, social media posts, and news articles.

What is the common usage of Text Mining?

Text mining is widely used in various fields, such as natural language processing,
information retrieval, and social media analysis. It has become an essential tool for
organizations to extract insights from unstructured text data and make data-driven
decisions.

“Extraction of interesting information or patterns from data in large databases is known as


data mining.”

Text mining is a process of extracting useful information and nontrivial patterns from a large
volume of text databases. There exist various strategies and devices to mine the text and
find important data for the prediction and decision-making process. The selection of the
right and accurate text mining procedure helps to enhance the speed and the time
complexity also. This article briefly discusses and analyzes text mining and its applications
in diverse fields.

“Text Mining is the procedure of synthesizing information, by analyzing relations, patterns,


and rules among textual data.”
As we discussed above, the size of information is expanding at exponential rates. Today all
institutes, companies, different organizations, and business ventures are stored their
information electronically. A huge collection of data is available on the internet and stored
in digital libraries, database repositories, and other textual data like websites, blogs, social
media networks, and e-mails. It is a difficult task to determine appropriate patterns and
trends to extract knowledge from this large volume of data. Text mining is a part of Data
mining to extract valuable text information from a text database repository. Text mining is a
multi-disciplinary field based on data recovery, Data mining, AI, statistics, Machine
learning, and computational linguistics.

Conventional Process of Text Mining

Gathering unstructured information from various sources accessible in various document


organizations, for example, plain text, web pages, PDF records, etc.

Pre-processing and data cleansing tasks are performed to distinguish and eliminate
inconsistency in the data. The data cleansing process makes sure to capture the genuine
text, and it is performed to eliminate stop words stemming (the process of identifying the
root of a certain word and indexing the data.

Processing and controlling tasks are applied to review and further clean the data set.

Pattern analysis is implemented in Management Information System.

Information processed in the above steps is utilized to extract important and applicable
data for a powerful and convenient decision-making process and trend analysis.
Procedures for Analyzing Text Mining

Text Summarization: To extract its partial content and reflect its whole content
automatically.

Text Categorization: To assign a category to the text among categories predefined by users.

Text Clustering: To segment texts into several clusters, depending on the substantial
relevance.

Text Mining Techniques

Information Retrieval
In the process of Information retrieval, we try to process the available documents and the
text data into a structured form so, that we can apply different pattern recognition and
analytical processes. It is a process of extracting relevant and associated patterns
according to a given set of words or text documents. For this, we have processes like
Tokenization of the document or the stemming process in which we try to extract the base
word or let’s say the root word present there.

Information Extraction

It is a process of extracting meaningful words from documents.

Feature Extraction – In this process, we try to develop some new features from existing
ones. This objective can be achieved by parsing an existing feature or combining two or
more features based on some mathematical operation.

Feature Selection – In this process, we try to reduce the dimensionality of the dataset
which is generally a common issue while dealing with the text data by selecting a subset of
features from the whole dataset.

Graph mining in data mining


Graph mining in data mining involves analyzing and extracting patterns, relationships, and
structures from graph-structured data. It’s commonly used to uncover insights in various
domains such as social networks, biological networks, and transportation systems.
Techniques include frequent subgraph mining, community detection, and graph clustering
to reveal meaningful information within complex interconnected data sets.

Types and tools of graph mining in data mining

Types of Graph Mining:

Frequent Subgraph Mining:

Identifying recurring patterns or subgraphs that appear frequently in a dataset.

Graph Clustering:
Grouping nodes or subgraphs based on similarity, revealing communities or clusters within
the graph.

Graph Pattern Matching:

Locating specific patterns or structures within a graph, aiding in pattern recognition.

Anomaly Detection:

Detecting irregularities or outliers in graphs that deviate from the norm.

Graph Classification:

Assigning labels to entire graphs based on certain characteristics or properties.

Tools for Graph Mining:

NetworkX:

A Python library for creating, analyzing, and visualizing complex networks and graphs.

Gephi:

An open-source software for exploring and visualizing networks, offering various algorithms
for graph analysis.

Igraph:

A library for creating, manipulating, and analyzing large-scale graphs in languages like R,
Python, and C.

Neo4j:

A graph database that allows for efficient storage and querying of graph-structured data.
Cytoscape:

A platform for visualizing molecular interaction networks and integrating with various data
sources for analysis.

Graph-tool:

A Python library for efficient manipulation and statistical analysis of graphs.

These tools cater to different aspects of graph mining, providing functionalities ranging
from basic analysis to advanced mining techniques.

Web Mining
Web Mining is the process of Data Mining techniques to automatically discover and extract
information from Web documents and services. The main purpose of web mining is
discovering useful information from the World-Wide Web and its usage patterns.

Applications of Web Mining:

Web mining is the process of discovering patterns, structures, and relationships in web
data. It involves using data mining techniques to analyze web data and extract valuable
insights. The applications of web mining are wide-ranging and include:

Personalized marketing:

Web mining can be used to analyze customer behavior on websites and social media
platforms. This information can be used to create personalized marketing campaigns that
target customers based on their interests and preferences.

E-commerce
Web mining can be used to analyze customer behavior on e-commerce websites. This
information can be used to improve the user experience and increase sales by
recommending products based on customer preferences.

Search engine optimization:

Web mining can be used to analyze search engine queries and search engine results pages
(SERPs). This information can be used to improve the visibility of websites in search engine
results and increase traffic to the website.

Fraud detection:

Web mining can be used to detect fraudulent activity on websites. This information can be
used to prevent financial fraud, identity theft, and other types of online fraud.

Sentiment analysis:

Web mining can be used to analyze social media data and extract sentiment from posts,
comments, and reviews. This information can be used to understand customer sentiment
towards products and services and make informed business decisions.

Web content analysis:

Web mining can be used to analyze web content and extract valuable information such as
keywords, topics, and themes. This information can be used to improve the relevance of
web content and optimize search engine rankings.

Customer service:
Web mining can be used to analyze customer service interactions on websites and social
media platforms. This information can be used to improve the quality of customer service
and identify areas for improvement.

Healthcare:

Web mining can be used to analyze health-related websites and extract valuable
information about diseases, treatments, and medications. This information can be used to
improve the quality of healthcare and inform medical research.

Process of Web Mining:

Web mining can be broadly divided into three different types of techniques of mining: Web
Content Mining, Web Structure Mining, and Web Usage Mining. These are explained as
following below.
Web Content Mining: Web content mining is the application of extracting useful
information from the content of the web documents. Web content consist of several types
of data – text, image, audio, video etc. Content data is the group of facts that a web page is
designed. It can provide effective and interesting patterns about user needs. Text
documents are related to text mining, machine learning and natural language processing.
This mining is also known as text mining. This type of mining performs scanning and mining
of the text, images and groups of web pages according to the content of the input.

Web Structure Mining: Web structure mining is the application of discovering structure
information from the web. The structure of the web graph consists of web pages as nodes,
and hyperlinks as edges connecting related pages. Structure mining basically shows the
structured summary of a particular website. It identifies relationship between web pages
linked by information or direct link connection. To determine the connection between two
commercial websites, Web structure mining can be very useful.

Web Usage Mining: Web usage mining is the application of identifying or discovering
interesting usage patterns from large data sets. And these patterns enable you to
understand the user behaviors or something like that. In web usage mining, user access
data on the web and collect data in form of logs. So, Web usage mining is also called log
mining.
Application and trends in data Mining

Data mining is widely used in diverse areas. There are a number of


commercial data mining system available today and yet there are
many challenges in this field. In this tutorial, we will discuss the
applications and the trend of data mining.

Data Mining Applications


Here is the list of areas where data mining is widely used –

Financial Data Analysis


Retail Industry
Telecommunication Industry
Biological Data Analysis
Other Scientific Applications
Intrusion Detection
Financial Data Analysis
The financial data in banking and financial industry is generally
reliable and of high quality which facilitates systematic data
analysis and data mining. Some of the typical cases are as follows –

Design and construction of data warehouses for multidimensional


data analysis and data mining.
Loan payment prediction and customer credit policy analysis.

Classification and clustering of customers for targeted marketing.

Detection of money laundering and other financial crimes.

Retail Industry
Data Mining has its great application in Retail Industry because it
collects large amount of data from on sales, customer purchasing
history, goods transportation, consumption and services. It is
natural that the quantity of data collected will continue to expand
rapidly because of the increasing ease, availability and popularity of
the web.

Data mining in retail industry helps in identifying customer buying


patterns and trends that lead to improved quality of customer
service and good customer retention and satisfaction. Here is the
list of examples of data mining in the retail industry –

Design and Construction of data warehouses based on the benefits


of data mining.
Multidimensional analysis of sales, customers, products, time and
region.

Analysis of effectiveness of sales campaigns.

Customer Retention.

Product recommendation and cross-referencing of items.

Telecommunication Industry
Today the telecommunication industry is one of the most emerging
industries providing various services such as fax, pager, cellular
phone, internet messenger, images, e-mail, web data transmission,
etc. Due to the development of new computer and communication
technologies, the telecommunication industry is rapidly expanding.
This is the reason why data mining is become very important to help
and understand the business.

Data mining in telecommunication industry helps in identifying the


telecommunication patterns, catch fraudulent activities, make
better use of resource, and improve quality of service. Here is the
list of examples for which data mining improves telecommunication
services –
Multidimensional Analysis of Telecommunication data.

Fraudulent pattern analysis.

Identification of unusual patterns.

Multidimensional association and sequential patterns analysis.

Mobile Telecommunication services.

Use of visualization tools in telecommunication data analysis.

Biological Data Analysis


In recent times, we have seen a tremendous growth in the field of
biology such as genomics, proteomics, functional Genomics and
biomedical research. Biological data mining is a very important part
of Bioinformatics. Following are the aspects in which data mining
contributes for biological data analysis –

Semantic integration of heterogeneous, distributed genomic and


proteomic databases.
Alignment, indexing, similarity search and comparative analysis
multiple nucleotide sequences.

Discovery of structural patterns and analysis of genetic networks


and protein pathways.

Association and path analysis.

Visualization tools in genetic data analysis.


Trends in Data Mining
Data mining concepts are still evolving and here are the latest trends
that we get to see in this field –

Application Exploration.

Scalable and interactive data mining methods.

Integration of data mining with database systems, data warehouse


systems and web database systems.

SStandardization of data mining query language.


Visual data mining.

New methods for mining complex types of data.

Biological data mining.

Data mining and software engineering.

Web mining.

Distributed data mining.

Real time data mining.

Multi database data mining.

Privacy protection and information security in data mining.

You might also like