Multimedia Data Mining
Multimedia Data Mining
Introduction
Data Management lies a the heart of multimedia information system.The spatial,
temporal, storage, retrieval, integration and presentation requirements of multimedia data
differ significantly from the traditional data. Hence the goal of Multimedia data management
system is to allow efficient storage, manipulation using of multimedia data in all its varied
form.
There are four types of multimedia data: audio data, which includes sound , speech,
and music; image data (black-and-white and colour images); video data, which include time
aligned sequences of images; and electronic or digital, which is sequences of time aligned 2D
or 3D coordinates of a stylus, a light per, data glove sensors, or a similar device. All this data
is generated by specific kind of sensors.
The first approach includes assigning key words or classifying the data. The second
approach for automatic annotation is through clustering and here multimedia documents are
clustered first and then the resulting clusters are assigned keywords by annotator. The third
approach does not rely on manual annotator and it tries to mine concepts by knowing the
contextual information.
The Multimedia Data Mining (MDM) is a part of multimedia technology, which covers the
following areas.
The data mining process consists of several processes and stages, which are related to
each other and interactive. The main stages of the data mining process are
(5)interpretation;
2. The data selection stage requires the user to target a database or select a subset of
fields or data records to be used for data mining. A proper domain understands at this
stage helps in the identification of useful data. This is the most time consuming stage
of the entire data mining process for business applications; data are never clean and in
the form suitable for data mining. For multimedia data mining, this stage is generally
not an issue, because the data are not in relational form and there are no subsets of
fields to choose from.
3. The next stage in a typical data mining process is the preprocessing step that involves
integrating data from different sources and making choices about representing or
coding certain data fields that serve as inputs to the pattern discovery stage. Such
representation choices are needed because certain fields may contain data at levels of
details not considered suitable for the pattern discovery stage. The preprocessing stage
is of considerable importance in multimedia data mining, given the unstructured
nature of multimedia data. The pattern discovery stage is the heart of the entire data
mining process. It is the stage where the hidden patterns and trends in the data are
actually uncovered. There are several approaches to the pattern discovery stage. These
include association, classification, clustering, regression, time-series analysis and
visualization. Each of these approaches can be implemented through one of several
competing methodologies, such as statistical data analysis, machine learning, neural
networks and pattern recognition. It is because of the use of methodologies from
several disciplines that data mining is often viewed as a multidisciplinary field.
4. The interpretation stage of the data mining process is used to evaluate the quality of
discovery and its value to determine whether previous stage should be revisited or not.
Proper domain understanding is crucial at this stage to put a value on discovered
patterns.
5. The final stage of the data mining process consists of reporting and putting to use the
discovered knowledge to generate new actions or products and services or marketing
strategies as the case may be.
Advantages of Multimedia Data Mining
Multimedia Miner
Multi-Dimensional Analysis in
Multimedia Databases
Features and Standards of Multimedia Data Mining
It is noted that different image attributes such as Colour, edges, shape, and texture are
used to extract features for mining. Feature extraction based on these attributes may be
performed at the global or local level. For example, colour histograms may be used as
features to characterize the spatial distribution of colour in an image. Similarly, the shape of a
segmented region may be represented as a feature vector of Fourier descriptors to capture
global shape property of the segmented region or a shape could be described in terms of
salient points or segments to provide localized descriptions. Global descriptors are generally
easy to compute, provide a compact representation, and are less prone to segmentation errors.
However such descriptors may fail to uncover subtle patterns or changes in shape because
global descriptors tend to integrate the underlying information. Local descriptors, on the other
hand, tend to do generate more elaborate representation and can yield useful results even
when part of the underlying attribute, for example, the shape of a region is occluded, is
missing. In the case of video, additional attributes resulting from object and camera motion
are used.
In case of audio, both the temporal and the spectral domain features have been
employed. Examples of some of the features used include short-time energy, pause rate, zero
crossing rate, normalized harmonicity, fundamental frequency, frequency spectrum,
bandwidth, spectral centroid, spectral roll-off frequency and band energy ratio. Many
researchers have found the cepstral based features, Mel-Frequency Cepstral Coefficients
(MFCC) and Linear Predictive Coefficients (LPC), very useful, especially in mining tasks
involving speech recognition. The MPEG-7 standard provides a good representative set of
features for multimedia data. The features are referred as descriptors in MPEG-7. The
MPEG-7 Visual description tools describe visual data such as images and videos
while the Audio description tools account for audio data. The MPEG-7 visual description
defines the following main features for color attributes: Color Layout Descriptor, Color
Structure Descriptor, Dominant Color Descriptor and Scalable Color Descriptor. The Color
Layout Descriptor is a compact and resolution invariant descriptor that is defined as YCbCr
Color space to capture the spatial distribution of color over major image regions. The Color
Structure Descriptor captures both color content and information about its spatial
arrangement using a structuring element that is moved over the image. The Dominant Color
Descriptor characterizes an image or an arbitrarily shaped region by a small number of
representative colors. The Scalable Color Descriptor is a color histogram in the HSV Color
Space encoded by Haar transform to yield a scalable representation. While the above features
are defined with respect to an image or its part, the feature Group of Frames-Group of
Pictures Color (GoFGoPColor) describes the color histogram aggregated over multiple
frames of a video9.
MPEG-7 provides for two main shape descriptors; others are based on these and
additional semantic information. The Region shape Descriptor describers the shape of a
region using Angular Radial Transform (ART). The description is provided in terms of 40
coefficients and is suitable for complex objects consisting of multiple disconnected regions
and for simple objects with or without holes. The Contour Shape Descriptor describes the
shape of an object based on its outlines. The descriptor used the curvature scale space
representation of the contour.
The motion descriptors in MPEG-7 are defined to cover a broad range of applications.
The motion activity descriptor captures the intuitive notion of intensity or pace of action in a
video clip. The descriptor provides information for intensity, direction, and spatial and
temporal distribution of activity in a video segment. The spatial distribution of activity
indicates whether the activity is spatially limited or not. Similarly, the temporal distribution
of activity indicates how the level of activity varies over the entire segment. The
CameraMotion Descriptor specifies the camera motion types and their quantitative
characterization over the entire video segment. The Motion Trajectory Descriptor describes
motion trajectory of moving object basic on spatiotemporal localization of trajectory points.
The description provided is at a fairly high level as each moving object is indicated by one
representative point at any time instant. The parametric Motion Descriptors describes motion,
global and object motion, in a video segment by describing the evolution of arbitrarily shaped
regions over time using a two-dimensional geometric transform.
The MPEG-7 Audio standard defines two sets of audio descriptors. The first set is of
low-level features, which are meant for a wide range of applications. The descriptors in this
set include silence, power, Spectrum, and Harmonicity. The silence Descriptor simply
indicates that there is no significant sound in the audio segment. The power Descriptor
measures temporally smoothed instantaneous signal power. The Spectrum Descriptor
captures properties such as the audio spectrum envelope, spectrum centroid spectrum spread
spectrum flatness, and fundamental frequency. The second set of audio descriptors is of high-
level feature, which are meant for specific applications. The features in this set include Audio
Signature, Timbre, and Melody. The Signature Descriptor is designed to generate a unique
identifier for identifying audio content. The Timbre Descriptor captures perceptual features of
instrument sound. The Melody Descriptor captures monophonic melodic information and is
useful for matching of melodies. In addition, the high-level descriptors in MPEG-7 Audio
include descriptors for automatic speech recognition, sound classification and indexing.
This application helps the manager of a a video library to group the customers according to
the purchase language, rating, cast etc.
Working
1. Loading The Data
After weka 3.6 has been installed we launch the explorer application of weka. Now we
need to load the dataset we have created as a .csv extension. Click on choose file and then
change the file type to .csv and browse to the desired location and select the file. It is as
shown below in the figure.
Our dataset contain the following attributes:
2. Basic Statistics
Once the data set has been loaded Weka will recognize the attributes and during the
scan of the data will compute some basic statistics on each attribute. The left panel in Figure
below shows the list of recognized attributes, while the top panels indicate the names of the
base relation (or table) and the current working relation.
Clicking on any attribute in the left panel will show the basic statistics on that
attribute. For categorical attributes, the frequency for each attribute value is shown, while for
continuous attributes we can obtain min, max, mean, standard deviation, etc. The figure
below illustrates the same. It shows the type of attribute be it numeric, nominal etc. for
nominal attribute “Lead” shown below it tells us the number of distinct values and also lists
the number of occurrences along with the values for each attribute.
The visualization graphs shown below is a cross tabulation between two attributes.
The figure below shows the cross tabulation between certification and the language of the
films.
In our sample data file, each record is uniquely identified by a F_id (the "id" attribute).
We need to remove this attribute before the data mining step. We can do this by using the
Attribute filters in WEKA. In the "Filter" panel, click on the "Choose" button. This will show
a popup window with list available filters. Expand the filters, then expand unsupervised, then
expand attributes and select “Remove” filter from that. It is as shown below in the figure.
After this click on text box immediately to the right of the "Choose" button. In the
resulting dialog box enter the index of the attribute to be filtered out. In this case, we enter 1
which is the index of the "F_id" attribute. Make sure that the "invertSelection" option is set to
false (otherwise everything except attribute 1 will be filtered). Then click "OK". It is as
illustrated below in the figure.
Now, in the filter box you will see "Remove -R 1". Now click on apply button. The
resulting filtration has removed the f_id attribute as shown in the figure below. We now save
the resulting intermediary dataset as “media2.arff”
4. Discretization
Now we shall open this media4.arff file in word pad and change the discrete values
assigned for duration to a more meaningfull value. We replace all occurances of (-inf-93]
with the value 0_93 this can be done by clicking replace all option in the figure below.
The same procedure is followed for all the values of “Duration” attribute. Then we
save the file as media-final.arff. The values that the duration attribute can have are shown in
the following two figures.
5. Association Mining
Now that all the attributes have been discretized we can perform association mining
on the dataset. The most commonly used algorithm is the apriori which we will also be using.
Go to the associate tab in weka. In that tab click on choose and select apriori as the
associator. Then click on the textbox next to the choose button. A dialog box appears. Here
we change the default value of rules to 20, this indicates that the program will report no more
than the top 20 rules. The upper bound for minimum support is set to 1.0 (100%) and the
lower bound to 0.1 (10%). Apriori in WEKA starts with the upper bound support and
incrementally decreases support (by delta increments which by default is set to 0.05 or 5%).
The algorithm halts when either the specified number of rules are generated, or the lower
bound for min. support is reached. The significance testing option is only applicable in the
case of confidence and is by default not used (-1.0). The figure below shows the final dialog
box for apriori.
Now we click ok and then click on start. The results are displayed as shown below.
6. Visualization
The relationship between attributes can be shown in terms of graphs by plotting tne X and Y
coordinates of araph with the attributes between which relation should be visualized using
“visualize” button in the top panel of the weka explorer
Consider we need to visualize relation between Language and Lead attributes . select the
corresponding by clicking on the red square as shown above. The following screen appears
showing the relation between Lead and Language
Conclusion
Multimedia data mining techniques are active and growing area of research now. In
case of digital library projects, there is need for multimedia data mining for conversion and
preservation of multimedia information. There is needed to make data mining strategy for
conversion of multimedia files in the libraries. The digital libraries, to a large extent
accessible through the web, must present multimedia information effectively. Then the
purpose of these libraries is served properly. To serve this purpose, there is needed to form
data mining strategy, considering standards, features and available techniques.