0% found this document useful (0 votes)

92 views20 pages

Robin 1 PDF

Uploaded by

Saravanan V

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

92 views20 pages

Robin 1 PDF

Uploaded by

Saravanan V

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

International Journal of Control and Automation

Vol. 12, No. 6, (2019), pp. 276-295

An Empirical Study on Methods, Metrics and Evaluation on Feature

Extraction in Big Data Analytics
1*. J. Jebamalai Robinson, Research Scholar, Bharathiar University, Coimbatore,
2. Dr. V. Saravanan, Dean – Computer Studies, Dr. SNS College of Arts and Science, Coimbatore.

ABSTRACT:
Big data usually refer to large data with high accumulation rate and complex which makes it very
difficult or even impossible for processing with the help of traditional techniques. Big data is often
referred as the data with many varieties accumulated in large volumes with exceptional velocity.
Many times, too much of information can be the cause of inefficiency in the domain of data mining.
Attributes that are irrelevant often add noises and can affect the accuracy of the data model. The
attributes may also be redundant that measures the same feature. These anamloies present in the data
that is built are prone to skew any logic of the DM algorithms and can cause adverse effects in the
model’s accuracy. Data with many such attributes involves lot of processing difficulties when data
mining algorithms are applied. The attributes present in a data model represents the dimensionality
of the processing space that are used by a particular algorithm. Greater the dimensionality, higher
the cost of computation in any algorithm design and processing. In order to minimize these noise and
high dimensions, specific form of dimensionality reduction techniques are required for the data
mining to be effective. Feature selection and feature extraction are common approaches towards
solving the issue. The former deals in selecting the attributes that are most relevant and the latter
helps to combine the attributes into a set of reduced features. Feature extraction in the process of
attribute reduction. Feature selection is the process where the attributes that are existing are ranked
based on the predictive significance whereas Feature Extraction transforms the attributes in reality.
Numerous researches have been conducted in proposing methods and techniques for effective feature
extraction. This paper is an outcome of the study on the methods, metrics and evaluation techniques
for the feature extraction in Big Data analytics.
Keyterms: Big data, Velocity, Variance, Attributes, Feature Extraction and Dimensionality

1. INTRODUCTION
Big data refers to the data that are available in large volumes. These can be either structured or
unstructured that gets flooded on a day-day fashion, the amount of data accumulated or gathered are
always not important but what the organization does with the accumulated data is what that really to
be considered. Big data analytics can be used to take better decisions and strategy planning in most of
the businesses [1]. The effective usage of Big data helps a company to outperform their competitors.
In many industries, the competitors and the new entrants are alike in using the strategies that are
resulted from the data analyzed for competence and innovation. Big data supports the organization for
creating a vertical growth and also to rise new categories of organizations that can merge and analyze
the industry data. Those companies might have huge information on the products and services, the
buyers and marketers,Consumer behavior and preferences that can be analyzed.

The term “Big Data” is comparatively new and the act of collecting information and storing it in large
volumes for analysis is very old. The complete character of the Big Date lies in 3 Vs (Variety,
Volume and Velocity) [2].The fundamental and important usage of big data lies in how the data is
used by an organization and not how much data is accumulated. All companies uses their data in own

ISSN: 2005-4297 IJCA

276
Copyright ⓒ 2019 SERSC
International Journal of Control and Automation
Vol. 12, No. 6, (2019), pp. 276-295

means. The more efficiently a organization uses it, the larger chances are there for it to grow. The
organization can use data from any of the sources and analysis can be done for finding answers that
can comprehensively improve the growth in terms of Cost reduction, time complexity reduction,
Planning and also to control the reputation and even more.
Feature Extraction is a way of dimensionality reduction through which an set of data that are raw are
reduced to manageable groups for easy processing. One of the character of these huge data is the
presence of numerous variables that makes the cost of computation and resource utilization to be
more. [3]. Feature extraction can be defined as the method through which selection and combination
of variables occurs by which these variables are transformed into features for effective reduction of
the amount of data that needs to be processes at the same time which maintains the accuracy and
completeness of the original data. Feature Extraction are normally used when there is a need for the
reduction of the resources used for the processing by keeping the important information safe and
secured. This also reduces the amount of data that are redundant in a certain analysis.
The speed of learning and the process of generalization improves when the data is reduced by the
combination of variables. The feature subset can be enough for constructing data models. As the
feature selection process keeps all the subsets of the original data, the important merit of this is that
the physical meaning remains unchanged that serves for better readability of the mode and also counts
for the interpretability [4]. Owing to this reason, the Feaure Extraction are widely adapted in most of
the real time applications. Removal of these redundant and non-relevant features helps greatly in
reducing the time complexity and storage costs without any loss of significant information or any
negative gradation in the performance of the learning [5].
Recently, the popularity of big data presents some challenges for the traditional feature extraction
task. Meanwhile, some unique characteristics of big data also bring about new opportunities for the
feature selection research. In recent times, the popularity of the Big data also present us with great
challenges in the conventional Feature Extraction methods. To add, some of the unique character of
the big data also come up with competitive opportunities for research in Feature Extraction.This paper
is intended to give a lucid knowledge on the methods that are used for the feature extraction based on
their data category such as structured, unstructured, multi label or multi-view data. The metrics used
for the performance study and the method of evaluation of the metrics are also discussed. The rest of
the paper are organized in the following manner. Section II gives the insights on the reviews on
feature extraction for structured data, un-structured data, Multi-label data, and Multi-View data
respectively. Section III deals with the comparative results and evaluation of the methods discussed
and Section IV concludes the paper with open research opportunities.

SECTION II
A. Feature Extraction methods of Big unstructured data
Unstructured data rather have an internal structure but lacks a pre-defined and structured schema.
These may in the form of text or non-text also generated by either human or machine. Many
organizations have turned up to a variety of software based solutions for extracting important
information from the unstructured data. The essential benefit of these type of tools is its ability to
gain useful information which can help in a company’s success. As the rate of data accumulation
is very high, Industries constantly look for better solutions to handle them efficiently. . The
following methods are studied from the literature that handles feature extraction in unstructured
big data.

ISSN: 2005-4297 IJCA

277
Copyright ⓒ 2019 SERSC
International Journal of Control and Automation
Vol. 12, No. 6, (2019), pp. 276-295

1. J. Wan et al [2019] proposed the Multi-Feature Extraction method. Keeping this as a base,
An MFE scheme for the AI driven Big Data is designed. The design is applied to the
algorithm of detecting hot events. The Multi-Feature based clustering model is developed that
works on the attention of the users which has two stages. In the initial stage, the MFKE
model is created for the evaluation of keywords and this is combined with the frequency
term. Through this, the keywords are extracted. In the next stage, the hot or the important
events are captured in accordance with the algorithm in the process of framing news clusters.
Different variadic parameters are analyzed for exploring the optimal effectiveness.
Experiments are conducted on large corpus and the results proves a significant increase in
Sensitivity, Specificity and F-Score. [6]

2. Jinrong He et al [2017]proposed DGOD (“Decision Graph Based Outlier Detection”). This

method works initially by calculating the score for the decision graph (DGS) on every sample
and the DGS is represented as a ration between the distance of discriminant and the local
density. The samples for the next ranking based on the DGS values will return the top r
largest DGS values as the outliers. Experiments were conducted both on synthetic and real-
time dataset and have confirmed a significant effectiveness in detecting the outliers and also
keeping the shape and dimensionality reduced which is evident from its low chi-square
results [7].

3. Jundong Li et al [2018]provided a lucid overview of the recent researches in the feature

Extraction. After motivated from the present difficulties, a revisit was done on the feature
extraction from the perspective of data and presented the available solutions for the FE in
unstructured, structured, and heterogeneous and also for streaming types of data. To give
more clear understanding, the similarities and the differences of the existing algorithms for
the unstructured data, categorization of them in four different groups have been done based
on Similarity , Information theory , sparse-learning and statistics based. The open challenges
in the research front are also discussed. The evaluation metrics are also discussed [8].

4. Mingkui Tan et al [2014]introduced the adaptive FSS (“Feature Scaling Scheme”) for the
very-high dimensional feature extraction for the Big Data and then reformulated it as a
convex SIP (“Semi-Infinite Problem). To solve this problem, an effective paradigm for
feature generation is proposed. Unlike the conventional approaches that are based on
gradients, which optimizes all the input features, the proposed method performs iterations and
activates only a group of features which solves the MKL(“Multiple Kernel Learning”) sub
problems. For the training process to get speed up, the MKL problems are then solved in the
primary form itself through the modified and accelerated proximity based gradient technique.
This also resulted in the identification of new cache techniques. The feature generation is
guaranteed to cover globally under mid conditions and can also achieve lower bias. The
proposed method is indented to solve couple of issues, firstly the FE based on groups for
complex structures and second, the non-linear method of Feature Extraction that has Feature
Mappings explicitly. Many experiments were carried out in a range of both synthetic and
real-time data with over a million points of data and the proposed method had a

ISSN: 2005-4297 IJCA

278
Copyright ⓒ 2019 SERSC
International Journal of Control and Automation
Vol. 12, No. 6, (2019), pp. 276-295

comprehensive performance measured in terms of Throughput and Time complexity when

compared with that of the other state-of-the art algorithms [9].

5. Kui Yu et al [2016] introduced the SAOLA for the feature extraction. The theoretical
analysis performed in the bounds of correlations occurring in pairs among the features, the
proposed method deploys a novel comparison method and maintains a thrifty model across
the time in an online fashion. Further, in-order to deal with the groups that gets added with
features, the proposed method is extended to a new group called group-SOALA for the online
feature extraction. The improved model can maintain a group which is sparse in both the
levels of groups and individually. The empirical study makes use of the benchmark data and
proved that both the algorithms proposed are scalable to data that has high demission and also
found to have better performance in terms of Sensitivity, Specificity and F-Score when
compared with other existing solutions [10].

6. Jun Lee et al [2017] proposed the SIE model for the classification of unstructured textual
data from the web in the form of five sensation based features namely the sight, hearing,
touch, smell and taste. Although the sensation is the primitive of all other human experience
with the environment, the study on this sensational information is normally neglected owing
to the non-availability of sensory expressions and the knowledge when compared to the
sentimental analysis or in opinion mining. Initially, the sensation measurement is assigned for
every feature. Then identification of as which measurement that is being assigned to feature is
done. Lastly, the sensational feature that has the strongest influence in human perceptional
experience is identified. The evaluation is done by calculating metrics such as correlation and
entropy [11].

7. Thee Zin Win et al [2018]Introduced the MIM (“Mutual Information Measure”) based
Feature extraction for removing the non-relevant and redundant features. It is observed that
the data which are massive demands efficient and effective mining techniques and the
researchers are in the process of developing algorithms that are scalable for successful mining
through which mountains of accumulated data are reduced to nuggets. The memory usage
will be more for the data with high dimension and this also increases the computational cost.
Hence, reducing the dimension of the data increases the performance considerably. The
proposed methodogy out performs the conventional algorithms which can eliminate only
irrelevant features but redundant features which is measured in terms of Entropy and
Correlation [12].
The table 2.1 gives an overview of the various methods and the metrics used for the Feature
selection of Unstructured Big data.

Sno Method Metrics

1 MFKE Sensitivity, Specificity and F score
2 FSA Chi Square
3 DGOD Correlation and Entropy
4 SIP Throughput and Time complexity
5 SAOLA Sensitivity, Specificity and F score

ISSN: 2005-4297 IJCA

279
Copyright ⓒ 2019 SERSC
International Journal of Control and Automation
Vol. 12, No. 6, (2019), pp. 276-295

6 SIE Correlation and Entropy

7 MIM Chi Square

Table 2.1 – Methods and Metrics for Feature Selection in Unstructured data

B. Feature Extraction methods of Big structured data

Structured data are the one that adheres to a model that is pre-defined and hence straight forward
for analysis purpose. These are normally in a tabulated form with inter-relations among the
various rows and columns. It is considered as the conventional method for data storage as the
earlier version of DBMS were capable of storing, processing and accessing the structured data.
This section describes the various feature extraction methods for structured big data

1. Tang et al [2015]made a research on the novel issue of feature extraction for the social media big
data in an unsupervised manner. Specific analysis is done to identify the difference among the
social media data and the traditional attribute-value based data and on how the hidden
relationship that are extracted from the linked form of data help in feature extraction. LUFS is
proposed for the linked data from social media. Systematic design and experiments are conducted
to evaluate the proposed framework on the real-time data from social media sites. The empirical
results shows that the proposed model is effective in terms of correlation and entropy when
compared with other methods [13]

2. Huan Liu et al [2016]proposed a unified platform as an intermediate. An exclusive example is

also proposed to demonstrate as how the available Feature extraction methods can be combined
bases on a meta algorithm which can take over the individual methods. This also facilitates the
user to deploy a suitable technique irrespective of the knowledge on the each algorithm. Few real
time applications are also included for the demonstrative purpose of the feature extraction in
mining process. The proposed method showed significant results in terms of Sensitivity,
Specificity and F score. The challenges and trends on feature selection is also discussed [14].

3. Lianzhi Li et al [2019]proposed a method for the building of an evolution model in the

educational service (EEM) quality in colleges and higher education institutions with an
orientation to the resources in colleges in Bid data platform. Further, experiments are carried out
as a verification on the basis of evolving data in 360 degree encyclopedia on the educational
domain. The experimental analysis proves that the proposed model can efficiently evaluate the
quality of the education in higher educational institutions. The metrics used for the experimental
analysis include Sensitivity, Specificity and F score[15]

4. Mehmet Burak Çatalkaya et al [2017]presented a software based architecture which can

identify the features with high estimation power. The Software is presented as a prototype and the
methods used for the development of the software, the techniques and algorithms and the
prototype are explained. The prototype developed is applied in banking data and the results are
analyzed. The together information gain , Chi square methods and Information value when used
together yields better results which is confirmed by the low value obtained for the chi-square and
the experiments also confirms a low logarithmic loss[16]

ISSN: 2005-4297 IJCA

280
Copyright ⓒ 2019 SERSC
International Journal of Control and Automation
Vol. 12, No. 6, (2019), pp. 276-295

5. J. A. Sáez et al [2019]introduced a novel feature extraction technique in the field of individual

learning, in which the unpredictability emerges from the way that the class marks can't be utilized
to choose the most discriminative highlights as it is customarily performed in managed learning.
The system planned, which is called Kolmogorov-Smirnov test-based Unsupervised Feature
Extraction(KSUFS), depends on the calculation of assessed feature conveyances that are later
contrasted with the first ones utilizing non-parametric factual tests to give the most delegate input
factors. Two renditions of the KSUFS are exhibited in the investigation: one of them is especially
intended to manage standard information and the other adaptation is intended to treat with large
information issues. The KSUFS is effectively contrasted with other best in class solo element
determination systems in an intensive trial study, which thinks about both standard and large
information issues. The outcomes acquired show that the technique proposed can beat the
remainder of reference of the individualFE strategies in terms of Throughput and time complexity
[17].

6. Sergio Ramírez-Gallego et al [2018]introduced the concept by which the feature extraction

methods are parallelized in the big data forums such as the Apache spark for the boosting of
performance and accuracy. The distributed framework on the conventional feature selection
which includes a variety of well-known pieces of information is developed. The experimental
results for the broad data set that has high dimensions as well as the data with larger samples
proves that the proposed method outperforms the sequential versions in terms of throughput and
time complexity [18].

7. Ke Gong et al [2018]proposed a rationale type of feature extraction, the O (|U|) FE method

whose time complexity increases in a linear fashion based on the no. of instances. For the
experimental purpose the data from UCI repository that has very high dimension and large scale
are considered. The dataset for the experiment had nearly three million features. The results
proved that the proposed model is efficient and effective in terms of Chi-Square Value and
logarithmic loss. Moreover, the proposed technique is also suitable for application in FE of even
large scale gigantic-dimensional data that are very difficult to process using the conventional
methods [19].

8. Makoto Yamada et al [2018] proposed the MPF (“Maximally Predictive Features”) with the
minimum redundant, high powered and increases interpretable features. The effectiveness of the
proposed method is demonstrated by application in various classifying phenotypes which are
based on expression in patients diagnosed with prostate cancer and also to detect enzymes
between the structures of the proteins. High accuracy is achieved as it extracted only 20 features
among the one million which contributed to a dimensionality reduction to 99.9 percent which is
obtained from the measures of correlation and entropy. . The proposed algorithm is also made
flexible to be applies in cloud platform. The proposed technique can serve as a great boost to
sophisticated predictive models in health care applications [20].

ISSN: 2005-4297 IJCA

281
Copyright ⓒ 2019 SERSC
International Journal of Control and Automation
Vol. 12, No. 6, (2019), pp. 276-295

The table 2.2 depicts the methods used and the metrics taken for consideration in the feature
selection of structured big data.

Sno Method Metrics

1 LUFS Correlation and Entropy
2 Meta-Algorithm Sensitivity , Specificity and F Score
3 EEM Sensitivity , Specificity and F Score
4 Chi-Square method Chi-Square Value , logarithmic loss

5 KSUFS Throughout , Time complexity

6 FS- Apache Spark Throughout , Time complexity
7 O (|U|) method Chi-Square Value , logarithmic loss
8 MPF Correlation and Entropy

Table 2.2 Method and Metrics for feature selection in Structured Big data

C. Feature Extraction methods of Big Multi-label data

The multi-label data assigns to every sample a set of labels that are pre-defined. These are
considered as the predictive data points which are not mutually exclusive for example the topics
that are more relevant in a document. The text data may be from any subject and from various
domain but has a certain label assigned to it. The available methods of feature extraction in the
multi-label data are explained as follows.

1. Konstantinos Sechidis et al [2011] investigated the issue of stratification in data that are
multi-labelled. The work considered couple of methods for the multi-label data and
comparison is done empirically along with the random sampling in numerous datasets with
definite sets of evaluation conditions. The results obtained shows interesting patters with
respect of the utilities of every method for certain types of multi-label data. The results were
evaluated using the Entropy , Correlation and RMSE metrics [21]

2. Conor Fahy et al [2019] proposed a Feature Mask that is dynamic for the clustering of very
high dimensional data. The features that are redundant are masked and the clustering was
performed only on the unmasked and relevant features. The Masks are updated regularly
when the importance of the feature changes. The proposed method stands independent of the
algorithm considered and are scalable to be applied for any of the density based clustering
techniques which does not poses a proper mechanism for the drift in features to handle with.
The evaluation is done in four data streams that are highly dimensional and the results proved
efficient in terms of Sensitivity , Specificity , RMSE and F-Score which helped in producing
a improved cluster performance and reduction in time complexity[22]

3. WenjunKe et al [2018]introduced the SCF method (“Score based Critea for Fusion Feature
Extraction”) for the prediction of cancer disease. This method is aimed to improve the quality
of classification model. The proposed method is evaluated on five large micro-array data and

ISSN: 2005-4297 IJCA

282
Copyright ⓒ 2019 SERSC
International Journal of Control and Automation
Vol. 12, No. 6, (2019), pp. 276-295

also with three very low dimensional data and shows significant increase in the metrics like
Hamming Loss and Logarithmic loss. The experiments also proves that the propose method
can be applied to find out the features that are discriminative unlike the other competitive
techniques. The method can also be used as a pre-processing method with combination of
other techniques in an efficient manner [23].

4. Jun Huang et al [2018]introduced a novel method through which the joint feature extraction
and the classification is performed using JFSC. The proposed method is intended to learn
from both the shared and label-specific feature by the consideration of pairwise correlation
and the multi-label classifier is then built on the learned data of low-dimension in a parallel
way. The performance of the proposed method was found to be better in terms of throughput
and time complexity when compared with that of the state of the art algorithms available in
the literature for multi-label data [24].
.
5. J. González-López et al [2019]proposed a MI (Mutual Information) model which is
distributive in nature. The model adapts for multi-label data using the Apache Spark. Couple
of approaches are proposed in the MI maximization and the minimum redundancy with
maximum relevance. The first one select the feature’s subset and the latter in addition reduces
the redundancy between the features. The experiments compares the distributed multilevel
model on 10 different datasets. The results were validates using the metrics such as Chi-
Square and correlation values and it is found that the MIM performs well and also reduces the
time complexity in orders of the magnitudes considered [25].

6. JianhuaXu et al [2019]Approximated the mesure of Moore-Penrose inverse matrix and the

kernel for calculating the space for the feature and kernel delta for the calculation of label
space. The symmetrization of entire matrix is done in the trace of the operation which results
in the effective approximation and symmetrized representations. Based on the projections of
orthogonality, and by maximizing a modified form of the model paved way for a new
eigenvalue issue for the linear Feature Extraction to be solved. Experiments were conducted
on 12 different datasets and the proposed method outstands seven other existing methods in
terms of Entropy and Correlation. The method also outperformed other three statistical tests
in the same domain [26].

7. Lin Sun et al [2019]proposed a novel thorough assessment capacity of CFS (NCFS) .The
NCFS is brought as a wellness work into the first BPSO and improved BPSO calculations to
advance multilabel order in the early and later stages, individually, and the enhancement
procedure is ended when the most extreme number of cycles is come to. Next, the Lebesgue
proportion of the local class is produced for examining the local estimate precision and the
reliance degree dependent on MNRS. Different properties are found, and the connections
among these measures are utilized to assess the vulnerability and relationships among names
of multilabel information. At long last, a half breed channel wrapper include choice
calculation utilizing NCFS-BPSO is intended for to begin with taking out excess highlights to
diminish the multifaceted nature, and a heuristic forward multilabel highlight determination
calculation is proposed for improving the exhibition of multilabel grouping. Trial results on

ISSN: 2005-4297 IJCA

283
Copyright ⓒ 2019 SERSC
International Journal of Control and Automation
Vol. 12, No. 6, (2019), pp. 276-295

fifteen multilabel datasets exhibit that our proposed calculations are compelling as far as
Sensitivity, Specificity and F-Score and in choosing noteworthy highlights and
acknowledging extraordinary characterization execution in multilabel neighborhood choice
frameworks [27].
8. A. A. Bidgoli et al [2018] proposed a model for optimization by considering the many-object
relation. The proposed method aims to select the sub features for the multi-label data that are
based on four important objectives namely the number of features, the two error measures
viz. the hamming and Logarithmic loss and the time complexity in extracting the features.
The many-objective problem is resolved using the NGSA-III (binary version) . The
experiments were carried out on several bench mark data which are multi-labelled and in
terms of many multi-objective assessment. The results show a great improvement in the
performance when compared to its peer NGSA-II in terms of lower values in loss metrics[28]

9. Jayaraman K Valadi et al [2017]introduced an efficient change in the Multi-label feature

extraction that are available in the literature. The method consisted of two phases. The first
phase , decomposition of the output label in smaller dimensions is done with the aid of
Simple Matrix factorization and the feature extraction method are deployed readily in the
reduced space. The simulated experiments with real time data revealed more efficiency in
terms of Chi-Square values [29]

10. Ali El-ZaartZiadAbdallah et al [2017]proposed the two most significant methods (MIML-
BOOST and MIML-SVM). The disadvantages of these current techniques that they didn't
take into contemplations: a) the depiction of the basic attributes from the picture, b) the
relationship between's marks. To conquer these issues a novel calculation (MIML-
GABORLPP) is proposed, which all the while handles these confinements. The calculation
utilizes Gabor channel bank as highlight descriptor to deal with the principal confinement. It
applies the Label Priority Power-set as Multi-mark change to take care of the issue of name
connection. The trial work shows that the aftereffects of MIML-GABORLPP are better as far
Sno Method Metric

11. as three assessment measurements, for example, F-measure , F-score and RMSE when
contrasted with other existing strategies [30]
The table 2.3 gives the methods and metrics used for the feature extraction of Multi-label Big
data.

ISSN: 2005-4297 IJCA

284
Copyright ⓒ 2019 SERSC
International Journal of Control and Automation
Vol. 12, No. 6, (2019), pp. 276-295

1 Random Sampling Entropy , Correlation and RMSE

2 Dynamic Feature Mask Sensitivity , Specificity , RMSE and F-Score
3 SCF Hamming Loss , Logarithmic loss
4 JFSC Throughput , Time Complexity
5 MI Maximization Chi Square , Correlation
6 Moore-Penrose inverse matrix Entropy , Correlation
7 NCFS Sensitivity , Specificity and F-Score
8 NGSA III Hamming loss , Ranking Loss
9 MLFS Chi Square
10 MIML-GABORLPP F-measure , F-score and RMSE

Table 2.3 Method and Metrics for Feature Selection in Multilabel Big data

D. Feature Extraction methods of Big Multi-View data

The multi-view data are those that has variety of features which are extracted from the similar
raw data. The data is meant for the investigation of the results from multiple clustering that runs
across each other and among different features so as to define the notion of “freshness and
interestingness”. The feature extraction of these data can bring out both interesting and un-
interesting results which are useful in either ways. The available methods and techniques for the
feature extraction applied in Multi-View data are discussed below.

1. Z. Wang et al [2015]proposed a novel method of Feature Extraction using the NMF – Multi-
view combined with graphical regularization , where the inside-view of the relationship
among the data is taken for consideration. The Matrix factorization is proposed by using the
construction of nearest k means neighbor for the integration of local geo information in each
view and to apply couple of rules for update in an iterative fashion. This is done to resolve the
optimization issue, The experimental results proves the effectiveness in terms of Entropy ,
Correlation and RMSE when compared with other methods [31]

2. ZALL R et al [2016] proposed a two-see semi-supervised learning strategy called semi-

regulated semi-supervised random correlation ensemble based on spectral clustering
(SS_RCE).SS_RCE utilizes a multi-see technique dependent on spatial clustering which
exploits discriminative data in various perspectives to gauge naming data of unlabeled
examples. So as to upgrade discriminative intensity of CCA highlights, consolidation of the
marking data of both unlabeled and named tests into CCA is performed. At that point,
irregular relationship is utilized between inside class tests from cross view to separate
different connected highlights for preparing segment classifiers. A general model in particular
SSMV_RCE is likewise stretched out to develop troupe technique to handle semi-managed
learning within the sight of numerous perspectives. The proposed techniques are contrasted
and existing multi-see highlight extraction strategies utilizing multi-see semi-administered
troupes. Test results on different multi-see informational indexes are introduced to exhibit the

ISSN: 2005-4297 IJCA

285
Copyright ⓒ 2019 SERSC
International Journal of Control and Automation
Vol. 12, No. 6, (2019), pp. 276-295

adequacy of the proposed techniques as far as Sensitivity, Specificity, RMSE and F-Score is
considered [32].

3. Zhiqiang Zuo et al [2014]proposed the MVMTFE framework that handles the features that
are multi-view in nature. The proposed method is applied in medical image classification.
The proposed method learns the FE matrix for every view and the combination of view co-
efficients. By this way, the proposed method is not only intended to handle the correlated and
noisy type of Features, but also to make use of the complimentaries of various views which
can further help in reducing the redundancy in every view. A specific algorithm is also
developed for the optimization factor through which each sub problem can also be solved.
The experiments produced very low Hamming Loss and Logarithmic loss which makes the
proposed method more stable than other techniques [33]

4. Hongfu Liu et al [2016]meant to find the discriminative highlights in each view for better
clarification and portrayal. The expectation was to give a right comprehension of multi-see
highlight determination. Not quite the same as the current work, which either erroneously
connects the highlights from various perspectives, or takes colossal time multifaceted nature
to get familiar with the pseudo marks, and proposed a novel calculation, Robust Multi-see
Feature Selection (RMFS), which applies hearty multi-see K-intends to acquire the powerful
and excellent pseudo names for scanty element determination in a proficient manner.
Nontrivially we give the solutio by taking the subsidiaries and further give a K-implies like
enhancement to refresh a few factors in a brought together structure with the combination
ensure. The broad investigations on three genuine world multi-see informational indexes,
which represent the viability and effectiveness of RMFS in terms ofThroughput and Time
Complexity[34].

5. Xuan Wu et al [2019]presented a unique approach called the SIMM for the multi-view and
multi-label feature extraction. This proposed method is intended to solve the issue of
subspaces that are shared and to view specific information being extracted. For the subspaces
that are shared, The SIMM minimizes the confusion loss and the loss due to multi-label for
the utilization of shared subspace for the utilization of the view dependent discriminative
features. Intense experiments were carried out on real-time data that clearly depicts the
performance improvement in-terms of Chi Square and Correlation [35]

6. Yasser Elmanzalawi et al [2018]introduced a novel multiview feature extraction method

based on CCA (“canonical correlation analysis”) which is used to extract unique features
from the multi-view data sets. The results obtained demonstrate that the model is effective in
predicting the KIRC (“kidney renal clear cell carcinoma “) diseases. The proposed method
can also be jointly used with other methods such as CAN, RNA-Sequence and Reversed
phase proteins array. The results outperform the other models that are trained using single-
view by achieving low values of Entropy and Correlation and an integrated model is also
brought in using the data-fusion method based CCA feature extraction[36]

ISSN: 2005-4297 IJCA

286
Copyright ⓒ 2019 SERSC
International Journal of Control and Automation
Vol. 12, No. 6, (2019), pp. 276-295

7. Wenzhang Zhuge et al [2018]introduced a unique framework FESG which is intended to

learn both the transformation matrix and the ideal and structural graph that contains the
information in the clusters. A novel method is also proposed for the extension of this FSEG
method for the multi-view feature extraction. The extension is names as MFESG (“Multiple
view feature extraction with structured graph”) and it is aimed to learn the optimal weight for
all the views in an automated way. The experimental results show that the proposed method
are effective in terms of Sensitivity , Specificity and F-Score [37].

8. OlcayKursun et al [2017]proposed a CCA based technique which is back supported by the

LDA for the multi-view feature extraction on a high dimensional data. The Canonical
Analysis is modified to work with two sets of variables that are interrelated and the linear
projections are calculated. The Maximum Mutual Correlation is reduced using the Fisher’s
Linear Discriminant Analysis. The results show a considerable performance as it produced
very low hamming loss and logarithmic loss [38]

9. Michele Volpi et al [2013] proposed a un-supervised multi-view feature extraction method

which is used before the classification process. A technique that automatically extract the
blocks based on the matrix of spectral correlation globally is then applied. The correlation
analysis is done using the conventional canonical method in the kernels. The proposed
method is then implemented in a multi-view setting (MVkCCA) inorder to identify the
projections in the data blocks. Experiments were carried out using LDA and the proposed
method shows an increased appropriateness by producing low value for the Chi-Square when
compared to other methods [39]
The table 2.4 gives a lucid description of the metrics that are used in the feature extraction of
Multi-view big data.

Sno Method Metrics

1 multi-view NMF Entropy , Correlation and RMSE
2 SSMV_RCE Sensitivity , Specificity , RMSE and F-Score
3 MVMTFE Hamming Loss , Logarithmic loss
4 RMFS Throughput , Time Complexity
5 SIMM Chi Square , Correlation
6 KIRC Entropy , Correlation
7 MFESG Sensitivity , Specificity and F-Score
8 LDA - CCA Hamming loss , Ranking Loss
9 MVkCCA Chi Square
Table 2.4 Methods and Metrics for the feature selection of Multi-View Big Data

Section III
A. COMPARITIVE RESULT ANALYSIS
The comparative analysis of the results obtained from the various methods considered based on
the metrics adapted are listed below. The metrics are grouped in sets irrespective of the data type
considered in the methodologies. The set1 is defined by S1 = {Sensitivity, Specificity, F-Score} ,

ISSN: 2005-4297 IJCA

set2 is defined by S2= { Entropy ,Correlation, RMSE} , S3 = { Through put , Tme complexity} ,
S4= {Hamming Loss ,Ranking Loss, Logarithmic loss} and S5={Chi-square }

Method Sensitivity Specificity F Score

MFKE 0.83 0.78 0.85
SAOLA 0.62 0.85 0.82
META-AlGORITHM 0.94 0.96 0.91
EEM 0.81 0.72 0.82
DFM 0.94 0.69 0.76
NCFS 0.86 0.81 0.80
SSMV_RCE 0.74 0.90 0.79
MFESG 0.89 0.79 0.75

Table 3.1 Experimental results based on metric set1

Figure 3.1 shows the graphical representation of the Table 3.1 and it is identified from the graph
that Meta-Algorithm gets the higher value for the metrics considered.

Figure 3.1 – Experimental Results based on Metric Set1

Table 3.2 shows the overview of the results obtained based on the metrics set2.

Method Entropy Correlation RMSE

DGOD 0.74 0.84 0.89

ISSN: 2005-4297 IJCA

SIE 0.98 0.92 0.57

LUFS 0.65 0.76 0.74
MPF 0.87 0.94 0.95
RS 0.80 0.72 0.58
MOORE 0.74 0.85 0.78
NMF 0.96 0.68 0.84
KIRC 0.61 0.79 0.92

Table 3.2Experimental results based on metric set2

It is obvious from the table that the NMF method has a high Entropy value which makes it less
stable as the no of records in the data increases. The MPF method has a high correlation which
makes the method superior in case of handling the dimensionality reduction. The RMSE value is
high is case of MPF which makes it more accurate. The Figure 3.2 gives the graphical
representation of the results in table 3.3

Figure 3.2 – Experimental Results based on Metric Set 2

Table 3.3 gives the empirical analysis of the results obtained based on the metric set 3. The
throughput and time complexity is measured in milliseconds. From the table it is evident that the
RMFS has the highest throughput

Method Throughput (Ms) Time complexity (Ms)

SIP 658 451

ISSN: 2005-4297 IJCA

KUFS 542 393

FS_APACHE 482 308
JFSC 854 691
RMFS 901 219
Table 3.3 Experimental results based on Metric Set 3

Figure 3.3 – Experimental Result based on Metric Set 3

Figure 3.3 gives the graphical representation of the results based on the Metric Set 3 and from the
values listed in Table 3.4. Table 3.5 lists the experimental result based on the Metric Set 4. The
null values in the table indicate that the method did not consider the particular metric head.

Method Hamming Loss Ranking Loss Logarithmic Loss

Chi-Square - - 0.265
O (|U|) Method - - 0.542
SCF 0.218 - 0.368
NGSA III 0.658 0.451 -
MVMTFE 0.521 - 0.129
LDA_CCA 0.458 0.468 -

Table 3.4 Experimental Results based on Metric Set 4

ISSN: 2005-4297 IJCA

Figure 3.4 – Experimental results based on Metric Set 4

The graphical representation of the results shows that the SCF method has the least Hamming
loss of 0.218 and NGSAIII has the least ranking loss and low Logarithmic Loss is maintained by
MVMTFE method. The table 3.5 depicts the result obtained based on the Metric Set 5.

Method Chi-Square
FSA 0.34
MIM 0.27
Chi-Sqaure 0.62
O(|U|) Method 0.54
MI-Max 0.38
MLFS 0.23
SIMM 0.48
MvKCCA 0.67
Table 3.5 Experimental Results based on Metric Set 5
The figure 3.5 represents the graphical notation of the results that are tabulated in 2.9. from the
figure it is understood that the MIM method has the lowest Chi-Square value which implies that
the method has the best data fitness function when compared to all the other methods.

ISSN: 2005-4297 IJCA

Figure 5 – Experimental Results based on Metric Set 5

SECTION IV

CONCLUSION
Big data analytics is one of the most needed and active research area in the information era. The
exponential growth in data accumulation makes it high nesseccary for a suitable technology to analyze the
same and to bring out patterns and hidden knowledge for the business and strategic growth for any
organization. One of the important and critical issue faced during the analysis is the dimensionality of the
large data. Feature selection and extraction are the two methods through which this issue is addressed.
This paper is intended to produce a clear description on the available methods for the feature extraction in
Big Data, the metrics that were used and their results comparison. The metrics used were segregated into
five different sets and a cumulative representation was made through facts and figures. Out of the study, it
is clear that there is still a large research gap and there are more scope for improvement in the Feature
Extraction domain in Big –Data.

REFRERENCES
1. M. Viceconti, P. Hunter and R. Hose, "Big Data, Big Knowledge: Big Data for Personalized
Healthcare," in IEEE Journal of Biomedical and Health Informatics, vol. 19, no. 4, pp. 1209-1215,
July 2015
2. Elgendy, Nada &Elragal, Ahmed. (2014). Big Data Analytics: A Literature Review Paper. Lecture
Notes in Computer Science. 8557. 214-227. 10.1007/978-3-319-08976-8_16
3. Kong, X., Chang, J., Niu, M. et al. Int J AdvManufTechnol (2018) 99: 1101.
https://doi.org/10.1007/s00170-016-9864-x

ISSN: 2005-4297 IJCA

4. Li, Jundong& Liu, Huan. (2016). Challenges of Feature Selection for Big Data Analytics. IEEE
Intelligent Systems. 32. 10.1109/MIS.2017.38
5. Li, Jundong& Cheng, Kewei& Wang, Suhang&Morstatter, Fred & Trevino, Robert & Tang, Jiliang&
Liu, Huan. (2016). Feature Selection: A Data Perspective. ACM Computing Surveys. 50.
10.1145/3136625
6. J. Wan, P. Zheng, H. Si, N. N. Xiong, W. Zhang and A. V. Vasilakos, "An Artificial Intelligence
Driven Multi-Feature Extraction Scheme for Big Data Detection," in IEEE Access, vol. 7, pp. 80122-
80132, 2019
7. J. He, N. Xiong, "An effective information detection method for social big data", Multimedia Tools
Appl., vol. 77, pp. 11277-11305, 2018
8. Jundong Li, Kewei Cheng, Suhang Wang, Fred Morstatter, Robert P. Trevino, Jiliang Tang, and
Huan Liu. 2017. Feature Selection: A Data Perspective. ACM Comput. Surv. 50, 6, Article 94
(December 2017)
9. Mingkui Tan, Ivor W. Tsang, and Li Wang. 2014. Towards ultrahigh dimensional feature selection
for big data. J. Mach. Learn. Res. 15, 1 (January 2014), 1371-1429.
10. Kui Yu, Xindong Wu, Wei Ding, and Jian Pei. 2016. Scalable and Accurate Online Feature Selection
for Big Data. ACM Trans. Knowl. Discov. Data 11, 2, Article 16 (December 2016
11. Jun Lee, Kyoung-Sook Kim, YongJin Kwon, and Hirotaka Ogawa. 2017. Understanding human
perceptual experience in unstructured data on the web. In Proceedings of the International Conference
on Web Intelligence (WI '17). ACM, New York, NY, USA, 491-498
12. Thee Zin Win and Nang Saing Moon Kham. 2018. Mutual Information-based Feature Selection
Approach to Reduce High Dimension of Big Data. In Proceedings of the 2018 International
Conference on Machine Learning and Machine Intelligence (MLMI2018). ACM, New York, NY,
USA, 3-7
13. Tang, Jiliang& Liu, Huan. (2014). An Unsupervised Feature Selection Framework for Social Media
Data. Knowledge and Data Engineering, IEEE Transactions on. 26. 2914-2927 2015
14. Liu H, Yu L. Toward integrating feature selection algorithms for classification and clustering. IEEE
Transactions on Knowledge & Data Engineering. Apr 1(4):491--502, (2015).
15. L. Li, "Evaluation Model of Education Service Quality Satisfaction in Colleges and Universities
Dependent on Classification Attribute Big Data Feature Selection Algorithm," 2019 International
Conference on Intelligent Transportation, Big Data & Smart City (ICITBS), Changsha, China, 2019,
pp. 645-649
16. M. B. Çatalkaya, O. Kalipsiz, M. S. Aktas and U. O. Turgut, "Data Feature Selection Methods on
Distributed Big Data Processing Platforms," 2018 3rd International Conference on Computer Science
and Engineering (UBMK), Sarajevo, 2018, pp. 133-138
17. J. A. Sáez and E. Corchado, "KSUFS: A Novel Unsupervised Feature Selection Method Based on
Statistical Tests for Standard and Big Data Problems," in IEEE Access, vol. 7, pp. 99754-99770, 2019
18. S. Ramírez-Gallego et al., "An Information Theory-Based Feature Selection Framework for Big Data
Under Apache Spark," in IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 48, no.
9, pp. 1441-1453, Sept. 2018
19. K. Gong, Y. Wang, M. Xu and Z. Xiao, "BSSReduce an $O(\left|U\right|)$ Incremental Feature
Selection Approach for Large-Scale and High-Dimensional Data," in IEEE Transactions on Fuzzy
Systems, vol. 26, no. 6, pp. 3356-3367, Dec. 2018

ISSN: 2005-4297 IJCA

20. M. Yamada et al., "Ultra High-Dimensional Nonlinear Feature Selection for Big Biological Data," in
IEEE Transactions on Knowledge and Data Engineering, vol. 30, no. 7, pp. 1352-1365, 1 July 2018
21. Sechidis K., Tsoumakas G., Vlahavas I. (2011) On the Stratification of Multi-label Data. In:
Gunopulos D., Hofmann T., Malerba D., Vazirgiannis M. (eds) Machine Learning and Knowledge
Discovery in Databases. ECML PKDD 2011. Lecture Notes in Computer Science, vol 6913.
Springer, Berlin, Heidelberg
22. Fahy and S. Yang, "Dynamic Feature Selection for Clustering High Dimensional Data Streams," in
IEEE Access, vol. 7, pp. 127128-127140, 2019
23. W. Ke, C. Wu, Y. Wu and N. N. Xiong, "A New Filter Feature Selection Based on Criteria Fusion for
Gene Microarray Data," in IEEE Access, vol. 6, pp. 61065-61076, 2018
24. J. Huang, G. Li, Q. Huang and X. Wu, "Joint Feature Selection and Classification for Multilabel
Learning," in IEEE Transactions on Cybernetics, vol. 48, no. 3, pp. 876-889, March 2018
25. J. González-López, S. Ventura and A. Cano, "Distributed Selection of Continuous Features in
Multilabel Classification Using Mutual Information," in IEEE Transactions on Neural Networks and
Learning Systems,2019
26. J. Xu and Z. Mao, "Multilabel Feature Extraction Algorithm via Maximizing Approximated and
Symmetrized Normalized Cross-Covariance Operator," in IEEE Transactions on Cybernetics,2019
27. L. Sun, T. Yin, W. Ding and J. Xu, "Hybrid Multilabel Feature Selection Using BPSO and
Neighborhood Rough Sets for Multilabel Neighborhood Decision Systems," in IEEE Access, vol. 7,
pp. 175793-175815, 2019
28. A. A. Bidgoli, H. Ebrahimpour-Komleh and S. Rahnamayan, "A Many-objective Feature Selection
Algorithm for Multi-label Classification Based on Computational Complexity of Features," 2019
29. J. K. Valadi, P. T. Ovhal and K. J. Rathore, "A Simple Method of Solution For Multi-label Feature
Selection," 2019 IEEE International Conference on Electrical, Computer and Communication
Technologies (ICECCT), Coimbatore, India, 2019, pp. 1-4
30. E. ZiadAbdallah and M. Oueidat, "An Improved Framework For Image Multi-label Classification
Using Gabor Feature Extraction," 2017 International Conference on Computer and Applications
(ICCA), Doha, 2017, pp. 151-157
31. Z. Wang, X. Kong, H. Fu, M. Li and Y. Zhang, "Feature extraction via multi-view non-negative
matrix factorization with local graph regularization," 2015 IEEE International Conference on Image
Processing (ICIP), Quebec City, QC, 2015, pp. 3500-3504
32. ZALL R , KEYVANPOUR ,"Semi-Supervised Multi-View Ensemble Learning Based On Extracting
Cross-View Correlation",Advances in Electrical and Computer Engineering, Volume 16, Issue 2,
Year 2016, On page(s): 111 - 124
33. ZhiqiangZuo, Yong Luo, Dacheng Tao, and Chao Xu. 2014. Multi-view Multi-task Feature
Extraction for Web Image Classification. In Proceedings of the 22nd ACM international conference
on Multimedia (MM '14). ACM, New York, NY, USA, 1137-114
34. Hongfu Liu, Haiyi Mao and Yun Fu,"Robust Multi-View Feature Selection ",2016 IEEE 16th
International Conference on Data Mining
35. XuanWu,Qing-GuoChen,YaoHu,DengbaoWang,XiaodongChang,Multi-View Multi-Label Learning
with View-Specific Information Extraction , Proceedings of the Twenty-Eighth International Joint
Conference on Artificial Intelligence (IJCAI-19)

ISSN: 2005-4297 IJCA

36. Yasser Elmanzalawi , CCA based multi-view feature selection for multiomics data integration , 2018
IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology,
CIBCB 2018
37. W. Zhuge, F. Nie, C. Hou and D. Yi, "Unsupervised Single and Multiple Views Feature Extraction
with Structured Graph," in IEEE Transactions on Knowledge and Data Engineering, vol. 29, no. 10,
pp. 2347-2359, 1 Oct. 2017
38. Kursun O., Alpaydin E. (2010) Canonical Correlation Analysis for MultiviewSemisupervised Feature
Extraction. In: Rutkowski L., Scherer R., Tadeusiewicz R., Zadeh L.A., Zurada J.M. (eds) Artificial
Intelligence and Soft Computing. ICAISC 2010. Lecture Notes in Computer Science, vol 6113.
Springer, Berlin, Heidelberg
39. Michele Volpi1,Giona Matasci1,Mikhail Kanevski1,Devis Tuia,Multi-view feature extraction for
hyperspectral image classification, European Symposium on Artificial Neural Networks,
Computational Intelligence and Machine Learning. Bruges (Belgium), 24-26 April 2013

ISSN: 2005-4297 IJCA

UFED Physical Analyzer v5.0 Manual March2016
No ratings yet
UFED Physical Analyzer v5.0 Manual March2016
238 pages
Fortigate Cookbook 505 Expanded
No ratings yet
Fortigate Cookbook 505 Expanded
364 pages
Venter Review 2010
No ratings yet
Venter Review 2010
12 pages
Smart Software Manager Satellite Enhanced Edition 6.2.0 Installation Guide
No ratings yet
Smart Software Manager Satellite Enhanced Edition 6.2.0 Installation Guide
16 pages
Api Guide
No ratings yet
Api Guide
26 pages
Saurabh Atm Synopsis
100% (1)
Saurabh Atm Synopsis
33 pages
3.2.1 Create Sales Order in Java Web Dynpro Using BAPI
No ratings yet
3.2.1 Create Sales Order in Java Web Dynpro Using BAPI
55 pages
Rm07mlbs Sap Report
No ratings yet
Rm07mlbs Sap Report
16 pages
SPSA (Unix Notes)
50% (2)
SPSA (Unix Notes)
66 pages
Papers/electronics-Communication-Engineering-Ece/33133052: 25 Best Embedded Systems Interview Questions and Answers
No ratings yet
Papers/electronics-Communication-Engineering-Ece/33133052: 25 Best Embedded Systems Interview Questions and Answers
4 pages
ANZSCO Code Information: Australian Bureau of Statistics List of Eligible Skilled Occupations
No ratings yet
ANZSCO Code Information: Australian Bureau of Statistics List of Eligible Skilled Occupations
24 pages
Operating System
No ratings yet
Operating System
20 pages
Myrose PDF
No ratings yet
Myrose PDF
27 pages
As 61508.3-2011 Functional Safety of Electrical Electronic Programmable Electronic Safety-Related Systems Sof
0% (1)
As 61508.3-2011 Functional Safety of Electrical Electronic Programmable Electronic Safety-Related Systems Sof
12 pages
Consideration of Internal Control in An Information Technology Environment
No ratings yet
Consideration of Internal Control in An Information Technology Environment
11 pages
Digital Image and Video Processing - An Introduction: Spring '09 Instructor: Min Wu
No ratings yet
Digital Image and Video Processing - An Introduction: Spring '09 Instructor: Min Wu
38 pages
Bluehill Universal Operator Dashboard
No ratings yet
Bluehill Universal Operator Dashboard
2 pages
Outlook Search Tips PDF
No ratings yet
Outlook Search Tips PDF
7 pages
Chapter 26: Network Security
No ratings yet
Chapter 26: Network Security
70 pages
Wave Clus Intro
No ratings yet
Wave Clus Intro
7 pages
Implementation of FFT by Using Matlab: Simulink On Xilinx Virtex-4 Fpgas: Performance of A Paired Transform Based FFT
No ratings yet
Implementation of FFT by Using Matlab: Simulink On Xilinx Virtex-4 Fpgas: Performance of A Paired Transform Based FFT
7 pages
Data Mining Notes C2
No ratings yet
Data Mining Notes C2
12 pages
QTP New Feature Version 10.0
No ratings yet
QTP New Feature Version 10.0
23 pages
Utl File
No ratings yet
Utl File
14 pages
Anti Bot Datasheet
No ratings yet
Anti Bot Datasheet
2 pages
Nls Arabic
No ratings yet
Nls Arabic
8 pages
Understanding Cloud
No ratings yet
Understanding Cloud
8 pages
An Improved Fast Clustering Method For Feature Subset Selection On High-Dimensional Data Clustering
No ratings yet
An Improved Fast Clustering Method For Feature Subset Selection On High-Dimensional Data Clustering
5 pages
Creating External Table.
No ratings yet
Creating External Table.
6 pages
Feature Selection and Its Use in Big Data
No ratings yet
Feature Selection and Its Use in Big Data
17 pages
Solaris, VXVM, Cluster L1 - L2 and L3 Also Interview Questions - Exploring Solaris and Veritas
No ratings yet
Solaris, VXVM, Cluster L1 - L2 and L3 Also Interview Questions - Exploring Solaris and Veritas
4 pages
Data Mining Machine Learning and Big Dat
No ratings yet
Data Mining Machine Learning and Big Dat
7 pages
Data Types in C
No ratings yet
Data Types in C
3 pages
Icsp2016 RM
No ratings yet
Icsp2016 RM
7 pages
Database Design and Implementation Exam June 2006 - UK University BSC Final Year
100% (1)
Database Design and Implementation Exam June 2006 - UK University BSC Final Year
10 pages
Conference 101719
No ratings yet
Conference 101719
7 pages
Comparartive
No ratings yet
Comparartive
7 pages
Unit - 3
No ratings yet
Unit - 3
12 pages
Applied Techniques for GPT-3: Definitive Reference for Developers and Engineers
From Everand
Applied Techniques for GPT-3: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Informatica Solutions and Data Integration: Definitive Reference for Developers and Engineers
From Everand
Informatica Solutions and Data Integration: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
InfluxDB Essentials: Definitive Reference for Developers and Engineers
From Everand
InfluxDB Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Operational Monitoring with Datadog: Definitive Reference for Developers and Engineers
From Everand
Operational Monitoring with Datadog: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Sentry Error Monitoring and Application Observability: Definitive Reference for Developers and Engineers
From Everand
Sentry Error Monitoring and Application Observability: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Workfront Implementation and Optimization Techniques: Definitive Reference for Developers and Engineers
From Everand
Workfront Implementation and Optimization Techniques: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Big Data: Statistics, Data Mining, Analytics, And Pattern Learning
From Everand
Big Data: Statistics, Data Mining, Analytics, And Pattern Learning
Rob Botwright
No ratings yet
Data Mining: Fundamentals and Applications
From Everand
Data Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
Synthetic Data Generation: A Beginner’s Guide
From Everand
Synthetic Data Generation: A Beginner’s Guide
Robert Johnson
No ratings yet
The Power of Big Data: Transforming Industries and Shaping the Future
From Everand
The Power of Big Data: Transforming Industries and Shaping the Future
Tom Henricksen
No ratings yet
Practical Data Strategies and Recipes
From Everand
Practical Data Strategies and Recipes
Tom Henricksen
No ratings yet
Database Management System
From Everand
Database Management System
Manish Soni
No ratings yet
Learn Hadoop in 24 Hours
From Everand
Learn Hadoop in 24 Hours
Alex Nordeen
No ratings yet
Management Information systems - MIS: Business strategy books, #4
From Everand
Management Information systems - MIS: Business strategy books, #4
SANJIVAN SAINI
No ratings yet
Big Data Ethics in Research
From Everand
Big Data Ethics in Research
Nicolae Sfetcu
No ratings yet
Data Management
From Everand
Data Management
IntroBooks Team
No ratings yet
Hadoop BIG DATA Interview Questions You'll Most Likely Be Asked
From Everand
Hadoop BIG DATA Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Data Analytics and Data Processing Essentials
From Everand
Data Analytics and Data Processing Essentials
gareth thomas
No ratings yet
Data Entry Operator: Skills, Software, Career Tips, and Interview Q&A
From Everand
Data Entry Operator: Skills, Software, Career Tips, and Interview Q&A
Sumitra Kumari
No ratings yet
Data-Driven Decision Making
From Everand
Data-Driven Decision Making
Aadinath Pothuvaal
No ratings yet
Spreadsheets To Cubes (Advanced Data Analytics for Small Medium Business): Data Science
From Everand
Spreadsheets To Cubes (Advanced Data Analytics for Small Medium Business): Data Science
alasdair gilchrist
No ratings yet
"Big Data Science" Basic Concepts and Applications
From Everand
"Big Data Science" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet
Business Intelligence and Data Mining Techniques
From Everand
Business Intelligence and Data Mining Techniques
Dwaipayan Sethi
No ratings yet
Data Analytics with Generative AI
From Everand
Data Analytics with Generative AI
Younish P
No ratings yet
"Data Analysis" Basic Concepts and Applications
From Everand
"Data Analysis" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet
Data Analytics with Python: Data Analytics in Python Using Pandas
From Everand
Data Analytics with Python: Data Analytics in Python Using Pandas
Frank Millstein
3/5 (1)
Python Data Wrangling for Business Analytics: Python for Business Analytics Series
From Everand
Python Data Wrangling for Business Analytics: Python for Business Analytics Series
George Snypes
2/5 (1)
Introduction to Business Analytics
From Everand
Introduction to Business Analytics
Dwaipayan Sethi
No ratings yet
Knight's Microsoft Business Intelligence 24-Hour Trainer: Leveraging Microsoft SQL Server Integration, Analysis, and Reporting Services with Excel and SharePoint
From Everand
Knight's Microsoft Business Intelligence 24-Hour Trainer: Leveraging Microsoft SQL Server Integration, Analysis, and Reporting Services with Excel and SharePoint
Brian Knight
3/5 (1)
Mastering Data Mining Techniques
From Everand
Mastering Data Mining Techniques
Dhaanyalakshmi Ahuja
No ratings yet
Data Governance for Tax Administrations: A Practical Guide
From Everand
Data Governance for Tax Administrations: A Practical Guide
Inter-American Center of Tax Administrations – CIAT
No ratings yet
Big Data and Data Science: Analytics for the Future
From Everand
Big Data and Data Science: Analytics for the Future
Dhaanyalakshmi Ahuja
No ratings yet
Principles of Data Mining
From Everand
Principles of Data Mining
Subodh Keshari
No ratings yet
Decision Making with Data
From Everand
Decision Making with Data
Ravi Deshpande
No ratings yet
Managing Big Data Effectively
From Everand
Managing Big Data Effectively
Bhima Asan
No ratings yet
Essentials of Data Analysis
From Everand
Essentials of Data Analysis
Agasti Khatri
No ratings yet
Data Science Career Guide Interview Preparation
From Everand
Data Science Career Guide Interview Preparation
Gradient Publication
No ratings yet
Enterprise Data Science: Smarter Decisions with Big Data
From Everand
Enterprise Data Science: Smarter Decisions with Big Data
Vidhur Gupta
No ratings yet
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
From Everand
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
WINTON CLEM
No ratings yet
Data Analytics for Businesses 2019: Master Data Science with Optimised Marketing Strategies using Data Mining Algorithms (Artificial Intelligence, Machine Learning, Predictive Modelling and more)
From Everand
Data Analytics for Businesses 2019: Master Data Science with Optimised Marketing Strategies using Data Mining Algorithms (Artificial Intelligence, Machine Learning, Predictive Modelling and more)
Riley Adams
5/5 (1)
Applied Predictive Modeling: An Overview of Applied Predictive Modeling
From Everand
Applied Predictive Modeling: An Overview of Applied Predictive Modeling
Steven Taylor
No ratings yet
Analytics and Big Data for Accountants
From Everand
Analytics and Big Data for Accountants
Jim Lindell
No ratings yet
From Data To Decisions: Driving Performance in the Age of Analytics
From Everand
From Data To Decisions: Driving Performance in the Age of Analytics
Babatunde Yusuf
No ratings yet
Data-Driven Business Strategies: Understanding and Harnessing the Power of Big Data
From Everand
Data-Driven Business Strategies: Understanding and Harnessing the Power of Big Data
Steven Vollmer
No ratings yet
The Role of Data Management in Building Sustainable AI Systems
From Everand
The Role of Data Management in Building Sustainable AI Systems
Alberto De Miranda
No ratings yet
Data Science and Analytics: Transforming Raw Data into Actionable Insights: A Comprehensive Guide
From Everand
Data Science and Analytics: Transforming Raw Data into Actionable Insights: A Comprehensive Guide
Marlowe Reyes
No ratings yet
Smarter Data Science: Succeeding with Enterprise-Grade Data and AI Projects
From Everand
Smarter Data Science: Succeeding with Enterprise-Grade Data and AI Projects
Neal Fishman
No ratings yet
Big Data: Understanding How Data Powers Big Business
From Everand
Big Data: Understanding How Data Powers Big Business
Bill Schmarzo
2/5 (1)
PYTHON FOR DATA ANALYTICS: Mastering Python for Comprehensive Data Analysis and Insights (2023 Guide for Beginners)
From Everand
PYTHON FOR DATA ANALYTICS: Mastering Python for Comprehensive Data Analysis and Insights (2023 Guide for Beginners)
Waldo Todd
No ratings yet
Business Analytics: Leveraging Data for Insights and Competitive Advantage
From Everand
Business Analytics: Leveraging Data for Insights and Competitive Advantage
Ronald BLaha
No ratings yet
Building and Operating Data Hubs: Using a practical Framework as Toolset
From Everand
Building and Operating Data Hubs: Using a practical Framework as Toolset
Georg Graner
No ratings yet
Enabling World-Class Decisions: The Executive’s Guide to Understanding & Deploying Modern Corporate Performance Management Solutions
From Everand
Enabling World-Class Decisions: The Executive’s Guide to Understanding & Deploying Modern Corporate Performance Management Solutions
Corey Barak
No ratings yet

Robin 1 PDF

Uploaded by

Robin 1 PDF

Uploaded by

International Journal of Control and Automation

Vol. 12, No. 6, (2019), pp. 276-295

An Empirical Study on Methods, Metrics and Evaluation on Feature

ISSN: 2005-4297 IJCA

ISSN: 2005-4297 IJCA

2. Jinrong He et al [2017]proposed DGOD (“Decision Graph Based Outlier Detection”). This

3. Jundong Li et al [2018]provided a lucid overview of the recent researches in the feature

ISSN: 2005-4297 IJCA

comprehensive performance measured in terms of Throughput and Time complexity when

Sno Method Metrics

ISSN: 2005-4297 IJCA

6 SIE Correlation and Entropy

B. Feature Extraction methods of Big structured data

2. Huan Liu et al [2016]proposed a unified platform as an intermediate. An exclusive example is

3. Lianzhi Li et al [2019]proposed a method for the building of an evolution model in the

4. Mehmet Burak Çatalkaya et al [2017]presented a software based architecture which can

ISSN: 2005-4297 IJCA

5. J. A. Sáez et al [2019]introduced a novel feature extraction technique in the field of individual

6. Sergio Ramírez-Gallego et al [2018]introduced the concept by which the feature extraction

7. Ke Gong et al [2018]proposed a rationale type of feature extraction, the O (|U|) FE method

ISSN: 2005-4297 IJCA

Sno Method Metrics

5 KSUFS Throughout , Time complexity

C. Feature Extraction methods of Big Multi-label data

ISSN: 2005-4297 IJCA

6. JianhuaXu et al [2019]Approximated the mesure of Moore-Penrose inverse matrix and the

ISSN: 2005-4297 IJCA

9. Jayaraman K Valadi et al [2017]introduced an efficient change in the Multi-label feature

ISSN: 2005-4297 IJCA

1 Random Sampling Entropy , Correlation and RMSE

D. Feature Extraction methods of Big Multi-View data

2. ZALL R et al [2016] proposed a two-see semi-supervised learning strategy called semi-

ISSN: 2005-4297 IJCA

6. Yasser Elmanzalawi et al [2018]introduced a novel multiview feature extraction method

ISSN: 2005-4297 IJCA

7. Wenzhang Zhuge et al [2018]introduced a unique framework FESG which is intended to

8. OlcayKursun et al [2017]proposed a CCA based technique which is back supported by the

9. Michele Volpi et al [2013] proposed a un-supervised multi-view feature extraction method

Sno Method Metrics

ISSN: 2005-4297 IJCA

Method Sensitivity Specificity F Score

Table 3.1 Experimental results based on metric set1

Figure 3.1 – Experimental Results based on Metric Set1

Method Entropy Correlation RMSE

ISSN: 2005-4297 IJCA

SIE 0.98 0.92 0.57

Table 3.2Experimental results based on metric set2

Figure 3.2 – Experimental Results based on Metric Set 2

Method Throughput (Ms) Time complexity (Ms)

SIP 658 451

ISSN: 2005-4297 IJCA

KUFS 542 393

Figure 3.3 – Experimental Result based on Metric Set 3

Method Hamming Loss Ranking Loss Logarithmic Loss

Table 3.4 Experimental Results based on Metric Set 4

ISSN: 2005-4297 IJCA

Figure 3.4 – Experimental results based on Metric Set 4

ISSN: 2005-4297 IJCA

Figure 5 – Experimental Results based on Metric Set 5

ISSN: 2005-4297 IJCA

ISSN: 2005-4297 IJCA

ISSN: 2005-4297 IJCA

ISSN: 2005-4297 IJCA

You might also like