0% found this document useful (0 votes)

24 views

Assignment 4

The document outlines a research proposal on big data classification. It discusses using a MapReduce framework with a proposed optimization algorithm called Exponential Bat (E-Bat) for feature selection and classification. The proposal includes background information, a statement of the problem, aims and objectives, assumptions, hypotheses, research questions, a literature review, and proposed research methodology, data analysis, expected findings, implications, and conclusion. It also provides a table comparing 8 existing works on data classification methods.

Uploaded by

Nayeema Shaik

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views

Assignment 4

Uploaded by

Nayeema Shaik

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

A.

Write a research proposal including the following details:

1. Background to the study:

The background of big data classification is deliberated in section 1 and section 2

depicts the literature review of the existing big data classification methods. In

section 3, the proposed method of big data classification is presented and section 4

discusses the results of the proposed method. Finally, section 5 concludes the

paper.

2. Statement of the problem

The need to solve the multimodal optimization objectives with highly complex
and non-linear constraints insist the researchers to work for developing better
optimizations that assure the global optimization solutions without any conflicting
constraints. Metaheuristics pave a way for multi-objective problems, which never
concludes with a single best solution instead, metaheuristics generate a set of
solutions for a better approximation. Moreover, most of the algorithms developed
based on the metaheuristics is suitable for single objective optimizations rather
than for the multi-objectives, and these existing algorithms convert the multi-
objectives as single objective with the help of weights. On the other hand, the
generation of solutions with better diversity is another challenge faced by the
existing metaheuristics. Additionally, the real-world issues, like uncertainty and
noise should not have impact on the algorithm as it should be robust to permit
inhomogeneity and should offer a good option for the decision-makers to go for
effective decision-making. Thus, metaheuristic algorithms contribute much to the
multi-objective global optimization. Keeping all these in mind, a novel
metaheuristic search algorithm, called as E-bat algorithm is developed. The
proposed E-Bat algorithm is the integration of the EWMA [22] with BA [2].

3. Aims and objectives

The main aim of the research is to establish a big data classification model using

an optimization algorithm. The big data classification is progressed using the

MapReduce framework that uses the proposed optimization algorithm, named

Exponential Bat (E-Bat) algorithm. The proposed algorithm is obtained by

integrating the EWMA with the BA. The big data that is obtained from the
distributed sources is fed to the mapper phase that performs the feature selection

using the E-Bat algorithm. The effective feature extraction ensures the

classification of the data such that the classification accuracy is enhanced. The

selected features are fed to the reducer for data classification, and the reducer

utilizes the Neural Network (NN), which is trained using the proposed E-Bat

algorithm such that the data is classified as various classes.

4. Assumptions

5. Hypothesis

6. Research questions

7. Literature review

Eight literary works related to the data classification framework in the big data environment
is presented in table 1.

Authors Methodology Pros Cons

H. Ke, et al. [1] Big Data The proposed Neglected the use of
Classification with approach achieves deep learning and
Lightweight the performance optimal partitioning.
VGGNet without the need for
denoising the EEG.

Furthermore, the
approach requires
only one
hyperparameter,
which avoids the
potential errors
caused by excessive
parameter settings.

Cost effective
solution

P. Ezatpoor, et al. MapReduced Faster processing Data classification

[2] Enhanced Bitmap time for incomplete data
Index Guided has been left
Algorithm unaddressed

Suitable only for the

data environment
with moderate miss
rate.
Elkano, M. et al. [3] distributed Improves execution Fails to provide
MapReduce time without better results in high
prototype generation comprising accuracy dimensional
method CHI-PG and reduction rates. environment.
Mikel Elkano, et al. Fuzzy Rule-Based The model improved Problem of sizeup is
[4] Classification classification left unaddressed
System accuracy regardless
of total number of The algorithm has
computing nodes. linear relation on
execution time and
scaleup algorithm.
S. Ramírez-Gallego, Nearest Neighbor Alleviates time Driftt changes in
et al. [5] Classification problem occurring data are neglected
during High during data
dimensional classification.
scenario.
Zhai, J., et al. [6] Fuzzy integral-based Simple structure and Does not suitable for
ELM ensemble easy for multi-classification
implementation. of imbalanced data.
Deals with particle
problem of medium
size
Murugan, N.S. and LR-PCA Achieved increased Inefficient in large
Devi, G.U. [7] hybridization detection rate and datasets, and
accuracy neglected the effects
the noise in the data.
R. Varatharajan et LDA with an Use of reduced Using large data
al. [8] enhanced SVM feature set enhanced environments may
the data result in reduced
classification performance.

8. Research Methodology
a) Sample
b) Research Design
c) Tools for data collection

9. Data Analysis(Methods)

10. Expected research findings

11. Expected research implications

12. Conclusion
The paper deals with the proposed big data classification that aimed at
meeting the raising demands of high volume, high velocity, high value,
high veracity, and huge variety. The big data classification is performed
using the MapReduce framework such that the data from the distributed
sources is handled parallel at the same time. The big data is analyzed by
the MapReduce framework to yield the classified results and the
processing is of two steps. The first step is feature extraction that extracts
the optimal features from the data using the proposed E-Bat algorithm in
the mappers. In contrary, the classification is performed in the reducers
that are provided with the NN. The optimal tuning of the weights of NN is
processed using the proposed EBatNN algorithm. The final output from
the MapReduce framework is the classified big data that forms the clusters
for the whole big data. The experimentation of the proposed big data
classification is performed using four standard databases taken from the
UCI machine learning Repository.
13. References

[1] A. Alexandrov et al., The stratosphere platform for big data analytics, The VLDB

Journal, 23(6) (2014), 939-964.

[2] A. Fernandez et al., Fuzzy rule based classification systems for big data with

MapReduce: granularity analysis, Advances in Data Analysis and Classification,

11(4) (2017), pp. 711-730.

[3] A.J.C. Slooter et al., Seizure detection in adult ICU patients based on changes in EEG

synchronization likelihood, Neurocritical care, 5(3) (2006), 186-192.

[4] B. Xue, M. Zhang and W. N. Browne, Particle Swarm Optimization for Feature

Selection in Classification: A Multi-Objective Approach, in IEEE Transactions on

Cybernetics, 43(6) (2013), 1656-1671.

[5] Breast cancer dataset,

"http://archive.ics.uci.edu/ml/datasets/breast+cancer+wisconsin+%28diagnostic%29,"

accessed on March 2018.

[6] D. Singh, D. Roy, and C. K. Mohan, DiP-SVM: Distribution Preserving Kernel

Support Vector Machine for Big Data, IEEE Transactions on Big Data, 3(1) (2017),

pp. 79-90.

[7] D.Cui et al., Estimation of genuine and random synchronization in multivariate neural

series, Neural Networks, 23(6) (2010), 698-704.

[8] G. Chatzigeorgakidis et al., FML-kNN: scalable machine learning on Big Data using

k-nearest neighbor joins, Journal of Big Data, 5(1) (2018), p.4.

[9] G. Manogaran and D. Lopez, Spatial cumulative sum algorithm with big data

analytics for climate change detection,Computers & Electrical Engineering, 2017.

[10] H. Ke et al., Towards Brain Big Data Classification: Epileptic EEG Identification

with a Lightweight VGGNet on Global MIC, in IEEE Access 99, 1-1.

Cambridge Lower Secondary Checkpoint: Mathematics 1112/01
92% (12)
Cambridge Lower Secondary Checkpoint: Mathematics 1112/01
12 pages
Grade 7 Magnetism
100% (2)
Grade 7 Magnetism
75 pages
BMW N73 Engine
100% (2)
BMW N73 Engine
45 pages
MAQ. BASE 200 D LC Parts Book
No ratings yet
MAQ. BASE 200 D LC Parts Book
565 pages
A Comparative Analysis of Gradient Descent-Based Optimization Algorithms on Convolutional Neural Networks (2)
No ratings yet
A Comparative Analysis of Gradient Descent-Based Optimization Algorithms on Convolutional Neural Networks (2)
8 pages
Improved Discretization Based Decision Tree For Continuous Attributes
No ratings yet
Improved Discretization Based Decision Tree For Continuous Attributes
5 pages
Genetic Algorithm-Artificial Neural Network GA-ANN Hybrid Intelligence for Cancer Diagnosis
No ratings yet
Genetic Algorithm-Artificial Neural Network GA-ANN Hybrid Intelligence for Cancer Diagnosis
6 pages
Comparative Analysis of Classification Algorithms On Diferrent Dataset Using Weka SW PDF
No ratings yet
Comparative Analysis of Classification Algorithms On Diferrent Dataset Using Weka SW PDF
5 pages
Survey of FNN
No ratings yet
Survey of FNN
25 pages
70 157 1 PB
No ratings yet
70 157 1 PB
11 pages
Domain Generalization On Constrained
No ratings yet
Domain Generalization On Constrained
12 pages
Spam Detection & Classification Final
No ratings yet
Spam Detection & Classification Final
38 pages
Orthogonal Array Tuning
No ratings yet
Orthogonal Array Tuning
12 pages
Graph Autoencoder-Based Unsupervised Feature Selection With Broad and Local Data Structure Preservation
No ratings yet
Graph Autoencoder-Based Unsupervised Feature Selection With Broad and Local Data Structure Preservation
28 pages
Proofreading
No ratings yet
Proofreading
23 pages
A Data-Related Patch Proposal For Semantic Segmentation of Aerial Images
No ratings yet
A Data-Related Patch Proposal For Semantic Segmentation of Aerial Images
5 pages
17 - Chapter 9
No ratings yet
17 - Chapter 9
20 pages
Applsci 12 09512
No ratings yet
Applsci 12 09512
20 pages
Missing Value Imputation using Hybrid K-Means and Association Rules
No ratings yet
Missing Value Imputation using Hybrid K-Means and Association Rules
9 pages
Analysis of Effectiveness Particle Swarm Optimization in Improving The Performance of Naïve Bayes Algorithm
No ratings yet
Analysis of Effectiveness Particle Swarm Optimization in Improving The Performance of Naïve Bayes Algorithm
5 pages
Impact of Outlier Removal and Normalization Approa
No ratings yet
Impact of Outlier Removal and Normalization Approa
6 pages
A Dynamic All Parameters Adaptive BP Neural Networks Model and Its Application On Oil Reservoir Prediction
No ratings yet
A Dynamic All Parameters Adaptive BP Neural Networks Model and Its Application On Oil Reservoir Prediction
10 pages
Week 4
No ratings yet
Week 4
5 pages
The Effect of Number of Agents On Optimization of Adaptivity Join Queries in Heterogeneous Distributed Databases
No ratings yet
The Effect of Number of Agents On Optimization of Adaptivity Join Queries in Heterogeneous Distributed Databases
5 pages
Quantum Neural Network
No ratings yet
Quantum Neural Network
9 pages
Chapter 5. Paper 1: Fast Rule-Based Classification Using P-Trees 5.1. Abstract
No ratings yet
Chapter 5. Paper 1: Fast Rule-Based Classification Using P-Trees 5.1. Abstract
22 pages
The Research Story 1
No ratings yet
The Research Story 1
5 pages
Grid Search Hyper-Parameter Tuning and K-Means Clustering ToImprove The Decision Tree Accuracy
No ratings yet
Grid Search Hyper-Parameter Tuning and K-Means Clustering ToImprove The Decision Tree Accuracy
3 pages
A Fast DBSCAN Algorithm for Big Data Based on Efficient Density
No ratings yet
A Fast DBSCAN Algorithm for Big Data Based on Efficient Density
12 pages
ImageGA Conf 2012
No ratings yet
ImageGA Conf 2012
8 pages
Improvised Method of FAST Clustering Based Feature Selection Technique Algorithm For High Dimensional Data
No ratings yet
Improvised Method of FAST Clustering Based Feature Selection Technique Algorithm For High Dimensional Data
6 pages
Fault Prediction
No ratings yet
Fault Prediction
9 pages
Multi-Layer Perceptrons
No ratings yet
Multi-Layer Perceptrons
8 pages
Hybrid Dimensionality Reduction
No ratings yet
Hybrid Dimensionality Reduction
10 pages
Binary Ebola Optimization Search Algorithm For Feature Selection and Classification Problems
No ratings yet
Binary Ebola Optimization Search Algorithm For Feature Selection and Classification Problems
46 pages
Systematic Approach To Intrusion Evaluation Using The Rough Set Based Classification
No ratings yet
Systematic Approach To Intrusion Evaluation Using The Rough Set Based Classification
6 pages
Comparation Analysis of Ensemble Technique With Boosting (Xgboost) and Bagging (Randomforest) For Classify Splice Junction Dna Sequence Category
No ratings yet
Comparation Analysis of Ensemble Technique With Boosting (Xgboost) and Bagging (Randomforest) For Classify Splice Junction Dna Sequence Category
10 pages
Multi-Criteria Genetic Algorithm Applied To Scheduling in Multi-Cluster Environments
No ratings yet
Multi-Criteria Genetic Algorithm Applied To Scheduling in Multi-Cluster Environments
10 pages
Romi Jse Template 2014
No ratings yet
Romi Jse Template 2014
5 pages
Classifying Datasets Using Some Different Classification Methods
No ratings yet
Classifying Datasets Using Some Different Classification Methods
7 pages
An Hybrid Domain Adaptation Diagnostic Network Guided by Curriculum
No ratings yet
An Hybrid Domain Adaptation Diagnostic Network Guided by Curriculum
12 pages
Data Mining Machine Learning and Big Dat
No ratings yet
Data Mining Machine Learning and Big Dat
7 pages
MCTS-GA
No ratings yet
MCTS-GA
5 pages
s42256-023-00761-y
No ratings yet
s42256-023-00761-y
8 pages
A Novel Approach For Feature Selection Based On Correlation Measures CFS and Chi Square
No ratings yet
A Novel Approach For Feature Selection Based On Correlation Measures CFS and Chi Square
13 pages
Multicriteria Distribution Network Reconfiguration Considering Subtransmission Analysis
No ratings yet
Multicriteria Distribution Network Reconfiguration Considering Subtransmission Analysis
8 pages
Online Entropy-Based Discretization For Data Streaming Classification
No ratings yet
Online Entropy-Based Discretization For Data Streaming Classification
12 pages
Dijazi - Deep Clustering Via Joint Convolutional Autoencoder Embedding and Relative Entropy Minimization - 17
No ratings yet
Dijazi - Deep Clustering Via Joint Convolutional Autoencoder Embedding and Relative Entropy Minimization - 17
13 pages
pp9 - v4 - Mejorado
No ratings yet
pp9 - v4 - Mejorado
6 pages
Housing Prices AI
No ratings yet
Housing Prices AI
10 pages
Hybrid Feature Selection
No ratings yet
Hybrid Feature Selection
8 pages
Mapreduce-Based Backpropagation Neural Network Over Large Scale Mobile Data
No ratings yet
Mapreduce-Based Backpropagation Neural Network Over Large Scale Mobile Data
5 pages
Agra Wal 2021
No ratings yet
Agra Wal 2021
8 pages
A New Metaheuristic Algorithm Based On Water Wave Optimization For Data Clustering
No ratings yet
A New Metaheuristic Algorithm Based On Water Wave Optimization For Data Clustering
25 pages
Research Article: Network Intrusion Detection Method Based On Fcwgan and Bilstm
No ratings yet
Research Article: Network Intrusion Detection Method Based On Fcwgan and Bilstm
17 pages
GCAT2024_Paper0962
No ratings yet
GCAT2024_Paper0962
6 pages
Fault IC Detection PPT[1]
No ratings yet
Fault IC Detection PPT[1]
30 pages
Evolutionary Neural Networks For Product Design Tasks
No ratings yet
Evolutionary Neural Networks For Product Design Tasks
11 pages
Futureinternet 14 00178
No ratings yet
Futureinternet 14 00178
16 pages
ls4 PDF
No ratings yet
ls4 PDF
9 pages
Data Augmentation in Network Intrusion Detection Based on S-DCGAN
No ratings yet
Data Augmentation in Network Intrusion Detection Based on S-DCGAN
6 pages
1. Unsupervised feature selection using sparse manifold learning Auto-encoder approach
No ratings yet
1. Unsupervised feature selection using sparse manifold learning Auto-encoder approach
18 pages
Analysis of Common Supervised Learning Algorithms Through Application
No ratings yet
Analysis of Common Supervised Learning Algorithms Through Application
20 pages
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
César Pérez López
No ratings yet
Create Animated Unfolding
No ratings yet
Create Animated Unfolding
1 page
Tips To Answer Paper 3 Biology
No ratings yet
Tips To Answer Paper 3 Biology
4 pages
2 Forecasting Techniques
No ratings yet
2 Forecasting Techniques
47 pages
Brutale 675 Brochure
No ratings yet
Brutale 675 Brochure
6 pages
Fallacy
No ratings yet
Fallacy
3 pages
Schedule A - PKG - Ii
No ratings yet
Schedule A - PKG - Ii
31 pages
Fuel Injection
No ratings yet
Fuel Injection
18 pages
Mod Menu Log - Com - Love.story
No ratings yet
Mod Menu Log - Com - Love.story
172 pages
Brochure Damen ASD Tug 3212
100% (1)
Brochure Damen ASD Tug 3212
39 pages
BSC (P) - IV-Waves and Optics
No ratings yet
BSC (P) - IV-Waves and Optics
18 pages
Maximum Likelihood Estimation
No ratings yet
Maximum Likelihood Estimation
6 pages
python interview question
No ratings yet
python interview question
39 pages
Creo 3 - 0 Basic 2016 PDF
100% (1)
Creo 3 - 0 Basic 2016 PDF
113 pages
HHM-Notes 2 PDF
No ratings yet
HHM-Notes 2 PDF
160 pages
Assignment - Flood Estimation
No ratings yet
Assignment - Flood Estimation
4 pages
Calculation of Relief Rate Due To External Heat Input For Dense Phase Fluids
No ratings yet
Calculation of Relief Rate Due To External Heat Input For Dense Phase Fluids
3 pages
Arbuthnot's Argument For Divine Providence
No ratings yet
Arbuthnot's Argument For Divine Providence
14 pages
Has Science Made Religion Redundant?'
No ratings yet
Has Science Made Religion Redundant?'
20 pages
MCSL-054 2019-20 - Word
No ratings yet
MCSL-054 2019-20 - Word
39 pages
Lab 1
No ratings yet
Lab 1
16 pages
Physics 1200 Lecture 26 Fall 2024
No ratings yet
Physics 1200 Lecture 26 Fall 2024
24 pages
Application
No ratings yet
Application
4 pages
WAN Lab 2 Configuring Frame Relay
No ratings yet
WAN Lab 2 Configuring Frame Relay
8 pages
Design of Piles in Sand: Case Study of Lekki Pennisula, Lagos Nigeria
No ratings yet
Design of Piles in Sand: Case Study of Lekki Pennisula, Lagos Nigeria
10 pages
Bilirubin Rebound After Intensive Phototherapy in Neonatal Jaundice: in Tertiary Care Hospital
No ratings yet
Bilirubin Rebound After Intensive Phototherapy in Neonatal Jaundice: in Tertiary Care Hospital
30 pages
De 84003447
No ratings yet
De 84003447
155 pages