Efficient Algorithm For Big Data Application
Efficient Algorithm For Big Data Application
Volume 6 Issue 11
Assistant Professor, Department of Computer Science and Engineering, Paavai Engineering College, Namakkal
ABSTRACT: using one step algorithm and three step algorithm Iterative
Data mining applications play an important role in IT firms algorithm by various calculations Efficient mining
where energy wastage is the main problem. Increase in characteristics, too energy.
workload and computation leads to high energy cost.
Mapreduce scheduling algorithm is a model which is
developed for processing and storing large volume of data at
the same time. EMRSA is an algorithm gives reliable energy
and reduction in maps based on arrangement priority based
scheduling is provided to the test for utilization and system
work is easily improved by reduction with maps.
1 INTRODUCTION
Big data – both structured and unstructured – that
overwhelms a business on a day-to-day basis. It’s what
organizations do with the data that matters. Big data can be
analysed for visions that lead to well decisions and strategic
business moves. The major areas covered finance, banking,
education, E-commerce and so on.
used and presentation of jobs with various Hadoop Support vector machine organization method.
configuration settings. Theidea to use the multivariate Supervised Used in the algorithm:
regression modelling on the data collected from the energy • Classification and regression (binary and multi-
reading of the Hadoop Map Reduceto generate these models classproblem)
to control. The parameters added in a model which by • Anomaly detection (one class problem)
getting output by doing the slight factorial analysis of results The SVM training algorithm, categorize the new example
of the energy description done using the max and min into one category, Non stochastic binary linear classifier.
possible values of all the parameters mentioned above. Then In the SVM model, Points in space, isolated categories are.
make and verify the stochastic Markov chain models for the It is as wide as possible.Maintenance vector machines are
Map Reduce systems to calculate the performance and being developed as robust.
energy by making use of data collected from
energyrepresentation. Tools for noisy complex classification
andregressiondomain. Two important functions of
2.3 ENERGY MAP REDUCE SCHEDULING maintenance vector machine It is generality theory, which
ALGORITHM (EMRSA) leads to the principle method to chosen hypothesis and, OS
functions, which introduce a non-linearity to the hypothesis
space. Explicitly requires a nonlinear algorithm.
2.5.2 SCHEDULING
Zero-frequency problem
If the attribute value (Outlook = Overcast) does not occur
with all class values (Play Golf = no), add 1 to the number
of all attribute values - class combination (Laplace
estimator). Numerical Prognostic Variable Numerical tables
must be converted to resounding variables (binning) before
creating the frequency table. Another option user have is to
use the distribution of numerical variables to get common
guesses. For example, one common approach is to assume a
Fig. 4 Scheduling Process
normal distribution of numeric variables. The probability
Scheduling process scheduling is afundamental part of the
density function for a normal distribution is distinct by two
multiprogramming operating system. Such an operating
limitations (mean and standard deviation).
system allows multiple processes to be overloaded into
executable memory at one time and the overloaded process
parts the CPU using time multiplexing. Priority scheduling.
The basic idea is simple. Priority is allocated to each process
and priority is executed. Equal priority processes are
scheduled in FCFS order. The shortest job priority (SJF) computations,” in Proc. 2nd ACM Symp. Cloud Comput.,
algorithm is a superior case of the general priority 2011, pp. 13:1–13:14.
scheduling algorithm. [15] T. J€org, R. Parvizi, H. Yong, and S. Dessloch,
“Incremental recomputations in mapreduce,” in Proc. 3rd
5. CONCLUSION: Int. Workshop Cloud Data Manage., 2011, pp. 7–14.
This chapter, and the classification method of maintenance [16 ] Y. Zhang, Q. Gao, L. Gao, and C. Wang, “imapreduce:
vector machines and naive Bayes for effective data analysis A distributed computing framework for iterative
results, said for a set of efficient techniques for repetitive computation,” J. Grid Comput., vol. 10, no. 1, pp. 47–68,
repetition calculation. In a real-time experiment, the 2012.
described classification method and EMRSA is, [17] M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M.
suggestively reducing the amount of time it takes in order to McCauley, M. J. Franklin, S. Shenker, and I. Stoica,
refresh the large amounts of data mining results, compared “Resilient distributed datasets: A fault-tolerant abstraction
with the re-calculation of a simple replication Map Reduce, for, in-memory cluster computing,” in Proc. 9th USENIX
consistent efficient Energy use. Conf. Netw. Syst. Des. Implementation, 2012, p. 2.
[18] S. R. Mihaylov, Z. G. Ives, and S. Guha, “Rex:
REFERENCES Recursive, deltabased data-centric computation,” in Proc.
[1] S. Lloyd, “Least squares quantization in PCM,” IEEE VLDB Endowment, 2012, vol. 5, no. 11, pp. 1280–1291.
Trans. Inform. Theory., vol. 28, no. 2, pp. 129–137, Mar. [19] Y.Zhang,Q.Gao,L.Gao,andC.Wang,“Acceleratelarge-
1982. scaleiterative computation through asynchronous
[2] R. Agrawal and R. Srikant, “Fast algorithms for mining accumulative updates,”
association rules in large databases,” in Proc. 20th Int. Conf. inProc.3rdWorkshopSci.CloudComput.Date,2012,pp.13–22.
Very Large Data Bases, 1994, pp. 487–499. [20] C. Yan, X. Yang, Z. Yu, M. Li, and X. Li, “IncMR:
[3]S. Brin, and L. Page, “The anatomy of a large-scale Incremental data processing based on mapreduce,” in Proc.
hypertextual web search engine,” Comput. Netw. ISDN IEEE 5th Int. Conf. Cloud Comput., 2012, pp.pp. 534–541.
Syst., vol. 30, no. 1–7, pp. 107–117, Apr. 1998. [21] Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A.
[4] J. Dean and S. Ghemawat, “Mapreduce: Simplified data Kyrola, and J. M. Hellerstein, “Distributed graphlab: A
processing on large clusters,” in Proc. 6th Conf. Symp. framework for machine learning and data mining in the
Opear. Syst. Des. Implementation, 2004, p. 10. cloud,” in Proc. VLDB Endowment, 2012, vol. 5, no. 8, pp.
[5] R. Power and J. Li, “Piccolo: Building fast, distributed 716–727.
programs with partitioned tables,” in Proc. 9th USENIX [22] S. Ewen, K. Tzoumas, M. Kaufmann, and V. Markl,
Conf. Oper. Syst. Des. Implementation, 2010, pp. 1–14. “Spinning fast iterative data flows,” in Proc. VLDB
[6] G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Endowment, 2012, vol. 5, no. 11, pp. 1268–1279.
Horn, N. Leiser, and G. Czajkowski, “Pregel: A system for [23] D. G. Murray, F. McSherry, R. Isaacs, M. Isard, P.
large-scale graph processing,” in Proc. ACM SIGMOD Int. Barham, and M. Abadi, “Naiad: A timely dataflow system,”
Conf. Manage. Data, 2010, pp. 135–146. in Proc.24th ACM Symp. Oper. Syst. Principles, 2013, pp.
[7] Y. Bu, B. Howe, M. Balazinska, and M. D. Ernst, 439–455.
“Haloop: Efficient iterative data processing on large [24] U. Kang, C. Tsourakakis, and C. Faloutsos, “Pegasus:
clusters,” in Proc. VLDB Endowment, 2010, vol. 3, no. 1–2, A peta-scale graph mining system implementation and
pp. 285–296. observations,” in Proc. IEEE Int. Conf. Data Mining, 2009,
[8] J. Ekanayake, H. Li, B. Zhang, T. Gunarathne, S.-H. pp. 229–238.
Bae, J. Qiu, and G. Fox, “Twister: A runtime for iterative
mapreduce,” in Proc. 19th ACM Symp. High Performance
Distributed Comput., 2010, pp. 810–818.
[9] D. Peng and F. Dabek, “Large-scale incremental
processing using distributed transactions and notifications,”
in Proc. 9th USENIX Conf. Oper. Syst. Des.
Implementation, 2010, pp. 1–15.
[10] D. Logothetis, C. Olston, B. Reed, K. C. Webb, and K.
Yocum, “Stateful bulk processing for incremental
analytics,” in Proc. 1st ACM Symp. Cloud Comput., 2010,
pp. 51–62.
[11] J. Cho and H. Garcia-Molina, “The evolution of the
web and implications for an incremental crawler,” in Proc.
26th Int. Conf. Very Large Data Bases, 2000, pp. 200–209.
[12] C. Olston and M. Najork, “Web crawling,” Found.
Trends Inform. Retrieval, vol. 4, no. 3, pp. 175–246, 2010.
[13] P. Bhatotia, A. Wieder, R. Rodrigues, U. A. Acar, and
R. Pasquin, “Incoop: Mapreduce for incremental
computations,” in Proc. 2nd ACM Symp. Cloud Comput.,
2011, pp. 7:1–7:14.
[14] Y. Zhang, Q. Gao, L. Gao, and C. Wang, “Priter: A
distributed framework for prioritized iterative