Cloud Comp Techno
Cloud Comp Techno
ISSN: 2278-0181
Vol. 3 Issue 4, April - 2014
Abstract- Big data is identically modernistic and technology, hence Big data was introduced with new
sizzling topic in today’s scenario. Big data is a set of enhancements.
data, which is larger in size that a conventional
database cannot or does not have the ability to capture, In section I the introduction is given about Big data and
store, manage and analyze the data. The big data is scalable database management system. The section II is
implemented using Hadoop and Hadoop is on demand about aspects of big data and its challenges. Limitations
in cloud now a days. and issues are in section III. The section IV helps us to
choose between the Hadoop or data warehouse. Section V
Big data business ecosystem and its trend that provide is Hadoop for Big data. Big data management ,scalability
basis for big data are explained. There is need of and performance is outlined in section VI. Big data
effective solution with issue of data volume, in order to business ecosystem is described in section VII. Running
enable the feasible ,cost effective and scalable storage Hadoop on Ubuntu Linux (Single-Node Cluster) is in
RT
and processing of enormous quantity of data, thus the section VIII. Finally the conclusion is in section XI.
big data and cloud go hand in hand and Hadoop is very
hot and enormously growing technology for BIG DATA definition- The set of data which is larger in
IJE
organizations . The steps required for setting up a size, that a traditional database cannot or does not have the
distributed ,single node Hadoop cluster backed by ability to capture, store, manage and analyze.
HDFS running on ubuntu (steps only) are given.
1 petabyte=1000 TB (terabyte)
II. Aspects of big data and its challenges IV. Which one to use? Hadoop or Data warehouse
Data Policies: As an example the storage Table 1: which one to use Hadoop or data warehouse [2]
computing, analytical software all these requires
as in new for big data. V. HADOOP FOR BIG DATA
Technology and techniques: Privacy, Security is
Overview: In 2002, Dough Cutting develop an open
required for data.
source web crawler project, the Google published map
Access to Data: When we have to access the data
reduced into 2006. Dough Cutting developed the open
then we need to integrate the multiple data sources
source, map reduced and HDFS.
together.
Hadoop is a framework that allows distributed It is programming paradigm. It has 2 phases for
processing of large data sets across the clusters of solving query in HDFS:
computers using a programming language. It is open
source library & application programs written in Map
JAVA language. Hadoop implements HDFS (Hadoop Reduce
distributed File system).
Map is responsible for to read data from input location
Hadoop clusters running the same software can range and based on input it generate a key value pair i.e. an
the size from single server to as many of thousand intermediate output in local machine. Reduces the
servers. responsible for to process the intermediate the output
receives from mapper and generate file output.
RT
IJE
Advantages of Hadoop:
Disadvantages of Hadoop:
REFERENCES
7. http://www.baselinemag.com/cloud-computing/managing-
big-data-in-the-cloud
8. Divyakant Agrawal ,Sudipto Das, Amr El Abbadi,
Department of Computer Science,University of California,
Santa Barbara.
9. https://cloudsecurityalliance.org/research/big-data/#_news
10. The Apache Hadoop Project.,
http://Hadoop.apache.org/core/, 2009.
11. http://www.ibm.com/developerworks/library/bd-
bigdatacloud/
12. D. Agrawal, S. Das, and A. E. Abbadi. Big data and cloud
computing: New wine or just new bottles? PVLDB,
3(2):1647–1648, 2010.
13. http://www.edupristine.com/courses/big-data-Hadoop-
program/?jscfct=1
14. http://www.edureka.in/blog/category/big-data-and-Hadoop/
15. www.forbes.com/big-data
16. www.michael-noll.com/tutorials/running-Hadoop-on-
ubuntu-linux-single-node-cluster/
17. Hadoop: The Definitive Guide, Second Edition by Tom
White, Published by O’Reilly Media, Inc., 1005 Gravenstein
Highway North, Sebastopol, CA 95472.