0% found this document useful (0 votes)

125 views11 pages

Introduction To Big Data

This document provides an introduction to big data and Hadoop technology. It discusses how large amounts of data are now being generated from sources like smartphones, social media sites, and large companies. This data comes in structured, unstructured, and semi-structured formats. Hadoop is presented as a solution for processing large datasets in a distributed, fault-tolerant manner. The document also outlines some of the challenges of big data processing, such as size, variety, velocity, and veracity of the data. Some advantages of big data analytics like reduced costs, fraud detection, and improved customer service are also mentioned.

Uploaded by

Jitendra Dangra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

125 views11 pages

Introduction To Big Data

Uploaded by

Jitendra Dangra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Journal of Network Security and Data Mining

Volume 3 Issue 2

Introduction to Big Data and Hadoop Technology

Shreya Rajiv Kulkarni1, Sangram A.Patil2, Ajinkya S. Shewale3*
1
Student, Department of Computer Engineering, ATS’s SBGI, Miraj, India.
2
Assistant Professor, Department of Computer Science Engineering, PVPIT, Budhgaon,
India.
3
Assistant Professor, Department of Computer Engineering, ATS’s SBGI, Miraj, India.

*Corresponding Author
E-mail ID:- [email protected]

ABSTRACT
Big data is a method used to keep, distribute and the datasets which can be massive sized are
analyzed with excessive velocity. Big data may be taken inside the form of structured,
unstructured or semi-structured which leads to lack of ability of conventional records
management strategies. Data is generated from diverse extraordinary assets and might arrive
inside the device at diverse prices. In order to procedure these massive quantities of statistics
in a cheaper and green way, parallelism is used. Hadoop is an open supply software program
mission that allows the disbursed processing of large information units with a completely
high diploma of fault tolerance. This paper deals with the technology of big data and issues
related to its technique and it additionally presents the solution of issue that is Hadoop
framework and its applications.

Keywords:-Big data, hadoop, HDFS, Mapreduce, YARN

INTRODUCTION TO BIG DATA systems it‟s hard to operate such a huge

We all uses smartphones nowadays. data. Big data has grown to be a whole
Roughly, 40 exabytes information is problem, which entails various tools,
produced from a solitary cell phone client techniques and frameworks rather than
in type of text, calls, messages, only one technique or tool[3]. The
photographs, recordings, searches and enormous data sets can be kept, retrieved
music in consistently. There are billions of or altered by the big data[4]. The data in
users who use smartphones and can the big data can be of following types: [5]
generate a massive amount of data which is  Structured - Structured data is any data
quite difficult to handle by traditional that can be stored, referred or operated
computing systems. Such a huge amount of in the decided layout[6]
data is referred as „Big Data‟[1]. The big Example: Relational data[5]
companies like Facebook, YouTube,  Unstructured - Unstructured data is
Google contains millions of uses and any data with unknown shape or
generates a huge amount of data. The structured is classed[6]
necessity of Big data is generated from Example: XML data[5]
such massive corporations for the purpose  Semi-structured -Semi-structured data
of evaluation of large amount of data that‟s is the data that can incorporate each
in structured form, unstructured form or format that is structured or
semi-structured form [2]. unstructured [6].
Example: Word, PDF, Text, Media
As its name suggests, Big data is a group of Logs.[5]
big datasets. For traditional computing

HBRP Publication Page 1-11 2020. All Rights Reserved Page 1

Journal of Network Security and Data Mining
Volume 3 Issue 2

CHALLENGES OF BIG DATA

Fig.1:-Characteristics of Big Data[17]

The big data can be described and one totally correct, there may be dirty
characterized by 5V‟s as shown in Figure records.[5]
1[7]:
ADVANTAGES
Volume Reduce Charges
Volume refers to the quantity of data. The Both the Sync kind and the New Vantage
measurements of the data can be surveys observed that big data analytics
represented from megabytes and gigabytes had been supporting agencies lower their
to petabytes. charges. Nearly six out of ten (59.4
percentage) respondents informed
Variety Syncsort big data tools had helped them
Variety makes the data too large. The boom operational efficiency and decrease
document comes in numerous formats and expenses, and approximately two thirds
of any type, it could be based or (66.7 percentage) of respondents to the
unstructured including textual content, New Vantage survey stated they'd started
audio, movies, log files and extra. out using huge data to decrease expenses.
Interestingly, however, handiest 13.0
Velocity percentage of respondents selected cost
Velocity gives the information about speed discount as their primary aim for big data
of data activities. The data receives in high analytics, suggesting that for plenty this is
rate and it is time-sensitive. simply a very welcome facet advantage.[8]

Value Fraud Detection

The capability value of Big data is massive. Another commonplace use for big data
Value is essential source for big data due to analytics particularly within the financial
the fact it's miles crucial for organizations, offerings industry is fraud detection. One
IT infrastructure system to shop big of the big blessings of big data analytics
quantity of values in database. systems that depend upon machine getting
to know is that they're amazing at
Veracity detecting styles and anomalies. These
Veracity refers to noise, biases and capabilities can provide banks and credit
abnormality When we dealing with card corporations the ability to spot stolen
excessive extent, speed and variety of credit score playing cards or fraudulent
statistics, the all of statistics aren't going purchases, frequently earlier than the

HBRP Publication Page 1-11 2020. All Rights Reserved Page 2

Journal of Network Security and Data Mining
Volume 3 Issue 2

cardholder even is aware of that something networking internet site) of a person while
is inaccurate.[8] combined with outside massive data sets,
ends in the inference of recent information
Improved Customer Service about that human being and it‟s feasible
One of the maximum not unusual dreams that those styles of facts may be secretive
among big data analytics programs is and the individual won't want the
improving customer service. Today‟s information owner to realize or any person
organizations capture a big data of to understand about them. Information
statistics from specific resources like concerning the human beings is
customer relationship management (CRM) accumulated and used for you to upload
systems, social media collectively with fee to the business of the corporation.
other factors of patron touch. By studying Another critical result bobbing up would
this huge amount of information they get be Social sites in which a person would be
to realize about the tastes and possibilities taking advantages of the Big data
of a user. And with the help of the big data predictive evaluation and on the other hand
technologies, they emerge as capable of underprivileged can be without problems
create experiences which can be more identified and handled worse.
responsive, private, and correct than ever
before.[9] Big Data enlarges the chances of sure
tagged people to be afflicted by negative
PROBLEMS ASSOCIATED WITH consequences without the potential to fight
BIG DATA PROCESSING lower back or even having know-how that
Instantaneous attention is required to the they're being discriminated.
obstacles which are nothing but the
difficulties in Big Data. If any kind of Data Access and Sharing Information
implementation is done by ignoring the Due to big quantity of records, data control
problems in big data then it may affects to and governance technique is bit
the technology implementations and some complicated including the necessity to
undesirable results. make statistics open and make it to be had
to government groups in standardized way
Size with standardized APIs, metadata and
The first component everyone thinks of codecs. Expecting sharing of information
with Big Data is its size. Overseeing huge among companies is awkward due to the
and quickly developing volumes of want to get an area in enterprise. Sharing
information has been a troublesome issue data about their customers and operations
for bounty decades. In the past, this project threatens the subculture of secrecy and
become mitigated via processors getting competitiveness
faster, following Moore‟s regulation, to
offer us with the sources had to address Analytical Challenges
growing volumes of statistics. In any case, The major analytical tough questions are
there is a basic move in progress now: as
information amount is scaling snappier  What if data volume receives so large
than process resources, and CPU speeds and extended and it isn't regarded the
are static. way to deal with it?
 How can the data be used to excellent
Privacy and Security benefit?
It is the maximum essential challenges  Does all data want to be analyzed?
with Big data which is touchy. The private  How to discover which data factors
data (example, in database of social

HBRP Publication Page 1-11 2020. All Rights Reserved Page 3

Journal of Network Security and Data Mining
Volume 3 Issue 2

are certainly crucial? computation can‟t be divided into such

 Does all data required to be cached? independent responsibilities. There could
be some responsibilities which is probably
Big data brings along with it some large recursive in nature and the output of the
analytical demanding situations. The form previous computation of challenge is the
of analysis to be carried out in this massive enter to the following computation.
amount of facts which may be Therefore restarting the whole calculation
unstructured, semi established or becomes awkward strategy. This might be
structured. turned away through utilizing Checkpoints
which proceeds with the condition of the
Human Resources and Manpower framework at positive times of the time. In
Since Big data is a rising era so it wishes case of any failure, the computation can
to draw organizations and young people restart from remaining checkpoint
with numerous new skill sets. These maintained.
talents ought to not be limited to technical
ones however additionally ought to extend Scalability
to investigate, analytical, interpretive and The versatility trouble of Big information
creative ones. These abilities should be has lead towards distributed computing,
advanced in individuals thus requires which presently totals more than one
tutoring projects to be held through the divergent remaining burdens with different
organizations. Also the Universities need execution objectives into exceptionally
to acquaint educational program on Big gigantic bunches.
information with gracefully proficient
work force in this comprehension. This requires an excessive level of sharing
of sources which is pricey and also brings
Technical Challenges with it numerous demanding situations like
Fault Tolerance the way to run and execute diverse jobs in
With the incoming of recent technology order that we are able to meet the aim of
like Cloud computing and Big data it's each workload price effectively. It
miles continually supposed that each time additionally calls for managing the
the failure takes place the harm achieved machine disasters in a green manner which
ought to be proper. Fault-tolerant takes place greater frequently if running on
computing is extremely difficult, massive clusters. These factors blended
concerning complex algorithms. Thus the placed the concern on how to explicit the
main mission is to lessen the opportunity programs, even complicated system
of failure to an “acceptable” stage. Two gaining knowledge of responsibilities.
strategies which appear to boom the fault There has been a massive shift within the
tolerance in Big data are as: technology being used. Hard Disk Drives
(HDD) are being changed by using the
First is to divide the complete computation solid state Drives and Phase Change era
being achieved into obligations and assign which are not having the identical
these duties to exceptional nodes for performance among sequential and random
computation. facts switch. There is one more query
about data storage and that query is which
Second is, one node is assigned the job of storage devices are to be taken.
staring at that these nodes are operating
nicely. If something occurs that specific Quality of Data
function is restarted. But from time to time Big data basically focuses on first-class
it‟s quite viable that that the complete statistics storage in place of having very

HBRP Publication Page 1-11 2020. All Rights Reserved Page 4

Journal of Network Security and Data Mining
Volume 3 Issue 2

huge beside the point information so that especially mechanized and manageable
better outcomes and conclusions may be manner. It suggests nicely integration with
drawn. This in addition leads to diverse database however unstructured data is
questions like how it is able to be ensured totally fresh and unsystematic.[4]
that which facts is relevant, how a lot
information would be enough for selection HADOOP: SOLUTION FOR BIG
making and whether or not the stored data DATA PROCESSING
is correct or not to take out conclusions Hadoop is a Programming framework
from it and so on. written in java and used to guide the
processing of huge records sets in a
Heterogenous Data disbursed computing surroundings.
Unstructured data represents nearly each Hadoop turned into advanced via Google‟s
kind of records being produced like social MapReduce that could be a software
media interactions, to recorded framework in which an application break
conferences, to coping with of PDF files, down into numerous parts [10]. In
fax transfers, to emails and extra. Working Hadoop, the modules are designed with
with unstructured records is a bulky fundamental assumption wherein the
problem and of path steeply-priced too. hardware fails with not unusual incidence
Converting all this unstructured data into and it‟s spontaneously controlled by
structured one is also not feasible. framework [11].
Structured data is usually organized into

Hadoop Features

Fig.2:-Features of Hadoop [18]

Economical Scalability
Hadoop utilizes product equipment (like Hadoop has the built in functionality of
your PC, computer).The cost of integrating seamlessly with cloud-based
responsibility for Hadoop-based services. So, if you are installing Hadoop
thoroughly task is limited. It is simpler to on a cloud, you don‟t need to fear about the
hold a Hadoop environment and is cost scalability factor due to the fact you can go
effective as properly. Also, Hadoop is in advance and obtain extra hardware and
open-supply software program and hence make bigger your set up within mins on
there's no licensing price. every occasion required.

HBRP Publication Page 1-11 2020. All Rights Reserved Page 5

Journal of Network Security and Data Mining
Volume 3 Issue 2

Flexibility  Hadoop YARN– (presented in 2012) a

Hadoop could be very bendy as far as the platform chargeable for dealing with
capacity to address a wide range of computing sources in clusters and
information. As we have mentioned using them for scheduling customers'
“Variety”, wherein facts can be of any type applications[13]
and Hadoop can save and procedure them
all, whether or not it is established, semi- HDFS Architecture
established or unstructured records. Hadoop includes a fault-tolerant storage
device referred to as the Hadoop
Reliability Distributed File System or HDFS. HDFS
When machines are running as a single can keep enormous amounts of data, scale
unit, if one of the machines fails, any other up steadily and live on the disappointment
machine will take over the obligation and of extensive components of the capacity
work in a reliable and fault-tolerant style. foundation without losing information.
Hadoop infrastructure has inbuilt fault Groups can be developed with more
tolerance characteristics and therefore, affordable PCs. In the event that one falls
Hadoop is extraordinarily reliable[12]. flat, Hadoop keeps to work the bunch
The base Apache Hadoop framework is without dropping records or intruding on
composed of the subsequent modules: work, by utilizing moving capacity to
 Hadoop Distributed File System different machines inside the group. HDFS
(HDFS)– a disbursed record-machine oversees capacity at the group by utilizing
that shops records on commodity breaking approaching records into pieces,
machines, offering very excessive alluded to as "squares," and putting away
aggregate bandwidth throughout the every one of the squares needlessly all
cluster; through the pool of servers. In the
 Hadoop MapReduce– an commonplace case, HDFS keeps three
implementation of the MapReduce complete copies of every report by copying
programming model for large-scale each piece to every individual servers. [10]
statistics processing.

Fig.3:- Architecture of HDFS[19]

HDFS architecture is widely divided into Name Node

following 3 nodes which can be Name It is centrally placed node, which includes
Node, Data Node, HDFS Clients/Edge data about Hadoop document device. The
Node essential venture of name node is that its

Journal of Network Security and Data Mining
Volume 3 Issue 2

information all of the metadata & attributes an operation to be applied to a massive data
and particular places of files & data blocks set, divide the problem and facts, and run it
within the records nodes. Name node acts in parallel. From an analyst‟s factor of
as the master node because it stores all of view, this can occur on multiple
the statistics approximately the system and dimensions. For example, a totally large
offers information that is newly delivered, dataset can be reduced right into a smaller
modified and removed from information subset wherein analytics can be applied. In
nodes. a traditional information warehousing
situation, this may involve applying an
Data Node ETL activity at the data to deliver
It functions as slave node. Hadoop something usable through the investigator.
environment might also comprise more In Hadoop, those varieties of operations are
than one information nodes primarily based written as MapReduce jobs in Java. There
on ability and performance. This node are some of better degree languages like
plays predominant obligations storing a Hive and Pig that make writing these
block in HDFS and acts because the applications simpler. The yields of these
platform for running jobs. occupations might be composed lower
back to both HDFS or situated in a regular
HDFS Clients/Edge node data distribution center. There are
HDFS Clients from time to time likewise following functions in MapReduce as
know as Edge hub. It goes about as linker follows:
among name hub and information hubs.  map– the characteristic takes key/price
Hadoop cluster there may be handiest one pairs as enter and generates an
consumer however there also are many intermediate set of key/cost pairs
depending upon performance wishes [5].  reduce– the characteristic which
merges all the intermediate values
MapReduce Architecture related to the equal intermediate
The processing pillar within the Hadoop key[4]
atmosphere is the MapReduce framework.
The framework permits the specification of

Fig.4:-Architecture of MapReduce[20]

Journal of Network Security and Data Mining
Volume 3 Issue 2

YARN within the system. The task of Node

YARN stands for Yet Another Resource Manager is to monitor the useful resource
Negotiator which is resource management usage via the container and record the
layer of Hadoop. The fundamental identical to Resource Manager. The assets
principle in the back of YARN is to are like CPU, storage, disk, network and so
separate resource management and activity forth.
scheduling/monitoring feature into separate
daemons. In YARN there may be one The Application Master arranges sources
worldwide Resource Manager and with Resource Manager and works with
consistent with-application Application Node Manager to implement and display
Master. An Application may be an only the task. YARN may be scaled past some
one job or a DAG of jobs. thousand nodes through YARN Federation
characteristic. This feature permits us to tie
Inside the YARN framework, we've got more than one YARN clusters into a one
two daemons Resource Manager and Node big cluster. This allows for the usage of
Manager. The Resource Manager arbitrates independent clusters, clubbed collectively
sources among all the competing programs for a completely large activity.[14]

Fig.5:-Architecture of YARN[21]

The elements of YARN consist of: each node supervisor‟s contribution. It has
1) Resource Manager (one according to two important components:
cluster) Scheduler- Allocating resources to
2) Application Master (one per numerous going for walks packages and
application) scheduling resources primarily based at the
3) Node Managers (one consistent with necessities of the applications; it doesn‟t
node) screen or track the popularity of the
programs
Resource Manager
Resource Manager manages the useful Application Manager- Accepting job
resource allocation within the cluster and is submissions from the customer or tracking
chargeable for tracking what number of and restarting application masters in case
resources are to be had in the cluster and of failure

Journal of Network Security and Data Mining
Volume 3 Issue 2

Application Master using social media, and phone content.

Application Master manages the useful Using Hadoop, those businesses analyze
resource needs of discrete applications and clients‟ information for better insights,
interacts with the scheduler to acquire the create content for extraordinary goal
specified resources. It connects with the audiences. For instance, Wimbledon
node supervisor to execute and monitor Championships utilizes huge information
duties. to gracefully finish supposition
examination at the tennis matches to
Node Manager clients in real time.
Node Manager tracks running jobs and
sends signals (or heartbeats) to the resource Healthcare Providers
manager to relay the status of a node. It The Healthcare sectors by the use of
additionally monitors every container‟s Hadoop analyses the unstructured layout
resource utilization. of information that includes affected
person history, ailment case histories. This
Container-Container houses a group of facilitates them to efficaciously deal with
sources like RAM, CPU, and community the patients correctly based totally on
bandwidth. Allocations are based totally on previous case histories. Also, by means of
what YARN has calculated for the identifying the disease that is
resources. It presents the rights to a utility
commonplace in a specific vicinity,
to apply specific resource amounts.
precautions may be taken, and medicines
[15]
may be made available to the one‟s
BIG DATA AND HADOOP IN regions. The University of Florida uses
DIFFERENT DOMAINS unfastened public health records and
Let us see how Hadoop is supporting Google Maps to visualize statistics that
corporations to solve their problems and in lets in for faster figuring out the spread of
which domains Hadoop programs are chronic diseases.
being run.
Education
Banking and Finance Sector The education sector makes use of big data
The banking and Finance industries face a significantly. A University of Tasmania
number of the demanding situations like having 26000 college students has
card frauds, tick analytics, archival of audit deployed LMS(Learning Management
trail, enterprise credit danger reporting, System) that tracks the log time, how a lot
etc. They use Hadoop to get an early time college students spend on distinct
warning for security fraud and alternate pages and overall development of the
visibility. They use Hadoop to transform students through the years.
and examine purchaser facts for better
insights, pre-trade decision-guide Government
analytics, and so on.
There are numerous government schemes
which are in execution and are generating
Communication, Media and
data in large quantity. The Food and Drug
Entertainment
The conversation, media, and Administration (FDA) is using Big Data to
entertainment industries face some locate and take a look at the styles of food-
challenges like gathering and studying associated diseases, taking into account
customer records for insights, locating faster treatment responses. [16]
patterns in real-time media utilization,

Journal of Network Security and Data Mining
Volume 3 Issue 2

CONCLUSION Research Journal of Engineering and

This paper has given an overview about big Technology (IRJET), 3(01).
data which includes its characteristics and 8. https://www.guru99.com/what-is-big-
advantages. The paper has also highlighted data.html
the problems associated with big data 9. Talha, M., Elmarzouqi, N., & Abou El
processing and their solution which is the Kalam, A. (2020). Quality and
Hadoop technology. The paper has Security in Big Data: Challenges as
examined about Hadoop technology along opportunities to build a powerful
with features and modules of Hadoop. The wrap-up solution. J. Ubiquitous Syst.
applications mentioned in the paper explain Pervasive Networks, 12(1), 9-15.
the need and use of big data and Hadoop 10. https://www.datamation.com/big-
technology. data/big-data-pros-and-cons.html
11. https://magnimindacademy.com/what-
REFERENCES are-the-advantages-and-
1. SIMPLILEARN (2019, DECEMBER 10), disadvantages-of-big-data/
BIG DATA IN 5 MINUTES | WHAT IS BIG 12. Bhosale, H. S., & Gadekar, D. P.
DATA?| INTRODUCTION TO BIG DATA (2014). A review paper on big data
|BIG DATA EXPLAINED RETRIEVED and hadoop. International Journal of
FROM Scientific and Research
Https://Youtu.Be/Bayrobl7tye Publications, 4(10), 1-7.
2. Shewale, A. S. (2020). Key Security 13. Tamilselvi, K., Sumithra, V., &
Problems-Research Challenges Dhanapriyadharsini, M. K. (2018).
Regarding Cloud Computing. Recent Big Data Analytics Using Hadoop
Trends in Androids and IOS Technology.
Applications, 1(3). 14. https://www.edureka.co/blog/hadoop-
3. Bavachkar, S. G., Shewale, A., tutorial/
Dhende, P. R., Mohitkar, H. A., & 15. https://en.wikipedia.org/wiki/Apache_
Gorwade, S. S. (2019). Issues Hadoop
Regarding Big Data 16. https://data-
Confidentiality. Journal of flair.training/blogs/hadoop-
Advancement in Software Engineering architecture/
and Testing, 2(3). 17. https://www.simplilearn.com/hadoop-
4. Prakash, A. A., & Aloysius, D. A. architecture-article
(2018). Architecture Design for 18. https://techvidvan.com/tutorials/future
Hadoop No-SQL and -of-hadoop/
Hive. International Journal of 19. https://www.vapulus.com/en/wpconte
Scientific Research in Computer nt/uploads/2019/02/BigData.jpg
Science, Engineering and Information 20. https://d1jnx9ba8s6j9r.cloudfront.net/
Technology, 3(1), 1069-1077. blog/wp-
5. https://www.tutorialspoint.com/hadoo content/uploads/2017/05/Hadoop-
p/hadoop_big_data_overview.htm Features-Hadoop-Tutorial-
6. Kaur, I. (2016). Navneet kaur, Edureka.png
Amandeep Ummat, Jaspreet Kaur, 21. https://www.intellectsoft.net/blog/wp-
Navjot Kaur,“Research Paper on Big content/uploads/Hadoop-HDFS-
Data and Hadoop”. International architecture-1.png
Journal of Computer Science and 22. https://www.researchgate.net/profile/
Technology, 7940. Markus_Endler/publication/22139992
7. Bobade, V. B. (2016). Survey paper 3/figure/fig1/AS:670714227089410@
on big data and Hadoop. International

Journal of Network Security and Data Mining
Volume 3 Issue 2

1536922141203/Map-Reduce- content/uploads/sites/2/2019/02/Yarn-
architecture-4.png Architecture.png
23. https://d2h0cx97tjks2p.cloudfront.net/
blogs/wp-

Sample Checklist For Admin Audit
100% (25)
Sample Checklist For Admin Audit
4 pages
AzSPU SSoW Permit To Work Procedure
100% (1)
AzSPU SSoW Permit To Work Procedure
27 pages
What Is Data
No ratings yet
What Is Data
20 pages
UNIT 1big Data Introduction
No ratings yet
UNIT 1big Data Introduction
56 pages
BDA Unit 1
No ratings yet
BDA Unit 1
10 pages
Overview of Big Data: Saidatul Rahah Hamidi
No ratings yet
Overview of Big Data: Saidatul Rahah Hamidi
25 pages
Big Data Analysis
No ratings yet
Big Data Analysis
3 pages
Three V of Big Data
No ratings yet
Three V of Big Data
4 pages
Bigdata Documentation
No ratings yet
Bigdata Documentation
20 pages
Informatics Engineering, An International Journal (IEIJ)
No ratings yet
Informatics Engineering, An International Journal (IEIJ)
20 pages
Research IN BIG Data - AN: Dr. S.Vijayarani and Ms. S.Sharmila
No ratings yet
Research IN BIG Data - AN: Dr. S.Vijayarani and Ms. S.Sharmila
20 pages
Big Data Analytics
100% (1)
Big Data Analytics
3 pages
Seminar Report BIG DATA
No ratings yet
Seminar Report BIG DATA
28 pages
Big Data Analytics Using Hadoop
No ratings yet
Big Data Analytics Using Hadoop
5 pages
Big Data
No ratings yet
Big Data
34 pages
BDA Unit 1
No ratings yet
BDA Unit 1
68 pages
Introduction To Big Data - Report 1
No ratings yet
Introduction To Big Data - Report 1
5 pages
Module I Big Data
No ratings yet
Module I Big Data
7 pages
Unit 5
No ratings yet
Unit 5
63 pages
Jsaer2016 03 01 21 24
No ratings yet
Jsaer2016 03 01 21 24
4 pages
Dont Do That
No ratings yet
Dont Do That
30 pages
Computer Networks TCP
No ratings yet
Computer Networks TCP
48 pages
BDA - Unit-I
No ratings yet
BDA - Unit-I
35 pages
Big Data Criteria
No ratings yet
Big Data Criteria
10 pages
Project FInal Report
No ratings yet
Project FInal Report
67 pages
Big Data in Business
No ratings yet
Big Data in Business
11 pages
ICT30005 - Assignment 1 - Begum Bolu 6623433 - Big Data Analytics
No ratings yet
ICT30005 - Assignment 1 - Begum Bolu 6623433 - Big Data Analytics
7 pages
I Jcs It 2015060405
No ratings yet
I Jcs It 2015060405
6 pages
Assignment: Ce Marketing Research & Data Analytics
No ratings yet
Assignment: Ce Marketing Research & Data Analytics
7 pages
Data Mining in The World of BIG Data-A Survey
No ratings yet
Data Mining in The World of BIG Data-A Survey
9 pages
Seminar Report Alisha
No ratings yet
Seminar Report Alisha
22 pages
Big Data and Business Opportunities
100% (1)
Big Data and Business Opportunities
6 pages
Unit 4
No ratings yet
Unit 4
29 pages
Big Data Analytics - Applications, Challenges & Future Directions
No ratings yet
Big Data Analytics - Applications, Challenges & Future Directions
6 pages
Unit I Bda
No ratings yet
Unit I Bda
18 pages
R19 Bda Unit-1
No ratings yet
R19 Bda Unit-1
22 pages
Bda CHP1
No ratings yet
Bda CHP1
83 pages
Unit 5 - Principles of Big Data 2
No ratings yet
Unit 5 - Principles of Big Data 2
14 pages
Big Data Seminar
100% (2)
Big Data Seminar
27 pages
Big Data Introduction
No ratings yet
Big Data Introduction
58 pages
Unit - 1 - Big Data - RCA - E 45
No ratings yet
Unit - 1 - Big Data - RCA - E 45
42 pages
Unit 1 - BDS - DS307
No ratings yet
Unit 1 - BDS - DS307
47 pages
G12 It Unit 2
No ratings yet
G12 It Unit 2
30 pages
Bigdata
No ratings yet
Bigdata
12 pages
Big Data Analysis
No ratings yet
Big Data Analysis
14 pages
Big Data Analytics in Cyber Security IJERTCONV5IS10032
No ratings yet
Big Data Analytics in Cyber Security IJERTCONV5IS10032
3 pages
Big Data in Computer Cyber Security Systems
No ratings yet
Big Data in Computer Cyber Security Systems
10 pages
Big Data
No ratings yet
Big Data
27 pages
Bda U1
No ratings yet
Bda U1
78 pages
Kaisler2013 - Big Data - Issues and Challenges Moving Forward
No ratings yet
Kaisler2013 - Big Data - Issues and Challenges Moving Forward
10 pages
$R3N9XOZ
No ratings yet
$R3N9XOZ
56 pages
Big Data Analytics: Recent Achievements and New Challenges
No ratings yet
Big Data Analytics: Recent Achievements and New Challenges
5 pages
chp3A10.10072F978 3 319 08976 8 - 16
No ratings yet
chp3A10.10072F978 3 319 08976 8 - 16
15 pages
Reading Assignment - 474 Final
No ratings yet
Reading Assignment - 474 Final
2 pages
Emerging Big Data and Cloud Computing
No ratings yet
Emerging Big Data and Cloud Computing
15 pages
Big Data Analytics Unit Test-I Answers Bank
No ratings yet
Big Data Analytics Unit Test-I Answers Bank
10 pages
Big Data Seminar Report Rahul Jain
No ratings yet
Big Data Seminar Report Rahul Jain
41 pages
Big Data
No ratings yet
Big Data
7 pages
Report On Big Data
No ratings yet
Report On Big Data
23 pages
Big Data Analytics and Its Applications
No ratings yet
Big Data Analytics and Its Applications
4 pages
Big Data (Analytics) in Power Systems
No ratings yet
Big Data (Analytics) in Power Systems
20 pages
The Power of Big Data: Transforming Industries and Shaping the Future
From Everand
The Power of Big Data: Transforming Industries and Shaping the Future
Tom Henricksen
No ratings yet
Miginox 310 / Tiginox 310: Classification: en Iso 14343-A
No ratings yet
Miginox 310 / Tiginox 310: Classification: en Iso 14343-A
1 page
Xage-WP-Zero Trust Access and Protection For The Whole Enterprise
No ratings yet
Xage-WP-Zero Trust Access and Protection For The Whole Enterprise
15 pages
Image Enhancement Techniques Using OpenCV
No ratings yet
Image Enhancement Techniques Using OpenCV
13 pages
BNYS Prospectus 2020 21
No ratings yet
BNYS Prospectus 2020 21
33 pages
Lecture 4
No ratings yet
Lecture 4
9 pages
CSEC English SBA - Gibran Nizamali
No ratings yet
CSEC English SBA - Gibran Nizamali
27 pages
" A Puzzle A Day To Learn, Code, and Play " Visit: Description Example
No ratings yet
" A Puzzle A Day To Learn, Code, and Play " Visit: Description Example
1 page
What Is Mechanical Integrity and What Are The Requirements of An MI Program - Life Cycle Engineering
No ratings yet
What Is Mechanical Integrity and What Are The Requirements of An MI Program - Life Cycle Engineering
5 pages
Compiler Design Qbank 2023
No ratings yet
Compiler Design Qbank 2023
15 pages
Estimating Today March April 2023
No ratings yet
Estimating Today March April 2023
36 pages
Advances in Neural Rendering
No ratings yet
Advances in Neural Rendering
33 pages
SHC Tracker Mar 2013
No ratings yet
SHC Tracker Mar 2013
8 pages
Guide To Citing Sources 2nd Edition (Spring 2022 Update)
No ratings yet
Guide To Citing Sources 2nd Edition (Spring 2022 Update)
45 pages
Number Theory: Essential Mathematics 2
No ratings yet
Number Theory: Essential Mathematics 2
81 pages
Grid-Forming Inverters For Stability Improvements
No ratings yet
Grid-Forming Inverters For Stability Improvements
5 pages
Sridevi SR - Accounts Executive With 4 Years of Exp
No ratings yet
Sridevi SR - Accounts Executive With 4 Years of Exp
3 pages
2015 Carens - D 1.7-Tci-U2-Diagram
No ratings yet
2015 Carens - D 1.7-Tci-U2-Diagram
1 page
C79000-G8976-1466 ROS v5.5 RSG909R ConfigurationManual
No ratings yet
C79000-G8976-1466 ROS v5.5 RSG909R ConfigurationManual
308 pages
Ais Reviewer Questions
No ratings yet
Ais Reviewer Questions
47 pages
Adoboczi,+08 PPCE 14346 P
No ratings yet
Adoboczi,+08 PPCE 14346 P
10 pages
Believe in Something Bigger Than Yourself
No ratings yet
Believe in Something Bigger Than Yourself
10 pages
Learned Image Downscaling For Upscaling Using Content Adaptive Resampler
No ratings yet
Learned Image Downscaling For Upscaling Using Content Adaptive Resampler
14 pages
Deep Learning: Huawei AI Academy Training Materials
No ratings yet
Deep Learning: Huawei AI Academy Training Materials
47 pages
Hiab 800XS
No ratings yet
Hiab 800XS
12 pages
ITNE3013 Assignment Sem 2 2020 PDF
No ratings yet
ITNE3013 Assignment Sem 2 2020 PDF
4 pages
COF - International Cyber Olympiad 2025
No ratings yet
COF - International Cyber Olympiad 2025
12 pages
MA 302: MATLAB Laboratory, Spring 2004 Graphics in MATLAB: An Overview
No ratings yet
MA 302: MATLAB Laboratory, Spring 2004 Graphics in MATLAB: An Overview
15 pages
Auditing Project Report
No ratings yet
Auditing Project Report
41 pages

Introduction To Big Data

Uploaded by

Introduction To Big Data

Uploaded by

Journal of Network Security and Data Mining

Introduction to Big Data and Hadoop Technology

Keywords:-Big data, hadoop, HDFS, Mapreduce, YARN

INTRODUCTION TO BIG DATA systems it‟s hard to operate such a huge

HBRP Publication Page 1-11 2020. All Rights Reserved Page 1

CHALLENGES OF BIG DATA

Fig.1:-Characteristics of Big Data[17]

Value Fraud Detection

HBRP Publication Page 1-11 2020. All Rights Reserved Page 2

HBRP Publication Page 1-11 2020. All Rights Reserved Page 3

are certainly crucial? computation can‟t be divided into such

HBRP Publication Page 1-11 2020. All Rights Reserved Page 4

Fig.2:-Features of Hadoop [18]

HBRP Publication Page 1-11 2020. All Rights Reserved Page 5

Flexibility  Hadoop YARN– (presented in 2012) a

Fig.3:- Architecture of HDFS[19]

HDFS architecture is widely divided into Name Node

HBRP Publication Page 1-11 2020. All Rights Reserved Page 6

HBRP Publication Page 1-11 2020. All Rights Reserved Page 7

YARN within the system. The task of Node

HBRP Publication Page 1-11 2020. All Rights Reserved Page 8

Application Master using social media, and phone content.

HBRP Publication Page 1-11 2020. All Rights Reserved Page 9

CONCLUSION Research Journal of Engineering and

HBRP Publication Page 1-11 2020. All Rights Reserved Page 10

HBRP Publication Page 1-11 2020. All Rights Reserved Page 11

You might also like