0% found this document useful (0 votes)

18 views18 pages

Bda Aat

The document describes the steps to install Hadoop on a system: 1. Download and extract the Java and Hadoop packages. 2. Add the Java and Hadoop paths to the .bashrc file to set the environment variables. 3. Check the Java and Hadoop versions to confirm a successful installation. 4. Edit the Hadoop configuration files core-site.xml and hdfs-site.xml.

Uploaded by

Abitha Bala Subramani Dept of Artificial Intelligence

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views18 pages

Bda Aat

Uploaded by

Abitha Bala Subramani Dept of Artificial Intelligence

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

B M S COLLEGE OF ENGINEERING

(An Autonomous Ins+tu+on Af liated to VTU, Belagavi)

Post Box No.: 1908, Bull Temple Road, Bengaluru – 560 019

DEPARTMENT OF MACHINE LEARNING

Academic Year: 2022-2023 (Session: April 2023 - July 2023)

SOCIAL MEDIA ANALYTICS (22AM6PESMA)

ALTERNATIVE ASSESSMENT TOOL (AAT)

AAT
HADOOP
Submitted by

Student Name:
Team Member 1: A Vaishnavi
Team Member 2: Abitha Bala Subramani
USN:
Team Member 1: 1BM20AI053
Team Member 2: 1BM20AI065
Semester & Section: 6A
Student Signature:
Team Member 1:
Team Member 2:

Valuation Report (to be lled by the faculty)

Score: Faculty In-charge: Dr. Arun

Faculty Signature:
Comments:
with date
 

 
fi
fi
Introduction

Hadoop is an open-source framework designed for distributed storage and processing

of large datasets across clusters of computers. It provides a scalable, fault-tolerant
solution for handling big data. Hadoop allows you to process and analyze massive
amounts of structured, semi-structured, and unstructured data in a distributed
computing environment.

The key components of Hadoop are:

1. Hadoop Distributed File System (HDFS): HDFS is a distributed le system

that stores data across multiple machines in a cluster. It provides high throughput
access to data and ensures fault tolerance by replicating data across multiple
nodes.

2. Yet Another Resource Negotiator (YARN): YARN is the resource

management framework in Hadoop. It manages cluster resources and schedules
tasks for processing data. YARN allows different data processing engines, such as
MapReduce, Spark, and Hive, to run on the same cluster.

3. MapReduce: MapReduce is a programming model and processing engine for

distributed processing of large datasets in Hadoop. It divides the data into smaller
chunks and processes them in parallel across multiple nodes in the cluster.
MapReduce consists of two phases: the map phase, which processes the input
data and produces intermediate results, and the reduce phase, which aggregates
the intermediate results to produce the nal output.

Hadoop has gained popularity due to its ability to handle massive volumes of data
and its scalability. It allows organizations to store, process, and analyze data that
would be impractical or infeasible to handle using traditional databases or single-
node systems. Hadoop is widely used in various industries, including nance,
healthcare, retail, social media, and more, for tasks like log processing, data
warehousing, recommendation systems, and large-scale data analytics.
fi
fi
fi
Working Principle

The working principle of Hadoop revolves around its key components: the Hadoop
Distributed File System (HDFS) and the MapReduce processing framework. Let's
break down the working principles of each component:

1. Hadoop Distributed File System (HDFS):

• Data Storage: HDFS divides large datasets into blocks and distributes them across
multiple machines in a cluster. Each block is replicated across different nodes for
fault tolerance.
• Master-Slave Architecture: HDFS follows a master-slave architecture. The master
node, called the NameNode, manages the le system namespace and tracks the
location of each data block. The slave nodes, known as DataNodes, store and
manage the actual data blocks.
• Data Replication: HDFS replicates data blocks across multiple DataNodes. By
default, each block is replicated three times to ensure data reliability. The
replication factor and block placement policies can be con gured based on the
desired level of fault tolerance and data locality.

2. MapReduce Processing Framework:

• Data Processing: MapReduce is a programming model that allows distributed
processing of large datasets across a Hadoop cluster. It consists of two stages: the
map stage and the reduce stage.
• Map Stage: In the map stage, data is processed in parallel across the nodes in the
cluster. Each node processes a portion of the input data and produces intermediate
key-value pairs.
• Shuf e and Sort: After the map stage, the intermediate key-value pairs are grouped
and sorted based on the keys. This process is known as shuf e and sort.
• Reduce Stage: In the reduce stage, the sorted intermediate data is processed to
produce the nal output. Each node processes a subset of the sorted data and
produces the nal output key-value pairs.
• Fault Tolerance: MapReduce provides fault tolerance by automatically handling
failures. If a node fails during processing, the framework redistributes the failed
task to another node in the cluster.

The overall working principle of Hadoop involves storing large datasets across a
distributed le system (HDFS) and processing the data using the MapReduce
framework. HDFS ensures data reliability and fault tolerance, while MapReduce
enables parallel processing of the data for distributed computing. This combination
fl
fi
fi
fi
fi
fl
fi
allows Hadoop to handle large-scale data processing tasks ef ciently and reliably
across a cluster of machines.

Installation

Step 1: Download the Java 8 Package. Save this file in your home
directory.
Step 2: Extract the Java Tar File.
Command: tar -xvf jdk-8u101-linux-i586.tar.gz

Fig: Hadoop Installation – Extracting Java Files

Step 3: Download the Hadoop 2.7.3 Package.

Command: wget https://archive.apache.org/dist/hadoop/core/
hadoop-2.7.3/hadoop-2.7.3.tar.gz

Fig: Hadoop Installation – Downloading Hadoop

Step 4: Extract the Hadoop tar File.

Command: tar -xvf hadoop-2.7.3.tar.gz

Fig: Hadoop Installation – Extracting Hadoop Files

Step 5: Add the Hadoop and Java paths in the bash file (.bashrc).
Open. bashrc file. Now, add Hadoop and Java Path as shown below.

Command: vi .bashrc

Fig: Hadoop Installation – Setting Environment Variable

Then, save the bash file and close it.

For applying all these changes to the current Terminal, execute the source
command.

Command: source .bashrc

Fig: Hadoop Installation – Refreshing environment variables

To make sure that Java and Hadoop have been properly installed
on your system and can be accessed through the Terminal, execute
the java -version and hadoop version commands.

Command: java -version

Fig: Hadoop Installation – Checking Java Version

Command: hadoop version

Fig: Hadoop Installation – Checking Hadoop Version

Step 6: Edit the Hadoop Configuration files.

Command: cd hadoop-2.7.3/etc/hadoop/

Command: ls

All the Hadoop configuration files are located in hadoop-2.7.3/etc/hadoop

directory as you can see in the snapshot below:

Fig: Hadoop Installation – Hadoop Configuration Files

Step 7: Open core-site.xml and edit the property mentioned below inside
configuration tag:
core-site.xml informs Hadoop daemon where NameNode runs in the
cluster. It contains configuration settings of Hadoop core such as I/O
settings that are common to HDFS & MapReduce.

Command: vi core-site.xml

Fig: Hadoop Installation – Configuring core-site.xml

Step 8: Edit hdfs-site.xml and edit the property mentioned below inside
configuration tag:
hdfs-site.xml contains configuration settings of HDFS daemons (i.e.
NameNode, DataNode, Secondary NameNode). It also includes the
replication factor and block size of HDFS.

Command: vi hdfs-site.xml

Fig: Hadoop Installation – Configuring hdfs-site.xml

Step 9: Edit the mapred-site.xml file and edit the property mentioned
below inside configuration tag:
mapred-site.xml contains configuration settings of MapReduce application
like number of JVM that can run in parallel, the size of the mapper and the
reducer process, CPU cores available for a process, etc.

In some cases, mapred-site.xml file is not available. So, we have to create

the mapred-site.xml file using mapred-site.xml template.

Command: cp mapred-site.xml.template mapred-site.xml

Command: vi mapred-site.xml.

Fig: Hadoop Installation – Configuring mapred-site.xml

Step 10: Edit yarn-site.xml and edit the property mentioned below inside
configuration tag:
yarn-site.xml contains configuration settings of ResourceManager and
NodeManager like application memory management size, the operation
needed on program & algorithm, etc.

Command: vi yarn-site.xml

Fig: Hadoop Installation – Configuring yarn-site.xml

Step 11: Edit hadoop-env.sh and add the Java Path as mentioned below:
hadoop-env.sh contains the environment variables that are used in the
script to run Hadoop like Java home path, etc.

Command: vi hadoop–env.sh

Fig: Hadoop Installation – Configuring hadoop-env.sh

Step 12: Go to Hadoop home directory and format the NameNode.

Command: cd

Command: cd hadoop-2.7.3

Command: bin/hadoop namenode -format

Fig: Hadoop Installation – Formatting NameNode

This formats the HDFS via NameNode. This command is only executed
for the first time. Formatting the file system means initializing the
directory specified by the dfs.name.dir variable.

Never format, up and running Hadoop filesystem. You will lose all your
data stored in the HDFS.

Step 13: Once the NameNode is formatted, go to hadoop-2.7.3/sbin

directory and start all the daemons.
Command: cd hadoop-2.7.3/sbin

Either you can start all daemons with a single command or do it

individually.

Command: ./start-all.sh

The above command is a combination of start-dfs.sh, start-yarn.sh

& mr-jobhistory-daemon.sh

Or you can run all the services individually as below:

Start NameNode:
The NameNode is the centerpiece of an HDFS file system. It keeps the
directory tree of all files stored in the HDFS and tracks all the file stored
across the cluster.

Command: ./hadoop-daemon.sh start namenode

Fig: Hadoop Installation – Starting NameNode

Start DataNode:
On startup, a DataNode connects to the Namenode and it responds to the
requests from the Namenode for different operations.

Command: ./hadoop-daemon.sh start datanode

Fig: Hadoop Installation – Starting DataNode

Start ResourceManager:
ResourceManager is the master that arbitrates all the available cluster
resources and thus helps in managing the distributed applications running
on the YARN system. Its work is to manage each NodeManagers and the
each application’s ApplicationMaster.

Command: ./yarn-daemon.sh start resourcemanager

Fig: Hadoop Installation – Starting ResourceManager

Start NodeManager:
The NodeManager in each machine framework is the agent which is
responsible for managing containers, monitoring their resource usage and
reporting the same to the ResourceManager.

Command: ./yarn-daemon.sh start nodemanager

Fig: Hadoop Installation – Starting NodeManager

Start JobHistoryServer:
JobHistoryServer is responsible for servicing all job history related
requests from client.

Command: ./mr-jobhistory-daemon.sh start historyserver

Step 14: To check that all the Hadoop services are up and running, run the
below command.
Command: jps

Fig: Hadoop Installation – Checking Daemons

Step 15: Now open the Mozilla browser and go to localhost:50070/

dfshealth.html to check the NameNode interface.

Fig: Hadoop Installation – Starting WebUI

CASE STUDY

Title: Big Data Analytics Using Apache Hadoop: A Case Study on

Different Fertiliser Requirements and Availability in Different States of
India from 2012-2013 to 2014-2015

Abstract:
This case study explores the application of big data analytics using Apache
Hadoop to analyse the fertiliser requirements and availability across
various states in India from 2012-2013 to 2014-2015. The study aims to
identify patterns, trends, and insights that can help policymakers and
stakeholders make informed decisions regarding fertiliser allocation and
distribution.

1. Introduction:
The agriculture sector plays a crucial role in India's economy, and fertiliser
usage is a critical factor in achieving higher agricultural productivity. By
harnessing big data analytics, this study seeks to understand the fertiliser
requirements and availability in different states of India during the
specified time period.

2. Methodology:

2.1 Data Collection:

The study gathers data from various sources, including government
reports, agricultural surveys, and fertiliser production and distribution
records. The dataset includes information on fertiliser types, quantities
used, states, and years.

2.2 Data Preprocessing:

The collected data is cleaned, integrated, and transformed to ensure
consistency and compatibility. Missing values, outliers, and
inconsistencies are handled appropriately.

2.3 Data Storage and Processing:

Apache Hadoop, a widely used big data framework, is employed to store
and process the large-scale dataset. Hadoop Distributed File System
(HDFS) stores the data, while Hadoop MapReduce facilitates distributed
processing.

3. Analysis and Results:

3.1 Fertiliser Requirements:
The study examines the fertiliser requirements across different states
during the specified time period. It analyses the demand for various
fertiliser types, such as nitrogen, phosphorus, and potassium-based
fertilisers, and identifies the states with the highest requirements.

3.2 Fertiliser Availability:

The availability of different fertilisers in each state is investigated. The
study explores the distribution patterns and identifies any supply-demand
gaps in specific regions.

3.3 Temporal Analysis:

Temporal analysis is conducted to observe the trends and changes in
fertiliser requirements and availability over the three-year period. Seasonal
variations and long-term patterns are studied to understand the dynamics
of fertiliser usage.

4. Insights and Recommendations:

Based on the analysis, the study provides valuable insights into the
fertiliser requirements and availability in different states of India. These

insights can be used by policymakers, agricultural experts, and fertiliser

manufacturers to optimise allocation, distribution, and production
strategies.

5. Conclusion:
The case study demonstrates the application of big data analytics using
Apache Hadoop for analysing fertiliser requirements and availability in
various states of India. The study's findings offer valuable insights into
optimising fertiliser allocation and distribution, ultimately contributing to
improved agricultural productivity and sustainability.

6. Limitations and Future Work:

The study acknowledges potential limitations such as data quality,
representativeness, and the scope of analysis. Future work can focus on
incorporating more recent data, including additional variables such as crop
types, rainfall, and soil quality, to enhance the analysis and provide more
comprehensive recommendations.

By leveraging big data analytics, this case study contributes to data-driven

decision-making in the agricultural domain, enabling stakeholders to make
informed choices for a more efficient and sustainable fertiliser
management system in India.

In the above case study, Apache Hadoop is used as the primary big data
framework to store, process, and analyse the large-scale dataset related to
fertiliser requirements and availability in different states of India from
2012-2013 to 2014-2015. Here's how Hadoop is utilised:

1. Data Storage:
Hadoop Distributed File System (HDFS) is employed to store the dataset.
HDFS is a distributed file system designed to store large volumes of data
across multiple machines in a cluster. It provides fault tolerance and high
scalability, allowing the dataset to be partitioned and distributed across the
cluster.

2. Data Processing:

Hadoop MapReduce, a programming model and processing framework, is

used for distributed data processing. MapReduce allows for parallel
processing of data across the Hadoop cluster, enabling efficient
computation on large datasets.

3. Data Preprocessing:
Before analysis, the collected data is preprocessed using Hadoop. This
involves cleaning the data, handling missing values, removing outliers, and
transforming the data to ensure consistency and compatibility. Hadoop's
distributed processing capability helps handle the preprocessing tasks
efficiently.

4. Analysis:
Hadoop's MapReduce is leveraged to perform various analytical tasks on
the dataset. MapReduce divides the analysis into two phases: the Map
phase and the Reduce phase. During the Map phase, the dataset is
processed in parallel across the cluster, generating intermediate results. In
the Reduce phase, the intermediate results are combined to produce the
final analysis output.

5. Scalability and Performance:

Hadoop's distributed nature allows for scalability and improved
performance in handling large datasets. By distributing the data and
computation across multiple nodes, Hadoop can process vast amounts of
data in a parallel and distributed manner, reducing the overall processing
time.

6. Insights and Recommendations:

The analysis performed using Hadoop helps derive insights and
recommendations regarding fertilizer requirements and availability. The
results obtained from Hadoop-based analysis can guide policymakers,
agricultural experts, and fertilizer manufacturers in making informed
decisions regarding allocation, distribution, and production strategies.

Overall, Hadoop serves as a foundational technology in this case study,

enabling efficient storage, processing, and analysis of big data related to
fertilizer requirements and availability in India. It empowers data-driven
decision-making in the agricultural domain by handling the complexity
and scale of the dataset effectively.

OUTPUT

Clean Architectures in Python
100% (1)
Clean Architectures in Python
153 pages
Netact Summary
No ratings yet
Netact Summary
63 pages
Simple Network Management Protocol
100% (1)
Simple Network Management Protocol
41 pages
Module Pool Programming
No ratings yet
Module Pool Programming
24 pages
C C4h47i 341707896269
No ratings yet
C C4h47i 341707896269
26 pages
SQL PL SQL Content
No ratings yet
SQL PL SQL Content
4 pages
Business Intelligence For Big Data Analytics
No ratings yet
Business Intelligence For Big Data Analytics
8 pages
University Time Table Scheduling System Databases Design
No ratings yet
University Time Table Scheduling System Databases Design
8 pages
Why Should We Use Separate ASM Home?
No ratings yet
Why Should We Use Separate ASM Home?
3 pages
Curriculum Vitae or Resume: Whydow Emakea CV? What Are The Key Elements of CV?
100% (1)
Curriculum Vitae or Resume: Whydow Emakea CV? What Are The Key Elements of CV?
11 pages
20A91A04C0 Internship Document
No ratings yet
20A91A04C0 Internship Document
38 pages
Sales and Inventory MGT Sys
No ratings yet
Sales and Inventory MGT Sys
86 pages
TM Endpoint Security Presentation
No ratings yet
TM Endpoint Security Presentation
67 pages
Edmodo Module 03 Activity 02
No ratings yet
Edmodo Module 03 Activity 02
3 pages
JK DBMS Ii Year (48P X 62C) Unit V
No ratings yet
JK DBMS Ii Year (48P X 62C) Unit V
48 pages
Keycloak Part 1
No ratings yet
Keycloak Part 1
22 pages
Axis Net Banking & E Statement
No ratings yet
Axis Net Banking & E Statement
19 pages
Tait SS TN9400
No ratings yet
Tait SS TN9400
4 pages
Big Data and E-Government A Review
No ratings yet
Big Data and E-Government A Review
8 pages
Shivani Pathak
No ratings yet
Shivani Pathak
8 pages
Android Application Components
No ratings yet
Android Application Components
3 pages
Chapter 2. Computer Hardware
No ratings yet
Chapter 2. Computer Hardware
14 pages
Credential Harvesting
No ratings yet
Credential Harvesting
18 pages
Which of The Following Is A Good Practice ? A. B. C. D
No ratings yet
Which of The Following Is A Good Practice ? A. B. C. D
13 pages
Blockchain QB
No ratings yet
Blockchain QB
3 pages
Arista ETM White Paper Keeping Your Network Safe
No ratings yet
Arista ETM White Paper Keeping Your Network Safe
7 pages
An Information Silo: Management System Information System Information
No ratings yet
An Information Silo: Management System Information System Information
5 pages
Arcadia Resume Pranjal
No ratings yet
Arcadia Resume Pranjal
1 page
Hospital Management Software Development: Olawale Ayotunde Sobogungod
No ratings yet
Hospital Management Software Development: Olawale Ayotunde Sobogungod
3 pages
LAN and WAN Network
No ratings yet
LAN and WAN Network
1 page
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
From Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
4/5 (6458)
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (648)
Never Split the Difference: Negotiating As If Your Life Depended On It
From Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
4.5/5 (1005)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
4/5 (1175)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4.5/5 (1856)
Grit: The Power of Passion and Perseverance
From Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
4/5 (650)
The Little Book of Hygge: Danish Secrets to Happy Living
From Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
3.5/5 (464)
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
From Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
4.5/5 (582)
A Man Called Ove: A Novel
From Everand
A Man Called Ove: A Novel
Fredrik Backman
4.5/5 (5181)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (298)
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
From Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
4.5/5 (141)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4.5/5 (4103)
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
From Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
4.5/5 (361)
Yes Please
From Everand
Yes Please
Amy Poehler
4/5 (2016)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4.5/5 (1139)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (943)
The Woman in Cabin 10
From Everand
The Woman in Cabin 10
Ruth Ware
3.5/5 (2814)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (629)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (2886)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (144)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M.L. Stedman
4.5/5 (815)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1267)
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
From Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
4/5 (1022)
The Constant Gardener: A Novel
From Everand
The Constant Gardener: A Novel
John le Carré
4/5 (278)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (244)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (903)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (836)
The Art of Racing in the Rain: A Novel
From Everand
The Art of Racing in the Rain: A Novel
Garth Stein
4/5 (4372)
Bad Feminist: Essays
From Everand
Bad Feminist: Essays
Roxane Gay
4/5 (1090)
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2289)
Little Women
From Everand
Little Women
Louisa May Alcott
4.5/5 (2369)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (100)
Wolf Hall: A Novel
From Everand
Wolf Hall: A Novel
Hilary Mantel
4/5 (4135)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (233)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2546)
A Tree Grows in Brooklyn
From Everand
A Tree Grows in Brooklyn
Betty Smith
4.5/5 (2033)
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
From Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
4.5/5 (280)
Brooklyn: A Novel
From Everand
Brooklyn: A Novel
Colm Tóibín
3.5/5 (2141)
On Fire: The (Burning) Case for a Green New Deal
From Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
4/5 (78)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (919)

Bda Aat

Uploaded by

Bda Aat

Uploaded by

B M S COLLEGE OF ENGINEERING

(An Autonomous Ins+tu+on Af liated to VTU, Belagavi)

DEPARTMENT OF MACHINE LEARNING

Academic Year: 2022-2023 (Session: April 2023 - July 2023)

SOCIAL MEDIA ANALYTICS (22AM6PESMA)

ALTERNATIVE ASSESSMENT TOOL (AAT)

Valuation Report (to be lled by the faculty)

Score: Faculty In-charge: Dr. Arun

Hadoop is an open-source framework designed for distributed storage and processing

The key components of Hadoop are:

1. Hadoop Distributed File System (HDFS): HDFS is a distributed le system

2. Yet Another Resource Negotiator (YARN): YARN is the resource

3. MapReduce: MapReduce is a programming model and processing engine for

1. Hadoop Distributed File System (HDFS):

2. MapReduce Processing Framework:

Fig: Hadoop Installation – Extracting Java Files

Step 3: Download the Hadoop 2.7.3 Package.

Fig: Hadoop Installation – Downloading Hadoop

Step 4: Extract the Hadoop tar File.

Fig: Hadoop Installation – Extracting Hadoop Files

Fig: Hadoop Installation – Setting Environment Variable

Then, save the bash file and close it.

Command: source .bashrc

Fig: Hadoop Installation – Refreshing environment variables

Fig: Hadoop Installation – Checking Java Version

Command: hadoop version

Fig: Hadoop Installation – Checking Hadoop Version

Step 6: Edit the Hadoop Configuration files.

All the Hadoop configuration files are located in hadoop-2.7.3/etc/hadoop

Fig: Hadoop Installation – Hadoop Configuration Files

Fig: Hadoop Installation – Configuring core-site.xml

Fig: Hadoop Installation – Configuring hdfs-site.xml

In some cases, mapred-site.xml file is not available. So, we have to create

Command: cp mapred-site.xml.template mapred-site.xml

Fig: Hadoop Installation – Configuring mapred-site.xml

Fig: Hadoop Installation – Configuring yarn-site.xml

Fig: Hadoop Installation – Configuring hadoop-env.sh

Step 12: Go to Hadoop home directory and format the NameNode.

Command: bin/hadoop namenode -format

Fig: Hadoop Installation – Formatting NameNode

Step 13: Once the NameNode is formatted, go to hadoop-2.7.3/sbin

Either you can start all daemons with a single command or do it

The above command is a combination of start-dfs.sh, start-yarn.sh

Or you can run all the services individually as below:

Command: ./hadoop-daemon.sh start namenode

Fig: Hadoop Installation – Starting NameNode

Command: ./hadoop-daemon.sh start datanode

Fig: Hadoop Installation – Starting DataNode

Command: ./yarn-daemon.sh start resourcemanager

Fig: Hadoop Installation – Starting ResourceManager

Command: ./yarn-daemon.sh start nodemanager

Fig: Hadoop Installation – Starting NodeManager

Command: ./mr-jobhistory-daemon.sh start historyserver

Fig: Hadoop Installation – Checking Daemons

Step 15: Now open the Mozilla browser and go to localhost:50070/

Fig: Hadoop Installation – Starting WebUI

Title: Big Data Analytics Using Apache Hadoop: A Case Study on

2.1 Data Collection:

2.2 Data Preprocessing:

2.3 Data Storage and Processing:

3. Analysis and Results:

3.2 Fertiliser Availability:

3.3 Temporal Analysis:

4. Insights and Recommendations:

insights can be used by policymakers, agricultural experts, and fertiliser

6. Limitations and Future Work:

By leveraging big data analytics, this case study contributes to data-driven

Hadoop MapReduce, a programming model and processing framework, is

5. Scalability and Performance:

6. Insights and Recommendations:

Overall, Hadoop serves as a foundational technology in this case study,

You might also like