Big Data Hadoop
Big Data Hadoop
Answer:
Hadoop is an open-source framework that allows for the distributed processing
of large data sets across clusters of computers using simple programming
models. It is designed to scale up from a single server to thousands of
machines.
10. What is the difference between Hadoop 1.x and Hadoop 2.x?
Answer:
The main difference is that Hadoop 2.x introduces YARN (Yet Another Resource
Negotiator) for better resource management and scalability, while Hadoop 1.x
uses a single JobTracker to manage resources.
13. What is the difference between a Map and Reduce task in Hadoop?
Answer:
Map task: Breaks down input data into key-value pairs.
Reduce task: Aggregates or processes data based on the key-value pairs
produced by the Map task.
16. What is the difference between HDFS and traditional file systems?
Answer:
HDFS is designed for distributed storage, offering fault tolerance and scalability,
while traditional file systems are typically limited to a single server with a
higher risk of failure and lower scalability.
34. What are the main differences between Hadoop and Spark?
Answer:
Hadoop is based on MapReduce, which writes intermediate data to disk,
making it slower.
Spark is in-memory computing and provides faster processing due to its
ability to store intermediate data in memory.