0% found this document useful (0 votes)
16 views1 page

Drawbacks of Fixed Partitioning in Hadoop

Fixed partitioning in Hadoop requires knowing the number of partitions in advance, which may not align with actual data distribution and can lead to unevenly sized partitions and inefficient resource usage. It is generally better to let the cluster determine the number of partitions to allow for efficient resource utilization, though in some special cases like zero or one reducer, fixed partitioning may be appropriate. The MultipleOutputs class can be used to write multiple output files from each reducer when using a partitioner like HashPartitioner.

Uploaded by

ponnaraseebk999
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views1 page

Drawbacks of Fixed Partitioning in Hadoop

Fixed partitioning in Hadoop requires knowing the number of partitions in advance, which may not align with actual data distribution and can lead to unevenly sized partitions and inefficient resource usage. It is generally better to let the cluster determine the number of partitions to allow for efficient resource utilization, though in some special cases like zero or one reducer, fixed partitioning may be appropriate. The MultipleOutputs class can be used to write multiple output files from each reducer when using a partitioner like HashPartitioner.

Uploaded by

ponnaraseebk999
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 1

Drawbacks of Fixed Partitioning in Hadoop:

Two drawbacks of this approach:


 You need to know the number of partitions (reducers) in advance, which may not align with
the actual number of weather stations.
 Fixed partitioning can lead to unevenly sized partitions and inefficient resource usage.

Special Cases for Setting the Number of Partitions (Reducers):


 Two special cases where setting the number of partitions makes sense:
 Zero reducers: No partitions, only map tasks are run.
 One reducer: Useful for small jobs when combining the output of previous jobs into a single
file.

Letting the Cluster Determine Partitions:


 It's generally better to let the cluster determine the number of partitions, allowing for efficient
resource utilization.
 The default HashPartitioner works well as it adapts to the available resources and helps create
more evenly-sized partitions.

Using MultipleOutputs:
 When using HashPartitioner, each partition contains multiple stations.
 To create a file per station, you can use the MultipleOutputs class, which allows each reducer
to write multiple files with different names.

You might also like