0% found this document useful (0 votes)
38 views2 pages

Q3 - To Run A Basic Word Count MapReduce

This document provides a simple Word Count program in R as an example of how the MapReduce paradigm works. The program takes input text from a file, maps it by counting the occurrences of each word, reduces the counts, and outputs the results. While this example runs locally, a true MapReduce system would distribute the map and reduce steps across multiple nodes for processing large datasets.

Uploaded by

omkarsahane2001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views2 pages

Q3 - To Run A Basic Word Count MapReduce

This document provides a simple Word Count program in R as an example of how the MapReduce paradigm works. The program takes input text from a file, maps it by counting the occurrences of each word, reduces the counts, and outputs the results. While this example runs locally, a true MapReduce system would distribute the map and reduce steps across multiple nodes for processing large datasets.

Uploaded by

omkarsahane2001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 2

Running a Word Count MapReduce program typically involves using a distributed

computing framework like Apache Hadoop. However, since you're asking for a basic
example, I'll provide a simple Word Count program in R, which won't be distributed
but will give you an idea of how the MapReduce paradigm works.

Firstly, you need to have R installed on your system. If you haven't installed it,
you can download and install it from the official R website: [R
Project](https://www.r-project.org/).

Here's a basic Word Count program in R:

```R
# Word Count MapReduce program in R

# Function to perform Map step


map <- function(text) {
words <- unlist(strsplit(text, "\\s+"))
key_value_pairs <- lapply(words, function(word) list(word, 1))
return(key_value_pairs)
}

# Function to perform Reduce step


reduce <- function(key, values) {
return(sum(values))
}

# Read input text from a file


input_file <- "input.txt"
text <- tolower(readLines(input_file))

# Map step
mapped_data <- unlist(lapply(text, map))

# Reduce step
result <- tapply(mapped_data, names(mapped_data), reduce)

# Print the word count


cat("Word Count:\n")
print(result)

# Save the result to an output file


write.table(data.frame(word = names(result), count = result), file = "output.txt",
quote = FALSE, row.names = FALSE, sep = "\t")
```

This is a basic example, and it assumes you have a file named `input.txt` in the
same directory with the text you want to analyze.

To run this program:

1. Save the code to a file, e.g., `wordcount.R`.


2. Create an input file (`input.txt`) with the text you want to analyze.
3. Open R in your terminal or RStudio.
4. Run the script using `source("wordcount.R")`.

The word count result will be printed, and an output file (`output.txt`) will be
created with the word count information.
Please note that this example is for educational purposes and doesn't leverage the
parallel processing capabilities of a true MapReduce system. In a real distributed
environment, such as Apache Hadoop, the Map and Reduce steps would be executed
across multiple nodes to handle large-scale data.

You might also like