Word frequency count example

Ilias TsagklisNovember 11th, 2012Last Updated: August 14th, 2013

0 85 2 minutes read

With this example we are going to demonstrate how to count the frequency of words in a file. In short, to count the frequency of words in a file you should:

Create a new FileInputStream with a given String path by opening a connection to a file.
Get the FileChannel object associated with the FileInputStream, with getChannel() API method of FileInputStream.
Get the current size of this channel’s file, using size() API method of FileChannel.
Create a MappedByteBuffer, using map(MapMode mode, long position, long size) API method of FileChannel that maps a region of this channel’s file directly into memory.
Convert the byte buffer to character buffer. Create a new Charset for a specified charset name, using forName(String charsetName) API method of Charset and then a new CharsetDecoder, using newDecoder() API method of Charset. Then use decode(ByteBuffer in) API method of CharBuffer to decode the remaining content of a single input byte buffer into a newly-allocated character buffer.
Create a new word pattern and a new line pattern, by compiling given String regular expressions to a Pattern, using compile(string regex) API method of Pattern.
Match the line pattern to the buffer, using matcher(CharSequence input) API method of Pattern.
For each line get the line and the array of words in the line, using find() and group() API methods of Matcher, for the matcher created for the line pattern.
Then for each word get the word and add it in a TreeMap.

Let’s take a look at the code snippet that follows:

package com.javacodegeeks.snippets.core;
import java.io.FileInputStream;
import java.nio.CharBuffer;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.charset.Charset;
import java.nio.charset.CharsetDecoder;
import java.util.Map;
import java.util.TreeMap;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class WordFreq {

    public static void main(String args[]) throws Exception {

  String filePath = "C:/Users/nikos7/Desktop/file.odt";


  // Map File from filename to byte buffer

  FileInputStream in = new FileInputStream(filePath);

  FileChannel filech = in.getChannel();

  int fileLen = (int) filech.size();

  MappedByteBuffer buf = filech.map(FileChannel.MapMode.READ_ONLY, 0,


    fileLen);


  // Convert to character buffer

  Charset chars = Charset.forName("ISO-8859-1");

  CharsetDecoder dec = chars.newDecoder();

  CharBuffer charBuf = dec.decode(buf);


  // Create line pattern

  Pattern linePatt = Pattern.compile(".*$", Pattern.MULTILINE);


  // Create word pattern

  Pattern wordBrkPatt = Pattern.compile("[\\p{Punct}\s}]");


  // Match line pattern to buffer

  Matcher lineM = linePatt.matcher(charBuf);


  Map m = new TreeMap();

  Integer one = new Integer(1);


  // For each line

  while (lineM.find()) {


// Get line


CharSequence lineSeq = lineM.group();



// Get array of words on line


String words[] = wordBrkPatt.split(lineSeq);



// For each word


for (int i = 0, n = words.length; i < n; i++) {


    if (words[i].length() > 0) {



  Integer frequency = (Integer) m.get(words[i]);



  if (frequency == null) {




frequency = one;



  } else {




int value = frequency.intValue();




frequency = new Integer(value + 1);



  }



  m.put(words[i], frequency);


    }


}

  }

  System.out.println(m);
    }
}

Output:

WordPress=2, Working=1, Your=3, YouÃ¢Â€Â™ll=1, a=136, able=1, about=8, above=2, absolutely=1, absurd=1, accept=.....

This was an example of how to count the frequency of words in a file in Java.

Word frequency count example

Thank you!

Ilias Tsagklis

Thank you!

Thank you!

Related Articles

Thank you!