0% found this document useful (0 votes)
17 views

Java

Java does not natively support reading Microsoft Word documents, but the Apache POI package provides this capability. The document provides a code sample using Apache POI to read a Word document, extract the text from its paragraphs, and print the number of paragraphs and length of each paragraph to the console.

Uploaded by

Praween Kumar
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

Java

Java does not natively support reading Microsoft Word documents, but the Apache POI package provides this capability. The document provides a code sample using Apache POI to read a Word document, extract the text from its paragraphs, and print the number of paragraphs and length of each paragraph to the console.

Uploaded by

Praween Kumar
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 1

When it comes to reading Microsoft Office Word document Java does not have any in build

classes to handle this but Apache POI Package developed by Apache Foundation gives you the
power of reading Microsoft Word document in Java. More information on the Apache POI
package can be found at Apache POI

import org.apache.poi.poifs.filesystem.*;
import org.apache.poi.hwpf.*;
import org.apache.poi.hwpf.extractor.*;
import java.io.*;
 
public class readDoc
{
public static void main( String[] args )
{
String filesname = "Hello.doc";
POIFSFileSystem fs = null;
try
{
fs = new POIFSFileSystem(new FileInputStream(filesname;
//Couldn't close the braces at the end as my site did not
allow it to close
 
HWPFDocument doc = new HWPFDocument(fs);
 
WordExtractor we = new WordExtractor(doc);
 
String[] paragraphs = we.getParagraphText();
 
System.out.println( "Word Document has " + paragraphs.length
+ " paragraphs" );
for( int i=0; i<paragraphs .length; i++ ) {
paragraphs[i] =
paragraphs[i].replaceAll("\\cM?\r?\n","");
System.out.println( "Length:"+paragraphs[ i
].length());
}
}
catch(Exception e) {
e.printStackTrace();
}
}

You might also like