Dinesh's Blog - Image Processing Using OpenCV On Hadoop Using HIPI
Dinesh's Blog - Image Processing Using OpenCV On Hadoop Using HIPI
Mo n d a y, Ma y 1 8 , 2 0 1 5 About M e
Blog Archive
2015 (2)
May (2)
Image processi
Hadoop usin
Image processi
Hadoop usin
Problem
This project tries to solve the problem of processing big data of images on
Apache Hadoop using Hadoop Image Processing Interface (HIPI) for storing and
efficient distributed processing, combined with OpenCV, an open source library of
rich image processing algorithms. A program to count number of faces in
collection of images is demonstrated.
Background
Processing large set of images on a single machine can be very time consuming
and costly. HIPI is an image processing library designed to be used with the
Apache Hadoop MapReduce, a software framework for sorting and processing
big data in a distributed fashion on large cluster of commodity hardware. HIPI
facilitates efficient and high-throughput image processing with MapReduce style
parallel programs typically executed on a cluster. It provides a solution for how to
store a large collection of images on the Hadoop Distributed File System (HDFS)
and make them available for efficient distributed processing.
1 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...
OpenCV (Open Source Computer Vision) is an open source library of rich image
processing algorithms, mainly aimed at real time computer vision. Starting with
OpenCV 2.4.4, OpenCV supports Java Development which can be used with
Apache Hadoop.
Goal
This project demonstrates how HIPI and OpenCV can be used together to count
total number of faces in big image dataset.
Overview of Steps
2 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...
Downloaded image were in gif format, I used Mac OSX Preview program to
convert these to png format.
Technologies Used:
Cloudera Quickstart VM for single node Hadoop cluster for testing and running
VM map/reduce programs
IntelliJ IDEA 14 CE Java IDE for editing and compiling Java code
Hadoop Image Image processing library designed to be used with the Apache
Processing Hadoop MapReduce parallel programming framework, for storing
Interface (HIPI) large collection of images on HDFS and efficient distributed
processing.
References:
http://hipi.cs.virginia.edu/index.html
http://docs.opencv.org/doc/tutorials/introduction/desktop_java/java_dev_intro.html
http://radar.oreilly.com/2013/12/how-to-analyze-100-million-images-for-624.html
Steps
3 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...
recompilation. (http://en.wikipedia.org/wiki/VMware_Fusion)
Download and install VMWare Fusion from following URL, this will be used to run
Cloudera Quickstart VM.
https://my.vmware.com/web/vmware/details?downloadGroup=FUS-
711&productId=450&rPId=7446
http://www.cloudera.com/content/cloudera/en/downloads/quickstart_vms/cdh-5-4-
4 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...
x.html
Setup Hadoop
The Cloudera Quickstart VM 5.4.x comes pre-installed with Hadoop 2.6 which is
needed needed for running HIPI.
5 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...
c788a14a5de9ecd968d1e2666e8765c5f018c271
Compiled by jenkins on 2015-04-21T19:18Z
Compiled with protoc 2.5.0
From source with checksum cd78f139c66c13ab5cee96e15a629025
This command was run using /usr/lib/hadoop/hadoop-common-2.6.0-
cdh5.4.0.jar
The best way to check and verify that your system is properly setup is to clone
the official GitHub repository and build the tools and example programs.
6 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...
/hadoop-2.6.0-cdh5.4.0.tar.gz
[cloudera@quickstart Project]$ ls
hadoop-2.6.0-cdh5.4.0 hipi
build.xml
to
7 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...
hipi:
[javac] Compiling 30 source files to /home/cloudera/Project
/hipi/lib
[jar] Building jar: /home/cloudera/Project/hipi/lib/hipi-
2.0.jar
[echo] Hipi library built.
compile:
[javac] Compiling 1 source file to /home/cloudera/Project
/hipi/bin
[jar] Building jar: /home/cloudera/Project/hipi/examples
/covariance.jar
[echo] Covariance built.
all:
BUILD SUCCESSFUL
Total time: 36 seconds
[cloudera@quickstart hipi]$ ls
3rdparty build.xml doc lib license.txt release util
bin data examples libsrc README.md tool
[cloudera@quickstart hipi]$ ls tool/
hibimport.jar
[cloudera@quickstart hipi]$ ls examples/
covariance.jar hipi runCreateSequenceFile.sh
createsequencefile.jar jpegfromhib.jar runDownloader.sh
downloader.jar rumDumpHib.sh runJpegFromHib.sh
dumphib.jar runCovariance.sh testimages.txt
8 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...
SampleProgram.java
import hipi.image.FloatImage;
import hipi.image.ImageHeader;
import hipi.imagebundle.mapreduce.ImageBundleInputFormat;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import java.io.IOException;
9 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...
} // If (value != null...
} // map()
}
if (total > 0) {
// Normalize sum to obtain average
avg.scale(1.0f / total);
// Assemble final output as string
float[] avgData = avg.getData();
String result = String.format("Average pixel value: %f %f
%f", avgData[0], avgData[1], avgData[2]);
// Emit output of job which will be written to HDFS
10 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...
} // reduce()
}
11 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...
build.xml
<target name="sample">
<antcall target="compile">
<param name="srcdir" value="sample" />
<param name="jarfilename" value="sample.jar" />
<param name="jardir" value="sample" />
<param name="mainclass" value="SampleProgram" />
</antcall>
</target>
...
Build SampleProgram
BUILD SUCCESSFUL
Total time: 16 seconds
Create a sample.hib on HDFS file from sample images provides with HIPI using
hibimport tool, this will be the input to MapReduce program.
12 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...
The average pixel value calculated for all the image is:
13 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...
The zip bundle for OpenCV 2.4.11 source can be download from following URL
and unzip it to ~/Project/opencv directory.
http://sourceforge.net/projects/opencvlibrary/files/opencv-unix/2.4.11/
-- Generating done
14 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...
Build OpenCV
This will create a jar containing the Java interface (bin/opencv-2411.jar) and a
native dynamic library containing Java bindings and all the OpenCV stuff
(lib/libopencv_java2411.so). Well use these files to build and run OpenCV
program.
Following steps are run to make sure OpenCV is setup correctly and works as
expected.
Create a new directory sample and Create an ant build.xml file in it.
build.xml
15 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...
<target name="clean">
<delete dir="${build.dir}"/>
</target>
<target name="compile">
<mkdir dir="${classes.dir}"/>
<javac includeantruntime="false" srcdir="${src.dir}"
destdir="${classes.dir}" classpathref="classpath"/>
</target>
16 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...
</project>
DetectFaces.java
import org.opencv.core.Core;
import org.opencv.core.Mat;
import org.opencv.core.Scalar;
import org.opencv.highgui.*;
import org.opencv.core.MatOfRect;
import org.opencv.core.Point;
import org.opencv.core.Rect;
import org.opencv.objdetect.CascadeClassifier;
import java.io.File;
/**
* Created by dmalav on 4/30/15.
*/
public class DetectFaces {
System.out.println(String.format("Detected %s faces",
faceDetections.toArray().length));
17 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...
}
}
Main.java
import org.opencv.core.Core;
import java.io.File;
System.loadLibrary(Core.NATIVE_LIBRARY_NAME);
if (args.length == 0) {
System.err.println("Usage Main /path/to/images");
System.exit(1);
}
18 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...
} else {
System.out.println("File: " + file.getAbsolutePath());
faces.run(file.getAbsolutePath());
}
}
}
}
compile:
[mkdir] Created dir: /home/cloudera/Project/opencv/sample/build
/classes
[javac] Compiling 2 source files to /home/cloudera/Project
/opencv/sample/build/classes
jar:
[mkdir] Created dir: /home/cloudera/Project/opencv/sample
/build/jar
[jar] Building jar: /home/cloudera/Project/opencv/sample/build
/jar/Main.jar
BUILD SUCCESSFUL
Total time: 3 seconds
This build creates a build/jar/Main.jar file which can be used to detect faces from
images stored in a directory:
Running DetectFaceDemo
/home/cloudera/Project/opencv/sample/lbpcascade_frontalface.xml
Detected 7 faces
addams-family.png
Writing addams-family.png
19 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...
OpenCV does fairly good job detecting front facing faces when
lbpcascade_frontalface.xml classifier is used. There are other classifier
provided by OpenCV which can detect rotate faces and other face orientations.
20 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...
if [ -d "${HADOOP_PREFIX}/build/native" -o -d
"${HADOOP_PREFIX}/$HADOOP_COMMON_LIB_NATIVE_DIR" ]; then
if [ -d "${HADOOP_PREFIX}/$HADOOP_COMMON_LIB_NATIVE_DIR" ]; then
if [ "x$JAVA_LIBRARY_PATH" != "x" ]; then
JAVA_LIBRARY_PATH=${JAVA_LIBRARY_PATH}:${HADOOP_PREFIX}/$HADOOP
_COMMON_LIB_NATIVE_DIR
else
JAVA_LIBRARY_PATH=${HADOOP_PREFIX}/$HADOOP_COMMON_LIB_NATIVE_DI
R
fi
fi
fi
.
.
This step details Java code for combining HIPI with OpenCV.
21 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...
return mat;
}
22 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...
....
....
23 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...
if (mappingFileUri != null) {
faceDetector = new
CascadeClassifier("./lbpcascade_frontalface.xml");
} else {
System.out.println(">>>>>> NO MAPPING FILE");
}
} else {
System.out.println(">>>>>> NO CACHE FILES AT ALL");
}
super.setup(context);
} // setup()
....
}
Mapper:
1. Load OpenCV native library
2. Create CascadeClassifier
3. Convert HIPI FloatImage to OpenCV Mat
4. Detect and count faces in the image
5. Write number of faces detected to context
Reducer:
1. Count number of files processed
2. Count number of faces detected
3. Output number of files and faces detected
FaceCount.java
import hipi.image.FloatImage;
import hipi.image.ImageHeader;
import hipi.imagebundle.mapreduce.ImageBundleInputFormat;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
24 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.opencv.core.*;
import org.opencv.objdetect.CascadeClassifier;
import java.io.IOException;
import java.net.URI;
25 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...
mat.put(j, i, rgb);
}
}
return mat;
}
return faceDetections.toArray().length;
}
if (mappingFileUri != null) {
faceDetector = new
CascadeClassifier("./lbpcascade_frontalface.xml");
} else {
System.out.println(">>>>>> NO MAPPING FILE");
}
} else {
System.out.println(">>>>>> NO CACHE FILES AT ALL");
}
26 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...
super.setup(context);
} // setup()
} // If (value != null...
} // map()
}
27 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...
total);
// Emit output of job which will be written to HDFS
context.write(new IntWritable(images), new Text(result));
} // reduce()
}
28 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...
Create new facecount directory in hipi folder (where HIPI was built) and copy
FaceCount.java from previous step.
Make changes to HIPI build.xml ant script to build link to OpenCV jar file and add
new build target facecount.
build.xml
<target name="setup">
....
29 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...
</target>
....
<target name="facecount">
<antcall target="compile">
<param name="srcdir" value="facecount" />
<param name="jarfilename" value="facecount.jar" />
<param name="jardir" value="facecount" />
<param name="mainclass" value="FaceCount" />
</antcall>
</target>
<target name="all"
depends="hipi,hibimport,downloader,dumphib,jpegfromhib,createsequenc
efile,covariance" />
30 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...
</project>
Build FaceCount.java
facecount:
setup:
[echo] Setting properties for build task...
[echo] Properties set.
test_settings:
[echo] Confirming that hadoop settings are set...
[echo] Properties are specified properly.
hipi:
[echo] Building the hipi library...
hipi:
[javac] Compiling 30 source files to /home/cloudera/Project
/hipi/lib
[jar] Building jar: /home/cloudera/Project/hipi/lib/hipi-
2.0.jar
[echo] Hipi library built.
compile:
[jar] Building jar: /home/cloudera/Project/hipi/facecount
/facecount.jar
BUILD SUCCESSFUL
Total time: 12 seconds
31 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...
Create HIB
Run MapReduce
run-facecount.sh
32 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...
#!/bin/bash
hadoop fs -rm -R project/output
hadoop jar facecount/facecount.jar project/input.hib project/output
33 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...
34 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...
Bytes Written=27
Check results:
9. Summary
OpenCV provides very rich set of tools for image processing, when combined
with HIPIs efficient and high-throughput parallel image processing power can be
a great solution for processing very large image dataset very fast. These tools
can help researcher and engineers alike to achieve high performance image
processing.
10. Issues
The wrapper function to convert HIPI FloatImage to OpenCV did not work for
some reason and not producing the correct image when converted. This was
causing a bug of no faces detected. I had contacted HIPI members but did not
receive timely reply before finishing this project. This bug is causing my results to
show 0 faces detected.
Pros: HIPI is a great tool for processing very large volume of images in hadoop
cluster, when combined with OpenCV it can be very powerful.
Cons: Converting the image format (HIPI FloatImage) to OpenCV Mat format is
not straightforward and causing the issues for OpenCV to process images
correctly.
35 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...
11 comments:
yes, I like it. I have question for Dinesh Malav. You can use SIFTDectector (Opencv) to find
the same image on Hadoop using HIPI?
Reply
I am not familiar with SIFTDectector but a wrapper to convert HIPI FloatImage to any
required format can be written and used with OpenCV.
Reply
Replies
Hi Dinesh Malav,
I want to know whether is it possible to search a image in the bundle that match
to the image I give. Thanks
Reply
Reply
Replies
36 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...
Reply
Hello Dinesh
Did you try to run this MapReduce prog. in Hipi using eclipse by having separate classes like
driver, mapper and reducer. If yes, please give me the steps of eclipse. Creating with single
file.java works fine with me.. any help!
-Prasad
Reply
The Hadoop tutorial you have explained is most useful for begineers who are taking Hadoop
Administrator Online Training
Thank you for sharing Such a good tutorials on Hadoop Image Processing
Reply
Apart from learning more about Hadoop at hadoop online training, this blog adds to my
learning platforms. Great work done by the webmasters. Thanks for your research and
experience sharing on a platform like this.
Reply
Some topics covered,may be it helps someone,HDFS is a Java-based file system that provides
scalable and reliable data storage,and it was designed to span large clusters of commodity
servers. HDFS has demonstrated production scalability of up to 200 PB of storage and a single
cluster of 4500 servers, supporting close to a billion files and blocks.
http://www.computaholics.in/2015/12/hdfs.html
http://www.computaholics.in/2015/12/mapreduce.html
http://www.computaholics.in/2015/11/hadoop-fs-commands.html
Reply
hi,dinesh
i have one project in which i have to give image of one person it will search into a video
frame by frame,match and give the result like count of face detection,timing of appearance
etc. so is it possible with HiPi and opencv??
Reply
I have my hipi with build.gradle file..!!! Where do i need to specify the opencv
dependencies in hipi instead of build.xml........??????
Reply
37 of 38 03/03/16 16:15
Dinesh's Blog: Image processing using OpenCV on Hadoop using HIPI http://dinesh-malav.blogspot.co.id/2015/05/image-processing-using-op...
Comment as:
Publish
38 of 38 03/03/16 16:15