Final Project: Face Recognition: Objectives of This Project
Final Project: Face Recognition: Objectives of This Project
KundurPage 1 of 10
Instructions
This project can be conducted in groups of at most two people. It is acceptable to discuss the project with other groups and to help other groups. However, the report has to be done separately for each group and no copying of code is allowed between groups. You should use the last lab session on December 2, 2010 to ask the TA any questions you may have and to attempt to conduct the project. You must attend the December 2, 2010 lab session for full points.
Deliverables
Please provide a group report presenting the following: o o o o A picture of the *.mdl and/or *.m file(s) that you used to generate the results. Please note that you can use either MATLAB or Simulink to conduct this project. A picture of your test and reference image(s) that you used to generate the results. Documentation and explanation of your results. A discussion of the challenges you encountered and any changes in models you made to obtain good results.
You will be graded on the completeness, accuracy and presentation quality of your report.
ECEN 448: Real-time DSP, Final Project, Texas A&M University (Fall 2010), Instructor: Dr. D. KundurPage 2 of 10
COMPARISON
FEATURE EXTRACTION
IMAGE REGISTRATION
Before you dive into the details, note that most face recognition devices that search huge databases are implemented via software, so that the software can access the image face databases stored locally or over some network. Since software implementations allow greater computational power (compare your Pentium to the DSP youre using for your labs!), we must water down the algorithms used substantially, and thus your implementation will lose the efficacy enjoyed by commercial face recognition software. For example, the face recognition software used in Las Vegas casinos can recognize the face of patrons while they move across the gambling floor, or while standing at the craps table. For our algorithm, we must capture only the face, looking straight at the camera, with no make-up, and of course no smile! This guarantees higher delectability, and lower false alarm. In addition the background should be completely black1, and the camera always at the same distance from the face pretty much like taking a photo for your drivers license or passport. After learning the basics in this project, you can take these ideas and implement more complex and more effective algorithms.
Images as Matrices
Before we begin describing each block used in this project, lets take a moment to review images. Recall that a sampled time-series signal can be considered as a 1-D vector. Naturally, a digital grayscale image can be considered as a matrix. If you want to get fancier and include color images, then you would have a 2-D matrix for every color plane: red, green, blue (RGB). Different formats represent color images differently. For example, instead of using the RGB format, one may use the YCbCr format, which represents color also using 3 planes: luminance (describes the intensity), and chrominance for blue and red. In this project well deal only with grayscale images, and thus consider one plane. An 8-bit grayscale
1
You may want to take the picture initially with a white background, and then use a photo editing software such as Photoshop to make the background black. The reason is due to lighting from various sources, including your flash, may give the background an unwanted luster. The reason we want a black background will become apparent soon. Dr. Deepa Kundur
ECEN 448: Real-time DSP, Final Project, Texas A&M University (Fall 2010), Instructor: Dr. D. KundurPage 3 of 10
image is thus a matrix whose values take on the integers 0 to 255, in other words 256 = 28 shades of gray. The minimum value 0 corresponds to black, since 0 intensity is darkness, while the maximum 255 corresponds to white, as the brightest scenario is all white.
More advanced face recognition software will also compute the geometries between sets of features to test whether the two faces in question may be the same or completely different. For example, your eyes and nose can be considered as the vertices of a triangle. Some triangles may be equilateral, while others will have different angles. Of course two different faces may share the same triangle type, and thus this test is not the only test to be performed. Dr. Deepa Kundur
ECEN 448: Real-time DSP, Final Project, Texas A&M University (Fall 2010), Instructor: Dr. D. KundurPage 4 of 10
average of all difference pixels, it is unlikely that the resulting value will be identical to 0. Say we compute a sample average of 1.5. Do we decide same face or different face? Where does the threshold lie for the comparison stage? For this, we rely on statistics. Let X denote the reference feature extracted (i.e., edge detected) image, and Y denote the test feature extracted (i.e., edge detected) image after registration as shown in Figure 1. Assume the pixels of each image are denoted X i, j and Yi, j . Let both images be MN matrices, and define the sample mean (a.k.a. average) and sample variance of the difference image X Y as in Equations (1) and (2), respectively:
mean(X Y ) =
(X
i, j
i, j
Yi, j )
M N
(1)
var(X,Y ) =
i, j
(2)
Notice that the minus one in the denominator of Equation (2) is not a typo. The minus one is used to make the sample variance an unbiased estimate of the true variance; details are beyond the scope of this project. The question we ask during this comparison phase of the facial recognition is whether the sample mean of Equation (1) is close enough to zero that it seems the faces in the reference and test images are the same. Our approach is to model the image pixels as random variables and then apply a confidence interval test to give a probability measure that can be used for comparison of the facial similarities. If we model the pixels X i, j and Yi, j as random variables that are independent, then the sample mean and variance denoted mean(X Y ) and var(X,Y ) , respectively, are also random variables. By the Central Limit Theorem, we can assume that mean(X Y ) is a Gaussian random variable, with its own mean being equal to true mean of the difference image X Y denoted m , and its variance being the true variance of ( X Y )/(M N) (dont worry about why this is true, but we will use it in the next paragraph).
Without access the true mean and true variance, we instead employ the sample mean of Equation (1), i.e. Equation (1) with the actual image pixel value numbers substituted which we denote with mean(xy) as not to confuse with with the random variable version, and substitute the sample variance appropriately into true variance of ( X Y )/(M N) . So now we have a random variable mean(X Y ) that has a Gaussian distribution with an estimate of its mean and variance. We can use probability theory to test whether it is likely that 0 is the true mean. We form a confidence interval, and check if 0 is inside this confidence interval. Specifically, we ask the question: Is 0 in the interval [ mean(x y) -b, mean(x y) +b]? As discussed before,
mean(x y) is the actual sample mean computed using the registered feature image data. The parameter
b is a number, which makes, for a user-defined p:
(3)
Dr. Deepa Kundur
ECEN 448: Real-time DSP, Final Project, Texas A&M University (Fall 2010), Instructor: Dr. D. KundurPage 5 of 10
For example, if we choose p = 0.05, and the corresponding parameter b is such that 0 is not in the interval [ mean(x y) - b, mean(x y) + b], then we can say that with probability 0.95 (or confidence at 95%), the two faces are not the same. Another way to look at this is if p = 0.05, then the corresponding b value is set so that with probability 0.95, all occurrences of sample means should be in this interval. Thus if we get a sample mean outside of this interval, we know that this only had a 0.05 chance of happening, so it must be an anomaly and we reject the notion that the true mean can be 0. Let us consider a pictorial representation as in Figure 2.
Figure 2 shows a Gaussian probability density function for the estimated mean random variable denoted mean(X Y ) . Recall from probability theory that integrating under a density function about a portion of the horizontal axis say [a c] gives the probability of the random variable falling in that range particular range of values. Recall also that integrating over the entire probability density function equals one. The shaded region, called tail probabilities, of Figure 2 corresponds to the event that mean(X Y ) does not fall in the region [ mean(x y) - b, mean(x y) + b] which has probability 1-p. Why is this? Recall our definition of the relationship between p and b in Equation (3).
You can see in Figure 2 that the shaded region corresponds precisely to p (because the un-shaded region corresponds to 1-p and the overall area under the function is one), and thus is commonly called the p-value. The smaller the p-value, the more likely 0 is an anomaly if it is not in the interval [ mean(x y) - b, mean(x y) + b]. Analytically, the p-value is given by Equation (4):
(4)
The erf or error function is used to find the probabilities for the Gaussian distribution. It is calculated based on a look-up table.
ECEN 448: Real-time DSP, Final Project, Texas A&M University (Fall 2010), Instructor: Dr. D. KundurPage 6 of 10
face recognition in Simulink, youll have to write a couple of MATLAB function m-files to help you out. Youll also have to do some preliminary work in the MATLAB command window to prepare the images for the Simulink model. In addition, keep in mind some general guidelines for doing the face recognition.
ECEN 448: Real-time DSP, Final Project, Texas A&M University (Fall 2010), Instructor: Dr. D. KundurPage 7 of 10
This command reads a picture from a file and stores it into an MxNx3 array where MxN is the dimensions of the image. The third dimension (3) represents the red, blue, and green intensities (the image is read as a color image). The array is stored in the variable called anchor. We call this picture the anchor because it is the reference face to which all others will be compared. 3. Use the imread command again to read your second picture to the MATLAB workspace. Save it in a variable named target. The target picture will be compared to the anchor picture from step 4. 4. The first thing you need to do to the pictures is convert them into grayscale. This is quite easy. Type anchor = rgb2gray(anchor) into the command window; do the same for the target. This is the red-green-blue to grayscale command. If you type whos into the command window, you will see that the third dimension of the arrays has been removed; anchor and target are now MxN matrices. Also, note that the data type is an unsigned 8-bit integer. 5. In anticipation of the edge detection that will be done later, you need to convert the image matrices from unsigned 8-bit integers to double-precision floating point numbers (MATLABs edge command only operates on double-precision numbers). Type anchor = double(anchor); do the same for the target. 6. In order to load the image matrices into Simulink, they must be put into a structure format. In MATLAB, a structure is a data type that stores more elementary data types (i.e. strings, doubleprecision floating point numbers, etc.) in an organized fashion. Specifically, structures consist of fields and values. A field is like a category, and values are the actual data within those categories. If youre confused, heres an example that should make more sense. Suppose you wanted to a store the data for a sine wave that lasts over a period of ten seconds. You could make a structure called S that has two fields: the first field, called signal, could store a vector of the actual values of the sine wave (lets call this vector x); the second field, called time, could store a vector of the corresponding time values from 0 to 10 seconds (lets call this vector t). If you wanted to create this structure, the syntax is pretty straightforward (dont actually type this command):
A structure, then, is just an organized way of storing data types that you are already know about. The complexity of structures is almost limitless because you can even put structures within other structures (which is what were about to do). If youve learned a programming language like C, you might have recognized by now that a MATLAB structure is much like a struct in C. 7. In order to be compatible with Simulink, the structures you make must be exactly the way Simulink wants. Heres Simulinks way. Within the structure are two fields. The first field is called time and contains, as its name suggests, a vector of time values; for you, of course, the time vector is a single value: 0. The second field is called signals; it actually contains another structure. This substructure has two fields of its own: the first, called values, contains the actual data in question (in your case, the image matrix); the second, called dimensions, specifies the dimensions of the data (in your case, this would be a vector [M N]). To make this a little clearer, heres a visual depiction of the structures organization:
ECEN 448: Real-time DSP, Final Project, Texas A&M University (Fall 2010), Instructor: Dr. D. KundurPage 8 of 10
8. Based on the description in step 6 and the discussion of structures in step 5, create a structure for the anchor and a structure for the target that are compatible with Simulink. Call the name of the anchors structure I1 and the name of the targets structure I2. 9. Now you are ready to load the images into Simulink. In your Simulink model add two From Workspace blocks from Simulink Sources. In the Block Parameters window, change the Data parameter to I1 and I2, respectively.
Edge Detection
The first step in our face recognition process is to extract the facial features (represented by the blocks labeled Extraction of Facial Features in Figure 1). As stated in the Introduction, youre going to do this MATLABs edge function. In Simulink, the easiest way to use a function that you would normally use in the command window is to use an Fcn block from Simulink User-Defined Functions. The Fcn block is a one-input, one-output block in which you can specify pretty much any one-input, one-output function that is defined in MATLAB. It can even be used for functions that you have written yourself in an m-file. In the Block Parameters of the Fcn block, all that you have to do for edge detection is to type double(edge(u)) into the MATLAB function parameter. Here, the u stands for the input to the block. Also, the double() command is necessary because, due to some peculiarity in Simulink, the block will not support a binary output (the edge command produces a binary output, as explained in the Introduction). Make two such Fcn blocks and place them into your Simulink model according to Figure 1.
ECEN 448: Real-time DSP, Final Project, Texas A&M University (Fall 2010), Instructor: Dr. D. KundurPage 9 of 10
3. Open a new m-file and write a one-input, one-output function called register.m. Heres some ideas on how proceed: a. Hardcode the step and angle inputs within your function. Namely, define angle = [2:0.01:2] and step = 50. This gets rid of two of the four inputs to im_reg_MI. b. In your Simulink model, concatenate the anchor and target images into one Mx2N matrix (use the edge-detected images, NOT the original anchor and target). This combined matrix will be the only input to your function. Youll have to do this with a Matrix Concatenate block (In the Block Parameters, set Number of inputs to 2, Mode to Multidimensional array, and Concatenate dimension to 2). c. Inside your function, extract the anchor and target images from the concatenated input of part b (i.e. into two separate matrices). d. Inside your function, call on im_reg_MI, using the angle and step variables you defined in part a, as well as the anchor and target images you extracted in part c. e. Make the single output of your function equal to the second output of im_reg_MI; you can ignore the other outputs. 4. Save your function m-file as register.m in the same directory as your Simulink model. 5. In your Simulink model, place an Fcn block that calls on the register function you just created. 6. Connect the blocks that you have so far to reflect the flow of data shown in Figure 1.
Comparison of Images
1. The last step in the face recognition process is to compare the anchor to the registered target image. To do this, of course, you need to calculate the difference image. In your Simulink model, subtract the edge-detected, registered target image from the anchor (the edge-detected anchor, NOT the original anchor). 2. Now, you need calculate the p-value of the difference image. To do this, youll once again need an Fcn block. And, once more, youll have to write your own function m-file to put in this Fcn block. 3. Write a one-input, one-output function m-file that calculates the p-value from a difference image. As you may have guessed, the input to this function should be the difference image and the output should be the p-value itself. Here are some things to consider as you write your function: a. Notice that to calculate the p-value in equation (3), you need the two-dimensional sample mean and sample variance, as defined in equations (1) and (2). CAUTION: MATLABs built-in functions mean and var are written for one-dimensional data and will not work as you want on your two-dimensional images. Instead, youll have to write some code to implement those formulae before you can compute equation (3). You may consider writing separate function m-files for equations (1) and (2) and have your p-value function call on these functions. b. The error function, erf, in equation (3) can be computed in MATLAB using, you guessed it, the erf() command. 4. If you havent already done so, implement your p-value function in step 3 in an Fcn block in your Simulink model. At the output of this Fcn block, attach a Display block from Simulink Sinks to view the p-value with.
Page 10 of 10 ECEN 448: Real-time DSP, Final Project, Texas A&M University (Fall 2010), Instructor: Dr. D. Kundur
Viewing Images
Although your face recognition algorithm is now complete, it would be nice to view some of the images at various stages in the process. To do this, use a Matrix Viewer block from Signal Processing Blockset Signal Processing Sinks. In the Block Parameters Window, set the Colormap matrix to gray(256) and uncheck the box labeled Display colorbar. Also: 1. If you want to view any image after edge detection, set the Minimum input value to 0 and the Maximum input value to 1. This is because edge detection produces a binary output. 2. If you want to view any image before edge detection, set the Minimum input value to 0 and the Maximum input value to 256. This is because before edge detection, the intensity values of the image are integers between 0 and 256. When you run your Simulink model, one window will appear for each Matrix Viewer you have in your model.