Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2
EE368: Digital Image Processing Project Report Ian Downes
[email protected] Stanford University Abstract— An algorithm to detect
and decode visual code markers in medium resolution images is presented. The algorithm uses adaptive methods to segment the image to identify objects. The objects are then used to form candidate markers which are examined for several criteria. Potential markers are then sampled and guide information present in the marker is used to verify the data. The algorithm is invariant to scale and rotation and is robust to motion blur and varying illumination. The algorithm is implemented in C and takes approximately 100 ms to execute per image using a Pentium 4 3.40 GHz computer. I. INTRODUCTION The presence of cameras in cell phones is becoming extremely commonplace as the price of inclusion plummets. As more and more people are equipped with these cameras it becomes feasible to develope a range of applications that utilize the camera for purposes other than simply taking a snapshot. One such application is to use the camera to sample the data contained in a visual code marker and to use this as a code to reference information. A typical use case might be to include a visual code marker next to the advertisment for a movie. By taking an image of the marker the phone can decode the data and then query a backend database for the local screening times of the movie. To achieve this type of application the camera must be able to reliably identify and decode visual markers. This report details the development and implementation of an algorithm that can be used for this purpose. The report first analyses the problem and establishes the requirements for the algorithm. It then examines the major steps of the algorithm and how it meets the requirements. The results of testing the algorithm are discussed before the report concludes with a summary. II. ALGORITHM DESIGN Before the design of the algorithm was started the problem was analysed to develop some ideas about how the visual code markers could be effectively detected and the issues that would need to be solved. Once a basic high level method was established the lower level stages of the proposed algorithm were designed and tested. The follow sections detail how the problem was analysed and the resulting algorithm. A. Problem Analysis Figure 1 shows the format of the visual code marker. The marker is composed of an 11×11 grid of binary elements. The binary elements are represented by a black square for a ‘1’ and a white square for ‘0’. The origin of the array is defined to be the top left corner (that is, the corner opposite the intersection of the guide bars) and the data is ordered in column major order. The central region of the marker contains 83 bits of data. The remaining 38 bits are spaced at the corners and are used to form guide elements. The guide elements consist of three corner square elements and two guide bars. The guide bars are perpendicular to each other and are of different lengths. In addition to the 11×11 array the marker also has a white border extending (at least) one element width around the array. This border serves the same purpose as the border around barcodes and ensures that the marker is separate from any background it is placed on. The image source is a digital image from a camera equipped cell phone. The image is of VGA resolution (640 × 480) and provided as a compressed 8-bit RGB JPEG file. The compression ratio is approximately 18:1. This resolution is comparatively low but it is of course definately adequate to sample and extract the data from the markers under the right conditions. Of more importance are the conditions under which the image is obtained. Specifically, the camera is most likely to be handheld, leading to a degree of motion blur, and to be positioned somewhat haphazardly, leading to arbitrary transformations in angle, translation and perspective. Furthermore, the lighting conditions are not controlled and it is possible that the illumination will not be constant over the image and definately not between images. The high contrast between the binary elements (black vs. white) will help to differentiate the binary values under a wide range of lighting conditions. The marker is likely to be a small fraction of the total image data. It is expected that the marker will be approximately between 100 × 100 and 150 × 150 pixels. As part of the project brief the image may contain one to three markers in an image. Although it may be possible to find the location of markers by perhaps looking for mixed dark/light regions it is essential that the guide elements be used to orient the marker. Correct orientation is required so that the data can be read in a meaningful way. The provision of the perpendicular guide bars allows the detection of any rotation of the marker. In addition, if necessary, the observed change in the 90o angle could be used to estimate the perspective of the camera view, opening up the possibility of rectifying the image. The corner guide elements can be used to accurately determine the extent of the marker. This will be particularly important when the viewing perspective distorts the shape of the marker. This analysis has shown that the algorithm must be able to
Computer Vision Fundamental Matrix: Please, suggest a subtitle for a book with title 'Computer Vision Fundamental Matrix' within the realm of 'Computer Vision'. The suggested subtitle should not have ':'.