A Visual Keyboard
A Visual Keyboard
A Visual Keyboard
ABSTRACT
Information and communication technologies can have a key role in helping people with special educational needs,considering both physical and cognitive disabilities. Replacing a keyboard or mouse, with eye-scanning cameras mounted on computers have become necessary tools for people without limbs or those affected with paralysis. The camera scans the image of the character, allowing users to "type" on a monitor as they look at the Visual Keyboard. The paper describes an input device, based on eye scanning techniques that allows people with severe motor disabilities to use gaze for selecting specific areas on a computer screen. Compared to other existing solutions, the strong points of this approach are simplicity and novelty.
conventional hardware keyboard with a Visual Keyboard. It employs sophisticated scanning and pattern matching algorithms to achieve the objective. It exploits the eyes natural ability to navigate and spot familiar patterns. Eye typing research extends over twenty years; however, there is little research on the design issues. Recent research indicates that the type of feedback impacts typing speed, error rate, and the users need to switch her gaze between the visual keyboard and the monitor.
2. Description of Concepts
2.1 The Eye
KEYWORDS:
Vis-Key, ICT, Cornea, Sclera, Choroids, Retina, Digitizer, Pixel, Equalization, Contrast, Digital image.
1. INTRODUCTION
Vis-Key aims at replacing the
Fig 2.1 shows us the horizontal cross section of the human eye. The eye is nearly a sphere with an average diameter of approximately 20mm.Three membranes The Cornea & Sclera cover, the Choroid layer and the Retina encloses the eye. When the eye is properly focused, light from an object is imaged on the retina. Pattern vision is afforded by the distribution of discrete light receptors over the surface of the Retina. There are two classes of receptors Cones and Rods. The cones,typically present in the central portion of the retina called fovea is highly sensitive to color. The number of cones in the human eye ranges from 6-7 millions. These cones can resolve fine details because they are connected to its very own nerve end. Cone vision is also known as Photopic or Bright-light vision. The rods are more in number when compared to the cones (75-150 million). Several rods are connected to a single nerve and hence reduce the amount of detail discernible by the receptors. Rods give a general overall picture of the view and not much inclined towards color recognition. Rod vision is also known as the Scotopic vision or Dim-light vision as illustrated in fig 2.1, the curvature of the anterior surface of the lens is greater than the radius of its posterior surface. The shape of the lens is controlled by the tension in the fiber of the ciliary body. To focus on distant objects, thecontrolling muscles cause the lens to be relatively flattened. Similarly to focus on nearer objects the muscles allow the lens to be thicker. The distance between the focal distance of the lens and the retina varies from 17 mm to 14 mm as
the refractive power of the lens increases from its minimum to its maximum.
The Vis-Key system (fig 3.1) comprises of a High Resolution camera that constantly scans the eye in order to capture the character image formed on the Eye. The camera gives a continuous streaming video as output. The idea is to capture individual frames at regular intervals (say of a second). These frames are then compared with the base frames stored in the repository. If the probability of success in matching exceeds the threshold value, the corresponding character is displayed on the screen. The hardware requirements are simply a personal computer, Vis-Key Layout (chart) and a web cam connected to the USB port. The system design, which refers to the software level, relies on the construction, design and implementation of image processing algorithms applied to the captured images of the user.
4. System Architecture
4.1. Calibration The calibration procedure aims at initializing the system. The first algorithm, whose goal is to identify the face position, is applied only to the first image, and the result will be used for processing the successive images, in order to speed up the process. This choice is acceptable since the user is supposed only to make minor movements. If background is completely black (easy to obtain) the users face appears as a white spot, and the borders can be obtained in correspondence of a decrease in the number of black pixels. The Camera position is below the PC monitor; if it were above, in fact, when the user looks at the bottom of the screen the iris would be partially covered by the eyelid, making the identification of the pupil very difficult. The user should not be distant from the camera, so that the image does not contain much besides his/her face. The algorithms that respectively identify the face, the eye and the pupil, in fact, are based on scanning the image to find the black pixel concentration: the more complex the image is, the slowest the algorithm is too. Besides, the image resolution will be lower. The suggested distance is about 30 cm. The users face should also be very well illuminated, and therefore two lamps were posed on
each side of the computer screen. In fact, since the identification algorithms work on the black and white images, shadows should not be present on the Users face. 4.2. Image Acquisition The Camera image acquisition is implemented via the Functions of the AviCap window class that is part of the Video for Windows (VFW) functions. The entire image of the problem domain would be scanned every 1/30 of second. The output of the camera is fed to an Analog to Digital converter (digitizer) and digitizes it. Here we can extract individual frames from the motion picture for further analysis and processing. 4.3. Filtering of the eye component The chosen algorithms work on a binary (black and white) image, and are based on extracting the concentration of black Pixels. Three algorithms are applied to the first acquired image, while from the second image on only the third one is applied.
The First algorithm, whose goal is to identify the face position, is applied only to the first image, and the result will be used for processing the successive images, in order to speed up the process. This choice is acceptable since the user is supposed only to make minor movements. The Face algorithm converts the image in black and white, and zooms it to obtain an image that contains only the users face. This is
done by scanning the original image and identifying the top, bottom, left and right borders of the face. (Fig 4.3.1). Starting from the resulting image, the Second algorithm extracts the information about the eye position (both left and right) pixels is the one that contains the eyes. The algorithm uses this information to determine the top and bottom borders of the eyes area (Fig. 4.3.2), so that it is extracted from the image. The new image is then analyzed to identify the eye: the algorithm finds the right and left borders, and generates a new image containing the left and right eyes independently.
The procedure described up until now is applied only to the first image of the sequence, and the data related to the right eye position are stored in a buffer and used also for the following images. This is done to speed up the process, and is acceptable if the user does only minor head movements. The Third algorithm extracts the position of the center of the pupil from the right eye image. The Iris identification procedure uses the same approach of the previous algorithm. First of all, the left and right borders of the iris are iris, is the one that has the higher concentration of black pixels. The center of this image represents also the center of the pupil. The result of this phase is the coordinates of the center of the pupil for each of the image in the sequence
4.4. Preprocessing The key function of preprocessing is to improve the image in ways to improve the chances for success with other processes. Here preprocessing deals with 4 important techniques: _ To enhance the contrast of the image. _ To eliminate/minimize the effect of noise on the image. _ To isolate regions whose texture indicates likelihood to alphanumeric information. _ To provide equalization for the image. 4.5. Segmentation Segmentation broadly defines the partitioning of an input image into its constituent parts or objects. In general, autonomous segmentation is one of the most difficult tasks in Digital Image Processing. A rugged segmentation procedure brings the process a long way towards successful solution of the image problem. In terms of character recognition, the key role of segmentation is to extract individual characters from the problem domain. The output of the segmentation stage is raw pixel data, constituting either the boundary of a region or all points in the region itself. In either case converting the data into suitable form for computer processing is necessary. The first decision is to decide whether the data should be represented as a boundary or as a complete region. Boundary representation is appropriate when the focus is on external shape characteristics like corners and inflections. Regional representation is appropriate when the focus is on internal shape characteristics such as
texture and skeletal shape. Description also called feature selection deals with extracting features that result in some quantitative information of interest or features that are basic for differentiating one class of objects from another. 4.6. Recognition and Interpretation Recognition is the process that assigns a label to an object based on the information provided by its descriptors. This process allows us to cognitively recognize characters based on knowledge base. Interpretation involves assigning a meaning to an ensemble of recognized objects. Interpretation attempts to assign meaning to a set of labeled entities. For example, to identify a character say 'C', we need to associate descriptors for that character with label 'C'. 4.7. Knowledge Base Knowledge about a particular problem domain can be coded into an image processing system in the form of a Knowledge database. The knowledge may be as simple as detailing regions of an image where the information of interest is known thus limiting our search in seeking that information. Or it can be quite complex such as an image database where all image entries are of high resolution. The key distinction of this knowledge base is that it, in addition to guiding the operation of various components, facilitates feedback operations between the various modules of the system. This depiction on fig 4.1 indicated that communication between processing modules is based on prior knowledge of what a result should be.
been sufficient research in this field of Eye scanning. If implemented, it will be one of the awe-inspiring technologies to hit the market.
6. Design Constraints:
Though this model is thought provoking, we need to address the design constraints as well. _ R & D constraints severely hamper our cause for a full-fledged working model of the Vis-Key system. _ The need for a very high resolution camera calls for a high initial investment. _ The accuracy and the processing capabilities of the algorithm are very much liable to quality of the input. Due to these design constraints we will be able to chalk out a plan to encompass modules 4, 5, 6 and 7 (Preprocessing, Segmentation and Representation, Recognition and Interpretation and the Knowledge base). The most preferred softwares to implement these algorithms are C++ and Mat Lab 6.0.
7. Alternatives/Related References
The approaches till date have only been centered on the Eye Tracking theory. It lays more emphasis on the use of eye as a cursor and not as a data input device. An eye-tracking device lets users select letters from a screen. Dasher, the prototype program taps into the natural gaze of the eye and makes predictable words and phrases simpler to write, said David MacKay, project coordinator and physics professor from Cambridge University. Dasher calculates the probability of one letter coming after another. It then presents the letters required as if contained on infinitely expanding bookshelves. Researchers say people will be able to write up to 25 words per minute with Dasher compared to onscreen keyboards, which they say average about 15 words per minute. Eye-tracking devices are still
5. Unique Features
This model is a novel idea and the first of its kind in the making. Also it opens a new dimension to how we perceive the world and should prove to be a critical technological breakthrough considering the fact that there has not
problematic. "They need re-calibrating each time you look away from the computer," says Willis.
_ Ward, D. J. & MacKay, D. J. C. Fasthands-free writing by gaze direction. Nature, 418, 838, (2002). _ Daisheng Luo Pattern Recognition and Image Processing Horwood series in engineering sciences.
Bibliography:
_http://www.cs.uta.fi/~curly/publicati ons/ECEM12-Majaranta.html _www.inference.phy.cam.ac.uk/djw 30/dasher/eye.html _http://www.inference.phy.cam.ac.u k/dasher/ _ http://www.cs.uta.fi/hci/gaze/ eyetyping.php _ http://www.acm.org/sigcaph