A Tutorial Review of Automatic Image Tagging Technique Using Text Mining
A Tutorial Review of Automatic Image Tagging Technique Using Text Mining
Department of Computer Science and Engineering, University College of Science, Technology and Agriculture, University of Calcutta, 2Dept. of Computer Science & Engineering, University of Calcutta, 92 A.P.C. Road, Kolkata 700009, India, [email protected], [email protected]
Abstract
With the advent of time, the number of images being captured and shared online has grown exponentially. The images which are captured are later accessed for the purpose of searching, classification and retrieval operation. Hence these images must be labelled with appropriate words, phrases or keywords so that the requisite operation can be performed efficiently. Automatic Image Tagging is such a technique which associates an appropriate keyword from a given set of words or phrases based on the relevance to the content of the image. This selection of the appropriate keyword can be performed by Text Mining which is concerned with the extraction of appropriate information from a given text. The main objective of this paper is to depict how Text Mining technique can be implemented in the process of Automatic Image Tagging. In order to annotate an image, techniques like Content Based Image Retrieval cane can be used, which emphasises on the content of the image to annotate an image. However due to several constraints of the above mentioned technique, Automatic Image Tagging technique is used which chooses a tag from a given set of tags to annotate an image. The selection of the appropriate tag can be implemented using the Classification logic of the Text Mining Algorithm that assigns the given set of keywords or tags to some predefined classes. In this way the most relevant tags can be selected assigned to the given image.
Keywords: Tagging, Image Annotation, Automatic Image Tagging, Linguistic Indexing, Content Based Image Retrieval, Automatic Annotation. ----------------------------------------------------------------------***-----------------------------------------------------------------------1. INTRODUCTION
A picture is a resemblance of past memories which is cherished by every individual all their life. Over the years the numbers of pictures being captured and shared have grown exponentially. There are several factors responsible for this growth. Firstly, in present days the digital cameras allow people to capture, edit, store and share high quality images with great ease compared to the old film cameras. Secondly, the availability of low cost of memory and hard disk drives. Thirdly, the popularity of social networking sites like Facebook, MySpace have given the user an additional interest to share photos online with their friends across the globe. With this rapid growth, their arises the need to perform effective manipulation (like searching, retrieval etc...) on images. Several search engines retrieve relevant images by text-based searching without using any content information. However, recent research shows that there is a semantic gap between content based image retrieval and image semantics understandable by humans. As a result, research in this area has shifted to bridge the semantic gap between low level image features and high level semantics. Thus, assigning relevant keywords is significant and can improve the image search quality. This is known as Image Annotation. It is defined as technique of assigning semantically relevant keywords to an image. The typical method of bridging the semantic gap is through the automatic image annotation (AIA) which extracts semantic features using machine learning techniques. Automatic image annotation (also known as automatic image tagging or linguistic indexing) is the process by which a computer system automatically assigns metadata in the form of captioning or keywords to a digital image. This application of computer vision techniques is used in image retrieval systems to organize and locate images of interest from a database. Apart from Automatic Image Annotation, annotation can also be done manually. However the latter technique being time consuming and involving considerable overhead motives the use of the former technique i.e Automatic Image Tagging.
1.2Characteristics:
2.
CONTENT
BASED
IMAGE
RETRIEVAL
(CBIR)
Content-based image retrieval (CBIR), also known as query by image content (QBIC) and content-based visual information retrieval (CBVIR) is the application of computer vision techniques to the image retrieval problem, that is, the problem of searching for digital images in large databases. Figure1This picture can be tagged, as sea, mountain, nature A tag attain the following characteristics (from Figure 1) Tags are chosen by the creator and/or by the viewer of the tagged item. Tags are not part of a formal subject indexing term set. Tags are informal and personal. Content based means that the search will analyze the actual contents of the image. "Content-based" means that the search will analyze the actual contents of the image rather than the metadata such as keywords, tags, and/or descriptions associated with the image. The term 'content' in this context might refer to colours, shapes, textures, or any other information that can be derived from the image itself.
1.3 Objective:
The main objective is to simplify the retrieval and search operation for an image in a photo library. Besides, in the commercial world, assigning a meaningful label or keywords to an image increases the efficiency in satisfying the needs of consumers; as an incorrectly or insufficiently labelled image is unlikely to be found, particularly within the stringent deadlines commonly experienced within the commercial world, thereby leading to a loss in operational efficiency.
Content-based image retrieval uses the visual contents of an image such as colour, shape, texture, and spatial layout to represent and index the image. In typical content-based image retrieval systems (Figure), the visual contents of the images in the database are extracted and described by multi-dimensional feature vectors. The feature vectors of the images in the database form a feature database. To retrieve images, users provide the retrieval system with example images or sketched figures. The system then changes these examples into its internal representation of feature vectors. The similarities /distances between the feature vectors of the query example or sketch and those of the images in the database are then calculated and retrieval is performed with the aid of an indexing scheme. The
1.
Input component is the query image. This image is to be matched in the image database. 2. In Content Based Image Retrieval System, Feature Extraction module performs extraction of features of the input image. The feature extraction is normally based on colour, texture and shape of an image, where colour content of an image is the most widely used feature used by CBIR in the process of feature extraction. The three types of image features are utilized in various CBIR applications ranging from scene/object and fingerprint identification and matching to face and pattern recognition. 3. The similarity matching is performed between the extracted features of the query image with the features of the image stored in the image feature database. Similarity matching, through metrics called similarity measures, is done to determine the degree of relevance of an image in a collection to a query. Similarity matching is a key component of a content-based image retrieval (CBIR) system because finding a set of images similar to the image the user had in mind is its primary goal. 4. The feature database contains the extracted features of the images in collection. 5. Finally an image or a set of images are extracted as output if the similarity matching between the query image and the images in the image database is successful.
1. Segmentation
Image segmentation is useful in many applications. It can identify the regions of interest in a scene or annotate the data.The segmentation component partitions images into local contents via either some block or region based method. The segmentation of an image includes global and local features of an image. Global features consider the visual features of the whole image; they cannot completely describe different parts of an image. On the other hand, image segmentation into local contents (i.e. different regions or areas) is able to provide more detailed information of images.
2. Feature Extraction
Feature Extraction is based on Global and Local features. Global feature includes all the pixels of an image. This extraction is used to represent the global colour of an image. There are two strategies for extracting local features. The first one is to partition a set of fixed sized blocks or tiles and the second for a number of variable shaped regions of interest. After performing block and/or region based segmentation,lowlevel features can be extracted from the tiles or regions for local feature representation. Low-level features such as colour, texture,shape, and spatial relationship are extracted to represent image features.
3. Annotation
Annotation is a technique of associating comments, notes, keywords, or other form of external words to a document without causing any change in the document itself. Annotations are metadata that add additional information about the existing piece of data. There are three types of image annotation: manual, automatic and semi automatic. Manual annotation needs users to enter some descriptive keywords when perform image browsing.
2.
3.
Flickr is one of the most comprehensive image resources on the web. As a photo management and sharing application, it provides user with the ability to tag, organize and share their photos online. Flickr offers a public API which can be used to write applications that use Flickr in some way. The objective of Automatic Image Annotation is to search over user- contributed photo sites like Flickr which have accumulated rich human knowledge and billions of photos, then associate surrounding tags from those visually similar Flickr photos for the unlabeled image.
CASE 1 Keywords:Red, Green, Flower, Yellow, Rose, Jasmine, Lotus. Automatic Annotation:Yellow Rose Flower. CASE 2 Keywords: Red, Green, Yellow, Jasmine, Lotus, Sunflower, Fruit, Rose Automatic Annotation: Yellow Rose. CASE 3 Keywords: Red, Green, Jasmine, Lotus, Sunflower, Fruit. Automatic Annotation: Insufficient keywords for annotation.
models that are capable of automatically tagging images from unrestricted domains with good accuracy. Given a
target image and its surrounding text, we extract those words and phrases that are most likely to represent meaningful tags.
Proposal
Consider an image of nature. The input image is segmented into several components and features are extracted from the components. In order to perform the feature extraction a multi-resolution grid-based framework for image content representation and feature extraction is performed, where the images are partitioned into a set of regular grids with different sizes and multi-modal visual features are automatically extracted for each image grid.
Whole Image
Assumptions:
With every class a set of keywords will be associated which is relevant to that class. When performing the classification, pick the first keyword from the query and associate it with the best set of match(s) that can be found from all the subclasses at level 1. It might be possible that one or more subclasses at level 1 has a keyword that may match the query string since
Design:
Input Image
Water Flower
Soil
Mountain
(Keywords
(Keywords K1) (Keywords K2) (Keywords K3) K4)(Keywords K5) (Keywords K6) (Keyword K7)
Keywords K5: The K5 set is associated with the class Flower. This set will contain all those keywords that is relevant to the class Flower. K5= {Red, Yellow, Pink, Blue, Green, Rose, Jasmine, Lotus, Sunflower, Leaf} Keywords K6: The K6 set is associated with the class Fruit. This set will contain all those keywords that is relevant to the class Fruit. K6= {Red, Green, Yellow, Orange, Pink, Apple, Mango, Orange, Guava, Strawberry, Banana, Watermelon} Keywords K7: The K7 set is associated with the class Others. This set will contain all those keywords that do not belong to the other classes.
Keywords K1: The K1 set is associated with the class Water. This set will contain all those keywords that is relevant to the class Water. K1={River, Sea, Pond, Lake, Blue} Keywords K2: The K2 set is associated with the class Air. This set will contain all those keywords that is relevant to the class Air. K2= {Windy, Odour} Keywords K3: The K3 set is associated with the class Soil. This set will contain all those keywords that is relevant to the class Soil. K3= {Tree, Plant, Grass, Green, Vegetables} Keywords K4: The K4 set is associated with the class Mountain. This set will contain all those keywords that is relevant to the class Mountain. K4= {Hill}
Example:
Consider the above Image. It an image of a Yellow Rose with Green leaf. Let us assume that this is the image that is to be tagged with the appropriate keyword(s). The image is
CONCLUSION
People love to take images, but are not so willing to annotate the images afterwards with relevant tags. A requirement for effectivesearching and retrieval of images in rapid growing online image databases is that each image has accurate and useful annotation. Handling large volumes of digital information becomes vital as online resources and their usage continuously grows at high speed. Online image sharing applications are getting extremely popular. Flickr1 is one of the most popular of these applications hosting over 4 billion2 images. The idea with automatic image tagging is that tags are automatically captioned and assigned to the digital image. These tags should describe every important part or aspect of the image and its context. Automatic image tagging can be done based on the visual content of the image, contextual information, or using a mixture of these two approaches. By looking at the visual content of an image, it could for example be possibly to predict that an image where most edges are vertical or horizontal contains a building. Another approach is to find a set of images that are visually similar to the query image in existing image databases consisting of already tagged images, and then pick the most relevant tags from the set of similar images. The context in which the image was taken can also be used to tag images.